(if you're here for the bonus question I will give you some pointers - this was generated on my own gaming computer with an RTX 4060 16GB Vram, and the original size at which it was generated is 1024x1024)
Let's be honest, AI is tough to get into for the average person. There's a lot of technical requirements you need in both hardware and software, lots of commands to run, and it can potentially break at any step of the process. And that's just being the user, we're not even talking about doing research or
Originally when I got started with AI image gen it was because I wanted to try something with deepseek. I gave it the following:
- I am on Windows. My hardware is the aforementioned RTX, my gbs of ram, etc. [letting it know context so it doesn't make stuff up to fill in the gaps]
- I might already have some dependencies installed, I'm not sure. So make sure to check for them [avoiding useless steps that would just waste time]
- I want to start image generation on my computer but I have no idea where to start. What do I install? Write a guide that:
-
- Uses the command-line interface as much as possible [just faster to work with this, you can copy and paste]
-
- Orders steps from least effort to most effort [makes sure dependency check and the 'easy' stuff gets out of the way first]
And it worked. Any time I ran into an error in the cmd output I gave the command and entire log to deepseek, told it which step I was on, and it told me what to run and that would usually fix it. If it didn't I sent it the log again until it did. It can install some older stuff since its knowledge cutoff is in august 2024, so you can make it look online for the latest info. Once I got ComfyUI working for example I pasted the startup log which had a warning about an old Torch version. By making deepseek look online we found that for my comfyUI version, I could use a higher version. With web search you don't spend hours opening dozens of tabs just to find which pytorch cuda version is compatible with your interface and then installing, the AI finds it for you
And it works and it took less than an hour.
I have an interface for image gen that works on a virtual python environment so that all dependencies are contained, I have docker desktop to run my LLMs with, everything works great and as far as I'm aware I'm on the latest versions of everything.
-> It's not just for AI, it's just that LLMs are great with stuff that has a lot of content to train on, and tech is up there. You could use this same prompt for linux, for example, and get more people away from windows. Anything a bit techy that you need to troubleshoot or tinker with, AIs like deepseek can provide a guide and you just have to follow it.
When it came to installing image gen it was even able to give me some models to download (though they were a bit outdated) and help me troubleshoot. I didn't even know what a VAE was. It told me where to put the stuff I download and what I needed to get started.
Then from there you learn by yourself. I downloaded better models, some LORAs, now I'm installing comfyUI aside from automatic's (with the same instructions to deepseek) so I can try Flux, which gives amazing results. I know more about how AI works behind the scenes now that I've done this, which was helped by deepseek. It might help that I already had python and knew what it is, as well as knowing what pip is and the other stuff I need to run commands. But if you have any questions, open a second deepseek window, send it the command, and ask "what's pip? what does this do? what's a virtual environment in python?"
Again, the point isn't so much about image gen as it is about getting you through the slog and working on stuff. You can use this to troubleshoot stuff like photoshop, your VPS, your code, and who knows what else. It's easier with open source because the code and documentation is just there for the AI (with web lookup it can even look at the code directly if you send it the page), which I hope and feel like will strengthen open source over proprietary software. the cURL project has already used AI to fix 100 bugs, including one they didn't even know was there.
Oh, I'll update with the answer to the bonus question in some hours haha.
Lol you're the closest, and not that far off. Most people said 3-9 minutes. The answer is actually:
answer
13 secondsWith proper GPU offloading (deepseek told me which launch arguments to use), pytorch and of course the GPU architecture probably plays a part in it. The model is actually meant to make 1024x1024 so that's what I generate at.
That particular picture was me trying different samplers to see the differences, the model's own limitations create a lot of artefacts but that's why I wanted to try Flux, and... it's scary:
This generates in around 40-60 seconds, I have TeaCache which cuts generation time in half but sometimes it takes a while for the model to parse the prompt with the TeaCache workflow, which is a bottleneck and I'm not sure why it's doing it. Flux also handles text but I'm not quite sure how to get it working.
Anyone who's scared of this tech... get acquainted with it intimately. It's not going away; this runs on the same computer that generated the picture in the OP and it's completely free.
Last time I tried it, it takes 3-5 minutes on my "gaming" laptop based on the number of iterations.
Are you using CPU by chance? On some interfaces you have to specifically tell it to use GPU.
I never got to make my point about the time either so I'll just make it here lol, but it's incredible how efficient image gen has become, it's faster than an LLM generating a full response actually. You can, today, host LLMs and image gen on your own server center and make the interface available to employees/friends/whoever. Basically businesses don't have to rely on the huge OpenAI datacenters, they can host these models on their own already existing infrastructure. It's better for the environment and AI is clearly not going away, so open source AI is what we should have more of.
No it's just that my GPU isn't great. (Though I love it because it has served me well so far.) It only has 4GB VRAM which is awful for any generative stuff. Then there is a quirk related to floating points where the objectively better setting does not work with my GPU so bad becomes much worse.