this post was submitted on 27 Oct 2025
-2 points (46.9% liked)

Technology

1304 readers
75 users here now

A tech news sub for communists

founded 3 years ago
MODERATORS
 

(if you're here for the bonus question I will give you some pointers - this was generated on my own gaming computer with an RTX 4060 16GB Vram, and the original size at which it was generated is 1024x1024)

Let's be honest, AI is tough to get into for the average person. There's a lot of technical requirements you need in both hardware and software, lots of commands to run, and it can potentially break at any step of the process. And that's just being the user, we're not even talking about doing research or

Originally when I got started with AI image gen it was because I wanted to try something with deepseek. I gave it the following:

  • I am on Windows. My hardware is the aforementioned RTX, my gbs of ram, etc. [letting it know context so it doesn't make stuff up to fill in the gaps]
  • I might already have some dependencies installed, I'm not sure. So make sure to check for them [avoiding useless steps that would just waste time]
  • I want to start image generation on my computer but I have no idea where to start. What do I install? Write a guide that:
    • Uses the command-line interface as much as possible [just faster to work with this, you can copy and paste]
    • Orders steps from least effort to most effort [makes sure dependency check and the 'easy' stuff gets out of the way first]

And it worked. Any time I ran into an error in the cmd output I gave the command and entire log to deepseek, told it which step I was on, and it told me what to run and that would usually fix it. If it didn't I sent it the log again until it did. It can install some older stuff since its knowledge cutoff is in august 2024, so you can make it look online for the latest info. Once I got ComfyUI working for example I pasted the startup log which had a warning about an old Torch version. By making deepseek look online we found that for my comfyUI version, I could use a higher version. With web search you don't spend hours opening dozens of tabs just to find which pytorch cuda version is compatible with your interface and then installing, the AI finds it for you

And it works and it took less than an hour.

I have an interface for image gen that works on a virtual python environment so that all dependencies are contained, I have docker desktop to run my LLMs with, everything works great and as far as I'm aware I'm on the latest versions of everything.

-> It's not just for AI, it's just that LLMs are great with stuff that has a lot of content to train on, and tech is up there. You could use this same prompt for linux, for example, and get more people away from windows. Anything a bit techy that you need to troubleshoot or tinker with, AIs like deepseek can provide a guide and you just have to follow it.

When it came to installing image gen it was even able to give me some models to download (though they were a bit outdated) and help me troubleshoot. I didn't even know what a VAE was. It told me where to put the stuff I download and what I needed to get started.

Then from there you learn by yourself. I downloaded better models, some LORAs, now I'm installing comfyUI aside from automatic's (with the same instructions to deepseek) so I can try Flux, which gives amazing results. I know more about how AI works behind the scenes now that I've done this, which was helped by deepseek. It might help that I already had python and knew what it is, as well as knowing what pip is and the other stuff I need to run commands. But if you have any questions, open a second deepseek window, send it the command, and ask "what's pip? what does this do? what's a virtual environment in python?"

Again, the point isn't so much about image gen as it is about getting you through the slog and working on stuff. You can use this to troubleshoot stuff like photoshop, your VPS, your code, and who knows what else. It's easier with open source because the code and documentation is just there for the AI (with web lookup it can even look at the code directly if you send it the page), which I hope and feel like will strengthen open source over proprietary software. the cURL project has already used AI to fix 100 bugs, including one they didn't even know was there.

Oh, I'll update with the answer to the bonus question in some hours haha.

top 11 comments
sorted by: hot top controversial new old
[–] Commiejones@lemmygrad.ml 7 points 1 week ago (1 children)

I love how this is getting down voted.

In the early 00s in high school I submitted a digital painting in art class and my teacher rejected it. "this isn't art anyone can do this." I heard the same sort of thing about electronic music.

Even if you don't want to use AI to generate art it is still a really powerful tool. I know nothing about programing. I bullied Deepseek into building a program for me. My program breaks now because of youtube changing shit but I just go bully deepseek into fixing it and it works again.

I suppose I should just say that I made the program but I still don't feel right saying that.

[–] CriticalResist8@lemmygrad.ml 6 points 1 week ago

Honestly the more I tinker with ai image gen the more I get it. It's fun, but it's a different fun to doing it by hand.This is more tinkering and gambling, it def has a "one more turn" feel to it to see what the tool can come up with.

And the more I toy with it the more all barriers between whether this is art or not art break down, imo. There's a lot of work that can go into AI gen, the same way you can pick up a pen and doodle in the margins of your notebook, or you can learn ink techniques and make manga. You have the models (literally hundreds of them and then also merged models that combine 2 or more into 1), but also loras which skew the result in a certain way, you have to select the sampler (what creates the noise) and the sampling method (how much noise is added each step), etc.

Did I not make it, because the output comes from a computer? No, I did and I totally get why now that I've dabbled, even if this will ruffle some feathers. Does the producer not make the beat because he "only" sets up synthesizers in Ableton other people made and then plays chords other people found out before him?

It's not just typing in the prompt either, because the words you type either the positive or negative prompt have an influence, and part of it is also figuring out these words. You can also give them different strengths and you have to type the way the model expects to receive information. It's like a puzzle, though like I said there's randomness to it. A picture you feel looks cool (though I totally get if other people don't share the enthusiasm looking at it lol) could have taken a few hours to get just right. There's something personal to it, because it's my combination of prompts and parameters that spat this out.

The castle in the OP picture was made with a certain sampler. Here's two more:

The only difference between those three pictures was the sampler algorithm - everything else (the seed, the steps, the prompt, the resolution) was the exact same. This is just one way models can give wildly different results.

There's a lot that goes into it and look, I'm from a graphic design background and non-designers who made their own logo or used clipart they found on google to make one are nothing new. If something's good, it's good. You know it when you see it. Artists will use it as part of a process and then throw it into Photoshop or other software to bring the finishing touches. Others will just take whatever the AI spits out and say good enough. Corridor Crew used genAI that's available in a smartphone app to green screen themselves out without an actual green screen and even change themselves on camera in different ways, all from the raw video footage and a prompt. they made a short movie with it using the rest of their VFX skills; someone who knows what they're doing will use this as part of their process.

And beyond art and image gen there's the very undeniable tech aspect as you found out as well. If you want to get something specific out of Youtube but there's no solution, then just make your own with deepseek. If you want open source software but it's not working on your machine, deepseek it instead of going back to proprietary.

[–] Horse@lemmygrad.ml 4 points 1 week ago* (last edited 1 week ago) (1 children)

this was generated on my own gaming computer with an RTX 4060 16GB Vram, and the original size at which it was generated is 1024x1024

from experience, probably ~30 seconds to a minute depending on batch size, though i usually cap out at 768x768 because i only ever generate images for art reference and ttrpg tokens

[–] CriticalResist8@lemmygrad.ml 4 points 1 week ago (1 children)

Lol you're the closest, and not that far off. Most people said 3-9 minutes. The answer is actually:

answer13 seconds

With proper GPU offloading (deepseek told me which launch arguments to use), pytorch and of course the GPU architecture probably plays a part in it. The model is actually meant to make 1024x1024 so that's what I generate at.

That particular picture was me trying different samplers to see the differences, the model's own limitations create a lot of artefacts but that's why I wanted to try Flux, and... it's scary:

This generates in around 40-60 seconds, I have TeaCache which cuts generation time in half but sometimes it takes a while for the model to parse the prompt with the TeaCache workflow, which is a bottleneck and I'm not sure why it's doing it. Flux also handles text but I'm not quite sure how to get it working.

Anyone who's scared of this tech... get acquainted with it intimately. It's not going away; this runs on the same computer that generated the picture in the OP and it's completely free.

[–] ksynwa@lemmygrad.ml 1 points 1 week ago (1 children)

Last time I tried it, it takes 3-5 minutes on my "gaming" laptop based on the number of iterations.

[–] CriticalResist8@lemmygrad.ml 1 points 1 week ago (1 children)

Are you using CPU by chance? On some interfaces you have to specifically tell it to use GPU.

I never got to make my point about the time either so I'll just make it here lol, but it's incredible how efficient image gen has become, it's faster than an LLM generating a full response actually. You can, today, host LLMs and image gen on your own server center and make the interface available to employees/friends/whoever. Basically businesses don't have to rely on the huge OpenAI datacenters, they can host these models on their own already existing infrastructure. It's better for the environment and AI is clearly not going away, so open source AI is what we should have more of.

[–] ksynwa@lemmygrad.ml 2 points 1 week ago

No it's just that my GPU isn't great. (Though I love it because it has served me well so far.) It only has 4GB VRAM which is awful for any generative stuff. Then there is a quirk related to floating points where the objectively better setting does not work with my GPU so bad becomes much worse.

[–] PoY@lemmygrad.ml 3 points 1 week ago (1 children)

while it can certainly help you do things, it can also lead you down some rabbit holes that can really bork your system up if you blindly run whatever it gives you. it's a pretty mixed bag, but if you're specific enough it generally will not screw you over

[–] CriticalResist8@lemmygrad.ml 4 points 1 week ago

Exercise caution and don't hesitate to search up the commands (both on google and in another chat window) before executing.

since this is a virtual environment I'm pretty safe. On VPSes I make sure to double-read the command before running anything, and do diagnostics first, like anything I copy from the internet. llms are very bad at asking questions before jumping to an answer so give it as much context as you can before anything, including OS.

I also try to make sure I can undo whatever it asks me to do if need be, which is actually easier to do when you have a single window with every instruction in it over a dozen tabs with some I haven't opened in over 45 minutes while troubleshooting a minor problem in a bigger problem lol.

[–] m532@lemmygrad.ml 3 points 1 week ago (1 children)

A picture of this size would take my cpu (8400F) ~3-4 minutes (depends on the diffusion model).

My guess is 12 seconds for your gpu.

[–] CriticalResist8@lemmygrad.ml 5 points 1 week ago

I'm not going to spoil the answer yet but I just wrote it on another comment reply in this thread 🤐