Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Question is, will SD3 be downloadable? I downloaded and run the early SD locally and it is really great.

Or did we lose Stable Diffusion to SAAS also? Like we did on many of the LLMs which started of so promising as for self hosting goes



Sounds like it’ll be downloadable. FTA:

> In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers.


The 800m model is super exciting


It will probably suck. These models aren’t quite good enough for most tasks (other than toy fun exploration). They’re close in the sense that you can get there with a lot of work and dice rolling. But I would be pessimistic about a smaller model actually getting you where you want.


Yeah, SDXL is probably better than SD3 800M if I had to guess. I’m looking forward to the quality advancements with LCM Loras, or an SD3 Turbo!


It's has about as many parameters as SD v1.5, but hopefully with a better architecture, so I think it could end up being better for VRAM-constrained users than SD v1.5.


Not really. Look at SDXL Turbo. Sure, it's fast, but the images it produces are not very good.

I'm not even sure what the use case is.


SDXL turbo is great in my experience! Way better than any alternative at that speed (e.g. sd1.4 or sd2). For img2img it's fantastic. And since the 800m model is using new insights since then and generally a cleaner dataset from the looks of it, I could imagine it's decent for some tasks (or better than sd2 turbo at least, which is enough to be fun and useful in my eyes).


But what's the use case?

I'd rather wait 30 seconds and get a much higher quality image than some mediocre image in 1 second.

Hell, even if it took 5 minutes per image, and produced even better images, I would prefer that.


A lot of use cases are cost-limited. Dalle3 makes great images but costs $0.12 per image so it would get extremely expensive at scale (1k generations is already 120$). The cost is by gpu time, the faster you can generate it the cheaper it is. We can get images under 250ms now, which is fast enough to fit in a web request.

Some use cases might be generating profile pictures or banners for users or unique profile pictures for bots in online games. Discord, steam, social media, whatnot, you could just type what you want your profile picture to be and make it on the fly. They're small, aren't expected to be extremely high quality, and cheap enough.

Testing on https://fastsdxl.ai/ - "high quality profile picture of a cartoon cat holding a Bouquet of flowers"

To be clear it's not perfect, but this is a fairly complex prompt and I find the majority of seeds would be "good enough" for thumbnail profile pictures. I think we're almost there for "cheap good enough" usecases.


I feel like you're grasping at use cases here... I also feel like people would find a low quality profile picture to be terrible.


Github's profile pictures are 40x40px in issues/pr's/commits/etc, they're very rarely seen above that, and I think sdxl lightning creates acceptable 1024x1024 images in many cases - downscaling to 512x512 hides a lot of the "ai artifacts"

Places like steam, discord, etc you very rarely see profile pictures above that size.


Ok but... Is generating these things really a problem or feature that people want addressed?


The use case is you can print out the great images and put them on your wall for decoration.

It is just art. I think AI art shows what obsessed gadget makers for profit we have become culturally. We can't even figure out that the use case for art is hanging on the wall for decoration. For a conversation piece. A few will have their name become known and make it into galleries.

Infinite supply means the value tends towards zero.Good luck monetizing anything with those economic characteristics.


I feel pretty comfortable estimating that 99%, if not 99.9% of art is not printed out and hung on a wall for decoration.


FWIW, whether you use Turbo or an accelerator (i.e. Nvidia's TensorRT), there is plenty of guesswork in the prompt you want. Iterating quickly with low steps in Euler A, finding a great prompt that works, then switching to higher fidelity (I like DPMS 2M at 3x the steps) goes a long way to getting it all "just right".

If you are getting what you want most of the time, then you are a better 'prompt engineer' than I am.


latency does matter. realistic workflows are not one shot outputs. They're slow iterative improvements and changes to an image generated from a fixed or at least manually adjusted seed. 5 min would be killer.

30 seconds is probably a decent sweetspot imo.


the more interesting use cases I've seen have been real time video filters and uses with music based generation for live performances, which is why the speed matters more than the accuracy.

For example it could take an old video game say morrowind and it could in real time patch the graphics onto the video screen. Or people could look at a video of themselves and it would update the style similar to a snapchat filter.


The linked paper did look at SDXL Turbo, and found that the images were about as good as SDXL and better than a lot of models that would have been popular a little while ago. The compromise from using it, if there is any, is hard to detect. But it is much faster.

But the difference is academic; progress is so fast that it is reasonable to expect all these models will be obsolete in a year or two.


Thanks for pointing that out. Super promising.


Yeah will all be downloadable weights. 800m, 2b and 8b currently planned.


Will there be refiner models too?


It looks like it. I really hope they do. Running SDXL right now is propper fun. I don't even use it for anything specific, just to amuse myself at times :D


Same question here. Anyone can point to the source that says they are going to publish the weights?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: