Hacker News new | past | comments | ask | show | jobs | submit login
Animating Prompts with Stable Diffusion (replicate.com)
158 points by davedx on Sept 2, 2022 | hide | past | favorite | 56 comments



The pace of the OS development around Stable Diffusion is nothing but mindblowing, and I can't wait to see what we'll be able to to do in just 6 months from now.

The possibilities for non-skeuomorphic media format is pretty insane, especially once we get into animation territory.

I've been using Midjourney and SD to create a sci-fi filmverse called SALT, here's some details about how I put together, including all available "episodes": https://twitter.com/fabianstelzer/status/1565085199322456069


> The vision for @SALT_VERSE is a multi-plot, community owned CC0-crypto filmverse

The generated art concept is pretty cool but why couldn't we leave crypto out of it for once?


Here’s why: https://twitter.com/fabianstelzer/status/1565726070963408896...

That said, SALT will be 100% open to anyone, regardless of whether they’re using crypto or not


None of that explains what it has to do with cryptocurrency.

You say "a fun idea in crypto gaming is composability", but I don't see why that is specific to cryptocurrency.

(FWIW, I think the idea of composable game worlds created by ML prompts is absolutely fascinating, and I also think a future in which all money is outside the control of the state and corporations is interesting. I just don't see the relationship between the two).


Open source, closed marketplace.


Yeah, why couldn't you ricardobeat? The GP sure didn't mention it in his comment.


You know how machines can be very good and efficient in some stuff but be terrible in other when compared to humans?

I'm yet to see anything from this world made by the latest AI generated images boom.

For example I really really like Midjourney, it creates images that feel artistic and all but I start to think that d I'm misjudging it because it appears to be a that tool makes great combinations that look fascinating because they are so novel and out of this world.

Crystals growing over the electronics, porcelain bubbles, viola that turns into plasma etc... all amazing but all these are a genre in art. Combining things, making things transition into other things, making them look like something else - all that are procedures that humans can master(it's just that the computer can do it much more quickly).

Considering that all this is simply teaching a computer to predict stuff by degrading images and trying to re-create back again, I think this is going to make a revolution in tooling when we can actually guide the output precisely. Right now it's just fascinating toy for "out of this world" image creation and anything made by AI looks like the imaginary bridges and buildings that you can find on Euro banknotes(made like that in order not to favour particular country over others). That said, I think the toy in its current state has high explorational value.


> Combining things, making things transition into other things, making them look like something else - all that are procedures that humans can master(it's just that the computer can do it much more quickly).

Have you not just described creativity (and genetic recombination) in the abstract? I understand this ability to combine and synthesize and cross-breed to be one of the more important components of human intelligence, from which almost everything else wonderful about us is derived. And it shouldn't escape not to replication and recombination is also the centre of biological information processes of life.

Something that recombines things better and more "creatively" than us in it's very nascent days, that feels quite important to me. Respectfully, it doesn't feel like simply a "genre of art" that it's doing better than us


I'm not talking about creativity here. It exists to create images, obviously it's creative tool by definition. I'm describing the nature of creativity and no, the creative process is not simply fitting things together somehow.


The creative process is very much fitting together and remixing things.

Humans are not an island. I recommend watching the “everything is a remix” series it does a good job of expounding on this


See, you are having a straw man argument.


What the sibling person is saying is relevant, and I'm kinda confused why you dismiss it as a straw man

> I'm describing the nature of creativity and no, the creative process is not simply fitting things together somehow.

Many non-naive folks in the sciences would agree that creativity (as far as the universe is concerned about human activity as a physical phenomenon) is basically just aimless recombination of semiotic structures, seeking stability that helps it (and associate structures) to persist. It's not unlike the function that genetic recombination performs in the biological strata of information. Genetic recombination are creativity in the biological substrate. Our version is just much more highly dimensional, but it's the same shit (the same forces in the universe) underlying it.

Your saying human creativity is "something more" feels to me like an example of elevating the subjective conscious experience of it


How do you actually create the animations? I'd love to be able to do this myself to make it easier to publish my music creations.


The model behind Stable Difussion works similar to pareidolia[1], recognising "shapes" on random noise following the prompt theme, and refining that "mental image" until it generates something that matches a recognizable image.

In these animations it's easy to see it, as the shape recognition is not very stable. For example, in the last video when the prompt changes from "people" to "bears" you can see a backpack turning into a bear head, which then turns into an open-mouth bear head. And then, the arm of the person carring the backpack is also turned into more bear heads.

The next steps in the evolution of this technique should be in exploring the relation between noise and subjects, so that you can create variations of the same image maintaining the recognizable parts stable.

[1] https://en.wikipedia.org/wiki/Pareidolia


That's actually a pretty good analogy for diffusion models. Another commenter linked https://www.reddit.com/r/dalle2/comments/vnw3z9/a_house_in_t..., showing how "flat stability" (zoom out / scrolling) is possible via in/out painting with masks.

The linked technique zooms and doesn't use masks, hence the instability.


It's a pretty compelling, trippy effect even if we don't have good ways to work around it yet.

This is some amazing work that takes advantage of it: https://twitter.com/xsteenbrugge/status/1558508866463219712?...


This reminds me of part of the opening scene of Adaptation: https://youtu.be/6Geq3wVvaNE?t=170

It cuts off before the full montage but you should get the idea.


Dall-E introduced outpainting yesterday: https://openai.com/blog/dall-e-introducing-outpainting/


It looks like DALLE is just performing inpainting on pieces of the image outside the original frame.

The zoom effect is scaling the original frame larger while keeping the frame, then performing image to image generation on it, or using the newly defined image in the frame and sending it back through diffusion (at least that’s my guess).

It’s a different process than what DALLE has since inpainting does not overwrite already generated pieces. Stable diffusion can also do inpainting.

You can make sliding images this way. Slowly translating the image out of the frame and then filling in the blank space with inpainting.


Looks quite nice, trending on artstation.


> trending on artstation

Could anyone explain why this phrases is repeated everywhere?


It seems like the model was trained on images that were "trending on artstation", see here: https://www.reddit.com/r/DiscoDiffusion/comments/u01cnw/how_...

So it might be that all images have a distinctive look, or influences of it, and this phrase is becoming kind of an inside joke/meme.


Images with this description tend to be good so it suggests the model should make good quality images. Other terms include "4k" and "unreal engine 5". All things held equal, it's not unreasonable that the model has the capacity to draw poor images intentionally.

It's a stupid temporary problem


I guess this phrase is present in the training set for stable diffusion.


If you want to run it, this is the main model on Replicate: https://replicate.com/deforum/deforum_stable_diffusion

It's by deforum. Here are links to their Discord, GitHub, and Colab: https://deforum.github.io/


Every single day SD never fails to amaze me. Imagine a music video generated like this in a few minutes…


Someone animated the entire history of the Earth with SD:

https://twitter.com/xsteenbrugge/status/1558508866463219712


I like how some people randomly pop up next to dinosaurs.



Something like this? https://youtu.be/Ip7p9DQXhSA


I've been playing with this for awhile, within a day is doable. But more like hours than minutes for more interesting content


This is something I've been thinking about for YEARS now. But I was going for a very very different approach.


Would be nice with some more documentation. I managed to get it up and running, but if I try with some prompts, it seemingly starts but then after about 10 minutes it crashes with internal server error.

I'm guessing it's running out of memory perhaps.


I'm having similar issues with the Web version. Their repo contains a nice python notebook so I'm going to try to get it working locally. https://github.com/deforum/stable-diffusion/blob/main/Deforu...


This is so cool!

What I would really love is to have multiple prompts. So, for example, prompt 1 is a fast moving description of the action. Prompt 2 is a slow moving fade between artistic style.


It supports multiple prompts, and you could control the time via frames.

> Provide 'frame number : prompt at this frame', separate different prompts with '|'. Make sure the frame number does not exceed the max_frames.

Checking out the code, in wouldn't be too difficult to add a system for movement keyframes either.


You can relatively easily create it yourself. Basically a for loop where in each frame you move/zoom the image slightly. each X frames you apply a different prompt. You have to play around with the weights to get a nice result.


So what would happen if you trained this model on frames from movies that were annotated? You could leverage CC descriptions and maybe use a classifier to automate a lot of the annotation. Would it be able to create novel but contiguous predictions for the next frame?


The problem would still be temporal stability. You'd have to train on series of frames, but then I'm not sure diffusion models are well suited for that?


How is this zoom effect achieved? Do they zoom in a little and then re-imagine the image?


I think it's actually the opposite. shrink the img and give it a white border, then re-imagine. Stitching together and playing the opposite way to give the zoom effect. shrink to keep same px dimensions or just enlarge canvas and place img in center.

Some good examples of this being done with Dalle2 last month -- https://youtu.be/TW2w-z0UtQU?t=244


I don't think that's the case, it looks to be a zoom and reinterpret. I wonder what tradeoffs doing it the other way around would bring?

       rot_mat = cv2.getRotationMatrix2D(center, angle, scale) # the zoom variable is passed as scale
https://github.com/deforum/stable-diffusion/blob/5241ce95058...

Edit: having looked carefully at the video, these are pretty different. Inpaining keeps the original crop the same, whereas this version allows the model to reinterpret the original.


I like to think it’s a consequence of my thread of the other day https://news.ycombinator.com/item?id=32659407

(I know… but let me think it, guys)


I wonder how much better this would work if the network had been trained with blurring as the noise function instead of noise (or with a combination of both)?


Is there a step by step guide somewhere how to run repos like these in the cloud?

I don't have a fast machine myself, but I would not mind renting a VM somewhere to play with it.

Any tips?


You can try it on Google Colab [1] to get started.

There are quite a few tutorials for getting started with SD.[2] They tend to explain how to install it on your CPU, but some also explain how to build your own instance on Google Colab . I know this one in Spanish[2], and you can look for more. [3]

[1] https://colab.research.google.com/github/altryne/sd-webui-co...

[2] https://youtu.be/5z223SxlAcA?t=1910

[3] https://www.youtube.com/results?search_query=stable+diffusio...


I would prefer to work on the command line.

And be able to chain together my own workflow, combining different tools.

So I would prefer to rent a VM and not use Google Colab.


Try Lambda Labs or Core Weave for affordable GPU servers. I'm sure there are more, but I've had good experiences with these in the past.


They have a docker image, I would start with that. It's pretty easy to setup a container in the cloud these days. They even have a tool to customize the container: https://github.com/replicate/cog


Before all the webuis came out, I had success following the README.

It looks like they've got a nice python notebook: https://github.com/deforum/stable-diffusion/blob/main/Deforu...

For other cases I would reccomend this repo which has a user script feature: https://github.com/AUTOMATIC1111/stable-diffusion-webui#user...


It reminds me of these old MTV weird animations from the 90s. Amazing what kind of content can be created with this


I wonder if it will be possible to instead provide it with two keyframes and it interpolate the frames in between?



I can't keep up!


In the future we will say “trending on Hacker News”


Infinite fractals based on human culture.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: