Animating Prompts with Stable Diffusion

fab1an · on Sept 2, 2022

The pace of the OS development around Stable Diffusion is nothing but mindblowing, and I can't wait to see what we'll be able to to do in just 6 months from now.

The possibilities for non-skeuomorphic media format is pretty insane, especially once we get into animation territory.

I've been using Midjourney and SD to create a sci-fi filmverse called SALT, here's some details about how I put together, including all available "episodes": https://twitter.com/fabianstelzer/status/1565085199322456069

ricardobeat · on Sept 2, 2022

> The vision for @SALT_VERSE is a multi-plot, community owned CC0-crypto filmverse

The generated art concept is pretty cool but why couldn't we leave crypto out of it for once?

fab1an · on Sept 2, 2022

Here’s why: https://twitter.com/fabianstelzer/status/1565726070963408896...

That said, SALT will be 100% open to anyone, regardless of whether they’re using crypto or not

jstanley · on Sept 2, 2022

None of that explains what it has to do with cryptocurrency.

You say "a fun idea in crypto gaming is composability", but I don't see why that is specific to cryptocurrency.

(FWIW, I think the idea of composable game worlds created by ML prompts is absolutely fascinating, and I also think a future in which all money is outside the control of the state and corporations is interesting. I just don't see the relationship between the two).

realce · on Sept 2, 2022

Open source, closed marketplace.

cinntaile · on Sept 2, 2022

Yeah, why couldn't you ricardobeat? The GP sure didn't mention it in his comment.

mrtksn · on Sept 2, 2022

You know how machines can be very good and efficient in some stuff but be terrible in other when compared to humans?

I'm yet to see anything from this world made by the latest AI generated images boom.

For example I really really like Midjourney, it creates images that feel artistic and all but I start to think that d I'm misjudging it because it appears to be a that tool makes great combinations that look fascinating because they are so novel and out of this world.

Crystals growing over the electronics, porcelain bubbles, viola that turns into plasma etc... all amazing but all these are a genre in art. Combining things, making things transition into other things, making them look like something else - all that are procedures that humans can master(it's just that the computer can do it much more quickly).

Considering that all this is simply teaching a computer to predict stuff by degrading images and trying to re-create back again, I think this is going to make a revolution in tooling when we can actually guide the output precisely. Right now it's just fascinating toy for "out of this world" image creation and anything made by AI looks like the imaginary bridges and buildings that you can find on Euro banknotes(made like that in order not to favour particular country over others). That said, I think the toy in its current state has high explorational value.

patcon · on Sept 2, 2022

> Combining things, making things transition into other things, making them look like something else - all that are procedures that humans can master(it's just that the computer can do it much more quickly).

Have you not just described creativity (and genetic recombination) in the abstract? I understand this ability to combine and synthesize and cross-breed to be one of the more important components of human intelligence, from which almost everything else wonderful about us is derived. And it shouldn't escape not to replication and recombination is also the centre of biological information processes of life.

Something that recombines things better and more "creatively" than us in it's very nascent days, that feels quite important to me. Respectfully, it doesn't feel like simply a "genre of art" that it's doing better than us

mrtksn · on Sept 2, 2022

I'm not talking about creativity here. It exists to create images, obviously it's creative tool by definition. I'm describing the nature of creativity and no, the creative process is not simply fitting things together somehow.

andreilys · on Sept 2, 2022

The creative process is very much fitting together and remixing things.

Humans are not an island. I recommend watching the “everything is a remix” series it does a good job of expounding on this

mrtksn · on Sept 2, 2022

See, you are having a straw man argument.

patcon · on Sept 3, 2022

What the sibling person is saying is relevant, and I'm kinda confused why you dismiss it as a straw man

> I'm describing the nature of creativity and no, the creative process is not simply fitting things together somehow.

Many non-naive folks in the sciences would agree that creativity (as far as the universe is concerned about human activity as a physical phenomenon) is basically just aimless recombination of semiotic structures, seeking stability that helps it (and associate structures) to persist. It's not unlike the function that genetic recombination performs in the biological strata of information. Genetic recombination are creativity in the biological substrate. Our version is just much more highly dimensional, but it's the same shit (the same forces in the universe) underlying it.

Your saying human creativity is "something more" feels to me like an example of elevating the subjective conscious experience of it

liquidify · on Sept 5, 2022

How do you actually create the animations? I'd love to be able to do this myself to make it easier to publish my music creations.

TuringTest · on Sept 2, 2022

The model behind Stable Difussion works similar to pareidolia[1], recognising "shapes" on random noise following the prompt theme, and refining that "mental image" until it generates something that matches a recognizable image.

In these animations it's easy to see it, as the shape recognition is not very stable. For example, in the last video when the prompt changes from "people" to "bears" you can see a backpack turning into a bear head, which then turns into an open-mouth bear head. And then, the arm of the person carring the backpack is also turned into more bear heads.

The next steps in the evolution of this technique should be in exploring the relation between noise and subjects, so that you can create variations of the same image maintaining the recognizable parts stable.

[1] https://en.wikipedia.org/wiki/Pareidolia

operator-name · on Sept 2, 2022

That's actually a pretty good analogy for diffusion models. Another commenter linked https://www.reddit.com/r/dalle2/comments/vnw3z9/a_house_in_t..., showing how "flat stability" (zoom out / scrolling) is possible via in/out painting with masks.

The linked technique zooms and doesn't use masks, hence the instability.

synu · on Sept 2, 2022

It's a pretty compelling, trippy effect even if we don't have good ways to work around it yet.

This is some amazing work that takes advantage of it: https://twitter.com/xsteenbrugge/status/1558508866463219712?...

googlryas · on Sept 2, 2022

This reminds me of part of the opening scene of Adaptation: https://youtu.be/6Geq3wVvaNE?t=170

It cuts off before the full montage but you should get the idea.

fragmede · on Sept 2, 2022

Dall-E introduced outpainting yesterday: https://openai.com/blog/dall-e-introducing-outpainting/

midlightdenight · on Sept 2, 2022

It looks like DALLE is just performing inpainting on pieces of the image outside the original frame.

The zoom effect is scaling the original frame larger while keeping the frame, then performing image to image generation on it, or using the newly defined image in the frame and sending it back through diffusion (at least that’s my guess).

It’s a different process than what DALLE has since inpainting does not overwrite already generated pieces. Stable diffusion can also do inpainting.

You can make sliding images this way. Slowly translating the image out of the frame and then filling in the blank space with inpainting.

Lapsa · on Sept 2, 2022

Looks quite nice, trending on artstation.

MrsPeaches · on Sept 2, 2022

> trending on artstation

Could anyone explain why this phrases is repeated everywhere?

martin_a · on Sept 2, 2022

It seems like the model was trained on images that were "trending on artstation", see here: https://www.reddit.com/r/DiscoDiffusion/comments/u01cnw/how_...

So it might be that all images have a distinctive look, or influences of it, and this phrase is becoming kind of an inside joke/meme.

spywaregorilla · on Sept 2, 2022

Images with this description tend to be good so it suggests the model should make good quality images. Other terms include "4k" and "unreal engine 5". All things held equal, it's not unreasonable that the model has the capacity to draw poor images intentionally.

It's a stupid temporary problem

mlboss · on Sept 2, 2022

I guess this phrase is present in the training set for stable diffusion.

bfirsh · on Sept 2, 2022

If you want to run it, this is the main model on Replicate: https://replicate.com/deforum/deforum_stable_diffusion

It's by deforum. Here are links to their Discord, GitHub, and Colab: https://deforum.github.io/

surfskatr · on Sept 2, 2022

Every single day SD never fails to amaze me. Imagine a music video generated like this in a few minutes…

optimalsolver · on Sept 2, 2022

Someone animated the entire history of the Earth with SD:

https://twitter.com/xsteenbrugge/status/1558508866463219712

ModernMech · on Sept 2, 2022

I like how some people randomly pop up next to dinosaurs.

Lapsa · on Sept 2, 2022

https://www.youtube.com/watch?v=Nz_n0qxqoPg

B0073D · on Sept 2, 2022

Something like this? https://youtu.be/Ip7p9DQXhSA

jimhi · on Sept 2, 2022

I've been playing with this for awhile, within a day is doable. But more like hours than minutes for more interesting content

ando818 · on Sept 2, 2022

This is something I've been thinking about for YEARS now. But I was going for a very very different approach.

macrolime · on Sept 2, 2022

Would be nice with some more documentation. I managed to get it up and running, but if I try with some prompts, it seemingly starts but then after about 10 minutes it crashes with internal server error.

I'm guessing it's running out of memory perhaps.

operator-name · on Sept 2, 2022

I'm having similar issues with the Web version. Their repo contains a nice python notebook so I'm going to try to get it working locally. https://github.com/deforum/stable-diffusion/blob/main/Deforu...

bravura · on Sept 2, 2022

This is so cool!

What I would really love is to have multiple prompts. So, for example, prompt 1 is a fast moving description of the action. Prompt 2 is a slow moving fade between artistic style.

operator-name · on Sept 2, 2022

It supports multiple prompts, and you could control the time via frames.

> Provide 'frame number : prompt at this frame', separate different prompts with '|'. Make sure the frame number does not exceed the max_frames.

Checking out the code, in wouldn't be too difficult to add a system for movement keyframes either.

holoduke · on Sept 2, 2022

You can relatively easily create it yourself. Basically a for loop where in each frame you move/zoom the image slightly. each X frames you apply a different prompt. You have to play around with the weights to get a nice result.

towaway15463 · on Sept 2, 2022

So what would happen if you trained this model on frames from movies that were annotated? You could leverage CC descriptions and maybe use a classifier to automate a lot of the annotation. Would it be able to create novel but contiguous predictions for the next frame?

operator-name · on Sept 2, 2022

The problem would still be temporal stability. You'd have to train on series of frames, but then I'm not sure diffusion models are well suited for that?

hexomancer · on Sept 2, 2022

How is this zoom effect achieved? Do they zoom in a little and then re-imagine the image?

SailingSperm · on Sept 2, 2022

I think it's actually the opposite. shrink the img and give it a white border, then re-imagine. Stitching together and playing the opposite way to give the zoom effect. shrink to keep same px dimensions or just enlarge canvas and place img in center.

Some good examples of this being done with Dalle2 last month -- https://youtu.be/TW2w-z0UtQU?t=244

operator-name · on Sept 2, 2022

I don't think that's the case, it looks to be a zoom and reinterpret. I wonder what tradeoffs doing it the other way around would bring?

       rot_mat = cv2.getRotationMatrix2D(center, angle, scale) # the zoom variable is passed as scale

https://github.com/deforum/stable-diffusion/blob/5241ce95058...

Edit: having looked carefully at the video, these are pretty different. Inpaining keeps the original crop the same, whereas this version allows the model to reinterpret the original.

NoMoreBro · on Sept 2, 2022

I like to think it’s a consequence of my thread of the other day https://news.ycombinator.com/item?id=32659407

(I know… but let me think it, guys)

nullc · on Sept 2, 2022

I wonder how much better this would work if the network had been trained with blurring as the noise function instead of noise (or with a combination of both)?

TekMol · on Sept 2, 2022

Is there a step by step guide somewhere how to run repos like these in the cloud?

I don't have a fast machine myself, but I would not mind renting a VM somewhere to play with it.

Any tips?

TuringTest · on Sept 2, 2022

You can try it on Google Colab [1] to get started.

There are quite a few tutorials for getting started with SD.[2] They tend to explain how to install it on your CPU, but some also explain how to build your own instance on Google Colab . I know this one in Spanish[2], and you can look for more. [3]

[1] https://colab.research.google.com/github/altryne/sd-webui-co...

[2] https://youtu.be/5z223SxlAcA?t=1910

[3] https://www.youtube.com/results?search_query=stable+diffusio...

TekMol · on Sept 2, 2022

I would prefer to work on the command line.

And be able to chain together my own workflow, combining different tools.

So I would prefer to rent a VM and not use Google Colab.

fjfbsufhdvfy · on Sept 2, 2022

Try Lambda Labs or Core Weave for affordable GPU servers. I'm sure there are more, but I've had good experiences with these in the past.

bzxcvbn · on Sept 2, 2022

They have a docker image, I would start with that. It's pretty easy to setup a container in the cloud these days. They even have a tool to customize the container: https://github.com/replicate/cog

operator-name · on Sept 2, 2022

Before all the webuis came out, I had success following the README.

It looks like they've got a nice python notebook: https://github.com/deforum/stable-diffusion/blob/main/Deforu...

For other cases I would reccomend this repo which has a user script feature: https://github.com/AUTOMATIC1111/stable-diffusion-webui#user...

holoduke · on Sept 2, 2022

It reminds me of these old MTV weird animations from the 90s. Amazing what kind of content can be created with this

mysterydip · on Sept 2, 2022

I wonder if it will be possible to instead provide it with two keyframes and it interpolate the frames in between?

TuringTest · on Sept 2, 2022

That's so last week ;-)

https://replicate.com/andreasjansson/stable-diffusion-animat...

mysterydip · on Sept 2, 2022

I can't keep up!

riddleronroof · on Sept 2, 2022

In the future we will say “trending on Hacker News”

mensetmanusman · on Sept 2, 2022

Infinite fractals based on human culture.