The pace of the OS development around Stable Diffusion is nothing but mindblowing, and I can't wait to see what we'll be able to to do in just 6 months from now.
The possibilities for non-skeuomorphic media format is pretty insane, especially once we get into animation territory.
None of that explains what it has to do with cryptocurrency.
You say "a fun idea in crypto gaming is composability", but I don't see why that is specific to cryptocurrency.
(FWIW, I think the idea of composable game worlds created by ML prompts is absolutely fascinating, and I also think a future in which all money is outside the control of the state and corporations is interesting. I just don't see the relationship between the two).
You know how machines can be very good and efficient in some stuff but be terrible in other when compared to humans?
I'm yet to see anything from this world made by the latest AI generated images boom.
For example I really really like Midjourney, it creates images that feel artistic and all but I start to think that d I'm misjudging it because it appears to be a that tool makes great combinations that look fascinating because they are so novel and out of this world.
Crystals growing over the electronics, porcelain bubbles, viola that turns into plasma etc... all amazing but all these are a genre in art. Combining things, making things transition into other things, making them look like something else - all that are procedures that humans can master(it's just that the computer can do it much more quickly).
Considering that all this is simply teaching a computer to predict stuff by degrading images and trying to re-create back again, I think this is going to make a revolution in tooling when we can actually guide the output precisely. Right now it's just fascinating toy for "out of this world" image creation and anything made by AI looks like the imaginary bridges and buildings that you can find on Euro banknotes(made like that in order not to favour particular country over others). That said, I think the toy in its current state has high explorational value.
> Combining things, making things transition into other things, making them look like something else - all that are procedures that humans can master(it's just that the computer can do it much more quickly).
Have you not just described creativity (and genetic recombination) in the abstract? I understand this ability to combine and synthesize and cross-breed to be one of the more important components of human intelligence, from which almost everything else wonderful about us is derived. And it shouldn't escape not to replication and recombination is also the centre of biological information processes of life.
Something that recombines things better and more "creatively" than us in it's very nascent days, that feels quite important to me. Respectfully, it doesn't feel like simply a "genre of art" that it's doing better than us
I'm not talking about creativity here. It exists to create images, obviously it's creative tool by definition. I'm describing the nature of creativity and no, the creative process is not simply fitting things together somehow.
What the sibling person is saying is relevant, and I'm kinda confused why you dismiss it as a straw man
> I'm describing the nature of creativity and no, the creative process is not simply fitting things together somehow.
Many non-naive folks in the sciences would agree that creativity (as far as the universe is concerned about human activity as a physical phenomenon) is basically just aimless recombination of semiotic structures, seeking stability that helps it (and associate structures) to persist. It's not unlike the function that genetic recombination performs in the biological strata of information. Genetic recombination are creativity in the biological substrate. Our version is just much more highly dimensional, but it's the same shit (the same forces in the universe) underlying it.
Your saying human creativity is "something more" feels to me like an example of elevating the subjective conscious experience of it
The model behind Stable Difussion works similar to pareidolia[1], recognising "shapes" on random noise following the prompt theme, and refining that "mental image" until it generates something that matches a recognizable image.
In these animations it's easy to see it, as the shape recognition is not very stable. For example, in the last video when the prompt changes from "people" to "bears" you can see a backpack turning into a bear head, which then turns into an open-mouth bear head. And then, the arm of the person carring the backpack is also turned into more bear heads.
The next steps in the evolution of this technique should be in exploring the relation between noise and subjects, so that you can create variations of the same image maintaining the recognizable parts stable.
It looks like DALLE is just performing inpainting on pieces of the image outside the original frame.
The zoom effect is scaling the original frame larger while keeping the frame, then performing image to image generation on it, or using the newly defined image in the frame and sending it back through diffusion (at least that’s my guess).
It’s a different process than what DALLE has since inpainting does not overwrite already generated pieces. Stable diffusion can also do inpainting.
You can make sliding images this way. Slowly translating the image out of the frame and then filling in the blank space with inpainting.
Images with this description tend to be good so it suggests the model should make good quality images. Other terms include "4k" and "unreal engine 5". All things held equal, it's not unreasonable that the model has the capacity to draw poor images intentionally.
Would be nice with some more documentation. I managed to get it up and running, but if I try with some prompts, it seemingly starts but then after about 10 minutes it crashes with internal server error.
What I would really love is to have multiple prompts. So, for example, prompt 1 is a fast moving description of the action. Prompt 2 is a slow moving fade between artistic style.
You can relatively easily create it yourself.
Basically a for loop where in each frame you move/zoom the image slightly. each X frames you apply a different prompt. You have to play around with the weights to get a nice result.
So what would happen if you trained this model on frames from movies that were annotated? You could leverage CC descriptions and maybe use a classifier to automate a lot of the annotation. Would it be able to create novel but contiguous predictions for the next frame?
The problem would still be temporal stability. You'd have to train on series of frames, but then I'm not sure diffusion models are well suited for that?
I think it's actually the opposite. shrink the img and give it a white border, then re-imagine. Stitching together and playing the opposite way to give the zoom effect.
shrink to keep same px dimensions or just enlarge canvas and place img in center.
Edit: having looked carefully at the video, these are pretty different. Inpaining keeps the original crop the same, whereas this version allows the model to reinterpret the original.
I wonder how much better this would work if the network had been trained with blurring as the noise function instead of noise (or with a combination of both)?
You can try it on Google Colab [1] to get started.
There are quite a few tutorials for getting started with SD.[2] They tend to explain how to install it on your CPU, but some also explain how to build your own instance on Google Colab . I know this one in Spanish[2], and you can look for more. [3]
They have a docker image, I would start with that. It's pretty easy to setup a container in the cloud these days. They even have a tool to customize the container: https://github.com/replicate/cog
The possibilities for non-skeuomorphic media format is pretty insane, especially once we get into animation territory.
I've been using Midjourney and SD to create a sci-fi filmverse called SALT, here's some details about how I put together, including all available "episodes": https://twitter.com/fabianstelzer/status/1565085199322456069