Yeah I did have a few false starts. Total time is more like 3 months vs 1 month for the final model. For small scale training I found it’s necessary to use a long lr warmup period, followed by constant lr.
There’s code on my GitHub (glid3)
edit: The architecture is identical to SD except I trained on 256px images with cosine noise schedule instead of linear. Using the cosine schedule makes the unet converge faster but can overfit if overtrained.
edit 2: Just tried it again and my model is also pretty bad at hands actually. It does get lucky once in a while though.
There’s code on my GitHub (glid3)
edit: The architecture is identical to SD except I trained on 256px images with cosine noise schedule instead of linear. Using the cosine schedule makes the unet converge faster but can overfit if overtrained.
edit 2: Just tried it again and my model is also pretty bad at hands actually. It does get lucky once in a while though.