> Are those 2048 x 2048 images still sensible? SD 1.5 is best used at 512x512 and may produce sensible images upto 768. It generates monstrosities above that. Similarly SD XL is good upto 1024.
You can do significantly higher resolutions with various tricks like tiled diffusion, which is also a memory efficiency hack. (The stable-diffusion-webui tiled diffusion extension uses 2560×1280 direct [no upscale step] generation with an SD 1.5-based model as one of its examples.)
Up scaling the image in chunk creates loads of semantic issues. For example, bottom of tree might look further in the mountains but it's top will be near you. You don't see problems like these in non scaled images.
> Up scaling the image in chunk creates loads of semantic issues.
No, tiled upscaling generally does not have that problem significantly (compared to direct generation at native model-supported size, which doesn't completely avoid that kind of issue), since the composition on that level is set before the upscale (direct tiled generation does, if you aren’t using something like controlnet to avoid it.)
> You don’t see problems like these in non scaled images.
You actually occasionally do, but its fairly rare.
It's conditioned on the lowres input, so if it doesn't have semantic discontinuites it doesn't happen. It will eventually happen if you continue doing this indefinitely, but with reasonable size to tile ratio (say <6x) it works well. With manual or object detection-assisted tiling and proper conditioning (controlnets sidechannel, especially if it's a custom trained controlnet/t2i) it can be pushed further.
You can do significantly higher resolutions with various tricks like tiled diffusion, which is also a memory efficiency hack. (The stable-diffusion-webui tiled diffusion extension uses 2560×1280 direct [no upscale step] generation with an SD 1.5-based model as one of its examples.)