NovelAI Aspect Ratio Bucketing Source Code Release (MIT Licensed)

Anlatan
2 min readNov 2, 2022

Training with aspect ratio bucketing can greatly improve the quality of outputs of Image Generations (and we personally don’t want another base model trained with center crops), so we have decided to release the NovelAI bucketing code under a permissive MIT license.

https://github.com/NovelAI/novelai-aspect-ratio-bucketing

The repository provides an implementation of aspect ratio bucketing for training generative image models as described in our previous blogpost:

Aspect Ratio Bucketing

One common issue of existing image generation models is that they are very prone to producing images with unnatural crops. This is due to the fact that these models are trained to produce square images. However, most photos and artworks are not square. However, the model can only work on images of the same size at the same time, and during training, it is common practice to operate on multiple training samples at once to optimize the efficiency of the GPUs used. As a compromise, square images are chosen, and during training, only the center of each image is cropped out and then shown to the image generation model as a training example.

Knight wearing a crown with darkened regions removed by the center crop

For example, humans are often generated without feet or heads, and swords consist of only a blade with a hilt and point outside the frame.
As we are creating an image generation model to accompany our storytelling experience, it is important that our model is able to produce proper, uncropped characters, and generated knights should not be holding a metallic-looking straight line extending to infinity.

Another issue with training on cropped images is that it can lead to a mismatch between the text and the image.

For example, an image with a `crown` tag will often no longer contain a crown after a center crop is applied and the monarch has been, thereby, decapitated.

We found that using random crops instead of center crops only slightly improves these issues.

Using Stable Diffusion with variable image sizes is possible, although it can be noticed that going too far beyond the native resolution of 512x512 tends to introduce repeated image elements, and very low resolutions produce indiscernible images.

Still, this indicated to us that training the model on variable sized images should be possible. Training on single, variable sized samples would be trivial, but also extremely slow and more liable to training instability due to the lack of regularization provided by the use of mini batches.

We hope to see many non-cropped images in the future!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Anlatan

novelai.net Driven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around. Anything goes!

Responses (1)

Write a response

oh but no anies just yer, because it didn't have my subscription

--