The Magic behind NovelAIDiffusion

4 min readOct 19, 2022

We’ve been quite busy watching NovelAI spread across social media, and we’re honestly surprised by the interesting theories people have come up with as to how NovelAIDiffusion works. No, we do not feed the enslaved AI in the basement 1bite a day for its hard work…

The magic powering our Diffusion models is challenging to explain, so let’s simplify it for this post.

First off, the AI doesn’t have a database of images.
It has no image files to pull from or piece together.
Our AI is not a human-written algorithm that copies and mixes already existing images.
It does not photo bash and does not work like a collage, it does not match up existing visual data.

With the help of a deep learning algorithm, our artificial intelligence generates original images from scratch.
The AI has essentially learned how to create images — just like a person would. Our artificial intelligence generates original images from scratch, using those deep learning algorithms by studying and building its knowledge into a latent mathematical space.

Imagine the AI knowledge in a web of latent mathematical space

The AI compresses its knowledge by generalizing common styles and subjects into an abstract way of understanding. Once prompted by text, the Diffusion model then takes the results of navigation through its knowledge and forms an image from nothing but noise over steps and time.

Img2Img or the Upload Image Function

While the model cannot refer to images within its training data, you can give it your own art as a reference that it should modify according to your prompt.

This is called “Image to Image” and can be a powerful tool for artists to gain inspiration and see how their works could turn out when taken into different directions. When passing the model an image, you still provide it with a prompt, directing the generation and also specify how strongly the model may modify your image.

Here is an example:

Combine and uploaded image with a text prompt and adjust the slider to give the AI freedom to change the initial image.

Model Details

Our model is based on the original Stable Diffusion model, which was trained on about two billion images from the LAION dataset (~150TB).
The finetuning dataset we used consisted of about 5.3 million images (~6TB) with very detailed text tagging data.

**Facts:** The model is a 1.6GB binary file with no access to any of its training data.

We found that tag-based labeling allowed for much more versatile control over outputs. NovelAI Diffusion builds upon the anime knowledge already existing within StableDiffusion.

During the finetuning process, these images and corresponding tags are shown to the model to let it learn. The model itself is small, just roughly 1.6GB and can produce images completely on its own without referring to any outside data. This size does not change during the whole training process.

We trained our models over the course of three months. Over many training checkpoints, we were able to see how the AI was learning, correct issues, and make adjustments.

To train these models, we used compute nodes with 8x A100 80GB SXM4 cards linked together via NVSwitch, and 1TB of system ram. Over the course of the past few months, we utilized several of these nodes to research and develop the models, eventually refining them to what they are now.

We hope this gives an insight into things and clears up some of the mysteries and misunderstandings surrounding AI Image Generation.

Feel free to let us know if you still have questions, and we might even force one of our developers to create something more technical!

Please use our tool responsibly,

The NovelAI team.

Japanese Version:

NovelAIDiffusionの仕組みの魔法について話しましょう。弊社はNovelAIがSNSなどで広がるのを忙しくご視聴させていただいており、弊社のAI画像作成ツールがどのように機能するかについて、皆さまが思いついた面白い考察に正直に驚いています。
例えば。。。弊社は地下室で奴隷にされたAIに毎日1バイトの食べ物しか与えたりはしていません。

https://twitter.com/rim4746/status/1579707147331723266

拡散(Diffusion)モデルを強化する「魔法」は説明が難しいため、この投稿では単純化しましょう。

始めに、AIには画像のデータベースがありません。
データベースから引いたり、つなぎ合わせたりする画像ファイルはありません。弊社のAIは既存の画像をコピーして混合する人間が作成したアルゴリズムではありません。フォトバッシュもしません。既存のビジュアルデータをコラージュしたり、一致するために組み合わせたりしていません。

ディープラーニングアルゴリズムの助けを借りて、弊社の人工知能がオリジナルの画像をゼロから作成します。AIは基本的に、人間と同じように画像を作成する方法を学習しています。

弊社の人工知能は、その知識を研究して潜在的な数学的空間に構築することにより、これらのディープラーニングアルゴリズムを使用して元の画像をゼロから生成します。

AIは、よくあるスタイルと主題を抽象的な理解方法に一般化することで、知識を圧縮します。テキストによってプロンプトされると拡散モデルはその知識を通じてナビゲーションの結果を取得し、処理手順と時間によるノイズだけから画像を形成します。

Img2Imgまたは画像アップロード(Upload Image)機能：

モデルはトレーニングデータ内の画像を参照できませんが、プロンプトに従って変更する参考として独自のイラストをモデルに与えることができます。

これは「Image to Image | 画像から画像へ」と呼ばれ、絵師が自分の作品を様々な方向で試したらどのようになるかを確認しインスピレーションを得るための強力なツールとなります。

モデルに画像を通すときに、生成を指示し、モデルが画像をどの程度変更できるかを指示するプロンプトを提供する必要があります。

Here is an example:

モデルの詳細：

弊社のモデルは、LAIONデータセット（約150TB)からの約20億の画像でトレーニングされたオリジナルのStable Diffusion モデルに基づいています。弊社が使用した微調整データセットは、非常に詳細なテキストタグデータを含む約530万枚の画像（約6TB)で構成されていました。

タグベースのラベル付けにより、出力をより柔軟に制御できることが分かりました。NovelAI Diffusionは#StableDiffusion内に既に存在するアニメの知識に基づいています。

微調整プロセス：画像と対応するタグがモデルが学習できるように表示します。モデル自体は約1.6GBの小ささで、外部データを参照することなく完全に独自に画像を生成します。このサイズは、全てのトレーニングプロセス中ずっと変わりません。

弊社は3ヶ月にわたってモデルをトレーニングしました。多くのトレーニングチェックポイントでAIがどのように学習し、問題を修正し、調整を行っているかを確認することができました。

これらのモデルをトレーニングするために、NVSwitchを介してリンクされた8x A100 80GB SXM4カードと1TBのシステムRAMを備えた計算ノードを使用しました。過去数ヶ月にわたって、これらのノードのいくつかを利用してモデルを研究および開発し、現在のモデルに改良しました。

これによりAI画像生成に関する洞察が得られ、謎や誤解が解消されることを願っています。
ご不明な点がございましたら、お気軽にお問い合わせください。開発者の一人に、より技術的なものを作成するよう強制するかもしれません。

責任を持ってツールをご利用ください。
NovelAIチームより

The Magic behind NovelAIDiffusion

Img2Img or the Upload Image Function

Model Details

Japanese Version:

Img2Imgまたは画像アップロード(Upload Image)機能：

モデルの詳細：

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Anlatan

Responses (3)