Introducing NovelAI Diffusion V4 Full

6 min readFeb 28, 2025

The next generation Anime & Furry AI image generation model

Today, we’re announcing our brand new state of the art AI image model, NovelAI Diffusion V4!

NOTICE:
To enjoy our latest image generation model, NovelAI Diffusion V4 Full, you are required to switch your subscription to our new payment processor.
Rest assured, you will not be charged right now and your billing schedule will remain unchanged. Your next payment will occur on your original subscription renewal date
Users who do not manually update their billing info to switch will eventually be switched to our new payment processor at a later date automatically. However, for users who manually update their billing information now, we will credit bonus Anlas as a Thank You.
You will be awarded 5000 Anlas if you use a credit card, and 2000 Anlas if you use PayPal.
Update your billing information here.

Since the release of V4 Curated at the end of 2024, our team has been working hard to get the full model ready, and here it is!

NovelAI Diffusion V4 is our most advanced image generation model yet. We’ve enhanced this model to generate higher quality images with greater precision, offering users even finer control over their creations. V4 not only includes a more comprehensive, albeit less curated, dataset, but the data was also updated by one month compared to V4 Curated.

To accomplish this, we built V4 from the ground up and added several brand new key features.

This model is designed to fully support our new features while delivering significant improvements in both fidelity and details and has been trained with roughly 230,000 hours of H100 compute.

Even though V4 uses more computing power than V3, NovelAI Diffusion V4 is more powerful and still runs roughly as fast as V3, even though it uses more computing power overall.

Let’s review the new features that make V4 such a significant improvement:

Natural Language Prompting

Natural language prompting is a new feature we’re particularly excited about. We made it a priority to support natural language as a first-class feature, allowing users to describe entire scenes in plain English. For our experienced users, this opens up new possibilities by allowing them to combine regular text descriptions with tags to achieve effects that weren’t possible before with the greatly enhanced text understanding, we had to extend the prompt context size to 512 tokens and replaced our CLIP text encoder with the T5 text encoder, providing you all the space you need to elaborate on your vision which results in stronger prompt adherence.

Multi-Character Prompting

We’ve also added multi-character prompting, addressing a major limitation of V3 and other models. V4 lets you specify prompts for each character separately, supporting up to 6 distinct characters in a single image, without any unwanted blending between them.

Save the same characters in an image, then pop them into a new prompt via drag and drop.
Get the same consistent visuals, easy!

Character Positioning

When prompting for multiple characters, you can use our new character positioning feature for even more control over the image composition. Previous models would normally place characters automatically, but you can now override placement and choose where you want each character to be positioned. This can help the AI follow your intended composition much easier!

Action Tags

That’s not all for multiple characters, Action Tags bring you a new level of control for character interactions. They allow you to specify which character performs what action, and to which character — giving you precise control over character relationships in your scenes.

Here’s how to use them, by adding action tags to your character prompt:

source#(active side): Specifies the character initiating an action

Example: source#hug → The character performs the hugging action

target#(passive side): Specifies the character receiving an action

Example: target#hug → The character gets hugged

mutual#(mutual action): Used when characters perform the same action together

Example: mutual#hug → The specified characters hug each other

Focused Inpainting

Focused Inpainting is a new feature coming to all our image generation models. A new button next to the brush and eraser tools allows you to select a region of your image with a rectangular selection. Then you can either paint your inpainting mask as usual inside this selection box or leave it empty, in which case the whole content of the box will be inpainted. The important bit is that, before inpainting, this section gets upscaled to approximately one megapixel (the size of the selection is limited to a smaller size). This allows the inpainting process to add more details at a higher resolution. Additionally, if you are an Opus subscriber, it also lets you inpaint regions on large images at zero Anlas cost. The slider, positioned where the brush size slider is for the regular inpainting brush, allows you to adjust the “context” around the selected area, which will not get inpainted and allows the model to see the surroundings of your selected region. This context region is marked with the translucent, red border inside the selection box.

This may sound a little complicated, but in practice, it is quite easy to use.
If you would like to improve the face details of a character in your image, you just press the selection tool button and put the selection rectangle around the character’s face. Generate, and the level of detail should improve.

Improved Image Clarity

Image fidelity has seen substantial improvements. Besides our new model design, we switched from SDXL’s VAE to Flux VAE, which makes images clearer. We’ve also given more computing power to the parts of the U-Net that focus on fine details making your images even sharper.

Text Rendering

Another major addition to V4 is text rendering through our improved U-Net architecture and switching to a T5 text encoder.Generate images with text in speech bubbles, or anywhere else you want. You can also use natural language to specify style and placement. While this feature is not perfect yet, it is already very useful, and we aim to improve it in future releases.

Updated Quality Tags & Undesired Content Presets

Due to some updates made to the dataset, we have also updated the default Quality Tags and the Undesired Content presets for the V4 Full model. The main thing of note is the new “no text” tag that is included in the default Quality Tags. This tag works to keep generations clean of most undesired kinds of text while generally not affecting text that you prompt for. In some cases, it can suppress desired outputs like sound effect text elements. If you want to enjoy a nice smattering of “sound effects” in your image, you can turn off the Quality Tags option and instead manually add “, best quality, very aesthetic, absurdres” to the end of your base prompt.

Built-in Furry Model

For those that are a fan of furry and kemono art styles, you can now begin your prompts with “fur dataset,” as a tag to then prompt away as you used to with our previous Furry models!

Excluded at launch
Vibe Transfer is expected to be added in the very near future.

These features mark a major leap in AI image generation. NovelAI offers better quality, smarter understanding, and unmatched control, making it easier than ever to bring your ideas to life.

Login now to unlock the power of NovelAI Diffusion V4 — your next masterpiece is just a prompt away!

Introducing NovelAI Diffusion V4 Full

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Anlatan

No responses yet