Inference Update: Llama 3 Erato Release Window, New Text Gen Samplers, and Goodbye CFG

2 min readSep 19, 2024

Readying our compute clusters to bring you the long-awaited next text generation model!

We’ve finally received our new inference hardware! As part of this process, we’re currently migrating our operations to a brand new compute cluster. You may have noticed some speed upgrades already, but this change will improve server and network stability, as well.

Since everything is finally coming together, it is time to announce the upcoming release schedule for our coming 70 billion parameter text generation model, Llama 3 Erato.

Built with Meta Llama 3: Erato
In order to add our special sauce, we continued pre-training the Llama 3 70B base model for hundreds of billions of tokens of training data, spending more compute power than even our previous text generation model, Kayra. As always, we finetuned it on our high quality literature dataset, making it our most powerful storytelling model yet.

Llama 3 Erato will be released for Opus users next week, so get ready for the release, the wait is almost over!

Until then, we are busy migrating to the new cluster, and switching our text generation models, Kayra and Clio, to a new inference stack, which serve these unquantized models more efficiently. However, this stack does not play well with our Classifier Free Guidance (CFG) sampler, so we will need to say goodbye to CFG sampling.

To make up for this, we are releasing two new samplers, which will also be supported for Erato: Min P and Unified Sampling:

The popular Min P sampler sets a simple threshold at a proportion of the top token’s probability, and prunes any tokens below it.

Unified Sampling is designed to replace your entire sampling chain, so you can use it alone without any other samplers. It’s based on new research, so the quality should be a minor improvement over your existing presets, while being much simpler.

To use it as intended, navigate to the Change Settings Order button, enable Unified and Randomness, and disable all the others. Then set Randomness to 1. Unified has three parameters: Linear, Quad, and Conf. Increasing Linear will prioritize high-probability tokens. Increasing Quad will delete the lowest-probability tokens. Increasing Conf will prioritize the highest-probability tokens, but only when the highest probability token is small.

To learn more, head on over to our docs.

We’re excited to see the new Presets you create and to hear how they perform on your stories.

See you next week, when you get to meet our new addition to our text generation model roster!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Try for $5/month

Artificial Intelligence

Llm

Large Language Models

Written by Anlatan

3.8K Followers

2 Following

novelai.net Driven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around. Anything goes!

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

More from Anlatan

Anlatan

NovelAI Diffusion V4 Full版、正式公開しました！

本日、私たちの新しい画像生成モデル「NovelAI Diffusion V4」をご紹介します！

Feb 28

Anlatan

Introducing NovelAI Diffusion V4 Full

The next generation Anime & Furry AI image generation model

Feb 28

Release — NovelAI Anime Diffusion V4 Curated Preview (EN)

Anlatan

Release — NovelAI Anime Diffusion V4 Curated Preview (EN)

After showing off some early results from our NovelAI V4 model, we have decided to get it into your hands as soon as possible. We’re very…

Dec 21, 2024

NovelAI Anime Diffusion V4 Curated Previewのご紹介

Anlatan

NovelAI Anime Diffusion V4 Curated Previewのご紹介

NovelAI V4モデルの初期成果をお見せした後、できるだけ早く皆様の手元にお届けしたいと考えました。この度、NovelAI Anime Diffusion V4 — Curated Previewのリリースを発表できることを大変喜ばしく思います。

Dec 21, 2024

See all from Anlatan

Recommended from Medium

Vipra Singh

AI Agents: Introduction (Part-1)

Discover AI agents, their design, and real-world applications.

Feb 2

Building a 2 Billion Parameter LLM from Scratch Using Python

Level Up Coding

Fareed Khan

Building a 2 Billion Parameter LLM from Scratch Using Python

It starts making sense

Jan 15

Lists

Natural Language Processing

1977 stories1620 saves

AI Regulation

6 stories708 saves

Generative AI Recommended Reading

52 stories1691 saves

What is ChatGPT?

9 stories521 saves

15 AI Agent Business Ideas to Get Rich in 2025

Everyday AI

Manpreet Singh

15 AI Agent Business Ideas to Get Rich in 2025

Feb 6

Document Visual Question Answering: Fine-Tuning Microsoft Florence-2

AI Advances

Ranjeet Tiwari | Senior Architect - AI | IITJ

Document Visual Question Answering: Fine-Tuning Microsoft Florence-2

Imagine you could ask an AI to read a document image and answer your questions about it — like having a smart assistant that can see and…

4d ago

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

The Constellar Digital&Technology Blog

Nedved Yang

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

In this Geek Out Time, we’ll explore knowledge distillation in TensorFlow, a technique that allows a smaller student model to learn from a…

Jan 31

Exploring Mercury, the First Commercial-Scale Diffusion Large Language Model

Jenray

Exploring Mercury, the First Commercial-Scale Diffusion Large Language Model

Mercury, is making waves as the first commercial-scale dLLM, promising to revolutionize text generation with its speed and efficiency.

5d ago

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams