A New Model — Clio is Coming to Opus!

6 min readMay 23, 2023

Finally, the time is upon us!
Our first model, trained from scratch on our own custom datasets totaling more than 1.5 trillion tokens (~6TB of text), is live for Opus subscribers right now. Clio, our new model, has only 3 billion parameters, but she knows how to use them, performing well above Krake’s level. The small size also means that Clio generations are blazing fast. Oh, and she offers a context size of 8192 tokens as well, which is 4x as much as our previous models!

**Fresh out of the Shoggy (H100 Supercompute Cluster) oven:** Clio is the muse of history, the proclaimer and celebrator of great deeds and accomplishments! She packs a powerful punch above her size, like all NovelAI models!

Originally Clio was just intended to be our first proof of concept model. A small model allows us to easily experiment and figure out all the little issues that invariably pop up in any large endeavor such as creating a large language model from scratch without costing us too much time and compute. But Clio surprised us by performing even better than we expected, hitting 73% LAMBADA accuracy during the pretraining phase and even went above 74% during the finetuning process. Due to the improved tokenizer, Clio is also quite proficient at Japanese, and that despite the fact that the amount of Japanese text in the training data was very small.

We decided to let our Opus subscribers play with Clio first as an Experimental Feature release first while we iron out any last wrinkles.
In two weeks, we plan to release Clio for all subscription tiers.

Having completed the proof of concept model successfully, we will begin scaling up to larger sized LLM models.

Now let’s get down to details:

For those who are interested in more technical details, we have prepared some evaluation results.

First, here is a table showing how our pretrained base model, NovelAI-LM-3B-508k, performs with regards to the most commonly used evaluation metrics in comparison to other models of similar and bigger sizes. As can be seen, it outperforms almost all free and open source models up to and including GPT-NeoX 20B. However, LLaMa-7B which was pretrained on a similar amount of tokens and is more than twice the size still outperforms it.

For reference, we also provide the evaluation results of the GPT-3 curie and Davinci API models as determined by EleutherAI.

NovelAI LM 3B Comparison Sheet — Zero shot evaluations of NovelAI-LM-3B-508k compared to other models. (Higher is better.)

In addition to these evaluation results, we also continuously monitored the model’s LAMBADA accuracy during training. You can see the results on the two figures below. We trained the model at a batch size of 2.85M tokens. As can be seen from these plots, the model had already achieved a LAMBADA accuracy of 67.9% after 285B tokens seen (step 100k), which is both higher and earlier than any other 2.7B to 3B model. At 855B tokens seen (step 300k), it reached 70.66% accuracy, outmatching the RedPajama 7B model’s accuracy at 800B tokens seen while being less than half its size.

These plots also show one of the difficulties we encountered while training. Since the hardware is still very new, we had to deal with some instabilities during our training process, where the training process crashed or got stuck. When this happened, we had to resume the run from the last checkpoint. Due to this, these evaluation charts show multiple runs in different colors.

LAMBADA OpenAI accuracy (Higher is better.)

LAMBADA OpenAI perplexity (Lower is better.)

We will be shipping Clio with a variety of Default Preset choices:
Please keep in mind that as a proof of concept model, Clio’s performance is aimed at storytelling. She is also adept at Text Adventure, Chat, Lists and other formats with some preset choices/adjustments. Check out our Default Preset descriptions and try them out depending on what you need!

Official Modules V2 for Euterpe and Sigurd

To tide over our non-Opus subscribers’ hunger for text generation news until Clio comes to all tiers in two weeks, we are also updating all existing default modules and releasing an extended set of official modules for our Euterpe and Sigurd models— and they are powered up too, based on our own internally developed modules V2 technology.

A number of new modules are joining our previous lineup. Make sure to check out the full list of changes below.

Last, but by far not least, we are finally releasing our completely revamped version of the Text Adventure module, which works significantly better than the previous version.

We would also like to take this opportunity to talk a little about the delay regarding the release of modules V2. While developing and training our official modules, we have noticed that this type of modules is not only more powerful than modules V1, but also requires a fair bit more babysitting and adjustments to get truly good results. Our Krake model has unfortunately proven himself to be quite stubborn in refusing to work with our new technology. These are also the reasons why custom user modules V2 are not available. They would require a lot of adjusting of training parameters and most likely multiple training runs to finally end up with a module that works well, which would end up both a frustrating and expensive experience for our users.

Module notes:

• Fixed the Text Adventure module.

• Added several shortcuts to Text Adventure, such as “> l” for “> You look around”, “> x (target)” for “> You examine (target)”, etc.

• Overhauled module preambles, improving the overall efficiency.

• Renamed “Theme: Airships” to “Theme: Steampunk”, and balanced it more towards the latter.

• Balanced contents for all modules.

All default modules for Sigurd and Euterpe have been updated to V2.

The full set of newly added modules is:

• General: Prose Augmenter (alternative general module which focuses on verbose output, for users who are struggling with short output and scenes that are over too quickly)

• Style: Jane Austen (the first female author module)

• Style: William Shakespeare (outputs script style)

• Theme: Ancient China

• Theme: Ancient Greece

• Theme: Ancient India

• Theme: Anthropomorphic Animals

• Theme: Contemporary (no-fantasy modern module, for slice of life etc.)

• Theme: Gaming (video game culture)

• Theme: Golden Age Scifi (1940s-1960s low tech scifi adventures)

• Theme: Hard SF (scifi that focuses on realism)

• Theme: Light Novels (ultimate light novel crossover module)

• Theme: Music (bands, touring, etc.)

• Theme: Nature (the great outdoors, hiking, mountaineering, etc. — usually solitary, low dialogue to differentiate this from Theme: Travel)

• Theme: Noir (film noir in prose)

• Theme: Weird West (western/splatterpunk with fantasy elements)

Thank you for your Support

On behalf of the entire NovelAI team, thank you guys so much for staying patient and keeping us on our toes. Without your support, none of this would be possible!

We look forward to hearing your feedback on Clio, her punch performance (for her small size), presets and general impressions!

We’ve only got even more cool things coming this year, and we hope you’re as excited as we are.

Japanese Clio Release Translation:
Clioに関するお知らせ:

ニューモデル『Clioクリオ』登場！

ついにこの時がやってきました！

私達のカスタムデータセットを元に、初のオリジナルAIモデル『Clioクリオ』が登場します！

このモデルは1.5兆トークン＝約６テラバイト分の文章や物語からなるカスタムデータセットでトレーニングしており、Opusプランをサブスク中の皆様は今からお使いいただけるのです！

さて、新モデルのClioは30億のパラメーターしかありませんが、少ないパラメーターでも効率的に既存モデルのKrakeクレーク以上の性能を振るいます。

小サイズのモデルという事は処理が軽いという事を意味しており、実際、生成速度は爆速。更に、8192トークンという既存モデルの4倍のコンテクストサイズを提供する事が出来ます！

もともとClioは、私たちの最初の試験的モデルになることを意図していました。

小規模のモデル開発であれば、将来的に大規模言語モデルをゼロから作成するような大きな試みで必ず出てくる小さな問題を、時間や計算コストをかけずに簡単に実験して解決することができると考えたのです。

しかし、Clioは私たちの予想を上回るパフォーマンスを発揮しました。事前学習段階で73%のLAMBADA精度を達成し、ファインチューニング過程では74%を超えるまでになったのです。トークナイザーの改良で少ない学習データを効率的に活用することが可能となり、学習データ内の日本語テキストの量は非常に少なかったにもかかわらず、Clioは日本語の扱いも得意としています。

今回私達は、Opusのサブスクライバーに、まず実験的な機能リリースとしてClioを遊んでもらい、そして2週間後には、すべてのサブスクリプション層向けにClioをリリースする予定です。

この試験的モデルのテストが成功に終われば、次は大規模言語モデルの開発に取り組むつもりでいます

私たちのH100スーパーコンピュータークラスター・Shoggyショギ―から産まれたてホヤホヤなClioはミューズであり、偉大な行為の達成を宣言し祝う女神です！

他の NovelAI モデルと同様、彼女はそのサイズ以上のパワフルなパンチを備えています！

詳細の紹介：
より技術的な詳細に興味がある方のために、いくつかの評価結果を用意しました。

まず、私達の事前学習済みベースモデルNovelAI-LM-3B-508kが、よく使われる評価指標に関して、同程度、もしくはそれ以上のサイズの他モデルのパフォーマンスと比較した表を紹介いたします。ご覧の通り、GPT-NeoX 20Bまでのほとんどすべてのフリーおよびオープンソースモデルを凌駕しています。

しかし、同程度の量のトークンで事前学習され、サイズが2倍以上であるLLaMa-7Bは、依然としてNovelAI-LM-3B-508kを凌駕しています。

参考までに、EleutherAIが決定したGPT-3Curie（GPT３キュリー）とDavinciのAPIモデルの評価結果も掲載しています。

NovelAI-LM-3B-508kのゼロショット評価を他機種と比較したものです。(高いほど良い)

また、これらの評価結果に加え、学習中のモデルのLAMBADA精度を継続的にモニタリングしています。その結果は、以下の2つの図に示されています。2.85万トークンのバッチサイズでこのモデルを学習させました。

このグラフは、285Bトークンを見られた時点（10万ステップ）で、すでにLAMBADA精度67.9%を達成している事を示しており、これは他のどの2.7B～3Bモデルよりも高く、早い段階です。

また、855Bトークンが見られた時点（30万ステップ）で、70.66%の精度を達成し、RedPajama 7Bモデルの半分以下のサイズながら、800Bトークンが見られた時点の精度を上回っています。

しかし、このグラフは、AIのトレーニングに際しての問題も示しています。

この折れ線グラフの色の違いに注目してください。これは不安定なトレーニング環境に私達が対処した証拠です。

私達が使っているハードウェアがまだ非常に新しいため、トレーニングプロセス中に、トレーニングプロセスがクラッシュしたり、スタックしたりする事を何度も経験しました。そういった問題が起きた場合、最後のチェックポイントから実行を再開する必要がありました。この評価チャートでは、複数の実行を異なる色で表示しています。