Little Coders Hub
Posts
The new Multiverse: Mixtral with an x

The new Multiverse: Mixtral with an x

Mixtral rules, truly!!!

1littlecoder
December 15, 2023

❝

As we get closer and closer to SuperIntelligence Everybody involved gets more stressed and more anxious and we realize the stakes are higher and higher.

Sam Altman

👋🏾 Hello Little Coders!

This week has been insane - so far and it truly is. Flurry of open model releases, new startups and a lot more happening! 👇🏾

Rumour or no???

If this GPT 4.5 screenshot is true, then the costs are absurd. Perhaps, OpenAI has got enough trust in their models or they know for sure there is no one else who could provide such a high quality model (👀 looking at you Google)!

Perhaps, they actually launch the model after I sleep or someone’s just feeding the rumour mill!

Impressive Mixtral + Mistral

If you are like me, then you’d be confused with the x in Mixtral - because this is a MiXture of eXperts, I guess they have decided to name it Mixtral (from the company Mistral AI)

Mixtral 8×7B MoE - 1, Base Model 2, Instruct Model

In fact, It’s so impressive in my tests that I started recommending it to everyone! If you haven’t seen my model testing, here’s the video.

Side Note: It’s not just Mixtral that impressed, Zephyr 7B Beta was also super impressive!

Quantized models have started showing up, Thanks to TheBloke but I’m personally not sure about the quality of them, yet! If you happen to test it, Drop me a message and educate me 🙂

Meanwhile Mixtral derivatives have started showing up:

Dolphin Mixtral (without benchmarks but with some solid datasets in the model fine-tuning)
Disco Mixtral with some nice benchmark scores
Mixtral SlimOrca

What does it take to fine-tune these? Dolphin’s creator wrote:

“It took 3 days to train 1.5 epochs on 4x A100s using qLoRA and Axolotl”

More and More New Models:

Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!

The way the model was built was new and innovative:

Built on the Llama2 architecture, SOLAR-10.7B incorporates the innovative Upstage Depth Up-Scaling. We then integrated Mistral 7B weights into the upscaled layers, and finally, continued pre-training for the entire model.

It claims to surpass the recent Mixtral 8X7B model, but I haven’t tested it yet.

Microsoft Phi 2

Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value).

They have shown some amazing benchmark scores but I’m not yet ready to believe them 🙂

Note: This model is allowed only for research purposes!

Stability AI Subscriptions

As ironic it sounds, Stability AI has decided not release their recent models with commerical license and launched a new subscription / membership model if you were to use the model commercially.

Am I fan? Nope!

Do I want them to make money? Absolutely Yes!

Can they strike a balance between $$$ and Open+Commercial? I hope so!

APIs and Pricing war!

Gemini Pro API and Mistral API

But the hot news here is the price cuts, which you should take advantage of!

If you haven’t seen my recent rant on it, here’s the video.

This newsletter already sounds like an essay. Perhaps, I’ll stop here and see you in the next one, Sunday or Monday - with a lot more links!

Meanwhile, OpenAI has got SuperAlignment Grants - Make money 🤑

What are you reading this week? Let me know!

Have feedback or interesting project, Hit reply 🙂