Mistral Ai's Paper Re: Mixtral of Experts

Jan 19, 2024

The new developments coming out of the AI space remind me of the early tech days (I was a kid but I remember it via my access to bulletin boards and PC magazine as a middle / highschooler.

Mistral is doing some innovative things, not to mention they are doing it open source which is contrary to other big players in the AI space (more to come re: Meta/Facebook) that is effectively being subsumed into big tech. I think this will prove to be an Achilles heel in the medium term for big tech related et. al., AI business—more on that in later posts.

Source: https://arxiv.org/pdf/2401.04088.pdf

So what is Mistral.ai and did I write a typo when I said Mixtral?

No, Mistral.ai is the OpenAI challenger based in Paris that is making some cutting edge Large Language Models (LLMs) including Mixtral 8X7B which is a part of a relatively newer subset of LLMs that is a Model of Experts (MoE) which train on specialized data and, in theory, can function at the same level of the granddaddy sized OpenAI models but are smaller in size and more efficient—so the theory goes.

Why does all this matter?

what is coming is going to be an era of localized LLMs that live on your device and do not need to be powered by mega cloud computers.

I think OpenAI is yesterday’s news (before you bite my headoff I only mean this in the sense that there is a broader ecosystem of open source focused computer scientists, technologists, entrepreneurs and academics globally) because what is coming is going to be an era of localized LLMs that live on your device and do not need to be powered by mega cloud computers.

To be clear there will still be a place for large resource heavy LLMs just like we still use super computers and quantum computers of the future that they “say” will crack all codes. However, for many everyday uses, normal humans will not need computing power that takes up data centers in Colorado.

I think where we are going is that these super computers will be able to operate on personal computers or smartphones and augment our daily lives drastically.

Figure 1: Mixture of Experts Layer. Each input vector is assigned to 2 of the 8 experts by a router.

What intrigues me about the Mixtral model, as discussed in the paper, is that the MoE structure allows for smaller LLMs that can be trained on specialized data sets. I think lightweight, specialized MoE models that can run locally is the future.

Given the speed of advancement I am most exciting about a version of the future where we are all running or able to run lightweight specialized models on smartphones, smart watches, IoT devices, etc. and larger resource intensive models with expensive processing and GPU power will be run via cloud infrastructure via data centers (e.g., biotech or big pharma doing drug discovery research will use the data center level compute power).

Are we living through the dinosaur moment for the legal profession?

As a lawyer by training, I have been very interested over the years regarding the intersection of technology and the law. In my opinion, we are living during a time of change not seen by the world since some mix of the period of the industrial revolution and the era of 1919 to 1945 that brought us a total re-ordering of the world map. As such, I keep asking myself are we living in the dinosaur moment for lawyers?

What has pushed me over the years to pay very close attention to are the developments leading to the creation of digital assets starting with Bitcoin and now the breakthroughs kicked off by GPT 3.5 in 2022 leading to great leaps in artificial intelligence. I can’t help but think that this must have been what Oppenheimer, Einstein and their contemporaries felt like during the early 20th century with the breakthroughs in physics and math of that era.

Of course the eras are very different but in many ways the impact of a new technology on the world and optimism and pessimism on both sides of these developments does seem to rhyme with our times.

Now back to my broader musings of “what does this mean for lawyers?”

Well I think it means “we aren’t in Kansas anymore Dorothy” to quote the Wizard of Oz. As a late Gen X’er / early Millennial all I can recall from a professional standpoint is basically—chaos; starting with 9/11 as a sophomore in college, followed by the Great Financial Crisis upon graduation from law school in 2008, sprinkle in living abroad during a global lock down for good measure. Given all this, I am primed to be aware of drastic change and I think there are major changes ahead for how the business of the law is conducted.

What am I doing about it?

Well, like most things in life, I believe in embracing change and not fighting the inevitable and to do this I have decided to play around with these open source models in a more serious way with the goal of building a product that can take advantage of the shifts in technology that will allow new winners to rise and may take down a few giants in the process. In this regard, I will be chronicling my research, findings and learning in the hope that some of you all out there may find it useful to read.

“ Corey, we need a clone of you.”

What I am most curious to discover is something that a teammate said to me a few years ago in the context of me handling a number of pressing matters. They said, “Corey, we need a clone of you.” Maybe, just maybe, I will find a way to use AI to clone my 15 years of legal and management experience and create a simple and easy interface to turn what once seemed like science fiction into reality.

My hope is that by writing and sharing my experiences some of you will reach out and say hello as fellow technologists, executives, founders and curious people so that we can connect and share ideas, contacts and maybe even build together.

Digital Corey

Discussion about this post