Beyond Meta: Large Concept Models Will Win

~ Meta’s new Ai is a huge leap for safety and explainability… and here’s the next one ~

TL:DR — Meta’s Ai Research team released their newest Ai innovation: a brain that thinks in concepts instead of just words. Those concepts were hard-wired into the network, from a corpus called SONAR. And, with a simple tweak, the Meta model can become manifold more powerful, developing its own hierarchies of concepts. The trick: send a message into the Concept-Encoder, then back out again, decoded… and repeat, repeatedly. If that “translation of a translation of a translation” begins to DRIFT away from the original encoded Concept-Vector, then we know that the concept is ‘unstable’. This is analogous to “self-consistency loss” in video-Ai. Train the network on that rate of concept-drift as its loss-function, to let the network learn its OWN concepts. This will give us ‘explainability’ — we will know the ideas inside the network. And that explainability will give us Ai-safety, because we can detect when they are malicious. Welcome to a new world!

Looking Back at Now

Currently, publicly-available Ai has various problems. The issue worrying many bright minds: “Will the Ai try to destroy us? Or will it screw-up our intentions, doing something catastrophically wrong?” That is the “Ai-alignment” problem. And, when we try to understand WHY the Ai does what it does, we have no simple technique to elucidate its intentions and thoughts. That is the “explainability” problem — and it’s hard to be sure that an Ai is safe or sane, if you cannot explain why it behaves a certain way!

Meta just defeated BOTH of those issues, in their paper “Large Concept Models”, here. They strapped a concept-thesaurus called SONAR onto a pair of networks, training the brain to take word-token-strings and convert them into concept-vectors. (With lots of math, of course!) When they asked the network to spit the concept back out, as words, it did. The network learned which words in combination form entire concepts.

Now, a particular concept can be represented in words via many different combinations — that ONE concept-vector can be decoded in many ways. Each sentence is referring to the same idea. Or, at least, they should be pretty close.

That’s where the next leg of innovation can step-in! We are going beyond Meta…

Take a sentence, such as “The quick brown fox jumped over the lazy dog” and send it into the Concept-Encoder. That entire sentence will be represented as a high-dimensional vector. When sent back out, through the decoder, that vector becomes a sentence again, such as “The lazy dog continued to sleep while the brown fox jumped over him quickly.” Basically the same idea.

Now, send that second version through, again. “The fox jumped past a sleeping dog.” Oh, hmmm — the output is starting to “drift”. This happens when you translate a translation of a translation… try it on Google Translate! We can identify when that drift happens, because the sentence will be encoded as a slightly different concept-vector. The concept-encoding drifts, whenever the network does NOT have a clear idea of that concept.

Drift = Incoherence

When a Large Concept Model is given the task of “translating the translation of the translation” then any drift of that concept-vector indicates that the concept is ‘blurry’ and more clarity is needed. We can use that rate of drift as the ‘loss’ on the network, sending a reward signal to encourage the network to form concepts which stay consistent.

By relying on that drift-loss, we can throw-away Meta’s original SONAR dataset! That SONAR concept-map was a single layer of abstraction, formed separately before training the network. And SONAR was not trained — it was kept constant, no NEW concepts learned. Instead, we can use drift-loss to train many layers of abstraction, from words to phrases to sentences and paragraphs, whole essays! And, those concepts are NOT pre-ordained and fixed-in-place the way Meta’s SONAR concepts were. We train the layers by how consistently they regenerate themselves, in translations of translations.

With layers of concepts, then a whole essay is digested as a sequence of lower-level concepts, which are then encoded into successively higher-levels. At the highest level of abstraction, the network would see numerous concepts registering together, as a “sparse encoding” across that space of concepts. The combinatorial variety of those sparse encodings is the space of concepts that the network can understand. More concept-layers will yield deeper insights into a wider swath of ideas. Those Ai will understand what they are saying, and we can verify if they are trying to trick us. I huffed a sigh of relief — try it! It feels good.