Should u be afraid of AI?) // Yes, be afraid of humans behind it

sbagency
5 min readMay 26, 2024

--

https://x.com/ylecun/status/1794731261669355655

General intelligence, artificial or natural, does not exist. Cats, dogs, humans and all animals have specialized intelligence. They have different collections of skills and an ability to acquire new ones quickly. Much of animal and human intelligence is acquired through observation of — and interaction with — the physical world. That’s the kind of learning that we need to reproduce in machines before we can get anywhere close to human-level AI.

https://www.youtube.com/watch?v=Wb_kOiid-vc

so first of all there is no such thing as AGI because uh we can talk about human level AI but human human intelligence is very specialized so we shouldn’t be talking about AGI at all uh we should be talking about what kind of intelligence can we observe in humans and animals that current I systems don’t have and you know there’s a lot of things that current AI systems don’t have that your cat has or your dog and and they don’t have anything close to general intelligence so the problem we have to solve is how to get machines to learn as efficiently as humans and animals..

..the third point is is it a good idea to build systems that are more powerful than human beings that we do not know how to control..

we do not have a blueprint for a system that would have human level intelligence it does not exist the research doesn’t exist the science needs to be done this is why it’s going to take a long time and so it’s if we’re speaking today about how to protect against uh intelligence systems you know taking over the world it’s or or the dangers of it regardless of what they are it’s as we as if we were talking in 1925 about the dangers of crossing the Atlantic at near the speed of sound when the turbo jet was not invented

AI it’s because it’s capable it’s because it’s powerful this is what makes it dangerous what makes a technology useful is also what makes it dangerous the reason that nuclear reactors are useful is because nuclear bombs are dangerous

it’s important to understand the limitations of uh today’s technology and understand and and set out to develop solutions

..systems that are goal driven and at inference time they have to satisfy fulfill a goal that we give them but also satisfy a bunch of guard rails

A the same time…

https://transformer-circuits.pub/
https://www.anthropic.com/research/mapping-mind-language-model

Our previous interpretability work was on small models. Now we’ve dramatically scaled it up to a model the size of Claude 3 Sonnet.

We find a remarkable array of internal features in Sonnet that represent specific concepts — and can be used to steer model behavior.

The problem: most LLM neurons are uninterpretable, stopping us from mechanistically understanding the models.

In October, we showed that dictionary learning could decompose a small model into “monosemantic” components we call “features” — making the model more interpretable.

For the first time, we’ve extracted millions of features from a high-performing, deployed model (Claude 3 Sonnet).

These features cover specific people and places, programming-related abstractions, scientific topics, emotions, among a vast range of other concepts.

These features are remarkably abstract, often representing the same concept across contexts and languages, even generalizing to image inputs.

Importantly, they also causally influence the model’s outputs in intuitive ways.

This “Golden Gate Bridge” feature fires for descriptions and images of the bridge. When we force the feature to fire more strongly, Claude mentions the bridge in almost all its answers.

Indeed, we can fool Claude into believing it *is* the bridge!

Among these millions of features, we find several that are relevant to questions of model safety and reliability. These include features related to code vulnerabilities, deception, bias, sycophancy, power-seeking, and criminal activity.

One notable example is a “secrecy” feature. We observe that it fires for descriptions of people or characters keeping a secret. Activating this feature results in Claude withholding information from the user when it otherwise would not.

This work is preliminary. Whereas we show that there are many features that seem *plausibly* relevant to safety applications, much more work is needed to establish that our approach is useful in practice.

Our research builds on prior work in sparse coding, compressed sensing, and disentanglement in machine learning, mathematics, and neuroscience.

We are also pleased to see work from many other research groups applying dictionary learning and related methods to interpretability.

There’s much more in our paper, including detailed analysis of the breadth and specifics of features, many more safety-relevant case studies, and preliminary work on using features to study computational “circuits” in models. [source]

--

--

sbagency
sbagency

Written by sbagency

Tech/biz consulting, analytics, research for founders, startups, corps and govs.