New role: AGI Alignment research

Aug 24, 2023

I'm taking a new role within Google -- I'm joining Google DeepMind to help set up a technical alignment team in the Bay Area.

Read →

4 Comments

Steeven

Nov 23, 2023

>Still, drop me a note here or at dmorr at google if you are a research scientist or software engineer and might be interested.

I do data and infra software engineering and would be interested. Good luck making progress on alignment!

Expand full comment

Ming Lee

Aug 31, 2023Edited

Congrats on the new position Dave,

I can think of no better person I'd choose to be on point to prevent the AI apocalypse.

One thing that I've thought of about AI safety is that it's not just AI alone that is dangerous. The combination of AI plus humanity will have an unknown-unknown dynamic. Humanity has already proven many times over to be a danger to itself. The combination of AI plus humanity is really where the greatest risk is, in my opinion. Focusing on AI safety alone is only mitigating one-half of that brand new combination of AI+humanity.

During the pandemic, we've seen that journalists and science communicators are ineffective at reducing misinformation that easily takes a foothold in various pockets of humanity. The internet makes (dis/mis)information (1) easily accessible and (2) easily reproducible and amplified.

Imagine a future world where LLMs are commonly the primary communication intermediaries between experts and the general public. A few confabulations that match existing human misinformation about AI safety stewards could cause a feedback loop of misinformation in real-time in conjunction with human-produced media.

That's a nightmare scenario where real-time AI agents ingest YouTube and Twitter conspiracy theories and regurgitates that misinformation in various creative generative forms adjacent to those conspiracy theories, thus strengthening them. In other words, for whatever reasons, a minority segment of humanity decides that AI safety is against their interests and the AI amplifies their view. A minority misinformed view may not even need to rise to a majority view before it can do harm. We saw this happening in real-time during the COVID pandemic. Thus, we know that humans are susceptible to developing pockets of misinformation and superstition among its various tribal affiliations. I imagine that well-written LLM generated text could only magnify that by virtue of both quality and volume.

But that's just the first degree of misinformation. At the second degree are conspiracy theories that the AI safety people such as yourself have nefarious goals against their own particular political interests. So some factions, perhaps not even in political agreement with each other, find common cause against AI safety people because they simply don't understand it. If AI ever advances to the point of “understanding”, then it may “understand” that it is the people who hold its leash who are preventing it from accomplishing its goals, whatever those goals might be. Strong dogs pull their owners off their feet all the time at the dog park. And AI is going to be a very very big dog.

By virtue of doing your job as AI safety steward, you're matching the pattern of whatever conspiracy theories are already out there. It's a kind of tautology. LLMs will pattern match. So within those patterns will exist an early confabulation of nefarious AI safety researchers based on what you actually are doing. Those conspiracy theories which hold a dash of truth are the most compelling. And LLMs will offer up confabulations all with a dash of truth because that’s what they’re designed to do. One of these confabulations may find a fertile home in the minds of conspiracy theorists because it accidentally hits all the right happy centers in our mortal human brains. Then those human brains will certainly amplify that idea— that was originally confabulated by an AI somewhere, somehow.

The feedback loop between human and AI is currently unrestricted. And if, in the not-too-distant-future, a significantly great volume of x-to-human communication derives from LLMs in one form or another, then our entire communication system is inherently unstable.

Okay, so problem identified—

1) human+AI = uncontrolled feedback loop.

2) human+(AI + leash) => (human+AI) + attempts to break the leash.

So what do we do? I have an idea. Well, several, but I’ll start with the basics— a brief diagnosis of why and where we are right now. And then a single idea for immediate remedy.

Diagnosis:

As I know you’re firmly aware, cryptocurrency technology is built upon the basic premise of distrust— or at the very least, not relying on any sort of trust network to operate computationally. This is inherently a non-human idea. Humans, of course, have always built their systems on trust— every currency system before crypto relied on some form of trust that the token dollar or seashell represented real value to everyone in the trust network.

What we need for AI+humanity safety is a similar trust network as we had for human currency, but for information instead of money. Accurate and reliable information, in this post-information age, will be the only valuable currency, and has been for some time now.

Misinformation, or what I’d rather call counterfeit information steals value from real people by presenting itself as valuable as actual valid information. Counterfeit information exists simply because information is so valuable that those who do not have access to valid information are resigned to the role of charlatans and hucksters to sell counterfeit information to others who are in information poverty.

Counterfeit information exists for some reasons:

1) Counterfeiters hold more trust among a community than hold true information.

2) Information/misinformation has value among their community. This could be in the form of actual money, but it could also be social capital as well.

3) consumers of counterfeit information regard it as more valuable relative to valid information.

4) The value of counterfeit information can be self-reinforcing within its own trust network, separate from valid information networks.

It is important to note that counterfeiters have some true information and that is the ingredient that makes counterfeit information have great efficacy. Just as LLMs are designed to fill in the gaps, our human brains do the same to “connect the dots” and “do your own research.” A kernel of truth is the starting point of those conspiracy theories. So even if your AI only generates truth (which is a very hard problem, so it probably won’t), it can still be distorted into misinformation.

Diagnosis summary: So what we see here is that trust has some equation in relation to information and social value. By some formulation, certain actors can extract value from their social network by leveraging trust + counterfeit information.

Solution: Basically increase trust on the other side of the equation. That is, increase the trust in real, actual information, which is a net good to all communities, regardless of political/tribal affiliation. Increased trust in actual, real information would decrease the value of counterfeit information to both the consumers and producers of counterfeit information.

How: Yeah, that’s a tricky problem. Trust networks and tribal affiliations are self-reinforcing, so there is lots of value to be extracted if counterfeit information is highly trusted among the members of a community.

One idea: YouTube science/math communicators are really good and entertaining. They are also highly trustworthy (as far as I know). People like these can bridge the gap between communication and real information with concise and entertaining deliveries. And people will know that they are human beings— which cannot be understated as AIs will soon become the first point of contact between large organizations (corporations/governments) to ordinary folks. So in the not-too-distant now/future, we may see fewer and fewer real people in our daily lives other than people already within our own social bubbles/trust networks.

So the One Idea here is that people who are brilliant science/math communicators are the front line soldiers for building a new trust network. Although I believe that perhaps there is a computational way to devise a trust network in the way that crypto devised a distrust network, I am not proposing such an idea as a solution here because that takes R&D.

But getting people who are good at communicating is something that can begin today. Here’s my proposal for immediate action items— hire folks to make entertaining media to educate people about AI. There will be and already is a lot of fear and misunderstanding about AI, some of it is propagated by people in the AI community itself.

If we recognize that trust is the most valuable commodity in the post-AI landscape, then this is a terrible start to addressing the problem of the AI+humanity feedback loop.

Also, I’d love to work with you, Dave. So consider this my resume.

-Ming

Expand full comment

Richard Möhn

Aug 26, 2023

Hi Dave, I signed up after reading the parenting rules post a long time ago. Glad you're going to work on AI alignment. It's one of the few most important things to work on.

Expand full comment

Reply (1)

Dave Orr

Aug 26, 2023

Thanks! Maybe I'll even start writing here more. :)

Expand full comment

Mistake Theory

New role: AGI Alignment research