Multi-Agent Hide and Seek | OpenAI

🎁Amazon Prime 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance

Video

Transcript

on earth the simple rules of natural

selection and competition led to the

evolution of increasingly intelligent

life-forms today we ask if comparably

simple rules at multi-agent competition

can also lead to intelligent behavior in

a new virtual world these agents are

playing hide and seek

these agents have just begun learning

but they’ve already learned to chase and

run away this is a hard world for a

hider who has only learned to flee

however after training and millions of

rounds of hide-and-seek the hiders find

a solution

the hiders learn to use rudimentary

tools to their advantage by grabbing and

locking these blocks they can create

their own shelter the Seekers are locked

in place for a brief period at the start

of the game giving hiders a chance to

prepare even so the hiders must learn to

collaborate accomplishing tasks that

would be impossible for any single

individual the hiders are not the only

ones who can learn to use tools after

many generations of failing to break

into the shelter the Seekers learned to

jump over obstacles using ramps however

after many millions of rounds of having

their shelter breached the hiders

learned to take away the primary tool

the Seekers have at their disposal note

that we did not explicitly incentivize

any of these behaviors as each team

learns a new skill it implicitly changes

the challenges the other team faces

creating a new pressure to adapt we’ve

also put these agents into a more

open-ended environment randomizing the

objects team sizes and walls in this

world they learn to construct their own

shelter from scratch requiring that they

arrange multiple objects into precise

structures to prevent seekers from using

the ramps the hiders move them to the

edge of the play area and lock them in

place we originally believe this would

be the final strategy that the agents

learned however we found that after more

training the Seekers discover that they

can jump on top of boxes and surf them

to the Hydra shelter

in the last stage of emergent strategy

that we observe the hiders learn to lock

as many boxes as they can before

constructing their force in order to

defend against box surfing so how do

agents acquire these skills they’re

trained using reinforcement learning an

algorithm inspired by the way animals on

earth learn the agents play thousands of

rounds of hide-and-seek in parallel for

many days they train against each other

as well as past versions of themselves

using an algorithm called self play

coevolution and competition on earth led

to the only generally intelligent

species known to date humans while this

world is far less complex than Earth we

have found evidence that simple rules

can lead to increasingly intelligent

behavior from multi-agent interaction we

hope that with a much larger and more

diverse environment truly complex and

intelligent agents will one day emerge

[Music]