🎁Amazon Prime 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance
Video
Transcript
on earth the simple rules of natural
selection and competition led to the
evolution of increasingly intelligent
life-forms today we ask if comparably
simple rules at multi-agent competition
can also lead to intelligent behavior in
a new virtual world these agents are
playing hide and seek
these agents have just begun learning
but they’ve already learned to chase and
run away this is a hard world for a
hider who has only learned to flee
however after training and millions of
rounds of hide-and-seek the hiders find
a solution
the hiders learn to use rudimentary
tools to their advantage by grabbing and
locking these blocks they can create
their own shelter the Seekers are locked
in place for a brief period at the start
of the game giving hiders a chance to
prepare even so the hiders must learn to
collaborate accomplishing tasks that
would be impossible for any single
individual the hiders are not the only
ones who can learn to use tools after
many generations of failing to break
into the shelter the Seekers learned to
jump over obstacles using ramps however
after many millions of rounds of having
their shelter breached the hiders
learned to take away the primary tool
the Seekers have at their disposal note
that we did not explicitly incentivize
any of these behaviors as each team
learns a new skill it implicitly changes
the challenges the other team faces
creating a new pressure to adapt we’ve
also put these agents into a more
open-ended environment randomizing the
objects team sizes and walls in this
world they learn to construct their own
shelter from scratch requiring that they
arrange multiple objects into precise
structures to prevent seekers from using
the ramps the hiders move them to the
edge of the play area and lock them in
place we originally believe this would
be the final strategy that the agents
learned however we found that after more
training the Seekers discover that they
can jump on top of boxes and surf them
to the Hydra shelter
in the last stage of emergent strategy
that we observe the hiders learn to lock
as many boxes as they can before
constructing their force in order to
defend against box surfing so how do
agents acquire these skills they’re
trained using reinforcement learning an
algorithm inspired by the way animals on
earth learn the agents play thousands of
rounds of hide-and-seek in parallel for
many days they train against each other
as well as past versions of themselves
using an algorithm called self play
coevolution and competition on earth led
to the only generally intelligent
species known to date humans while this
world is far less complex than Earth we
have found evidence that simple rules
can lead to increasingly intelligent
behavior from multi-agent interaction we
hope that with a much larger and more
diverse environment truly complex and
intelligent agents will one day emerge
[Music]