🎁Amazon Prime 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance
Video
Transcript
so my presentation is going to be on
feedback loops in opinion modeling and
i will so i’m danielle ensign my mentor
is jeff with
so i’m going to briefly overview here
where i’m going to talk about why this
is a problem we should study i’m going
to then give a literature review
of just like briefly lenses on opinion
modeling and then i’m going to talk
about the particular thing we studied
which is when models produce data
that then goes back into models in an
application of temperature decay
so why study opinion modeling well in ai
safety it would be useful to understand
how preferences change over time and
sort of open-endedness and data
augmentation it would be good to
understand
what are the processes that are leading
to this data we’re sending into our
models
and in ai fairness there is this concern
that as
language models generate text in the
world they may affect sort of this
opinion ecosystem
that exists out there and so it would be
good to understand how that happens and
how it affects models themselves so
first a brief
literature review we have language
modeling which is essentially where you
take
lots of data and then you
feed that data into a language model and
it sort of captures a
snapshot in time um so
this is useful but it doesn’t quite
capture a lot of these dynamic questions
they might ask like how the process is
changing over time and so we do have
some
previous work that people have done on
just like a facebook scale simulator or
people that study github and these find
that they’re sort of these spikes but
it’s hard to get sort of this detailed
analysis when you’re just
doing these learning systems and so
another thing you can do is these agent
based or physics
kind of models and one thing they find
there is that reality is very sort of
nuanced it’s hard to concretely define
these particular things because they’re
very interacting with each other and so
one thing people do is they
try to take like a systems theory
approach where they study the structure
for many interacting parts
but the problem here is it’s very hard
to have like choose the right sort of
level of fine-grainedness of your models
and so one thing you can do
is use these empirical laws we see in
real world data
and then use that to sort of validate
your models and then finally there’s
sort of
this other perspective of looking at
these networks
and you look at sort of how the data is
moving across
through like how opinions the density of
the networks and different things like
that and i can give you some insight
into
what’s happening so there’s a lot of
other things in particular we decided to
study
a particular problem which is where
you have models that are outputting data
and that’s fed back
into the models themselves so concretely
right now there are models like gp23
that are outputting text that’s
going on the internet and this data is
going to go back into future models
and so it would be good to understand
what things we should be worried about
here what’s happening
so here’s the setup that we’re going to
do we’re going to take some data we’re
going to feed it into a trained model
we’re going to generate some data
we’re going to use that to train another
model we’re going to generate some more
data and repeat
and you can imagine a couple of
variations of this maybe we’re going to
fine-tune the model instead of training
one from scratch or maybe we’re in a
classification setting
where we label this data distribution
and that goes and that
leads to a trained model so
um very concretely here let’s consider
this coin
setting where we have
so we have we’re going to flip lots of
coins and then in this case we ended up
with the same amount of heads and tails
and so our new probability is 0.5
we do this again and we ended up with 13
heads and seven tails so our new
probability of heads is
0.65 and we can repeat this multiple
times
and there’s a couple of things you find
when you start doing this formal
analysis on like linear classifiers or
coins
and two insights are first of all that
more data tends to lead to a decreased
step size and this just makes sense
it gives you a better estimator and the
second insight is when you look at this
we ended up at all tails and
the reason for this is because there is
some probability of just outputting the
same token over and over
and once that happens then it’s going to
be stuck there
and so for example uh
you could imagine all heads or all tails
or or even more generally
imagine if we have more than two
outcomes in a discrete setting
so we can do some random walk on these
and eventually one of these is going to
end up at zero in which case we’re back
to the two
token settings so there’s some theory uh
one other
thing that’s worth talking about in the
theory setting is this temperature
so with temperature i have this graph
here
and this is showing so when this red
line is is just a line of the green line
that’s temperature 1.0
and when the red line is horizontal
that’s temperature 0.0
so what you see here is that when we
sample with temperature what’s going to
happen
is things that are 0.5 or higher are
going to be pushed up
and things that are below 0.5 are going
to be pushed down
this has real world implications because
what this means is that when we sample
from models we’re going to be
perpetuating existing biases
this is biased in the technical sense uh
it’s less clear if you can argue that
this
applies to bias like in the real world
but
uh it certainly at least seems like it
would speed up this collapse issue that
we’re talking about
so there’s some theory but how does this
apply in practice
well first what we did is we looked at
some engram models where you can
actually run the theory and what the
theory suggests is that it should
collapse to a single path
on the graph from the start to the end
and that path should have no cycles
because if there is a cycle that
represents two different places we went
and one of those directions is going to
be collapsed and in fact that’s what we
found
uh just doing a basic n-gram this sort
of traditional
nlp modeling we found after doing ten
thousand ideas of this step it clapped
to by being missed
i will not wish the apart cousin of duty
so that’s great but transformers are
really sort of more modern language
models
and so the question is what happens with
transformers there
there is this tricky question of like
how do you measure collapse and
one way you can do this is by modeling
temperature by modeling entropy
so if we generate lots of sentences and
then we
compute the probability of each sentence
just by multiplying the probabilities of
each word
and then average over loss of those
sentences we get a rough estimate of the
entropy of the model itself
so as a reminder here’s what we’re doing
we’re taking data into a train model and
i’d like you to sort of guess what you
think is going to happen
with a transformer when we just feed it
back into here
so okay what we find
is that there are two settings the
the first is basically it just sort of
shoots off to
entropy randomness where it
yeah it sort of just becomes this
uniform
outputting thing roughly it still sort
of centers on things but it becomes this
very sort of random generation
the other thing we find is this behavior
here
where it sort of shoots up initially and
then it goes down
to this collapse as the theory predicted
concretely here so
it starts out we have this plankton or
aggressive wildlife some days in the
season just kind of standard output and
then it
if you’ve looked at the outputs of
language models you know that a lot of
the outputs are pretty weird
and so what’s happening is the language
models are sort of getting used to the
outputs being
at least what seems to be happening is
the language models are getting used to
these outputs being weirder
than they are used to as their inputs
and so
we end up with the generation quality
seems to decrease then eventually it
hits this cat where now it’s sort of
used to how weird the output is
and then we have these cycles that
happen so for example
um hi hello hi hello it might just
repeat something like that
and once that cycle appears in a
generated output
the model will see that and that will be
more likely to be produced in the future
and so we’ll repeat
this um and then it’ll be sort of
perpetuated in there and so if you look
at these like these are the most common
tokens at the peak they’re pretty
common tokens but what happens is it
sort of focuses
on particularly weird little loops so in
this case it really like to say twitter
a lot or ally a lot but
you we ran some other runs so this one
eventually just focused on saying enemy
over and over
and the important point here is that we
have this collapsed behavior
so that’s the theory um
what the theory also suggests is that
entropy we would or that temperature if
we have
a temperature below 0.1 or below 1.0
then we would expect it to
collapse quicker and if we have
temperature above 1.0
then we would expect it to go to this
entropy
and that is what we find so if we have
temperature below 1.0
we find that it very quickly collapses
you’ll recall over here it took 250
steps
here it only took about 20 and with very
very low temperature
it collapses almost immediately to the
sentences like essentially just the most
common sentence
whereas with higher temperatures it has
some time to fiddle around before
it collapses and then yeah with
temperatures above 1.0 it
just sort of takes off to essentially
the maximum entropy it can get to
so future work in this direction
uh it would be good to have better
understanding of what’s happening in
this process
it would be good to sort of um
to run more runs and understand some of
the variability some of these take a
very long time so it would be good to
understand if there’s other kinds of
outcomes these are the i’m describing
the general patterns we’ve seen
it would be good to understand like if
we are perpetuating
real-world biases you know uh
there’s an argument that temperature
sort of leads to these models being mode
seeking
where they will output the most common
thing and then that goes back into later
models
and so you can imagine that it should be
perpetuating bias but it’s hard to
necessarily argue that
and then finally there’s this
temperature decay phenomenon
where because this lower temperature
things are being fed back into the
models and the model
gets used to it and that we get less
entropy that seems like a problem
and so uh you know
the one other thing here is all of this
theory seems a little iffy like
in practice when we like
are feeding this models into the real
world there’s some filter on what data
actually goes out there
and so to and so we should under like
and so really we should incorporate in
this model some kind of feedback
thing that is filtering what data is
going on there and so
uh this is analogous to if you have like
a go playing system then you only are
feeding back the things that do well
and so that’s a relevant piece here as
well so yeah
that’s my presentation um and thank you
to
everyone um my mentors and everyone that
was able to help
and uh yeah i’m open to questions now
cool so uh the first question
we have is for the language model at t
equals one
does entropy always increase and then
decrease
yeah so sometimes it does seem to just
take off to really high entropy
sometimes it does uh
do that up and then back down we haven’t
ever seen it just go down
and honestly it’s kind of weird but it
pretty like consistently goes down it
doesn’t do much of a random walk
whereas for like the theory it suggests
it should be doing a bit of a random
walk and so there’s a lot of open
questions there
so there’s a question here would you see
the same effect if you extend the data
set instead of replacing it with model
samples
so this is another direction that you
can
yeah that i think is really interesting
it’s essentially this question of
grounding
so i have a couple slides here on this
one where
you can imagine instead of just directly
using
the data into a trained model we
actually
have some ground truth data that we’re
also feeding into the model
this in principle this can
uh help quite a bit because like we’re
just doing a random walk at each step
if you bias the random walk towards a
particular distribution
then we would expect that random walk to
not or to roughly stay around there and
for
small engram models we did find that
this helped
and this is relevant in practice because
these models are going to be feeding
back into
themselves and so or and these models
are going to be taking sort of
also real world data um but yeah we
haven’t validated this
on language models and i think it’s a
really interesting question
yeah what are the implications of this
work first semi-supervised learning
so i
i think that yeah it’s
um that’s a tricky question i i think i
would need to
think more about that um
[Music]
yeah so you could certainly imagine
like at the limit of
labeling data just from some small set
of data
that you may end up with some
feedback loops but
it’s less clear i think that’s a nuanced
and tricky
question yeah i’m not sure i have a
great answer for you on that one
uh the final question we have here is
we’re
able to look at what happens if the
outputs are only some smallish
percentage of the input
for the next training step mixed with
new real-world data
yeah so that’s this grounding setting
and we have not ran it on language
models
i did ran it for the engram things and
honestly my intuition is that yeah just
a very small percent would help
significantly
you know it’s worth pointing out that
these things
are like in practice we’re only going to
be doing a couple steps we’re not going
to be doing
10 000 steps which is how many we needed
to converge
and so like
only in practice if you have some of
this grounding data
that may make a big difference yeah
um so there was also a question have you
considered
other settings apart from the coin
setting
um yeah so we we did that setting with
lots of different models
we looked at like a linear classifier so
it’s a little different there
because instead of
you know this collapse it just sort of
randomly walks
if you don’t have any bias then it just
sort of keeps cycling
and yeah the
the general insight of sort of having
either a random walk
or collapse for discrete things seems to
be true in
a lot of settings but i i think that
it’s worth doing a little more detailed
analysis there because
for some more complex models it you
might be able to say something more
interesting
um okay so that’s all the time i have so
i’m gonna pass it off to jonathan
thank you everyone