Music Generation | Christine Payne | OpenAI Scholars Demo Day 2018 | OpenAI

🎁Amazon Prime 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance

Video

Transcript

thank you hi I’m Christine Payne

I’m really excited to be here and want

to thank you all for being here as well

and I want to say thank you especially

to opening I for sponsoring us this

summer it really has been an incredible

program so I thought I would dive right

in presentation that’s not clicking

sorry with some sample generations so

this summer I was working on training an

lsdm to generate music so this first one

is a sample when I train my LFE M on

classical piano music

[Music]

okay and then when I took the same

neural net and instead of training out

in classical music I trained it on jazz

music and then asked it to generate new

pieces

all right so a nice way to think about

music generation is to think about it as

a language model problem so I won’t get

into the details now but for language

modeling we usually use an architecture

maybe something like an LST or a

transformer and we treat it on the task

we give it a prompt sentence and then

our sequence of words and then we ask it

what should the next word be and once

you have a model that’s really good at

this it’s actually pretty convenient to

turn it into a generator you just take

this word that’s predicted feed it back

into the model and then ask it to

predict the next word and so forth and

you can create a text or a piece of

music as long as we’d want the trouble

of course is what we need to do would be

able to translate music into tokens that

we could then feed sequentially into a

model like this so people have tried to

do this in the past but often times

because it’s a difficult problem we have

some pretty strong limitations on what

you can do with music

so it’s kind of clear how if you had one

note at a time you could ask the machine

at the model to predict the next note

and maybe you come ajan and say like

every time step will have exactly four

nodes and then predict the next set of

four nodes so if you’re doing Bach

Corral’s that could work really well

we’re another limit you might have is

like how many notes you could have so

the lowest note to the highest note but

might be a pretty small range but I was

really curious about could we extend

this and make it more general the

problem is when you start looking at

general music it’s really complicated so

you know effectively you can have any

number of notes at any time it could be

any combination of that so I’m defining

here what I call a musical time step and

so ideally we want to tell the model

that at this moment in time we’re

playing a D and a B and then you move

forward into a G then you afford an you

clean up and so forth you go and you do

a genus E but here we’ve run into a

problem right because here we notice

we’re gonna try to fit in an extra D but

it’s faster than where we were sampling

things and so this causes a problem so

we can imagine sampling

more often but then you know this also

causes other issues because I’m will end

up with lots of moments where nothing’s

happening at all and Ella stamps don’t

really love it we have basically species

word

you know it’s not ideal

otherwise we might build have a method

that’s more flexible and has some sort

of more complicated notion of time like

how long to wait until the next next

time set so we have here both our

sampling frequency problem and then the

problem saying noted before how I dunno

range we want and if we want to be

flexible about how many notes we play it

once so to solve these problems I’m

proposing two different kinds of

encodings and I’m calling these the

court bytes encoding and then note wise

encoding and I like to think about these

in sort of parallel to the language

models so chord wise is very similar to

like a word based language model and

know why it is much more the way you do

character by character prediction so for

chord wise is what I’m doing here and

literally I’ve only found some of them

here but literally for every 88 of the

piano keys I’m putting 0 or 1 for if a

note is being played at that high step

so at the very first moment you can say

they’re just the TC simply so we have

the two ones you know you can imagine

this could exploit pretty quickly we

could have you know two to the 88

popsicle possible combinations

fortunately piano music is a lot more

predictable than that we’re limited by

how many fingers we have and what sounds

good so I found that in across most of

classical music we had a vocab range

around 55,000 so 55,000 different

combinations of notes the other system

I’m proposing is this note Y system

where here I literally say first it’s

going to be C then another C and we’re

gonna weight the weight sort marks at

the end of this time weight force except

then you end that first C that’s down in

the bottom you play G and then you wait

again so it’s sort of it’s very much

like a character system might be this is

the men’s of being pretty nice it has a

much smaller vocab size because you only

hate

number of notes you have the ends and

the weights and it also has a really

nice feature that you can pretty easily

encode notes that last longer which I

also tried playing around with violin

modeling and there you really need to be

able to have notes that last for a long

time so I took the results of these and

I made a quiz which I invite people to

come across to the table later and try

it out but I have pairs of songs and one

song at the human concludes piece and

one is an AI composed piece and the task

is to guess which is which and

encouragingly enough people are really

bad at this it’s you know I was getting

most people around twos threes ones do

you people got Poor’s although often

people would email me and be like I’m a

professional musician so in general

people are finding this hard so they

take home from this is that these better

encodings can actually lead to some

really interesting music generation and

i’m just suggesting to encoding here i

think there’s probably possibility for

even more interesting combinations I’m

suggesting you know maybe we can look at

ways of including some known music

theory things thirds scales things like

that I tinkered with this a little bit

but I think there’s still a lot that can

be done another interesting possibility

is right now I’m training my model on

pretty much all composers at once and

then asking that to generate it would be

really fun to train that and then

fine-tune it on kind of neat quirky

combinations I’m suggesting like what if

you smashed Chopin and jazz together or

something like that I think there could

be some fun artistic possibilities and

lastly I think the big problem both for

this and then in the language model

generation world is how you kind of get

a longer-term structure so the pieces

that are was creating more often really

good for the first 30 seconds and even

like a minute sounded good and then we

started being you know you realize that

there’s really no long when going on

here and so ideally you’d love to have

some sort of sense of the big picture

where you are and then

that’s sort of immediate index notes

that’ll sound good and in the interest

of time I skipped a lot of detail so

I’ll open it up to questions now feel

free to hit any of these or whatever

you’d like thank you or you can just let

me up so the question is did I hit any

dead ends things that I tried but just

were not working at all um I would say

that I feel that way a little bit about

the chord wise system like I feel

intuitively that it should work and it

actually turns out to be very good at

memorizing theses so I can I can prompt

it with a little bit of Mozart and then

it’ll continue to generate like 45

seconds or a minute of Mozart perfectly

but it has a really hard time moving

away from training set and getting into

kind of interesting new patterns and so

I feel like there should be a way to get

around that but I feel like it’s a

little bit of a dead end right now so

that’s part of why I moved over to know

Weiss go ahead

oh the question is how many minutes of

training data did I put into it

and I knew someone was gonna ask me it

is it’s actually a very large amount I

don’t know why in minutes but it’s I so

I took data from there’s a classical

archives where everyone has just

submitted MIDI recording so they’re

pieces so it’ll have like the entire set

of show penny to the entire set of

Beethoven sonatas but it’s really a

pretty broad range of like all of the

famous classical music pieces so in the

piano said I wasn’t finding I was

limited by data when I tried to do

violin and piano duo’s then I was more

because there’s a much smaller set for

that there’s a super interesting

question it’s have we done models like

this on birth songs or like whale songs

I imagine I do not know that answer so

it sounds like it could be really

fascinating but but I don’t know but

Thank You Sadie one more shake on okay

yeah so the so one thing that I would

really love to see in these pieces but

that I don’t see right now is any music

we have an idea of like one one and then

go so you you have a theme or like a

short idea and then you repeat the short

idea and then the third time you do it

it starts that way but then it goes off

into a longer idea and occasionally I

say here but not not regularly enough

that I could say like oh it’s really

learned this and I kind of would expect

a good good music model to get that and

that’s one of those longer-term

structure things but I’d love to be able

to get all right thank you I’m gonna

pass it up

[Applause]