GPT-4 Developer Livestream ｜ OpenAI

Video

Transcript

foreign

did the gpd4 developer demo live stream

honestly it’s kind of hard for me to

believe that this day is here open AI

has been building this technology really

since we started the company but for the

past two years we’ve been really focused

on delivering gpt4

that started with rebuilding our entire

training stack actually training the

model

and then seeing what it was capable of

trying to figure out its capabilities

its risks working with Partners in order

to test it in real world scenarios

really tuning Its Behavior optimizing

the model getting it available

so that you can use it and so today our

goal is to show you a little bit of how

to make gbto4 shine

how to really get the most out of it you

know where it’s kind of you know

weaknesses are where we’re still working

on it and just how to really use it as a

good tool a good partner

um so if you’re interested in

participating in the Stream uh that if

you go to our Discord so that’s

discord.gg openai there’s comments in

there and we’ll take a couple of

audience suggestions

so the first thing I want to show you is

the first task that gpd4 could do that

we never really got 3.5 to do

and the way to think about this is all

throughout training that you know you’re

constantly doing all this work it’s 2

A.M the pager goes off you fix the model

and you’re always wondering is it gonna

work

is all this effort actually going to pan

out and so we all had a pet task that we

really liked and that we would all

individually be trying to see is the

model capable of it now

and I’m going to show you the first one

that we had a success for four but never

really got there for 3.5

so I’m just going to copy the top of our

blog post from today going to paste it

into our Playground now this is our new

chat completions playground that came

out two weeks ago I’m going to show you

first with GPT 3.5 4 has the same API to

it the same playground

the way that it works is you have a

system message where you explain to the

model what it’s supposed to do and we’ve

made these models very steerable so you

can provide it with really any

instruction you want whatever you dream

up and the model will adhere to it

pretty well and in the future it will

get increasingly increasingly powerful

at steering the model very reliably

you can then paste whatever you want as

a user the model will return messages as

an assistant and the way to think of it

is that we’re moving away from sort of

just raw text in raw text out where you

can’t tell where different parts of the

conversation come from but towards this

much more structured format that gives

the model the opportunity to know well

this is the user asking me to do

something that the developer didn’t

attend I should listen to the developer

here

all right so now time to actually show

you the task that I’m referring to so

everyone’s familiar with summarize

this let’s say article into a sentence

okay getting a little more specific uh

but where every word begins with G

so this is 3.5 let’s see what it does

yeah it kind of didn’t even try

just gave up on the task this is pretty

typical for 3.5 trying to do this

particular kind of task if it’s you know

sort of a very kind of stilted article

or something like that maybe it can

succeed but for the most part 3.5 just

gives up

but let’s try the exact same prompt

the exact same system message

in gbt4

so kind of borderline whether you want

to count AI or not but so let’s say AI

doesn’t count

that’s cheating

so fair enough the model happily accepts

my feedback

so now to make sure it’s not just good

for G’s I’d like to turn this over to

the audience I’ll take a suggestion on

what letter to try next in the meanwhile

while I’m waiting for our moderators to

pick the lucky lucky letter I will give

a try with a

um but in this case I’ll say gpd4 is

fine

why not

also pretty good summary

so I’ll hop over to our Discord

all right

wow if people are are being a little

ambitious here I’m really trying to put

the model through the paces we’re going

to try Q uh which if you think about

this for a moment I want the audience to

really think about how would you do a

summary of this article that all starts

with Q it’s not easy

it’s pretty good that’s pretty good

all right so I’ve shown you summarizing

an existing article I want to show you

how you can flexibly combine ideas

between different articles so I’m going

to take this article that was on Hacker

News yesterday

copy paste it

into the same conversation so it has all

the context of what we’re just doing I’m

going to say find one common theme

between this article and the gpd4 blog

so this is an article about Pinecone

which is a python web app development

framework and it’s making the technology

more accessible user friendly if you

don’t think that was insightful enough

you can always give some feedback and

say that was not insightful

enough

please no I’ll just even just leave it

there leave it up to the model to decide

so Bridging the Gap between powerful

technology and practical applications

seems not bad and of course you can ask

for any other kind of task you want

using its flexible language

understanding and synthesis you can ask

for something like

now turn the GT4 blog post into a

rhyming poem

picked up on open AI evalues open source

for all helping to guide answering the

call which by the way if you’d like to

contribute to this model please give us

evals we have an open source evaluation

framework that will help us guide and

all of our users understand what the

model is capable of and to take it to

the next level

so there we go this is consuming

existing content using gpt4 with with a

little bit of creativity on top

but next I want to show you how to build

with gpt4 what it’s like to create with

it as a partner

and so the thing we’re going to do

is we’re going to actually build a

Discord bot

I’ll build it live and show you the

process show you debugging show you what

the model can do where its limitations

are and how to work with with them in

order to sort of achieve New Heights so

the first thing I’ll do is tell the

model that this time it’s supposed to be

an AI programming assistant

its job is to write things out in

pseudocode first and then actually write

the code and this approach is very

helpful so that the model break down the

problem into smaller pieces and then

that way you’re not kind of asking it to

just come up with a super hard solution

to a problem all in one go

it also makes it very interpretable

because you can see exactly what the

model was thinking and you can even

provide Corrections if you’d like

so here is the prompt that we’re going

to ask it this is the kind of thing that

3.5 would totally choke on if you’ve

tried anything like it but so we’re

going to ask for a Discord bot that uses

the gpd4 API to read images and texts

now there’s one problem here which is

this model’s training cutoff is in 2021

which means it has not seen our new chat

completions format so I literally just

went to the blog post from two weeks ago

copy pasted from the blog post including

the response format it has not seen the

new image extension to that and so I

just kind of wrote that up and you know

just

very minimal detail about how to include

images so and now the model can actually

leverage the doc that documentation that

it did not have memorized that it does

not know

okay

and in general these models are very

good at using information that it’s been

trained on in new ways and synthesizing

new content and you can see that right

here that it actually wrote an entirely

new bot

now let’s

actually see if this bot is going to

work in practice so you should always

look through the code to get a sense of

what it does don’t run untrusted code

from humans or from AIS

and one thing to note is that the

Discord API has changed a lot over time

and particularly that there’s one

feature that has changed a lot since

this model was trained

give it a try in fact yes we are missing

the intense keyword this is something

that came out in 2020

. so the model does know it exists but

it doesn’t know which version of the

Discord API we’re using so are we out of

luck well not quite we can just simply

paste to the model exactly the error

message not even going to say hey this

is from running your code could you

please fix it

we’ll just let it run

and the model says oh yeah whoops the

intense argument here’s the correct

here’s the correct code

now let’s give this a try once again

kind of making sure that we understand

what the code is doing

now a second issue that can come up is

it doesn’t know what environment I’m

running in and if you notice it says hey

here’s this inscrutable error message

which if you’ve not used jupyter

notebook a lot with async IO before you

probably have no idea what this means

but fortunately

once again you can just sort of say to

the model hey

I am using Jupiter

and would like to make this work

can you fix it

and the specific problem is that there’s

already an event Loop running so you

need to use this Nest async i o Library

you need to call Net Nest I sync IO dot

apply the model knows all of this

correctly instantiates all of these

these pieces into the bot it even helps

hopefully tells you oh you’re running in

Jupiter well you can do this bang pip

install in order to install the package

if you don’t already have it that was

very helpful

so now we’ll run and it looks like

something happened

so the first thing I’ll do

go over to our Discord

and I will paste in

a screenshot

of our Discord itself so remember gpt4

is not just a language model it’s also a

vision model in fact it can flexibly

accept inputs that intersperse images

and text arbitrarily kind of like a

document now the image feature is in

preview so this is going to be a little

sneak peek it’s not yet publicly

available it’s something we’re working

with one partner called be my eyes in

order to really start to develop it and

get it ready for prime time

but you can ask anything you like for

example I can’t you know I’ll say gp4

hello world

can you describe this image

and painstaking detail

all right which first of all think of

how you would do this yourself there’s a

lot of different things you could latch

onto a lot of different pieces of the

system you could describe and we can go

over to the actual code and we can see

that yep we in fact received the message

have formatted an appropriate request

for our API

and now we wait

um because you know one of the things we

have to do is we have to make the system

faster that’s one of the things that

we’re working on optimizing in the

meanwhile I just want to say to the

audience that’s watching we’ll take an

audience request next so if you have an

image and a task you’d like to

accomplish please submit that to the

Discord our moderators will pick one

that will run

so we can see that the Discord oh it

looks like we have a response perfect

so it’s a screenshot of a Discord

application interface pretty good did

not even describe it it knows that it’s

Discord it’s probably Discord written

there somewhere where it just kind of

knows this from from prior experience

server icon label gpd4 describes the

interface in great detail talks about uh

all the people telling me that I’m

supposed to do Q uh very very kind

audience

and describes a much of the uh the

notification messages and the users that

are in the channel and so there you go

that’s some that’s some pretty good

understanding now this next one if you

notice first of all we got a post but

the model did not actually see the

message so is this a failure of the

model or of the system around the model

well we can take a look

and if you notice here content is an

empty string we received a blank message

contents

the reason for this is a dirty trick

that we played on the AI

so if you go to the Discord

documentation

and you scroll through it all the way

down to uh I can see it hard for me to

even find honestly to the message

content

intent you’ll see this was added as of

September 2022 as a required field so in

order to receive a message that does not

explicitly tag you you now have to

include this new intent in your code

remember I said intensive change a lot

over time this is much newer than the

model as possible is possibly able to

know so maybe we’re out of luck we have

to debug this by hand but once again we

can try to use gpd4’s language

understanding capabilities

to solve this now keep in mind this is a

document of like I think this is like

ten thousand fifteen thousand words

something like that it’s not formatted

very well this is literally a command a

copy paste like this is what it’s

supposed to parse through to find in the

middle of that document that oh yeah

message contents that’s required now but

let’s see if it can do it

so we will ask for I I am receiving

blank message contents

can you

why could this be happening

how do I fix it

so one thing that’s new about gpd4 is

context length

32 000 tokens is kind of the upper limit

that we support right now and the model

is able to flexibly use long documents

it’s something we’re still optimizing so

we recommend trying it out but not

necessarily sort of really really

scaling it up just yet unless you have

an application that really benefits from

it so if you’re really interested in

Long context please let us know we want

to see what kinds of applications it

unlocks but if you see

it says oh yeah message content intent

was not enabled and so you can either

ask the model to write some code for you

or you could

I actually just you know do it the

old-fashioned way

either way is fine

I think this is a augmenting tool makes

you much more productive but it’s still

important that you are in the driver’s

seat and are the manager and knows

what’s what’s going on so now we’re

connected once again

and uh Boris would you like to rerun the

message

once again we can see that we have

received it even though the bot was not

explicitly tagged

seems like a pretty good

pretty good description interesting this

is an interesting image actually looks

like it’s a dolly generated one and

let’s actually try this one as well

so what’s funny about this image oh it’s

already been submitted

so once again we can verify this making

the right API calls

squirrels do typically eat nuts we don’t

expect them to use a camera or act like

a human so I think that’s that’s a

pretty good explanation of why that

image is funny

so I’m going to show you one more

example of what you can do with this

model

so I have here a nice hand-drawn mock-up

of a joke website definitely worthy of

being put up on my refrigerator

so I’m just going to take out my phone

literally take a photo

of this mock-up

and then I’m going to send it

to our Discord

all right going to send it to our

Discord

and this is of course the rockiest part

making sure that we actually send it to

the right Channel

which in fact I think maybe I did not

sent it to the wrong Channel

it’s funny it’s always the uh the sort

of non-ai parts of these demos that are

the hardest part to do

and here we go

technology is now solved

and now we wait

so the thing that’s amazing in my mind

is that

what’s going on here is we’re talking to

a neural network

and this neural network was trained to

predict what comes next right it played

this like this game of sort of being

shown a partial document and then

predicted what comes next across an

unimaginably large amount of content and

from there it learns all of these skills

that you can apply and all these very

flexible ways and so we can actually

take now this output so literally we

just said to

output the HTML from that picture

and here we go

actual working JavaScript

filled in the jokes

for comparison

this was the original

of our mock-up

and so there you go going from

hand-drawn

beautiful art

if I do say so myself to working website

and this is all just potential right we

you can see lots of different

applications we ourselves are still

figuring out new ways to use this so

we’re going to work with our partner

we’re going to scale up from there but

please be patient because it’s going to

take us some time to really make this

available for everyone

so I have one last thing to show you

I’ve shown you reading existing content

I’ve shown you how to

build with the system as a partner the

last thing I’m going to show

is how to work with the system to

accomplish a task that none of us like

to do but we all have to

so you may have guessed the thing we’re

going to do is taxes

now note that GPT is not a certified tax

professional nor am I so you should

always check with your your Tax Advisor

but it can be helpful to understand some

dense content to just be able to empower

yourself to to be able to sort of solve

problems and get a get a handle on

what’s Happening when you could not

otherwise so once again I’ll do a system

message in this case I’m going to tell

it that it’s tax GPT which is not a

specific thing that we’ve trained into

this model you can be very creative if

you want with the system message to

really get the model in the mood of what

is your job what are you supposed to do

so I pasted in

the tax code this is about 16 Pages

worth of of tax code and there’s this

question about Allison Bob they got

married at one point uh and that here

are their their incomes and they take a

standard deduction they’re filing

jointly so first question what is their

standard deduction for 2018

. so while the model is chugging I’m

going to solve this problem by hand to

show you what’s involved so the standard

deduction is the basic standard

deduction plus the additional the basic

one is 200 percent for a joint return of

subparagraph C which is here okay so

additional doesn’t apply the limitation

doesn’t apply

um okay now these apply oh wait special

rules for taxable year 2018 which is the

one we care about through 2025 you have

to substitute twelve thousand for three

thousand so two hundred percent of

twelve thousand twenty four thousand is

the final answer

if you notice the model got to the same

conclusion

and you can actually read through its

explanation

and to tell you the truth the first time

I tried to approach this problem myself

I could not figure it out I spent half

an hour reading through the tax code

trying to figure out this like back

reference and why there’s some program

like just what’s even going on it was

only by asking the model to spell out

its reasoning and then I followed along

that I was like oh I get it now I

understand how this works and so that I

think is where the power of the system

lies it’s not perfect but neither are

you and together is this amplifying tool

that lets you just reach New Heights

and you can go further you can say okay

now calculate their total liability

and here we go it’s doing the

calculation

honestly I every time it does it it’s

just it’s amazing this model is so good

at Mental Math it’s way way better than

I am at Mental Math it’s not hooked up

to a calculator like that’s another way

that you could really try to enhance

these systems but it has these raw

capabilities that are so flexible it

doesn’t care if it’s code it doesn’t

care if it’s language it doesn’t care if

it’s tax all of these capabilities in

one system that can be applied

towards the problem that you care about

towards your application towards

whatever you build

and so to end it the final thing that I

will show is I a little other dose of

creativity which is now summarize this

problem into a rhyming poem

and there we go a beautiful beautiful

poem about doing your taxes so thank you

everyone for tuning in I hope you

learned something about what the model

can do how to work with it and honestly

we’re just really excited to see what

you’re going to build I I’ve talked

about openai evals please contribute we

think that this model improving it bring

it to the next level is something that

everyone can contribute to and that we

think it can really benefit a lot of

people and we want your help to do that

so thank you very much we’re so excited

to see what you’re going to build

foreign