Music and Machine Learning (Google I/O’19)

[GOOGLE LOGO MUSIC] JESSE ENGEL: I am Jesse. ADAM ROBERTS: And I’m Adam. JESSE ENGEL: And we are from– thanks– we’re from a
project within Google called Project Magenta. It’s part of the machine
learning group within Google. And we work specifically on– it’s an open source
research project. So we do cutting edge
machine learning research. But we’re really
interested in the role that machine learning can
play for creative technologies and for artists and musicians. So everything we do we
put out in open source. And we also focus on
building tools for developers and for artists so that they
can actually actively explore using AI and machine learning
in the creative process. And so, if you’re interested
more in the project, just a real quick plug. You can go to If you’re more on
the research side, you can find all sorts of
research papers and data sets. If you’re a musician, there’s
a lot of integrated tools you can try. And we also have JavaScript
in other libraries to use for coders. And so I’m going to
pass it over to Adam who’s going to talk about what
this means more in practice. ADAM ROBERTS: Yes. So let’s look at
one concrete example of the type of work we do. This is a project that
just– we released last week called groove. And it actually started with
just inviting and actually hiring drummers,
professional drummers to come into the office. We recorded them playing on
an electronic drum kit that allowed us to capture the
symbolic representation of their performances. And then we trained
machine learning models to do various tasks with
this data that we thought could be interesting to
use in a creative context. So I’ll get into some of
those tasks in just a moment. But first, I just want to kind
of go through how we actually seed this stuff into the world. So first, we’re
a research group. So we write an academic paper. We submit it to a machine
learning conference. We also release the data sets
so that other researchers can take that and either
reproduce our results or even hopefully expand
on it, improve upon it. But we also always put our
stuff into open source. So we want coders to have access
to this technology, as well. So we’ll release a TensorFlow
implementation of our models. We, also, typically,
and in this case, we re-implemented
in TensorFlow JS. And this is just a
really useful technology for building interfaces
and applications on top of these methods. It makes it really
easy to do that when you put it into JavaScript. And then, lastly,
in this case, we put the data set into
TensorFlow data sets, which is just like a single line API
for accessing that data set so you can train new
models, or you can use it for whatever purpose you’d like. And then finally, we
typically build some sort of tools for either
musicians or artists, or whatever types of
creators we’re targeting. And in this case, we took our
JavaScript implementation, and we built some plugins called
magenta/studio for Ableton live. So if you’re not familiar
with Ableton Live, it’s a professional
software package that people use to produce
music or to compose music. And these plugins
add new functionality to that library that
didn’t exist before. So I want to focus
in on Drumify, which is one of the two plugins
that we built from the models we made with our group data set. And what this plug-in
does is it actually lets you take any
sort of rhythmic music and turn it into a drumbeat,
or to create a drumbeat that kind of accompanies it well. So imagine you’re a producer. You’re starting a new song. And you have a bass line
that you like a lot. But maybe you either don’t
have access to a drum kit or you’re not a talented or
skillful drummer yourself. You can use the Drumify
plug-in to take that bass line and create an accompanying
drumbeat to continue your compositional process. So let’s just hear a quick
example of this in practice. So first, you’re going to hear
a bass line that somebody made. [BASS LINE PLAYING] So now we’re going to
take that and we’re going to turn it into a drum
beat to accompany with Drumify. [BASS LINE WITH DRUMS PLAYING] [LAUGHTER] [APPLAUSE] So that was just using the bass
line, the onsets of the bass notes extracting that rhythm. And then with just
a few clicks, you can create a drum beat
to go along with that. So that’s just one
example of the types of things we’re working on. There’s a lot more We have a lot more projects. Everything is free, open
source, easy to use. So please check that out
if you’re interested. But now we want to transition
over and focus a bit on some of the creators
that have actually started using this technology
and their artistic practice. So, specifically, we have
two musicians here today that we’re really excited about. The first is Claire Evans. So she’s one third
of the band YACHT. She’s also an artist in
her own right and a very accomplished author. Her book is called “Broad Band– The Untold Story of the
Women Who Made the Internet.” And I highly
recommend this book. Yeah. [APPLAUSE] I consider it to be
required reading. So definitely check that out. But today she’s not going
to be talking about that. She’s going to be talking
about how her band has recently adopted some of these
machine learning technologies in their process. And we can see where that’s
taking them as a band. So let’s invite Claire
up to the stage. CLAIRE EVANS: Thanks, guys. Hi. Hi, everyone. Hi, I’m Claire, obviously. And I’m here as a
representative of my band. Yeah, we’re playing
tonight on the main stage. So if you like this, if
you find this interesting, we’re going to be playing
a lot of the songs that I’m talking about here
today on the main stage. So, for anyone who isn’t
familiar with Yacht, I’m just going to start this
with a quick sort of preface so you get a sense of
who we are and where we’re coming from when I
start talking about how we get into machine learning. So Yacht was founded in 2002
by my partner Jona Bechtolt, who’s sitting over there. It was named after
this kind of decrepit looking sign that he saw on the
street in Portland, Oregon– YACHT, Young Americans
Challenging High Technology. We have no idea what
this business did. It is actually
frankly unGoogleable. Some things are. We tried to find out many
times over the years. But even though we’ve had a
lot of different incarnations of the band, it’s been
17 years of making music, we’ve kept this acronym. Because it kind of
articulates something really core about who we
are, which is that we want to stay in constant engaged
dialogue with technology. I mean, obviously we’re not
Luddites, because I’m here. And we’re not particularly
adversarial, either. It’s just challenging
in the sense that we want to remain
engaged, always. And we always want to be
aware of the kind of push and pull between using tools
and having our work be affected by the tools that we use,
the tools shaping our work. I want to say that I’m
not a coder at all. I only understood, like, 20% of
what Adam and Jesse were even saying just now, and we’ve
been working with them for three years. So our relationship
to technology in the context of
art making has always been from the
outside looking in. And we’re interested in
tools, in getting access to interesting tools, and
then in finding our own novel ways of using those tools,
ways that are kind of sideways. We like to force
creative applications of noncreative technologist,
both consumer facing and nonconsumer-facing just to
see what we can do with them and see how they can be applied
to our historically pretty DIY, punk rock operation. If we have one basic
prime directive that dominates what we do, it’s
to do as much as possible with as little as possible,
which is something that we picked up from reading
a lot of Buckminster Fuller and drawing the requisite
analogies to our own background in decentralized punk rock
communities in the Pacific Northwest. I’m going to tell
you about a couple of projects we’ve done to give
you a sense of who we are. So, a couple of years ago,
we revealed new album cover artwork exclusively via fax. We did that by building a web
application that identified nearby fax machines to
our fans, like at FedEx or UPS or their parents’ offices
and sent the artwork directly to them with an
additioned cover letter. You know, a fax machine
transmits information through sound, which is
basically what music does. And we liked the
idea of activating a dormant technology,
or a latent technology, as a way of showing what
creative possibilities exist at the brink of obsolescence. And, on that step, we recently
wrapped a four year project to reactivate a really
dormant piece of technology in downtown Los Angeles,
a public artwork from 1975 called the “Triforium”,
which was originally designed as the world’s first
polyphonoptic instrument, so an instrument that could
synchronize light and sound into an original, new art form. The computer system that
it was built with in 1975, obviously, was not up to snuff. So it had been broken
for a very long time. But we got some money, and we
got an interdisciplinary team together, and managed
to bring the lights back using a custom built
LED installation. And we managed to salvage
the original 8-bit paper tape code that ran the
original computer system so that it could
be responsive to live musical input once again. So, again, this
is a cohabitation of old and new technologies. And we love making use of
things, things that are just waiting to be re-explored
and reimagined, and again, with as little
resources as possible. We became really
interested in machine learning about four years ago. Because we felt like it was
maybe the next step for us. We were interested both in the
reflective qualities of machine learning, like the
way that it would help us to understand ourselves,
and the generative qualities of machine learning, the
ways that it would help us to make something entirely new. So I’m speaking to you now
from basically the tail end of a year long project of
trying to find a machine learning driven compositional
process that would work for our purposes, that would
allow us to make music that wasn’t just passable as
human being generated music, but as something genuinely
interesting and meaningful and in line with our back
catalog of recordings. And so we didn’t just want
to make a record using machine learning. We wanted to make a YACHT
record using machine learning. And that’s a much
different proposition. I’ll get into the nitty gritty. Because I figure
that’s the theme. This is basically what we
used to make our record. We experimented with a bunch
of different strategies. But we found that the best
compositional tool for us was Magenta’s MusicVAE model,
which is a latent space interpolation model that allowed
us to find, essentially, like– I mean, I know this isn’t a
technical way of explaining it– but it allowed
us to find melodies hidden in between songs
from our own back catalog. And this is what the user
facing side of that model looks like when
we were initially recording the record last May. It’s a CO lab notebook,
so not exactly the kind of thing musicians are used
to bringing into the studio. And, unfortunately,
we started doing this before Magenta made user
friendly Ableton Live plugins for musicians. But you know, whatever–
it gives us street cred. So I’m OK with it. So in order to work this way,
to bring something like a CO lab notebook into the studio, we
have to do a lot of prep work. So, first, we manually annotated
our entire back catalog of music– that’s 82 songs– into MIDI. And then we broke out
all of the bass lines, vocal melodies, keyboard lines,
drum parts into four bar loops. Then we ran pairs of those loops
through the CO lab notebook at different
temperatures, sometimes dozens, if not
hundreds, of times in order to generate
this massive body of melodic information
that we could then use as sort of source material
for creating new songs. When we had this massive
amount of musical information, that’s when the human
being process began. This is when we started
manually culling through all of this MIDI data trying to
find interesting moments, things that spoke to us,
things that felt interesting, things that we wanted
to explore further. As some of you might know,
using machine learning to make a song with structure,
with a beginning, middle, and end, with a verse, chorus,
verse, is a little bit, still, out of our reach. But that’s a good thing because
the melody was the model’s job, but the arrangement
and the performance was entirely our job. So to demonstrate what I mean
a little bit more concretely, let’s focus on just a single
melody from a single song. So I’m going to play
you a melody that came straight out of
the MusicVAE model. It was one of several different
MIDI sequences generated by an interpolation between
two different YACHT songs, one called “Hologram”
and one that has a swear word in the title,
so I feel like I probably shouldn’t say it out loud. AUDIENCE: Say it! [ONE NOTE MELODY WITH METRONOME
Fuck You Till I’m Dead.” [LAUGHS] OK. So this particularly
melody for us was exceptionally
aesthetically interesting. But like every melody generated
by the MusicVAE model, it’s just an endless
sequence of notes that goes on and on until it stops. It’s not exactly pop material. So this is where
the rules come in. And I don’t mean
technical rules. I mean human rules, working
rules for our specific process. We have always thrived, like
many artists, I believe, under self-imposed constraints. Because when you sit
down to write a pop song about anything in the
world, it’s overwhelming. But if you have some
boundaries in place, you can begin to think
about it more concretely. So for us we decided that
every single song that we’re going to create
with this process had to be interpolated
from existing melodies from our back catalog. We hoped that this
would result in songs that had that kind
of indefinable YACHT feeling, which we don’t
know how to quantify and I don’t think the
model can, either. But that’s what we decided
would be our parameters. We decided also that we
could not add any notes. We could not add any harmonies. We could not jam or
improvise or otherwise interpret or, essentially,
be creative in any way. There was no additive
alteration, only subtractive or transpositional changes. So we could assign any
melody to any instrument. So that melody we just heard
could have been a keyboard line, could’ve been a bass line,
could’ve been a guitar line, could’ve been a vocal melody. That was our decision to make. We could transpose melodies
to our working key. And we could structure
and cut up and collage as much as we wanted. Now we’ll talk about lyrics,
another important element of any song. So for this project,
we collaborated with a different creative
technologist, Ross Goodwin, who was a free agent when
we started working with him. He’s now with Google’s Artists
and Machine Intelligence Group, which is something
that happened kind of a lot during our process. We worked with Ross to
create a lyrics model with sort of the same
ethos as our melodic model. So we wanted to
have it be kind of reflective of our
own inspiration, our own background, our own
history, our own back catalog. So the model that
we built with Ross was trained on a corpus
of 20 megabytes of text. So that’s about 500,000 pages
or approximately two million words. And these are all
from bands that we considered to be our influences,
music we grew up listening to, music our parents listened to,
our own music, our friends, and collaborators, and peers. We saw this as an opportunity
to kind of teach the machine our values, our history,
our community, and where we come from as artists. The end result was this. So this is one
instance, one block of output from the lyrics model
that Ross created with us. Which we printed out on
a single sheet of dot matrix printer paper. Because, did you know, they
still make dot matrix printers, and you can buy them on Amazon. And we wanted to visualize it
as something really physical. So we had this
massive block of text, one continuous sheet that
we brought into the studio with us. And I literally sat down
on the studio floor, highlighting
interesting passages. It’s interesting
because it contains a range of low to high
temperature material. So the low temperature stuff,
because it’s taking less risks, is much more repetitive,
much more simplistic. It’s kind of a punk
rock lyrics engine. It taps into the more
elemental things in songs. So there’s entire pages
and pages and pages of repetitive phrases,
like, I want your brain, or I want to rock,
or I love you. That’s the really low
temperature material. And then the high
temperature material is full of nonsensical
run on sentences and lots of really weird proper
nouns and names of things. And so in order
to make songs that had a range of emotional
sort of feeling, we combined a lot
of low temperature and high temperature
material into the same songs. And as with the melodies, we
didn’t take anything as is. We really went through
manually and combed through and looked for exceptionally
interesting phrases, or images, or passages, or things
that spoke to us and felt like they were
meaningful to who we are and where we’re coming from. The biggest influences on our
working method with the text were really kind of like low
tech, anti-technological, really. I mean, we were looking at
William S Burroughs cut up writing methods
and the Dadaists. High tech, low tech is
kind of our operandus. So in order to
actually make songs with this giant block of
text and this giant pile of melodies, we actually had to
take the interesting passages and then place them
on top of the melodies that we decided would be the
interesting vocal melodies. The problem is the
melodies generated by VAE don’t have any relationship
to the human body, least of all to our human bodies,
or to our competencies as performers and singers. And they certainly
have no relationship to the internal rhythms
of the English language. So a lot of the time we had
to break the lyrics on top of the melodies in order to
sort of force them into working. And that meant we had to pull
apart syllables and pronounce things in really weird
ways and do things that were deeply unintuitive
and which will certainly lead to a lot of people
listening to this music and mishearing
lyrics constantly. It’s like [INAUDIBLE] maker. So let’s take a
closer look at some of the lyrics we
decided to work with. Here’s a passage from a song. “I want your phone to my brain. I want you to call my name. I want you to do it, too. Oh, won’t you come? Won’t you come? Won’t you work on my
head, be my number nine? To be alive, to be
with you, like a weed. I can feel it in my
head like a dog in bed.” I know. [APPLAUSE] So speaking as someone who
normally writes songs that have a relationship
to meaning or cadence or are personal in some way,
singing these kinds of lyrics really forced me to step
outside of my embodied habits and develop a relationship
to words first as sounds. And then to grow to love and
appreciate the meaning that comes after sound. It’s pretty liberating. But the lyrics also contain
these really strong, strange images that I
would never have written. Like, “I can feel it in my
head like a dog in bed.” I mean, what a phrase. It has the form of
idiomatic English. But the meaning is
completely sideways. And yet, it also still
kind of means something. Because I think
we can all easily imagine a feeling that’s
as warm and willful and present as a dog that’s
sneaking into bed at night. And that’s the magic of
working with this stuff. It really opens you up to new
ways of thinking about language or thinking about music and
of thinking of the interplay between those two things. OK. So let’s hear how these
lyrics fit on that melody that I just played you, which
we determined at the outset would be a good vocal melody. [METRONOME AND MUSIC PLAYING] [LAUGHS] [APPLAUSE] Thanks. So one of the most interesting
and challenging things of working in this
way is actually performing the
generative material. Like I said earlier, it’s often
far beyond our competencies. And sometimes the things
that sound simple– I mean, this sounds simple,
but it’s like sideways from the embodied patterns of
play of singing and performance that we’re accustomed to. And I can’t tell
you how many times we are in the studio just trying
to nail, like, some seemingly simple guitar line. But just because it was
slightly different than what we were accustomed to doing,
it was impossible to do. And that happened a lot. And it was kind of
brain breakingly difficult in many moments. But at the same time,
it often forcibly pushed us outside of our
comfort zone, pushed us outside of the patterns
that we had fallen into, and often patterns we
hadn’t even perceived were there to begin with. And it forced us
to play differently and think differently
about how we work. OK. Finally, I want to play you the
first minute of the final song with everything together. So you’ll hear the
first chorus, which has its own, like,
amazing, made up idiom. And then you’ll hear the
verse I played you before. So again, melodically,
everything you hear was generated by
the MusicVAE model. But the performance,
arrangement, production, structure, everything
else is ours. And this is kind of what we see
as a collaborative strategy. It’s not so much about,
like, as an artist being replaced by
machine learning, but rather being
given the opportunity to focus our energies
in different directions in different places than
we’re accustomed to. It’s not about revoking
control at all. It’s not about letting go. It’s about holding on and
letting the process change you. You could pump it, maybe. [MUSIC PLAYING] It’s like a real song! So obviously, this
is just one way of working with machine
learning to make music. And it’s not even, like, really
the right way, I don’t think. There’s countless approaches. And many of them are going
to be far more technical. Again, we are getting
in where we fit in. We are engaging at the
level that we know how. But beyond the challenges
that it faces, like, that it brings to workflow,
because it’s definitely not intuitive or fun to pull up the
CO lab notebook in the browser, in the studio,
the challenges are really satisfying and exciting. Because they’re the
kinds of challenges that make you stop and consider
what you’re actually doing. And, for us, the process has
been infuriating at times, but ultimately, really
deeply gratifying. And the best way
I can describe it is that it feels like
you’re doing a puzzle. And then when you’re done,
the picture on the puzzle is not what’s on the box. But who cares? Because who cares
what’s on the box? OK. There’s more to talk about. We can talk about
it in the panel. Thank you. [APPLAUSE] JESSE ENGEL: That’s awesome. Cool. Yeah. You can sit here. Yeah. So we’re really grateful
to YACHT for coming to us so early in this process. Because you can see
that a lot of our tools, through your story, were just
sort of in their early stages. And we got a lot of
really useful feedback about how an artist
would actually want to interact with
different types of things in machine learning. And this next project
we’re going to introduce I feel sort of shows how we’ve
come along this process to now make these tools
available to the point where someone can just
do a project in a shorter amount of time. And so we’ve done a project
with the Flaming Lips that’s very specifically for– AUDIENCE: Woo! ADAM ROBERTS: –hey– that’s
very specifically for I/O. And so we’re going to give
you a sneak peak of things that are going to be happening
for the concert tonight. And then we’re going to have
a nice discussion panel here talking about all these things. So we’ll play a video. CLAIRE EVANS: One more. ADAM ROBERTS: One more. [VIDEO PLAYBACK] – Any time that the
Flaming Lips have stumbled upon a new
little instrument, it’s changed what we created. – The goal of the
Magenta team is to really explore the role of
machine learning in creativity, in the creative process to
enable people to express themselves in new ways. Piano Genie is the great
work of an intern we had, Chris Donahue. He designed an algorithm to
be able to take piano playing and try to reproduce the piano
playing only hitting a couple buttons on a controller. And it naturally
comes out sounding a lot more like a
professional piano player. – One of the things
that we really focus on that’s very different
from a lot of other machine learning projects is
how can we let people manipulate these algorithms. – It’s been really exciting
collaborating with the Flaming Lips. Because they’re so creative
in their approach to music that we just sort of showed
them everything we have. We’re hoping to create
a new experience where the audience can co-create music
with the band in real time, using Piano Genie. – So we made an intelligent
musical instrument and melody creator out of
fruit with Google. [MUSIC PLAYING] [APPLAUSE] When I played the
fruit, I’m touching it. And I didn’t know exactly what
it was going to do every time. – Each one is
announcing what it is. – Banana. – Green apple. – Maybe it’s in the key of G. [STRANGE ELECTRONIC VOICE] – So we worked with Google AI. And they sent us the
software called Piano Genie. You hit a note, and it
automatically plays music. [MUSIC PLAYING] – And the more that
we play with it, the more it understands
what it’s playing against and who it’s playing with. – So you play a different
rhythm or a different note, and this goes on and on. So it actually wrote a melody
that we would not have written. – Instead of the machine
doing it for you, you’re kind of encouraging
the machine to do something. It’s kind of cool. 1, 2, 3– if you’re
a banana, that’s pretty good for a banana. [LAUGHTER] [END PLAYBACK] [APPLAUSE] JESSE ENGEL: So– WAYNE COYNE: All right. Am I– am I working? There you go. Yeah. Yeah. JESSE ENGEL: Yeah,
there you are. WAYNE COYNE: All right. JESSE ENGEL: So
this is Wayne Coyne. WAYNE COYNE: Hello, everybody. [APPLAUSE] JESSE ENGEL: Yes, hello. So we wanted to start this
discussion because there’s obviously a lot of hype
about machine learning and artificial intelligence. And when you come to
one of these projects, there can be a lot
of preconceptions about what it is that’s
going to be like to interact with these things. So, maybe, Claire, I
want to start with you, just talking about how
those preconceptions met or what was different
than what you expected. CLAIRE EVANS: Oh. Yeah, I mean, I’m not ashamed
to say that, at the outset, we thought we would just
push a button and make songs. We thought that’s where
we were at in terms of the technological
development of AI. I mean, that’s maybe what
the hype of the mainstream makes us believe. That it’s going to come
for our jobs in this way that’s so visceral. But we really thought we could
sort of put all of our songs into a machine and then it
would give us a new YACHT song. And we found out very
quickly, of course, that that’s not at
all where we’re at, which was really exciting. Because it meant that we got
to be the humans in the loop. And we got to have way more
control over the process than we initially
believed we would. I mean, initially
we thought it would be about committing to
whatever the machine made and then we would have to play
it and perform it and make it our own. But actually, we got to kind
of co-create with the models. And we had to take a
much stronger hand. And we had to come up
with rule sets and systems and processes that
were uniquely our own. So that, even if
you gave the same– we gave other musicians or
really anyone in this room the exact same lyrics
generated machine output and the same notation data,
we’d all make different records. Because it’s really about
the personal interpretation of what to do with this,
all this source material. So I was pleasantly surprised
at the lack of sophistication, I suppose. Yeah. JESSE ENGEL: Yeah. Wayne, does that vibe
with your experience? WAYNE COYNE: I would
say our thing had so much momentum going. And I always have– I feel, like, I have
too many questions. You know? I have questions
even to ask you. What was the word? OK. So there’s the light thing
that’s in Los Angeles, right? CLAIRE EVANS: Triforium. Yeah. WAYNE COYNE: Yeah. So what’s the word
it says it’s– CLAIRE EVANS: Polyphonoptic. I know. I figured you would like that. WAYNE COYNE: What is that? I mean, is it even a word? CLAIRE EVANS: It’s,
like, phonoptic. It’s not a real– I mean, it’s a word. WAYNE COYNE: OK. CLAIRE EVANS: So
any word is a word. WAYNE COYNE: But
it’s a made up word? CLAIRE EVANS: It’s– every
word is a made up word. WAYNE COYNE: I was, like, what? [LAUGHTER] Well, that’s true. So, I mean, from my experience,
I didn’t really have any idea– how– really what
we were going to do. I think we were
deciding what we were going to do based on what
we did five minutes ago. I mean, everything
that we’ve done starts to accelerate with
ideas and energy and momentum. And I think that’s why you
guys wanted the Flaming Lips, you know? Because, like, they’re going
to do something, you know? And you talked about these
self-imposed rules or whatever. And I think we were lucky that
the only rule we had is you have to do it now. You have to get going. And I love that. I mean, a lot of times you do,
with time, you second guess. What is this? Is this any good? And you go back and redo it. And, sometimes, with that
energy of do it, do it, do it, you make decisions and
you’re 20 decisions in before you’ve had time
to be too insecure about it or whatever. So I think the more that– and we’re very lucky because
we had you guys sitting there every step of the way. If something didn’t
work, we just blamed you. And said you know, you’ve
got to fix this thing. You know? And to me that’s always
the intimidating part of a new thing. We have a brand new
Volvo and I don’t quite know how to turn on the Sirius
stations in the car yet. You know, you’re driving. You’re trying not
to kill anybody. And then you’re trying to get
to the last Beatles station that you were on. And so I always get intimidated
that I’m going to just turn the car off or something. So if I don’t know how it
works, I’m always afraid. But having you guys there sort
of showing us how it worked. And then, we kind of immediately
want to go, oh, well, I want to do this. I want to do that. And I think that’s not a luxury
that very many people have. But I felt like
that encouraged us to be as absurd as
you guys, you know, thankfully allowed us to be. I mean, the idea that
this quickly went from– there’s a piano there
with this little device. And then 30 seconds later,
there’s a bowl of fruit there, and Steven’s playing this
amazing classical kind of piece using bananas and
strawberries and oranges. And when you’re there
and all that’s happening, it’s exhilarating. And I think for someone
like me that idea that we’re not playing
it on a keyboard, we’re not playing it on a
guitar, we’re not playing it– we’re playing it on
fruit, just takes it in to this other realm. And so, some people say, well,
that’s what, maybe– that’s what children would do. You know, that say yeah,
can I play musical notes on an orange or a strawberry. And I’d be like, yeah. Yeah. ADAM ROBERTS: Yeah, so even
though we were there, right, so we could help get around sort
of any issues you came across, there is some level
of unpredictability to these algorithms. Right? WAYNE COYNE: Yeah. ADAM ROBERTS: And I
think you guys have both experienced that in your work. So I’m curious to
hear, maybe starting with Claire from your process. Because there was
parts of things that you’re giving
up some agency to, right, like, you were letting
it write lyrics for you. And in your case, you’re going
to be doing a live performance where you don’t know exactly
what it’s going to play. WAYNE COYNE: Yeah. ADAM ROBERTS: How do you,
even though you’re giving up a little bit of this
agency, how do you keep control and make sure that
your artistic vision is still shining through on top of this? CLAIRE EVANS: Yeah. I mean, I think
ultimately nothing goes out into the world
with our name on it that hasn’t been
sussed over for months on the computer in
our living room. We have the ultimate agency. And the fact that we’re even
doing the project to begin with is like a kind of
aesthetic provocation that we’ve decided
we want to do. But at the same time– I don’t know– I mean, I think
what’s being replaced here in terms of our process
is the initial jam, the initial sound gathering,
the initial filling up the notebooks
with lyrics ideas, the sort of generative moment. And instead of jamming in
a room with each other, we’re jamming in a server
with a model, right? And what we end up with is
a mass of source material to work with. And in my mind, that’s
when the work begins. That’s when the
creative work begins. And I’m a writer, too,
so I believe strongly that writing is editing. What you put on the page
in the very beginning is just like this thing that
happens in a fugue state that’s a total mess. And then it’s useless until you
actually do something with it. And I think it’s the same thing. I mean, a notebook full of
lyrics ideas isn’t a song. A jam you did in the studio
that sounded really cool in the moment isn’t a song. What you do with those things,
how you bring them together, how you structure them, how
you arrange them, produce them, perform them,
commit to them, and then perform them for
possibly years to come, that’s the song. WAYNE COYNE: Yeah. CLAIRE EVANS: And
so I don’t really feel that much like we
gave up anything, really. We just sort of
sped up or changed the nature of the process. I don’t even think
it was faster. Actually, I think it was slower. I think it took way longer
to manually annotate our back catalog in
the MIDI and come up with a corpus of 2
million words of things that we thought would be
the same things that we’re kicking around in our
head on some level. I think it was actually
more tedious in many ways. So yeah, it’s, like, you
decide what you give up. But you also really
determine the parameters of how you do it. And then what you do
with it afterwards is what really matters. JESSE ENGEL: And that was more
in a compositional process, where you have to deal with
how much are you controlling and how much are you editing. And so Wayne, with this thing
that’s more of a performance, right, you have this
[INAUDIBLE] interaction where it’s, like,
how much are you controlling what’s happening
and how much is there a chance element in the performance? WAYNE COYNE: Right. I mean, the way we looked
at it, and even the way we looked at even– you guys are– we’re
accepting that we’re bringing a collaborator in. You know what I mean? Which we do that all the time. And there’s sometimes
you regret it, you know? [LAUGHTER] We had a digital trumpet
part on a song once. We thought it was
great and whatever. And we had a trumpet
player come to the studio. And we said, yeah, play
whatever you want, whatever. You know? And little by little,
we didn’t like what he played, for whatever reason. It wasn’t his fault. We were
stuck on a certain idea. And, by the end of
it, we said, well, can you play exactly
what we played, only play it on a real trumpet? And, after he left, we actually
didn’t even use his trumpet, you know? But all this is a process of
saying do I like this thing? How much do I care about it? How much does it matter to me? And you go through all
these different reasons. And there’s no real reason. Your reason at the end
of it is I just like it. I don’t have any
deep meaning to it. And so for us, I
mean, we went into it knowing we want to do
this collaboration. And unlike the trumpet
player, everything that we would touch– and I said this to
you guys there– we were starting with
sounds that I really liked to begin with. And a lot of times I’ll
start to hear the sounds and I’m already starting
to think of a song. I’m, like, hey, play that again. You know, you’re
going off on things. I’m, like wait. Go back to this other sound. So, to me, that’s really
all I’m ever doing. It’s sparking something
that’s saying, oh, I can turn that into a song. I already know what I want
to sing to that or whatever. And knowing that we’re going to
turn it into something that we have to perform– we never consider that
when we’re thinking of what it’s going to be. Yeah, to me, those are
just two different worlds. You know? JESSE ENGEL: Yeah. WAYNE COYNE: And
knowing that we know that we want the audience to be
part of it, I think as we went, our initial surge of an idea
quickly turned into a way to make that something
that we thought– when we’re there
with an audience and they’re doing
this thing with us, they’ll understand their
contribution to the thing. We talked about
interactive things where you’re not quite sure
what you’re doing to make it go. You know, you walk into a lot of
things where you sort of feel, like, you’re supposed
to move your arm or stand in a certain place
and it does something. But you can’t quite
tell what it’s doing. And I think we’ve
gone to great effort to say when this
thing is happening, good, bad, indifferent,
you know you contributed to its growing, its dying,
its boredom, its excitement. You contributed to it. Which to me, is
already what we want. Because to me, that’s
where the fun is. That’s where the
energy comes from. We’re not really saying we’re
going to create the greatest piece of music ever. We might. But to me, that’s not what
concerts or anything is. It’s we’re all going to
participate in this thing. We’re going to create
our own energy, create our own time,
our own happening. JESSE ENGEL: And
so you mentioned there’s some unpredictability
in the interaction with this thing. And a lot of music
and technology has always co-evolved with
that unpredictability. Guitar amplifiers
weren’t meant to distort. But then people turned
them up and they found this sounds great. You know? Or the 808 drum was
originally supposed to be an accurate reproduction
of a drum set, which it’s not. But, man, it sounds really
good in electronic and hip hop music. WAYNE COYNE: For sure, yeah. JESSE ENGEL: So with
these new technologies, machine learning,
like, Claire, did you have any experiences where
they failed in ways that, you know, maybe the failure
was the interesting aspect? CLAIRE EVANS: Yeah. I mean, I think it’s
all about the failure. The minute it’s perfect is
when I stop being interested, I think. I think those moments where the
melodic information deviates from anything a human
being would do or perform, or those moments
where a tool doesn’t meet your expectations of
what it’s supposed to do, but it does something
completely different that’s more interesting
like the 808 or the NSynth, by the way, which is an amazing
neural synthesizer that Google makes. But, yeah, those are
the moments where I think that’s the most
interesting output. Because it allows
you to determine what your own taste is sometimes. I think that taste is often
a response to something rather than something that’s
coming directly from you. You see something. You’re, like, I like it. I don’t like it. And that’s how you kind
of determine who you are, what you want to be doing, how
you want to express yourself. And I think it’s very
interesting and helpful to have this sort of semi-neutral– I mean, I know saying
that AI is neutral is kind of a loaded thing– but
semi-neutral other party in the room that is not
one of your band mates that proposes an idea. And we can all agree with it. And we can all disagree with it. No one’s feelings are hurt if
we dispense with that idea, because it didn’t
come from any of us. We can fall in love with
it and move forward with it as a group or not. And we can determine who we are
in terms of our relationship to each other by
agreeing or disagreeing on some other generated output. But, I mean, the
NSynth, for example, is kind of like the
808 in many ways. Because it’s this tool that
is supposed to do one thing, it’s supposed to sort of split
the difference between lots of different sounds
using light and space. And it kind of, in my opinion,
sort of fails at that. I mean, it doesn’t convincingly
spot the difference between a car horn and a flute. Because the sort
of sample rate is so low that it ends up
sounding kind of reedy and wonky and weird. And initially we thought
that that was, like– we thought that was a failure. We thought it was
not interesting. It wasn’t until we started
thinking about it like the 808, where we realized that it was
a tool with its own aesthetic, its own kind of weird,
wonky, reedy sound that it became really
interesting to us. And now it’s a huge
part of our record. Because it is, again,
one of these objects that’s both high tech and
low tech at the same time. And it sounds really low fi. But it takes, like, millions
of dollars of machine learning research to make that sound. That juxtaposition is
pretty fascinating to us. ADAM ROBERTS: Yeah, so I
think, unfortunately, we’re out of time. But let’s thank Claire– WAYNE COYNE: Yay! ADAM ROBERTS: –and
Wayne for coming. [APPLAUSE] WAYNE COYNE: Yep. ADAM ROBERTS: Thanks. All right, thank you. And they’re both
playing tonight. So check out the show. [GOOGLE LOGO MUSIC]

, , , , , , , , , , , , , , , , , , , , , , , , , , ,

Post navigation

10 thoughts on “Music and Machine Learning (Google I/O’19)

  1. So you have your algorithm identifying things but it only can identify stuff its familiar with in familiar scenes. Whats missing if I see a pink sheep on a stair case my mind adapts but a computer will likely relate it to something it is familiar with and identify that there's a pile of flowers on a stair case. So in short AI is today good at deep learning with things it can get familiar with but is not good at adaptive learning like a human or even an ant colony can be.
    So where do you start well I would suggest a robot like ant colony where you challenge the ant colony with things unfamiliar to it to test new systems abilities to adapt or that sort of thing. Imagine that Ai in the future ran more of our lives but wasn't very adaptive this could lead to some small or big problems as new challenges arose. It also gets one thinking about the nature of consciousness is consciousness simply a mechanism that allows life to adapt to new challenges.

  2. Almost switched to YouTube music from Spotify. The AI / Machine learning replaces the 1000s of playlists I have. YouTube is better at being the music plug 🔌

Leave a Reply

Your email address will not be published. Required fields are marked *