Multimedia – Understanding Technology – by CS50 at Harvard


[MUSIC PLAYING] SPEAKER 1: Multimedia, odds are you see
it every day, you hear it every day, but what is it? Well, let’s start with audio, what
you hear coming out of a computer. Turns out computers are really good
at recording and playing back audio, and they’re really good at
generating audio as well. And they can do so using
any number of file formats where a file format is just a way of
storing zeros and ones on disk in a way that certain software
knows how to interpret it. So let’s start with a
particularly common file format for musical instruments known as MIDI. It turns out that using the MIDI format,
M-I-D-I can you store effectively the musical notes that
compose some song. And you can do this for
different instruments, and you can then play
these instruments together by telling the computer
to interpret those notes and then render them based on
particular choices of instruments. For instance, this here is a
program called GarageBand on Mac OS, and I’ve preloaded a MIDI file
that I’ve downloaded online and I daresay you will
soon recognized the tune. Let me go ahead and hit play. [MUSIC PLAYING] All right, well, that
doesn’t sound as good as you might remember it sounding
in the movie, but why is that? Well, that’s because my computer was
synthesizing that music based only on those musical notes. So that wasn’t an actual recording
of an orchestra performing that song, but rather it was a
computer synthesizing or generating the music based on an
interpretation of those notes. So MIDI is especially
common among musicians who want to share music with each other. It’s especially common in
the digital musical space where you do want the computer
to synthesize the music for you. But, of course, we humans
are generally in the habit of listening to songs as we know and
love them on the radio or from CDs back in the day or
streaming media services. And those are songs that have actually
been performed typically by humans and recorded often in a concert or in
a sound studio, so they sound really, really good and really really pristine. Well, you don’t have to use MIDI
for those kinds of experiences rather you can use any
number of other file formats. For instance, one of the
earliest formats for audio and still one of the most
common for uncompressed audio is called the wave file format, which
can store data in an uncompressed form so that you have a really,
really high quality versions of some audio recording. But also popular and
perhaps more popular among consumers is that known as MP3 or MPEG3,
which is a file format for audio that uses compression to significantly reduce
generally by a factor of more than 10 just how many bits are necessary to
store some song on your hard drive or on your music device or on your
phone or any other form of technology where you might store music. And it does so by really
throwing away zeros and ones that we humans can’t necessarily hear. Now, some people will
disagree, and true audio files might disagree and
insist that, actually, you can tell the difference
among these file formats, but that may very well be the case
because there’s a trade off here. If you want to use fewer bits
and really fewer megabytes to store your audio
files, you might indeed have to sacrifice some of the quality. But the upside is that you might be
able to store on your phone or your iPod or some other device 10 times as much
music as a result of that compression. So audio compression is generally
what’s known as lossy, L-O-S-S-Y, whereby you’re actually losing
some of the quality or the fidelity of the music, but the gain is that
you’re using far less space to store that information. A similar file format in spirit is ACC,
which is commonly used for audio files as well as inside video files for audio. And that’s something that you might
see when you download files from– via iTunes, for instance, or the like. And then there are streaming
services these days like Google Play and the Amazon store
and Apple Music and Spotify, Pandora, and others that don’t necessarily
transfer files outright to your computer, but stream the
bits to you so that they’re actually being played in real time so long as
your internet connection can keep up with the required bandwidth. So how do we think about the
quality of these recordings, whether we’re using any
number of these file formats? Well, you can think of it in
terms of at least two parameters. One is sampling frequency,
the number of times per second that we actually take a
digital snapshot, so to speak, of what it is the human would otherwise
be hearing in person so as to then represented digitally
using zeros and ones. And the second parameter
would be the bit depth, just how many bits are you
using for that snapshot in time, some number of times
per second, in order to represent the pitch and the volume
and what it is the human is seeing. And if you multiply those
two values together, the bit depth and the
sample rate, will you get just how many total
bits are necessary to store for instance one second of music? And these file formats
vary and allow you to vary exactly what these parameters are. So by using fewer bits, you
might be able to save space but get a lower quality recording,
or if you want a super high quality recording, you might use a
higher bit rate all together. So now let’s transition to graphics,
what we see in the world of multimedia. Turns out here too there’s multiple
file formats for representing graphics. And what is a graphic? Well, graphic really if
you think about it is just a whole bunch of dots otherwise
known as pixels both horizontally and vertically. Indeed most images that you and
I see on the web, on our phones, on our computers are
rectangular in nature, though, you can make some of
the images transparent, so they might appear to be other shapes. But at the end of the day,
all file formats for images are rectangular in nature,
and you can think of them as just a grid of pixels or dots. Now in the simplest
form, each of those dots might just be represented
by a single bit, a 1 or a 0. So for instance, here if
you look far enough back, is what appears to be a
very happy smiley face. But it’s pretty simply implemented. If you think of, again,
this rectangular region as just having a whole
bunch of dots or pixels, I’ve pretty much colored in in
black only those dots necessary to convey the idea of
a happy face and left in white any of the dots that are
otherwise part of our background. And you might then
consider the white pixels to be represented with a
one, and the black pixels to be represented with
a zero or vice versa. It doesn’t really matter, so long as
we’re consistent in our file format. And so if you take a step back,
you can, kind of, sort of, but it’s really hard to see the same
image even among those zeros and ones, but that might be the simplest
mapPNG from binary to an image. You simply have to decide that
there’s some number of bits horizontally, some number
of bits vertically. And if it’s a 1, it’s a
white pixel, and if it’s a 0, it’s a black pixel or
equivalently vice versa. But, of course, we don’t generally
use black and white images alone, on the internet, on our
phones, on our computers. Indeed, the world would be pretty
boring if it only looked like that. And that’s, indeed, how it
looked way back in the day even before there was digital and
before we had file formats like this when you just had black and white TV. But that would really
be similar in spirit to what we’re looking at here
with some gray scales as well. But here let’s focus on
color and the introduction of color in a digital context,
RGB, red, green, blue. If you’ve ever heard this acronym,
and even if you haven’t, this represents the three colors that
can be mixed together really to give us any color that we want– RGB meaning red, green, and blue. So using three different
values, how much red do you want, how much green do you
want, how much blue do you want, you can tell a computer to colorize
each of those dots in a certain way. Now if you have none of these colors,
you’ll actually get a black dot. And if you have all of these colors
mixed together in equal form, you’ll get a white dot. But it’s in the grades in between that
you get all sorts of disparate colors. So let’s consider this. Here is three bytes before you, and
each is a byte, because each of these is 8 bits where, again,
a bit is just a 0 or a 1. So I have eight bits here, eight
bits here, and eight bits here. The first byte of bits, first
eight bits, is, of course, all ones apparently. The second byte is all zeros, and
the third byte is all zeros as well. So if you view each
of these bytes, 1, 2, 3 as representing how much of a
certain color red, green, blue, RGB, this appears to be a lot of
red, because all of these bits are ones, no green and no blue. So are RGB, red, green, blue,
lots of red, no green, no blue. And so indeed this is how
a computer would typically using eight bits per color or 24
bits in total, 8 plus 8 plus 8, would represent the
number we know as red. So that is to say if you
think of this whole screen as just one dot– it’s
not quite a square. It’s a rectangle in
this case– but if you think of this whole
screen as just one dot, if a computer wanted
to make this dot red, it would store a pattern of 24 bits,
the first eight of which are all ones, the second eight of which are
all zeros, and the third of which are all zeros as well. And it will interpret the
first of those eight bits as meaning give me a lot of red,
give me no green, give me no blue, and thus you get a whole screen full
of red or a whole pixel full of red. What if we change it up? What if we have a zero byte, a byte with
all ones, and then another zero byte. Thereby, making the red zero, the
green all ones, and the blue all zeros. Well, indeed, we’ll get a screen
filled with all green using that encoding of 24 bits. And you might guess in the end here, if
we have zeros and zeros and then ones, RGB, this time we’re going to get blue. That’s how a computer
using 24 bits would represent a dot that’s entirely blue. Meanwhile, if you
wanted represent black, you would use all zeros for
each of the R, G and B values, and if you wanted to represent white,
you would use all ones for each of the R,G, and B values. And you can get any number of
colors in between these extremes in any number of variations
of red, green, and blue by just mixing those colors
together in different quantities. Now it turns out when we talk
about graphical file formats, we don’t typically talk in terms
of or think in terms of binary. We rather use something
called hexadecimal. Whereas binary just has
two digits, zero and one and whereas recall decimal has
10 digits zero through nine, hexadecimal is a little different. It has 16 possible digits. And it’s a little weird, but it’s
at least pretty straightforward. Those 16 digits are zero
through nine, and then A through F. In other words, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. And so, of course, zero is the
smallest number we can represent, and 15 is going to be the
largest number we can represent, which is to say that F represents a 15. So, in fact, let’s consider an example. Here is a pattern of eight
bits, all of which are one. Let me go ahead and add a little
bit of space to these eight bits just to separate them
into two groups of four, because it turns out one of the
nice features of hexadecimal mathematically is that each
hexadecimal digit zero through F represents in total, four bits. Which is to say that we can take
a number in binary like this, look at it as two halves,
one half of a byte followed by another half of a byte, and
use one hexadecimal digit instead of four binary digits to
represent the first four bits. And then one other hexadecimal digit
to represent the other four bits. So we can take something that takes
eight symbols represent and widdle it down to just two, which
is pretty convenient. And so, in fact, it turns
out that in hexadecimal if we had all zeros, in
hexadecimal that would just be 0. But if we have all ones, 1, 1, 1
and we convert that to hexadecimal, that’s going to be if this is
the one place and the twos places and the fours place and
the eights place, that’s going to be the number
15, otherwise known in hexadecimal as F. Which is to
say if you have a byte of bits, 8 bits, all of which are ones,
you can think of that same byte as being two hexadecimal digits
FF, as opposed to thinking of it as 1, 1,1,1, 1, 1, 1, 1, it’s just FF. So it’s a more succinct
way of representing the exact same information. And so accordingly, if you want to think
about red a little more succinctly, you don’t have to think about it in
terms of eight ones and eights zeros and eight zeros, you can think
of it in terms of FF, 0, 0, 0, 0 just because it’s more succinct–
similarly for green, 0,0 F, F, 0, 0 and for blue 0,0, 0,0, F,F. It’s
just a more succinct way of explaining oneself, and indeed a lot of graphical
editing programs like Photoshop being one of the most popular actually
use this notation certainly instead of binary and also instead often
of decimal just by convention. So now let’s consider some
specific file formats. If you’re a PC user, you might
not have seen this in a while, but odds are when you
did it was for quite a few years, this beautiful rolling hill
with a beautiful cloudy sky behind it. This was, of course, the
wallpaper or the background image that came by default with Windows XP
on operating system from Microsoft for PC computers. So the very first time you turn on
your computer and, perhaps, logged in, you would see a screen like this
and maybe some of your icons and your recycle bin and the like. Now as an aside and spoiler, this
is what that same hill apparently looks like today. So it hasn’t necessarily aged well,
but for our purposes what’s interesting here is what this image was stored as. It turns out that this image originally
was a bitmap file, BMP, or bitmap, B-I-T-M-A-P, to pronounce it out loud. And that file format really
is what that word implies. It’s a map of bits. It’s a grid of bits, which is perfectly
consistent with our definition earlier of a very simple
smiley face using just zeros and ones or black and white dots. This case, clearly, has
many more colors than that, and indeed it’s certainly
the case in general. The graphical file formats
on computers support dozens of colors, hundreds
of colors, thousands, maybe even millions of colors,
certainly, more than just black and white alone. But there’s a finite
amount of information here. And even though this looks like a
beautifully crisp green grassy area and a beautifully blue– a beautifully blue sky with
some very smooth clouds, if we actually zoom in
on those clouds, you’ll see that indeed an image is
really just a grid of dots. In fact, let me zoom in on those clouds,
and I’ve not done any alterations. I simply used a graphical
program to take that same sky and zoom in, zoom in,
zoom in as much as I can. And as soon as you zoom in enough, you
see that that cloud that previously looked especially smooth to
the human eye, really isn’t. It’s just that my human eyes
can’t really see dots, especially clearly when they’re really small,
and there’s a very high resolution so to speak– a lot of pixels
horizontally and a lot of pixels vertically in an image. But if I do zoom in on that, I actually
do see the pixilation so to speak, whereby you actually see the dots. And you can see that those
clouds are really just roughly represented as a green– as a grid of dots or a map of pixels,
a rectangular region of pixels. So that’s all very
interesting now, because it would seem that we don’t
have an endless ability to zoom and zoom and
zoom in and see more and more detail unless that
information’s already there. And so, much like with audio, when you
have the choice over just how many bits to use, so in the world
of images do you have discretion over how many bits to use. How many bits do you use to
represent each dots color. And that might indeed be just 8 bits for
red, 8 bits for green, 8 bits for blue, AKA 24-bit color, but resolution
also play– comes into play. If you have an image that’s only 100
pixels, for instance, by 100 pixels, horizontally by vertically,
it might only be this big. Now that might not big enough
to fill– be big enough to fill your whole background
wallpaper on your computer, and so you might try to
scale it up or zoom in on it. But when you do that, you’re taking
only a limited amount of information, 100 pixels by 100 pixels,
and you’re essentially just duplicating those pixels making
them bigger and blotchier just to fill your screen. Better would be to not start
with an image with so few pixels, but rather get a much
higher resolution image. And indeed, this is what you get
with newer and better camera phones these days, newer and bigger,
better digital cameras is among other things do you get
higher and higher resolution. More and more dots, so that the
dots ultimately that we humans see are so small on our screens,
it looks ever more smooth than, say, an image like this. So generally speaking,
higher resolution gives us higher fidelity and a cleaner image. The other factors in cameras
certainly play into that as well. But there’s something
else I notice here. It seems a little silly that I’m
using the same number of bits to represent the color of every
one of the dots on the screen. Because even though I do
see a few different shades of gray or white in there
and light blue and dark blue, I see a lot of identical
blue throughout this image. There’s a lot of redundancy,
and indeed if we rewind, there’s a whole lot of
blue in this image itself. There’s a whole bunch of
similar white it would seem in the middles of the clouds. There’s a whole bunch of
similar looking green. And yet we are using, it would
seem by default, 24 pixels– 24 bits for every pixel,
which just seems wasteful even if one pixel is identical
to the one next to it. So it turns out that graphical file
formats can often be compressed, and this can be done in different ways. It can be done losslessly or lossely. So earlier you’ll recall that I proposed
shrinking audio files by throwing away information that maybe my human
ears can’t necessarily hear or my non-audio file might
not even notice are missing. And that would be lossy
compression, and then I’m just throwing information away assuming
that the user’s not going to notice. But that’s not always necessary. Sometimes you can do
lossless compression, whereby you can use fewer bits
to store the same information. You just have to store
it more intelligently. So consider this example here where you
have an apple against a blue backdrop and that, much like our blue sky,
seems pretty consistent throughout. And so it seems a
little silly intuitively to record an image like
this on disk as follows. If we think about me being a
verbalisation of a file format, make this pixel blue, make this
pixel blue, make this pixel blue, make this pixel blue, make this
pixel blue, make this pixel blue. Literally saying the same
sentence, or more technically using the same 24 bits for every
pixel across that entire row even though my sentence
might not be changing. And so instead what a clever
file format might do is this. This is not what the
user sees, but this is what the file format could store with
respect to all of that redundant blue. Just remember, for
instance, the leftmost pixels color as by saying
this pixel is blue, and then for the rest of the row or
scanline as it’s called in an image just say, and so are the rest
of the pixels in this row. So I can say much more concisely,
essentially repeat this color throughout the entirety
of the rest of the row, thereby saving myself any
number of sentences let alone any number of 24 bits. And I can do that the same
here, make this pixel blue and then repeat that image– that color again and again and again. Now it gets a little less efficient
as soon as we hit like the stem on the apple, because then
that sentence has to change. Then we have to say something like make
this pixel brown, make this pixel blue, and then repeat again. So we have to, kind of, stop and start
if there’s some obstruction in the way. And the same thing for
the red apple itself. But just look based on the white at
how much information we’re potentially saving or how many bits
we’re potentially saving, and yet we’re saving those bits in a
way that the original information is recoverable. Just because we don’t
store 24 bits representing blue for every one of these dots
on the screen, doesn’t mean we can’t display blue there just
by interpreting this file format a little more cleverly. And so this is indeed how a file format
might actually losslessly compress itself using fewer bits to store
the same image, but in a way where you can recover the
original image itself. Now let’s take a look at another
example this time of lossy compression. Here is a beautiful sunflower
taken somewhere here on campus at Harvard University. This is a high quality JPEG photograph
where JPEG is a popular file format for photographs especially. And this image here was somewhat
compressed, but not very compressed. In fact, only if I put my face
really awkwardly close to the screen do I see that it’s a little
bit blotchy way up close. But from just a foot or so or
beyond, it looks perfectly pristine. But not if we compress
this image further. Suppose that this image is just too
big to fit on my Facebook profile page, or it’s just too big to email
to a friend via my phone. In other words, I need to use
fewer bits or fewer megabytes even if it’s a really big
file to store this same image and convey the gist of
the image to that friend. Now I see a little bit of blue
and I do see a bunch of yellow, but it’s not quite
the same clean pattern that we saw with the apple
or even the blissful blue sky above the green grassy hill. And so if I were instead using
a file format that can still be compressed, but lossily where we’re
actually throwing information away, this might be the before image. And now wait for it. This might be the after image. So it’s still clearly a
sunflower, though it looks a little more sickly at this point. But it definitely looks blotchier. In fact, from a foot or
more away, I can actually see that my sky has
become very pixelated. It almost looks like Super Mario
Bros. back in the old Nintendo systems where you could really see the big dots. And the greenery here is
just a grid of pixels too, and even the flower
has really just become a collection of dots that I ever
so clearly see on the screen. And certainly this flower
looks none so good anymore. So let’s rewind. This was before, after, before, after. And so this is what it means
to lossily compress an image. I cannot go from this pretty poor
version back to the original, if I have achieved this compression by
just throwing away some of those bits. So whereas before I
was very cleverly just remembering repetition in the image,
in this case using this file format, especially when you really
turn the virtual knob and say compress this
as much as you can. Essentially what my graphical
software is going to do is start to use approximations. Well, does this leaf here really need
to be 20 different shades of green? How about just two? And that’s why I get this big green
blotch here and this other green blotch here. Does this sky really needs to
be 30 different shades of blue? How about two shades of
blue and two shades of gray? And so that might be a way to use
less information to still represent the same sky. I don’t know in this file format
just how clear the sky used to be, because those dots have essentially been
thrown away and aggregated in this way. But it makes for a much smile– much smaller file format. And so what are the formats
that are disposable? Well, there’s any number
of options out there today, but perhaps the most common are these. There’s the bitmap file
format, which was commonly used originally in Windows and other
contexts, not super common these days. Certainly, not on the
web, but does indeed lay out all of your pixels in a grid
essentially on disk of zeros and ones. Meanwhile, there’s gif, which is
commonly used for low quality images in multiple senses of the word. This is often used for icons on the
screen or clip art that you might see, and it’s also increasingly used for
internet memes or the kinds of images that you might forward
along to friends or see popping up on your screen in large
part, because gifs can be animated. So they’re, sort of, a very
low end version of a video file where really it’s like an image with– it’s like a video file with
just a few images inside of it that often play on the
repeat, so one after the other creating the illusion of
some form of animation. But the resolution of gifs tends
to be not very high, although they can be losslessly compressed,
as we saw with the apple before, but they only support 8-bit color. And 8 bits can mean–
implies that we can only have a total of 256 colors in the
image itself, which limits the range. And so they tend not to
look great, especially when large for things like photographs
of humans and in grassy knolls. JPEG, meanwhile, is the file
format we saw just a moment ago of that beautiful sunflowers. This actually supports 24-bit
color, but is lossily compress, so you might lose some information
when shrinking those image files, but it allows you so many more colors
that you can see images typically with much higher fidelity
at much greater quality. Meanwhile, there’s PNGs as well. PNGs are commonly used
for high quality graphics that you might want to print or resize,
supporting 24-bit color as well, and are generally used for
images that you might indeed want to use in multiple contexts. Not neccess– not so much photographs,
but other artwork that’s higher quality than gifs. And here’s just a few examples. This is, perhaps, the most ridiculous
animated gif that I could find. This here being a cat
flying through the sky. And this is an animated gif in the
sense that it’s really just one image after another, after
another, after another, and they’re repeating again
and again and again and again. So even though it looks
like motion, really you’re just seeing a bunch of
images each of which has the cat in a slightly
different position, and it’s rainbow and the stars
in a slightly different position. And if you loop these again and
again, it looks like the cat’s moving, but really you’re just seeing a whole
bunch of images every split second. Meanwhile, here is another JPEG in
addition to the sunflower earlier. This is a beautiful shot of the ceiling
here in Sanders Theater at Harvard University, and JPEG really
lends itself to photography, because you have not only
a huge range of colors, you also have the choice not really
to compress the files very much. The fact that my sunflower
got so ugly on the screen was because I deliberately said compress
that sunflower as much as you can, but that doesn’t need to be the case. If you can afford to
spend the bytes on disk or you can afford to post a
really big image on the internet, then you can certainly
use minimal compression and capture a really beautiful image. As for a PNG, here might be
a good opportunity for a PNG, a really high resolution
version of say Harvard’s crest that you might want to print small
on some piece of paper or large on a banner or the like. And so this might lend itself
especially to an application like that. Of course, we don’t have an
infinite amount of information at our disposal in graphics. Rather we only have
the pixels and the dots and the colors that are there when that
image was saved in some file format. And so it’s quite all too common to
see in popular television and film, sort of, abuses of what it means to
be a multimedia format and a graphical file format at that. Such that there’s entered the
lexicon this notion of enhance where enhance essentially
means apparently in the media make this image as clearly
readable as possible no matter what format it was saved in. And we can see some examples of
that with this popular TV show here. SPEAKER 2: We know. SPEAKER 3: That at 9:15
Ray Santoya was at the ATM. SPEAKER 2: The question is
what was he doing at 9:16? SPEAKER 3: Shooting the 9
millimeter at something. Maybe he saw the sniper. SPEAKER 2: [INAUDIBLE] SPEAKER 3: Right. Go back one. SPEAKER 2: What do you see? SPEAKER 3: Bring his
face up full screen. SPEAKER 2: His glasses. SPEAKER 3: There’s a reflection. SPEAKER 2: [INAUDIBLE] baseball team. That’s their logo. SPEAKER 3: And he’s talking
to whoever’s wearing a jacket. SPEAKER 2: We may have a witness. SPEAKER 3: To both shootings. SPEAKER 1: All right,
let’s take a closer look at exactly what we just saw. So they’re watching this video
of some bad guy presumably, and they’re trying to
identify the suspect. So they’re really just
looking at what’s called a frame in a video, which
for all intensive purposes is just an image inside of a video. Because what’s a video? Well, much like the animation
we saw a moment ago, a video really is just
a set of images being shown really fast to
the human eye generally at a rate of 24 frames
or images per second or as many as 30 frames
or images per second, thereby creating the illusion of
motion or really motion pictures. But really it’s just a
whole bunch of pictures being shown to us super quickly. So here’s one such picture. And here apparently is the
key to solving this mystery. Indeed, if we enhance that
glint in this fellow’s eye, we apparently see exactly this. And by the magical incantation of
enhance do we apparently see this. And this is where reality breaks down. If this is the entirety
of the information that has been stored in some
file format and indeed you can see the pixels and the
pixelation, the blotchiness because only so many bits
and only so much resolution was used to store that
image and we are looking at a tiny, tiny, tiny fraction
of it in the reflection of that fellow sunglasses, this is all
the information that we might have. Now, you might stare
at this all day long and, kind of, sort
of, think that you see who it is that had
perpetrated this crime, but you’re certainly not going
to get from that anything close to the resolution of
this, unless the original video and, therefore, the original frame
or image was as high resolution as this output suggests. So the information, the bits,
the pixels aren’t just there. And even cartoons of today
like Futurama know this. SPEAKER 4: Magnify that death spear. Why is it still blurry? SPEAKER 5: That’s the
resolution we have. Making it bigger
doesn’t make it clearer. SPEAKER 4: It does on CSI Miami. SPEAKER 1: All right, and
what better segue then to video file formats themselves then
these excerpts from some actual videos. Indeed, you can think of a video file
format as very reminiscent of something from the real world. In fact, as a kid if you either made
or played with these little flip books, you might have had the ability to
actually see something animated really by just flipping through some physical
pieces of paper really quickly. Well, that’s all a video
format is in the digital age. It is simply a file format that contains
essentially a whole bunch of images inside of it, each of which is shown
to you so fast that there appears to be the illusion just like this of motion. And you’re seeing 24 images per
second, 30 images per second, and it’s not necessarily that
they’re all PNGs or JPEGS or gifs or actual images inside
of it, there’s actually more complicated and sophisticated ways
of storing the information so you’re not just storing each of the frames. You can actually use
algorithms and mathematics to actually go from
one frame to another. And indeed, there are some
very clever opportunities when it comes to videos for
compressing video formats themselves. We can certainly leverage within
frames or Intraframe so to speak, the exact same techniques that
we saw earlier with something like a gif and an apple where we can
actually leverage the fact that there’s redundancy in a given frame of a video,
throw that information away, and just remember whole sky is blue or the whole
rest of some line or row in a file is blue and, therefore, save
on information and bits. But with videos you have another
opportunity, because you don’t just have an individual picture, you have
a picture in every subsequent picture, which might look very similar as well. In fact, if I hold very
still for multiple seconds, odds are almost everything
in this video is staying the same except
for my mouth, apparently my pointer finger and my
lips and eyes as I blink, but everything else about
me is pretty much the same. So why would you in your file format
store all of the various colors that we see behind me and around me? You don’t need to do that. You can also leverage something
called interframe compression, whereby in simplest form you can take a
look at the current frame of a video and look at the next frame
and decide what has changed. And maybe look another
frame after that, see what has changed, and another frame
after that and see what has changed. And essentially store not every
image from the starting point to the ending point, but really just the
differences between those frames that are adjacent. So for instance, if we start off with
this bee here on a trio of flowers and he moves and he moves
and he moves, we could– if not compressing this video
and these four frames that compose the video– we could just store
each of those images essentially as is, even though flowers are not
moving, the leaves are not moving. The only thing that’s moving is the bee. Or we can be more clever
about this just as we were with the blue sky behind the apple. We can recognize that between
picture one and picture four, or the first four frames of this
video, the only thing that’s moving is indeed that bee. So maybe we should store just what
we’ll call keyframes or a snapshot in time of what the video looks like. And then on each subsequent
frame, essentially, just remember what information
has changed, in this case, the position of the bee and leave
it to the computer playing the video to infer or interpolate
these inner frames based on those so-called keyframes. Use a bit of clever math, use some
algorithms to actually figure out that, oh, here’s where the bee now is. Let me redraw the exact same flower and
the exact same leaves behind that bee. But I now only have to
store really as many bits as it takes to remember
where the bee now is there, where the bee now is here,
and then just for good measure to keep everything synchronized
maybe every few frames we’ll have another keyframe that,
even though it’s a little expensive, stores the entirety of the frame. Just in case something goes
wrong, we can guarantee ourselves that we can reconstruct what the
video actually is even if there’s a little bit of a glitch otherwise. So what are the file formats
that we have at our disposal? Well, in the video world the terminology
gets a little more complicated in that there are a number
of different solutions to the problem of storing video. And indeed these are what the
world might call containers. And a container is just
as the name implies, it’s a digital container inside of which
you can put multiple types of data. And the types of data you
might put into a container would be a video track,
like the actual footage that you see on the
screen, an audio track. Which is the actual audio that you
hear, maybe a secondary audio track. If a film has been dubbed from
one language into another, you might have multiple audio
tracks in the same container. And then the software on your computer
or even on your TV for that matter that’s playing back
this video can actually choose between those
multiple audio formats. You might have closed captions or
some other track inside the container. So long story short, a
container really is just that. It’s this bucket inside of which
is the video and the audio, but maybe multiple formats thereof
so that you can play them back based on your own preferences. So AVI is a very popular format that’s
been commonly used in the Windows world for years, as has been DIVX. MP4 and Quicktime have been
more common on the side of Macs, although MP4 is now pretty much
universal across all browsers and operating systems and more. Otrosca is more of an open
source container that’s meant to be even more
versatile than these others on this screen capable of storing
any number of file formats inside. And as to those formats inside,
they might indeed be video. They might indeed be audio. But within those worlds
realize there are different ways of storing
and encoding information, and those inner most rappers
use what are called codecs where a codek is just a way of
encoding information in a video or in an audio file format. And there’s any number of these
options as well, but perhaps some of the most common these days is
something called H.264 for video, which is a way of storing video on disk
inside of a container, or MPEG-4 part 2, a little bit more verbosely. A popular alternative there too. And then in the world of audio
files, two terms we’ve seen before, and this is where the world
gets a little confusing, sometimes the container formats are
the same as the actual media formats. And in this case, AAC and
MP3 can be standalone files that you download and listen to
in iTunes or some other software, or they can be tracks inside of a
container that actually provide a video with the audio that accompanies it. But there aren’t just these two
dimensional file formats, if you will. There are increasingly three
dimensional or virtual formats as well that allow you to capture
the entirety of spaces like this. In fact, this is a picture that is
knowingly a little bit distorted, because if you look up and
around in reality at this space, it doesn’t look so
wide and stretched out. And the stage definitely
isn’t curved like this, but essentially what
you’re looking at now is a 360 degree photograph
of this exact stage. And that image, even
though it’s effectively a sphere that captures the
entirety of this space, it’s essentially like you’ve taken
a sphere and cut it around the edges and then flattened it out, much
like flattening a globe of the earth into a rectangular
region, and what you get is something that’s a little distorted. But if you kind of stare
at this for just a moment and you imagine that the
wooden stage here is really meant to be a straight
line and all of these seats are supposed to be put
together side by side, you can imagine re-forming a
sphere out of this otherwise flat two-dimension image and
putting yourself inside of it and being able to experience
a space like this. So increasingly some of these same
file formats that we’ve discussed, among them JPEG, for
instance, for photographs, do you have the ability to inject what’s
called metadata, some additional often textual data that the human
looking at an image doesn’t see. But programs like Photoshop
and browsers and applications can actually read and
realize, oh, this image has not only a grid
of pixels, compressed or otherwise, color or otherwise,
that I can display to the user, there’s also some additional
metadata that tells me how to display this image in a
way that’s much more immersive, so that the image effectively
wraps around the user. Now the user might look
a little silly doing so, but if he or she has a headset
quite like this one here, he or she can take a look
at this image, pull it up on the digital screen that’s
before him, and thanks to two small lenses, left eye and
right eye, start to look up and down and left and right and all around him or
her and actually see a space like this and experience it in 360
degree virtual reality. So this is just a taste then of the
file formats that currently exist, that are on the horizon today, and
just who knows what more will exist. But at the end of the day, it all
boils down to bits, to zeros and ones, how you arrange them on disk, and
what features you provide to the users with which to capture their imagination.

, ,

Post navigation

10 thoughts on “Multimedia – Understanding Technology – by CS50 at Harvard

  1. CSI : MIAMI forensic team has access to some highly advanced AI software which can match blurred images to most closely resembling baseball team logos using methods of probablity and genetic algorithms – Horatio Caine

Leave a Reply

Your email address will not be published. Required fields are marked *