Source: Unsplash

COVID-19 has left a lot of musicians stranded at home with internet access, a lot of free time, and wanting to find ways to keep playing music with people without being in the same room. It often leads to a conversation like this:

“Hey I’ve finished the drums, I’ll put it in the Dropbox.”

“Can’t wait to hear it man but I’m almost out of space, for I’m one of those people trying to get by with a free Dropbox plan.”

“I’ll just do an MP3 bounce, that will save some space. It should be okay shouldn’t it?”

“Yeah this is just for Instagram, it should be fine.”

It should be fine, but it isn’t

It turns out there is more at stake than just sound quality when you encode your takes to MP3. A side effect of the format that I recently discovered and feel like not many people know about is that it adds a bit of silence to the start of the file, which makes things downright confusing for anyone trying to line up their playing with someone else’s. It’s just one of the many ways computers don’t make it easy to play in time with other people, let alone yourself. This has been happening for decades, since the dawn of cheap recording hardware and software.

Sound on (delayed) sound

My first experience with recording sound over other sound I’d already recorded on a computer was probably in CoolEdit Pro. I plugged the gooseneck microphone that came with our SoundBlaster into the mic input, pointed it at my acoustic guitar, pressed record, and played the chords to Paranoid Android. That means it was probably 1997.

Cool Edit Pro Cool Edit Pro. Source: Sound on Sound magazine, 2002. So it’s not the version I used but you get the idea.

The next exciting step was creating a new track, arming the track, taking a deep breath, pressing record, and playing one of the other guitar parts from Paranoid Android. Then, for the moment of truth I pressed play to hear the results of this bare-bones proof of concept. Unfortunately the truth that it revealed was that it was terrible. The latency between me playing and the computer actually recording it made me sound about a third of a second out of time with myself. Almost comically badly out of time. Almost as bad as trying to jam over Zoom. Sure, I could just drag that second file back to be in time with the first one, but there are two problems with that:

  1. I might never get it exactly right, and
  2. I shouldn’t have to – I feel like it’s the computer’s job to keep track of that.

And I still do.

It’s the computer’s job to keep us in time

Fast forward 22 years and the technology is a lot better. I have an audio interface made by a company called RME that does a great job of convincing me that when I hear back the thing I played, it’s exactly when I played it. And that’s important. Good timing, or ‘feel’ as musicians like to call it, is subjective, and a musician’s sense of what sounds like ‘good’ time is honed over countless hours of playing. As a bass player, I feel like my taste in feel is what makes me me more than anything else. And so if the computer that’s supposed to be recording me isn’t recording my notes exactly where I wanted to put them…

Yikes.

It really gives me the creeps that you can work your whole life on being good at playing things in time only for a computer to come along and move it slightly.

It almost makes you not want to record music.

But if that’s a bit drastic, we have to make do with what we’ve got, and during COVID-19 lockdown what we’ve got is emailing/Dropboxing audio files to each other and trying to make something sound good.

Source: Unsplash This part of the article was a bit light on images so here’s another microphone in an otherwise out-of-focus studio

Using your Logic

Unfortunately, we all pretty much make up how to do this as we go along. What happened last week was Simon recorded some keys and vocals and sent it to Danny (those are their real names), who recorded some drums in time with what Simon did. A sensible way to make sure everything lands where the players wanted it is:

“I recorded from the start of your take. Line up my file with the start of yours and it should be in time.”

Here’s the scenario: Danny and Simon were both using Logic Pro X. They uploaded their parts, in M4A and MP3 format respectively, but that shouldn’t matter, should it? Isn’t it all audio? I had bought a new computer recently and hadn’t gotten around to installing Logic yet, but I did have Ableton Live 10 installed. I thought, “That’s okay, I should still be able to just record something in Ableton that’s in time with what’s happened so far, and send it to the other guys.” Using different software shouldn’t matter, should it?

The drum beat’s out of time

Imagine my surprise when it was my turn to add bass, and Danny and Simon sounded out of time with each other. And it was difficult to troubleshoot. Part of the problem is we weren’t all using the same software or saving in the same formats. But on the other hand, why should we have to? It’s all audio, shouldn’t we be able to just line them up? I would have thought so.

I pestered the other guys for a while to upload WAV versions of everything, and then it all sounded right. But why should the format matter? I (and everybody else I asked) assumed that MP3 just encodes the audio coming in and has no business adding or subtracting anything while it’s doing it. But we were wrong. Maybe MP3 was never intended for this sort of thing, but that doesn’t stop us from trying to use it.

Problem #1: the end of the file

I knew about this one from trying to listen to Dark Side of the Moon in iTunes. The MP3 format does add arbitrary bits of silence at the end of tracks, completely destroying those albums where the songs seamlessly segue into one another and makes you wish you had a CD player. This happens because during encoding, the file is dealt with in blocks of 1152 samples. The length of the file has to be a multiple of that, and if it’s not, it will add some silence to the end.

Hang on, what? Samples?

Oh, right. I should explain a bit about how digital audio works. But not too much. Just enough to understand what’s going on with the lengths of our recordings.

When your sound card records audio, it records the volume of what’s coming in, and it does this 44100 times per second for CD quality, or 48000 times per second for slightly better than CD quality. This number is called the sampling rate and it basically determines the quality of the audio: the higher the sampling rate, the better. So let’s do some maths about those 1152 samples.

48000 / 1152 = about 42ms

Where were we?

So. The file might have up to 42 milliseconds of silence added to the end when it gets converted to MP3. Apple figured out a way to get iTunes to account for this at some point, by analysing every song in your library and figuring out where the silence starts. Millions of people sat confused while their computer spent hours “Determining Gapless Playback Information”. More on that later.

So I knew about the silence that MP3 adds at the end. But after figuring out which question to ask Google, I found out that it also adds silence at the start of the file.

Problem #2: the start of the file

When you start asking Google questions like this, you’ll inevitably be lead to forums, which will inevitably be populated with people who answer questions whether they know the answer or not. But it’s not the internet’s fault. These people exist in life too, and it’s your job as a discerning human to figure out who to listen to. Luckily, I listened to the person who said, “it’s to do with the MP3 codec,” and the person who replied and added, “There’s more information here,” with a link to this incredibly informative technical FAQ for the LAME MP3 Encoder (one of many pieces of software that encode audio in MP3 format), written in June 2000 and thankfully still available online. I won’t make you read it all and understand it all. I won’t even make myself read it all and understand it all. But it does very succinctly answer why the LAME MP3 encoder adds silence to the beginning of the file. I didn’t quite understand the explanation, but I’ll have a go at paraphrasing it. Enter the MP3 encoder:

“Ok, let’s start at the beginning of the file. Firstly, what happened 42 milliseconds before that?”

“Why?”

“I just need to know.”

“Nothing happened, that’s before the start of the file.”

“Let’s add some silence then.”

“Why?”

“That’s just how it works.”

It works on a time window, so it has to add some silence to get started. The encoder and the decoder both do it. Instead of trying to wrap my head around the LAME FAQ, finding out what MDCT stands for, and wondering why “What does MDCT stand for?” wasn’t one of the frequently asked questions, I decided it would be a better use of my time to just accept that this is an unavoidable side-effect of the MP3 format, and I should just move ahead and try to test when and why it happens.

The Test

It’s fairly simple to test. Record a sound - probably a kick drum because it has a decisive attack that’s easy to spot in a waveform. Export it as a WAV and an MP3. Import those files again and see if they’re in time with each other.

/assets/mp3/Untitled.png

They’re not. The MP3 is 47ms late, and I’d categorise that as a bad flam. That’s not surprising, it just proves that this delay is present in an MP3 exported from Ableton. What’s surprising is that 47ms is more than we were expecting. Let’s keep going. Is Ableton to blame? Let’s take Ableton’s MP3 encoding out of the equation by encoding the WAV file to an MP3 using LAME in the command line, without reading the manual and hoping for the best:

lame ableton.wav abletonlame.mp3

I unintentionally brought more bitrates into the mix here, because lame by default uses 128kbps. Because it’s still 1998. Reassuringly, this lame-encoded file had the same delay as the one straight from Ableton, so the bitrate doesn’t seem to affect the delay.

/assets/mp3/Untitled%201.png

So encoding with something else (I only tested LAME and ffmpeg, but that uses LAME too) gives us the same result. Is it still Ableton’s fault? Could it be the way that Ableton imports MP3 files? I tried importing the same files into Logic Pro X (yes, I finally got around to installing it).

/assets/mp3/Untitled%202.png

Now that’s even scarier. If it were consistent, I could blame MP3. But Logic and Ableton seem to be handling them differently. What about MP3 files exported from Logic? I did an M4A too just to see if that was handled differently.

/assets/mp3/Untitled%203.png

That’s nice to know. Somebody at Apple has clearly thought of this. People using Logic will want to export and import MP3 files and they’ll want them to be the right length. And they are. Well done Apple. How does Ableton handle files exported from Logic?

/assets/mp3/Untitled%204.png

Not well, it seems. The M4A is in time with the WAV, but the MP3 is not. Curiously, it’s delayed by 22ms, which is not the same delay as the MP3 files from Ableton or Lame. Sorry, guy in the forums who said he just chops off 1100 samples from MP3 files. That’s not going to cut it. Apple must be using their own encoder; I wouldn’t put it past them. It’s a very Apply way to handle it.

So we have an MP3 file being treated differently by different applications. The horror. Apple handles its own files okay. So how is Apple doing it?

Peeling the Apple

I read somewhere in one of many forum posts that maybe Logic embeds something about the appended silence in the ID3 tags. That seemed worth investigating. Remember ID3 tags? That’s where information like song title, artist, album, length and genre are stored. Or, at least, that’s where they used to be stored back in my day. If you’ve been listening to MP3 music for as long as I have you probably encountered WinAmp at some point.

/assets/mp3/Untitled%205.png The most 1999 song I could find.

It was definitely the first MP3 player software I used. I liked it so much I even registered it. I stuck with it long after iTunes had achieved domination. It could even rip those ‘copy protected CDs’ that iTunes couldn’t. Long after Justin Frankel sold the company to AOL for $400 million, it was still a great player, and you could get nerdy with the input/output plugins and do weird stuff like encode MOD files to MP3 if you so wanted. And it had a great ID3 tag editor.

/assets/mp3/Untitled%206.png Yes, that’s a Windows app running on a Mac.

See that big Comment field? You can put anything in there. And Apple pretty much did. People online struggled to make sense of how Apple was using ID3, but storing information about gapless playback in the comments was a great way to ensure backward compatibility while still attaching their own data to the file. Using the command line utility id3info, part of id3lib, we can see the tags in a file that was ripped from CD by iTunes.

*** Tag information for 01 minipops 67 [120.2] [source field mix].mp3
=== TT2 (Title/songname/content description): minipops 67 [120.2] [source field mix]
=== TP1 (Lead performer(s)/Soloist(s)): Aphex Twin
=== TP2 (Band/orchestra/accompaniment): Aphex Twin
=== TCM (Composer): James
=== TAL (Album/Movie/Show title): SYRO
=== TRK (Track number/Position in set): 1/12
=== TPA (Part of a set): 1/1
=== TYE (Year): 2014
=== TCO (Content type): Electronica
=== COM (Comments): (iTunPGAP)[eng]: 0
=== TEN (Encoded by): iTunes 12.1.0.50
=== COM (Comments): (iTunNORM)[eng]: 00000ADF 00000B24 000092A4 000091FC 00005C8D 000160DB 000091F3 000092F9 00020334 00007905
=== COM (Comments): (iTunSMPB)[eng]: 00000000 00000210 00000AA4 0000000000C1854C 00000000 008C5B6E 00000000 00000000 00000000 00000000 00000000 00000000
=== COM (Comments): (iTunes_CDDB_1)[eng]: AF0F260C+291068+12+150+21719+69062+91856+112017+126377+153392+157786+190884+220356+239779+266927
=== COM (Comments): (iTunes_CDDB_TrackNumber)[eng]: 1
*** mp3 info
MPEG1/layer III
Bitrate: 256KBps
Frequency: 44KHz

Oh wow. I had to scroll back to 2014 to find a CD I had ripped myself. I guess I don’t do that anymore. I’ve now been not ripping CDs with iTunes for longer than I ever ripped CDs with iTunes. And now iTunes has been superseded by Apple Music. But let’s not get too sentimental, iTunes has always been a bit of a nightmare. I don’t think anyone would be able to identify when the good ol’ days of iTunes were. It was just kind of there and the only option if you had an iPod/Phone/Pad and you put up with its problems and it eventually made itself obsolete by nudging you towards signing up for Apple Music instead. Also, Apple just wouldn’t stop touching it and slightly changing things on every update. For example: you used to be able to right click on a song and click Create MP3 Version. That was great. Until one day, they moved that from the right-click menu to the File menu. Why would they do such a thing? Can’t people just leave things where they are, now that we’ve gotten used to them? What problem did this change solve?

Anyway. The important part is this line:

=== COM (Comments): (iTunSMPB)[eng]: 00000000 00000210 00000AA4 0000000000C1854C 00000000 008C5B6E 00000000 00000000 00000000 00000000 00000000 00000000

SMPB. As far as I can tell, that’s the seamless playback bit. The values that follow must be how iTunes reminds itself how much silence was added at the start and end of the file. I suspected that because it’s Apple, they would have used the same thing when encoding MP3 files in Logic. That might be an unfair assumption. Apple’s a big place. The people who work on iTunes probably haven’t even met the people who work on Logic. But let’s have a look at the ID3 tags of one of the files exported from Logic.

*** Tag information for logic320.mp3
=== TSS (Software/Hardware and settings used for encoding): Logic Pro X 10.3.1
=== COM (Comments): (iTunNORM)[eng]: 000000F7 000000F7 00000D14 00000D15 000001F8 000001F8 00007B3E 00007B3E 000001F8 000001F8
=== COM (Comments): (iTunSMPB)[eng]: 00000000 00000210 00000870 000000000000BB80 00000000 00008700 00000000 00000000 00000000 00000000 00000000 00000000
*** mp3 info
MPEG1/layer III
Bitrate: 320KBps
Frequency: 48KHz

Sure enough, that iTunSMPB record is there. To see if this is indeed how Logic tells itself about the silence at the start and end of the file, let’s see what it does when we remove the ID3 tags altogether. I am no stranger to removing ID3 tags.

A digression about removing ID3 tags

At some point in 1999 or so, I realised I didn’t need to spend hours avoiding studying by making sure all the tags were consistent in my huge folder of MP3s leeched from Napster. If the files were named consistently (in the form artist - title.mp3) Winamp would figure out what you meant. Just one of the beautiful things about WinAmp. Thanks, Justin.

So I renamed all the files, which was much faster than opening and closing that ID3 tag editor window that just had too many fields, and then batch-removed all the tags. It worked. But I’d live to regret it a few years later, when I (badly) copied all those files to my new computer and managed to lose all of Windows’s long file names, leaving me with a folder of tagless MP3s with names like STEVIE~1.MP3, STEVIE~2.MP3 and STEVIE~3.MP3. The happy ending is a few years later I tried again after learning how to clone drives using dd and all my filenames were back. The cliffhanger ending is that iTunes doesn’t care what the filenames are, had no ID3 tags to go by, and imported a whole lot of songs with no artists or titles. I still don’t know what some of those songs are called. I guess the moral of the story is don’t pirate MP3s in the ’90s, just wait a few decades for Spotify to be invented and then listen to anything you want. Except Jill Scott’s first album. At time of writing, that’s not on Spotify for some reason.

Where were we?

Oh yeah, I was removing the tags from a Logic-exported MP3 to see how Logic likes those apples once the label’s been peeled off.

/assets/mp3/Untitled%207.png

Sure enough, without the ID3 tags Logic doesn’t recognise its own children. And there’s a bunch of silence at the end now too. I think that confirms iTunSMPB is how Logic knows what the real start and end of the file should be. Just for completeness, let’s compare how Ableton handles the MP3 files from Logic with and without the tags.

/assets/mp3/Untitled%208.png

No change, but it’s still not right. This is either a bug or a case of “Apple doesn’t tell people how this works”. Just to see how far the Apple core goes, let’s try one more Apple-sanctioned method of MP3 encoding. The one I used for years before I became a friend o’ the terminal. The good ol’ “drag it into iTunes and convert it to an MP3 and drag it out again”.

/assets/mp3/Untitled%209.png

As you’d expect it’s the same situation: without the tags it doesn’t know what to do. And I won’t bother including another screenshot to show Ableton still doesn’t like it with or without the tags.

Conclusion

I could have concluded this thousands of words ago, but I loved going down that rabbithole and finding out why MP3 does the things it does, and when it does them. The lesson is if you want to collaborate on music with people online, don’t compress things with MP3. Or make sure you’re all using the same software. Or both. It would be interesting to test this in more DAWs, but I don’t have any. Except for Reaper.

Reaping the rewards

Remember Justin from WinAmp? With his millions of cold hard AOL dollars he went on to found Cockos, the company behind Reaper, a cheap and cheerful and very capable alternative in the world of recording software. Let’s see how it likes MP3 files.

/assets/mp3/Untitled%2010.png

Well this is interesting. It handled everything except the one straight from Ableton. It even made sense of the one from Logic with the tags removed, which Logic couldn’t handle. Thank you again, Justin.

Update (30 May 2020): what about Pro Tools?

Thanks to Luke Howard, an Avid fan for many years, for importing and exporting that kick drum so I can find out how Pro Tools deals with this. The answer is: differently, again.

/assets/mp3/Untitled%2011.png

So that’s a slightly different delay than all the others. And it leaves no clues in the ID3 tags:

*** Tag information for a kick pro tools.mp3
=== TYER (Year): 2020
=== TRCK (Track number/Position in set): 1
*** mp3 info
MPEG1/layer III
Bitrate: 320KBps
Frequency: 48KHz

And in case it would be helpful to have exact(ish) numbers:

  • Ableton: 2256 samples
  • Pro Tools: 1200 samples
  • Logic: 1056 samples

I say exact(ish) because I zoomed in in Audacity and selected the gap between 0.5 seconds and the start of the kick drum, which isn’t as sudden-looking as I’d hoped. If you zoom in far enough, nothing’s sudden.