In the beginning, back in 1955, computers were the size of a room and cost tens of thousands of dollars, with circuits made of mechanical switches and orange-hot metal in glass tubes. Electronic music was the property of the avant-garde, with its tape-manipulation experiments and untouchable instruments.
Herbert Belar and Harry Olson, two engineers at the Radio Corporation of America, have been working in secret on an Electronic Music Synthesizer that can make the ideal version of any sound. Don’t you hate the scratching of bows on violins? The pressing of valves on woodwinds? The “clatters and rattles” of hairless apes pawing at wood and metal with their squishy extremities? This machine does away with all those impurities.
But maybe you’re some kind of humanist. Maybe you’ve survived 2023 and you’re sick of STEMlords using computers to replace artists. The synthesizer can’t give you the toil of a real performance, but, with a few basic principles of sound design, we can use it to invent entirely new instruments.
RCA applied for a patent on this synthesizer in 1951, but it wasn’t issued until ’58. Here in the future, when you apply for a patent, it starts expiring immediately. But any “Patent Pending” notice on ads for $50 salad tossers can tell you that patents aren’t published immediately. Before 1995, patents wouldn’t start expiring until they were published. So a greedy company could leave their patents unpublished, sometimes for years, to one day inflict their patents on unsuspecting rivals, like termites hollowing out the walls. RCA, “a corporation of Delaware,” was one of those companies.
Let’s read this patent to see what secrets it holds about the synthesizer. First it starts with a bunch of repetitive, unreadable garbage. “A further object of the invention is to provide an improved method of and means for producing a series of synthesized tones under the control of a suitable coded record.” “A still further object of the invention is to provide a method of and means for selecting tones of a musical scale in rapid succession in response to a coded record.” Oh, I see, that’s completely different. Thanks.
Nothing else I read about this was so boring. I bet lawyers do this to put you to sleep. Eventually this patent explains how the synthesizer works, but it’s very clear that this is one possible design. These components can be implemented in all sorts of ways which will be covered under this patent.
But while the patent was in hiding, RCA submitted a paper to the Journal of the Acoustical Society of America on the features of the synthesizer. Olson’s name comes first, but it’s my understanding that Belar did most of the work. Classic. This paper isn’t choked in legalese, thankfully. The circuit diagrams are laid out more sensibly, but the components and connections are almost identical to the patent. So I’ll refer to both the patent and the paper in this explanation.
You can define any sound using a handful of metrics like frequency, loudness, and timbre. You can assign numbers to these metrics and store these numbers as binary code on perforated paper tape. Each perforation represents a bit; filled circles are zeroes and empty circles are ones. Most Western music uses 12 notes per octave, and 4 bits gives you 16 settings. That leaves one for rests and three for anything else. Humans have a hearing range of about 10 octaves, so you only need 4 bits to specify the octave. This machine uses 3. The “master volume” is dead simple: 4 bits for 16 levels of loudness, including silence. You can define growth and decay times with any real number, but there’s only so much room on the tape. Timbre is even more complicated; you can’t easily map it to a number line. This synthesizer uses three bits for growth and decay, and four for timbre. We’ll see how Olson & Belar deal with these problems soon.
Overall, we have 18 bits of data for one sound. Each element of a sound has its own colored section on the tape. Each row is a unit of time. If you’ve used a tracker, this might look familiar. The tape feeder moves between 2 and 8 inches per second. Each row is a quarter inch tall, so the synthesizer can interpret up to 32 instructions per second, which is higher than the frame rate of this video.
The synthesizer has two channels of sound so the machine can prepare the next note while it plays the current one. This also accounts for sounds ringing out and decaying on top of other sounds.
As the tape moves, contact brushes activate relays in response to holes in the paper. A relay is a kind of switch. This brush triggers eight relays and represents the 1s place of a binary number. The next brush triggers four relays and represents the 2s place. The third brush triggers two relays, and the last just one.
Twelve of these input wires connect to tuning forks, creating sine waves with rock-stable frequencies.[1] When the paper is full, every switch is off. The zero wire, connected to nothing, runs through these contacts to the output. These tuning forks run from F# to F, so if you want to play an E natural, you’ll want to connect the 11 wire. You punch a hole in the 1s, 2s, and 8s places, leaving the 4s alone, and the wires reroute accordingly.
The selected sine wave is clipped to a square wave, then shifted up or down some number of octaves by multipliers and dividers. A shaper circuit turns the square wave into a sawtooth wave. This resistor-capacitor pair is called a differentiator: it takes the incoming voltage and gives you its rate of change. For example, a square wave gives you zero volts most of the time, but its rising and falling edges give you short positive and negative pulses. This differentiator connects to a triode—the vacuum-tube ancestor to the modern-day transistor. When current flows through this triode, this capacitor gets charged. When the triode sees a positive voltage on the differentiator, it stops the flow of current and quickly discharges the capacitor. This pattern of slow charging and quick discharging makes a sawtooth wave. These obnoxious waves are rich in overtones, so they can be filtered into other sounds.[2]
This synth uses three kinds of filters. Low-pass filters let low frequencies through and stop high ones, making things sound muffled. If you’ve heard loud dance music leaking out of a house party, you’ve heard a low-pass filter. High-pass filters let high frequencies through, making things sound tinny. If you’ve heard a hit song through terrible phone speakers, you’ve heard a high-pass filter.
A resonance filter accentuates a specific frequency and reduces all the others. It’s hard to think of a real-life equivalent, but combined with one of the other filters, it makes this familiar sound: [beep].
Sounds get quieter when you take away their overtones. This sawtooth wave, for example, [beep] is much louder than a sine wave with the same height. [beep] So this compensator network compensates for this offset by raising the volume on filtered waves.
The process of filtering a basic wave shape, rich in overtones, to create new sounds is called subtractive synthesis. But this synthesizer supports other features. One of the leftover note settings connects to a white noise generator. Noise is essential for making unpitched percussive sounds like cymbals and snare drums. White noise contains all frequencies of sound equally, just as white objects reflect all light. Like the sawtooth wave, this noise goes through the filter network to make other kinds of noise.
Tuning forks provide stable frequencies, but sometimes you want to slide between notes or apply vibrato. This machine uses a few tricks to detune the forks.
This circuit turns a wave of a given frequency into a direct current, or DC, voltage. The higher the frequency, the higher the voltage. Switching tuning forks causes these sharp vertical edges, similar to square waves. To achieve a frequency glide—what the Italians call portamento—you need to soften these edges with a low-pass filter. For reference, here’s what a low-pass filter does to a square wave. Once you’ve softened the edges, you convert the DC voltage back into an alternating current, or AC, waveform, and you get that gliding effect.
Later analog synthesizers only use voltages to control an oscillator’s frequency, so vibrato was simple: you’d feed a sine wave into the oscillator. Using waves to modulate, or change, the frequency of other waves is called frequency modulation, or FM. That’s not possible with tuning forks; their frequency is restrained by their size, density, and stiffness. So Olson & Belar instead change the intensity, or amplitude, to trick our ears into hearing vibrato. The trick is pretty complicated, but if you’re curious about the details, the explanation starts on page 32 of the patent.
Speaking of amplitude, every sound has some kind of volume contour over time, called an envelope. Take the violin: when you don’t touch the strings, the violin is silent. When you bow the string, the sound slowly grows until it hits a maximum. Then, once you stop bowing, it slowly decays into silence. But when you strike a pair of hi-hat cymbals, the sound grows and decays much faster.
With eight settings for growth and decay, you don’t get much control. I believe four settings adjust the growth and four adjust the decay, but the patent and paper use different circuits. There’s no need for a “sustain” or “silence” setting because the amplifier can only get so loud, and the synthesizer defaults to being silent.[3]
Now we have everything we need to make a sound. So how do we record it? From the ‘60s to the ‘90s, most audio recordings went to a multitrack magnetic reel-to-reel tape. You might have one track for vocals, another track for guitar, a third for bass, and so on. A mixing engineer would adjust the volume of each track so nothing sounds too quiet or too loud on its way to a stereo half-inch “master tape.” The contents of that tape would be cut onto a twelve-inch lacquer disc, which forms a template for vinyl records.
This machine cuts audio directly to a sixteen-inch lacquer disc. It fits six three-minute tracks, and it’s extra-wide to fit six playback heads. These tracks could be mixed and transferred all at once to a single track on a second disc. You could repeat this process indefinitely to fit unlimited tracks on a disc, though I’m sure it would sound like noisy trash eventually.
All that forms the first iteration, or Mark I, of the RCA Electronic Music Synthesizer. This machine cost $25,000; $300,000 in 2023. This was no mass-produced product—it was a custom-built piece of laboratory equipment. But in 1959, Olson, Belar, and—wait. Jim Timmens, best known for his work on Sesame Street? The three of them submitted a paper on the Mark II. It has four channels with two paper tapes—that’s twice the recording power. It can play ten octaves instead of eight; the tapes are two bits wider. The octave circuitry uses nine dividers instead of a mixture of multipliers and dividers.
Most importantly, the engineers found a use for the other note settings. There are now two sets of 12 variable frequency oscillators so you can play with all sorts of nonstandard scales.
Olson & Belar hoped the synthesizer would be used to make hits. Jim Timmens’s “Obelin,” the composition discussed in the paper, reminds me of the original Super Mario RPG soundtrack. But despite a banger from the Sesame Street guy, the synthesizer mainly saw use from avant-garde academics like Milton Babbitt.
These synthesizers make a surprising breadth of sounds for their age. All the principles of sound design were there from the beginning. The Mark I even fooled three-quarters of listeners into thinking this was a real piano.
But it had its limitations. Take the binary code. Modern synthesizers have knobs so the user can, for example, control growth and decay precisely. But if you change a knob, your perfect sound is gone forever.
At least your paper tapes will make the same sounds every time, right? Nope. These synthesizers were difficult to maintain. For $25,000, the tape feeder jammed, the tubes blew out, the relays blackened from arcing. In 1976, someone broke into the studio and vandalized the synthesizer. By that time, the world had moved on. It was difficult to replace those ancient tubes, so the synthesizer was never quite the same.
But couldn’t it make the most realistic sounds? No. RCA fooled musicians with a fake piano, but the New York Times reported that full band excerpts were much less convincing. Take this so-called “hillbilly band.” [beep] I feel like I’m returning to Zork.
But surely this machine could at least be used to defeat the Russians?[4] It is the 1950s, after all. The Times claimed that the synthesizer could be used to synthesize any human voice, which “might be of some value in psychological warfare.” I couldn’t find any recordings of this alleged synthesized speech, and the paper is vague about the results. The Sunday paper article rolls back the claims, but the reporter worries about… deepfaked speeches from world leaders? Gosh. I can’t believe we’ve been talking about this for 70 years.
None (so far).