Fun with Formants: Vocal Sounds with Crystal
In this tutorial we'll look at making vocal sounds with Crystal. Specifically, the sounds will be vowel sounds, so the goal is to make Crystal go "ooh" and "ah". One note before we get started: this effort is to get human-like sounds. Crystal is useful for making synthetic sounds that don't occur in nature. If instead you want truly human vocal sounds, you're probably better off with a sampler. Although, even with a sampler, realistic human voices are notoriously difficult to achieve. In this tutorial however, we're aiming to get synthetic sounds with a vowel-like character.
First a bit of background on that musical instrument in your mouth. The recognizable vowel sounds of the human voice are due to formants created by various cavities in your head. A formant is a set of narrow band pass filters. You create this set of filters in your head when making sounds by altering the size and shape of empty volumes such as the nasal cavity, mouth, and pharynx. By using your vocal chords as a sound source, and passing that sound through one of these formants, you get an interesting variety of sounds. The formants need not be complex: a set of 3 filters usually suffices to create a sound that we can recognize as an "ooh" or an "ah".
Crystal is well-suited for this kind of application since it has a bank of band pass filters. Where, you say, is this filter bank? It is the set of 4 delays. Each delay has a band pass filter. By simply routing audio through these delays (and setting the delay times to zero, so that we get no echoes), we get a bank of 4 filters.
To get started, just pick a simple sawtooth oscillator, set the filter bank to the appropriate frequencies for the desired formant, and route the audio from the voice through the filter bank. What are the appropriate frequencies for various formants? There are many places on the web where you can find tables of formant frequencies for various vowel sounds. Here's a table that you might find useful (from "The Talk Box and Formant Filtering" by Hans Mikelson):
Vowel |
"ee" |
"i" |
"e" |
"ae" |
"ah" |
"aw" |
"u^" |
"oo" |
"u" |
"er" |
Male spoken |
270
2290
3010
|
390
1990
2550
|
530
1840
2480
|
660
1720
2410
|
730
1090
2440
|
570
840
2410
|
440
1020
2240
|
300
870
2240
|
640
1190
2390
|
490
1350
1690
|
Male sung |
300
1950
2750
|
375
1810
2500
|
530
1500
2500
|
620
1490
2250
|
700
1200
2600
|
610
1000
2600
|
400
720
2500
|
350
640
2550
|
500
1200
2675
|
400
1150
2500
|
Female spoken |
310
2790
3310
|
430
2480
3070
|
610
2330
2990
|
860
2050
2850
|
850
1220
2810
|
590
920
2710
|
470
1160
2680
|
370
950
2670
|
760
1400
2780
|
500
1640
1960
|
Female sung |
400
2250
3300
|
475
2100
3450
|
550
1750
3250
|
600
1650
3000
|
700
1300
3250
|
625
1240
3250
|
425
900
3375
|
400
800
3250
|
550
1300
3250
|
450
1350
3050
|
Child spoken |
370
3200
3730
|
530
2730
3600
|
690
2610
3570
|
1010
2320
3320
|
1030
1370
3170
|
680
1060
3180
|
560
1410
3310
|
430
1170
3260
|
850
1590
3360
|
560
1820
2160
|
Amplitudes (db) |
-4
-24
-28
|
-3
-23
-27
|
-2
-17
-24
|
-1
-12
-22
|
-1
-5
-28
|
0
-7
-34
|
-1
-12
-34
|
-3
-19
-43
|
-1
-10
-27
|
-5
-15
-20
|
Download the following bank file to get Crystal patches which demonstrate this technique:
Mac Download
Windows Download
If you look at the filter frequencies in the table, notice that these are relatively low frequencies and are fairly close to the fundamental frequencies around middle C. Since these filters are very narrow band pass filters, that means that the effective range will not be very wide. In other words, as you try out these patches, you'll have to hunt around on your keyboard to find a range where they sound good. The range may only be a few notes...not unlike the human voice (well, mine at least).
A couple things to note about how these patches were created: First, you want the filters to be very narrow band pass filters. You can make them especially narrow by turning up the Q value and by increasing the feedback (be careful about turning feedback parameters all the way up). Second, once you have the filters configured, route the voice to the filters and turn off the dry output of the voice. Third, adjust the relative volumes of the filter outputs to suit to taste.
The "oh" through "er" patches demonstrate a single formant, that is a sawtooth wave through a bank of 3 filters, with each filter frequency taken from the above table. That's nice, but Crystal is built for moving, responsive, interactive sounds, so let's make it go from ooh to ah.
What we want to do is make the filter frequences go from the values for "oo" to the values for "ah". This is a job for modulation, so go to the modulation matrix and set it up to modulate, or change, the filter frequencies. There are a number of different ways to do this with Crystal, but the "ooF-awF MW" patch does it like this: use 3 rows of the modulation matrix to control delay filters 1, 2, and 3. The low value for each modulation will correspond to the frequencies for oo and the hi value will correspond to ah.
To do this, simply choose modulation wheel as the "Source" for the first three rows of the modulation matrix. Set the targets for those three rows to "Delay 1 Filter Freq", "Delay 2 Filter Freq", and "Delay 3 Filter Freq". The mod wheel will now control the filter frequency values for those three delays.
Now, set the "Low" value for each mod matrix row to correspond to the frequencies for the male sung u^ (400, 720, 2500) and the highs to male sung ah (700, 1200, 2600). Now, when the mod wheel is all the way down, the formant will be male sung u^, and when all the way up will be male sung ah.
Go ahead and try the "oo-ah MW" patch. Hold down a note in the range on the keyboard where it sounds good, and move the mod wheel up and down. The sound will go from oo to ah!
Next, instead of using the mod wheel to modulate the filter frequencies, let's use a modulation envelope. That's what the "oo-ah ME" patch does. It starts out with oo, goes to ah, and returns to oo when you release the key.
Next, let's add a bit of chorus by using pulse width modulation. In other words, let an LFO modulate the pulse width of voice 1. That's what the "oo-ah ME PWM" patch does.
Finally, let's add a second voice harmonized a major 3rd above the original for a 2 voice harmony. Listen to the "oo - ah ME PWM 2V" patch to hear this.
Experiment with different oscillators as the sound source. Try crossfading two voices based on note on velocity (see the VelXFade preset for a velocity cross fade example). Instead of the mod wheel or mod envelope to modulate between oo and ah, try using an lfo. Try different amplitude envelopes for the 2 voices. Try...well, you get the idea :-).