2.Source–filter theory of speech production
3. Vocal-[pic]tract cavity properties and formant frequencies
4. Adaptive design of speech sound inventories
5. Concluding remarks
2. Source–filter theory of speech production
The mapping between VT properties and acoustic signals has been investigated over many decades (Chiba & Kajiyama 1941; Stevens & House 1955, 1961; Fant 1960;Flanagan 1972; Stevens 1998) and, as documented in the last of these cited works, is now reasonably well understood for the major classes of speech sounds. At the core of this understanding lies the assumption that speech outputs can be analysed as the response of a set of VT filters to one or more sources of sound energy. A further assumption, that holds to a first approximation in most cases, is thatthe source and filter properties of the vocal tract are independent.
A source in the vocal tract is any modulation of the airflow that creates audible energy. Such sound-producing modulations occur in the vicinity of constrictions either at the glottis (i.e. the space between the vocal folds of the larynx) or in the supralaryngeal regions of the vocal tract. Several types of source may bedistinguished. One is (quasi-) periodic and consists of cycles of varying airflow attributable to vocal-fold vibration or voicing. Sounds produced with a voiced source have a fundamental frequency (F0) equal to the repetition rate of vocal-fold vibration. They include vowels (e.g. /a/ and /u/), nasal consonants (e.g. /m/), liquids (e.g. /r/ and /l/) and glides (e.g. /w/). Other sources are aperiodic andinclude (i) turbulence noise generated as air flows rapidly through an open, non-vibrating glottis (referred to as ‘aspiration’), (ii) turbulence noise generated as air flows rapidly through a narrow supralaryngeal constriction (referred to as ‘frication’),1 and (iii) a brief pulse of excitation caused by a rapid change in oral air pressure (referred to as a ‘transient source’). Examples of the useof these aperiodic sources are, respectively, the aspirated /h/, the fricatives /f/ and /s/ and the stop consonants /p/ and /t/ (both of which, in stressed-syllable-initial position, tend to be associated with a rapid reduction in oral air pressure at the moment of VT opening). Some speech sounds have multiple sources operating simultaneously or in succession. For example, the fricative /z/ isproduced with a voiced source and a simultaneous turbulence noise (i.e. frication) source, while the stop consonant /t/ may be produced with, in quick succession, a transient source, a frication source and an aspiration source, as the mouth opens (Fant 1973).
All of these sources—both periodic and aperiodic—are well suited for evoking responses from the VT filters. Under normal conditions, eachsource has an energy level sufficient to generate highly audible speech sounds. Moreover, each source has an amplitude spectrum that is fairly broadband, ensuring that even VT filters in the higher-frequency range (1–5[pic]kHz) will tend to be excited.
How, then, does the vocal tract act to filter sound energy generated by the sources? Any fully or partially enclosed volume of air has certain...