´ Alain de Cheveigneb)
Ircam-CNRS, 1 place Igor Stravinsky, 75004 Paris, France
Received 7 June 2001; revised 10 October 2001; accepted 9 January 2002 An algorithm is presented for the estimation of the fundamental frequency (F 0 ) of speech or musical sounds. It is based on the well-knownautocorrelation method with a number of modiﬁcations that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices andmusic. The algorithm is relatively simple and may be implemented efﬁciently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model periodic signal that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing. © 2002Acoustical Society of America. DOI: 10.1121/1.1458024 PACS numbers: 43.72.Ar, 43.75.Yy, 43.70.Jt, 43.66.Hg DOS
The fundamental frequency (F 0 ) of a periodic signal is the inverse of its period, which may be deﬁned as the smallest positive member of the inﬁnite set of time shifts that leave the signal invariant. This deﬁnition applies strictly only to a perfectly periodic signal,an uninteresting object supposing one exists because it cannot be switched on or off or modulated in any way without losing its perfect periodicity. Interesting signals such as speech or music depart from periodicity in several ways, and the art of fundamental frequency estimation is to deal with them in a useful and consistent way. The subjective pitch of a sound usually depends on itsfundamental frequency, but there are exceptions. Sounds may be periodic yet ‘‘outside the existence region’’ of pitch Ritsma, 1962; Pressnitzer et al., 2001 . Conversely, a sound may not be periodic, but yet evoke a pitch Miller and Taylor, 1948; Yost, 1996 . However, over a wide range pitch and period are in a one-to-one relation, to the degree that the word ‘‘pitch’’ is often used in the place of F 0 ,and F 0 estimation methods are often referred to as ‘‘pitch detection algorithms,’’ or PDA Hess, 1983 . Modern pitch perception models assume that pitch is derived either from the periodicity of neural patterns in the time domain Licklider, 1951; Moore, 1997; Meddis and Hewitt, 1991; Cariani and Delgutte, 1996 , or else from the harmonic pattern of partials resolved by the cochlea in the frequencydomain Goldstein, 1973; Wightman, 1973; Terhardt, 1974 . Both processes yield the fundamental frequency or its inverse, the period. Some applications give for F 0 a different deﬁnition, closer to their purposes. For voiced speech, F 0 is usually
Portions of this work were presented at the 2001 ASA Spring Meeting and the 2001 Eurospeech conference. Electronic mail: firstname.lastname@example.orgﬁned as the rate of vibration of the vocal folds. Periodic vibration at the glottis may produce speech that is less perfectly periodic because of movements of the vocal tract that ﬁlters the glottal source waveform. However, glottal vibration itself may also show aperiodicities, such as changes in amplitude, rate or glottal waveform shape for example, the duty cycle of open and closed phases , orintervals where the vibration seems to reﬂect several superimposed periodicities diplophony , or where glottal pulses occur without an obvious regularity in time or amplitude glottalizations, vocal creak or fry Hedelin and Huber, 1990 . These factors conspire to make the task of obtaining a useful estimate of speech F 0 rather difﬁcult. F 0 estimation is a topic that continues to attract much...