# Introduction

he spectral centroid (SC) is one of the low level spectral domain features of a signal useful in signal classification or identification applications. The spectral centroid has been proposed by researchers in several applications like estimating the timbral brightness of music [1], for discriminating between the speech and the music [2,3,4], Speaker Recognition [5], Noisy Speech Recognition [6,7], Identification of Musical Instruments [8]. The spectral centroid was also incorporated as one of the Audio Low level features for audio content in MPEG-7 multimedia standard [9]. In [10], an AR(2) model based dynamic estimation of spectral centroid of a Narrowband Acoustic Doppler Volume Backscattering Signal was proposed.

The spectral centroid represents the "center of gravity" of the magnitude or power spectrum of a signal. Perceptually, the spectral centroid is a measure of the brightness of a sound. The unit of such a centroid would be the unit of frequency, Hz. Intuitively, the spectral centroid of a single tone signal is the frequency of the tone itself. Similarly, the spectral centroid of a signal having two equal amplitude real sinusoids is the mean frequency of two sinusoids.

Mostly, the natural or real signals (e.g. speech, voice, audio, etc) are nonstationary in nature. Classification of such signals requires extraction of dynamic features that change with time. When spectral centroid is considered a promising feature, it is estimated dynamically from short segments of signal (one value of each segment), and the spectral centroid vector thus obtained for the entire signal becomes a feature vector for the classification system. The estimation of the spectral centroid from a short segment of signal data is a challenging task due to the windowing effects. In the literature, to the best of the knowledge of the author, .no systematic study results were reported on the finite data effects on the estimation of spectral centroid.

In this paper, a systematic study is carried out on the estimation of spectral centroid from finite data of different lengths. The windowing effects on the estimation error are investigated considering certain deterministic signals that appear frequently in speech and audio content. A novel algorithm is proposed to counter the finite window effects and for better estimation of spectral centroid. Well structured signals are used to make the bench marking easy, nevertheless the algorithm can be applied on any kind of real signals.

The remainder of the paper is organized as follows. The mathematical basics of spectral centroid are introduced in the section II. Short time fourier transform (STFT) for estimating the magnitude spectrum of the signal dynamically is presented in section III. The proposed algorithm along with the flowchart is discussed in section IV. Section V discusses the details of simulations and the test signals used in the simulations. Section VI presents the results and discussions on the findings. Finally conclusions on the research work are drawn in Section VII.


# II.


# Spectral Centroid

Mathematically, the spectral centroid of a continuous time signal y(t) is given by
???? = ? ð??"ð??" ??(ð??"ð??")??ð??"ð??" ? 0 ? ??(ð??"ð??")??ð??"ð??" ? 0(1)
where ??(ð??"ð??") is the one-sided magnitude spectrum of the signal y(t).

The counter part of the discrete time signal y(n) is given by
???? = ? ?? ? |??(??)| ???1 ??=0 ? |??(??)| ???1 ??=0(2)
where ??(??) is the one-sided power spectrum of the signal y(n).

For example, the magnitude spectrum of a tone signal of unit amplitude and frequency F is an impulse at F Hz on the frequency axis. The spectral centroid of this signal is F Hz Itself. Similarly, the magnitude spectrum of a signal consisting of two tones of equal amplitude and frequencies F 1 and F 2 contains two equal amplitude impulses at F 1 Hz and F 2 Hz on the frequency axis. The spectral centroid of this signal is the mid frequency of F 1 and F 2 i.e. (F 1 + F 2 )/2 Hz. If the amplitudes of two tones are not equal, then the spectral centroid is biased towards the higher amplitude tone. Figure 1  In each case, the sum of amplitudes is selected to be unity. This is to make the amplitude spectrum resemble a probability function. The figure 1  


# Short Time Fourier Transform

When fourier transform is applied on short segments of data to dynamically analyze the signal, it is called short time fourier transform (STFT). To carry out the the short term analysis of a signal, the given signal 
??(??, ??) = 1 ???? ?? ?? ?? ?? (?? + ????) ??(??)?? ??? 2?? ???? ?? ???1 ??=0 ? 2 0 ? ?? ? ?? ? 1, 0 ? ?? ? ?? ? 1 (3)
where k is the discrete frequency index, l is the time frame index, M is the hop size, K is the total number of bins of ones-sided STFT and L is the total number of frames. The spectral centroid is computed from the magnitude spectrum of each frame of signal, thus yielding a SC vector of length L., and is given by
????(??) = ? ?? ? ??(??, ??) ???1 ??=0 ? ??(??, ??) ???1 ??=0 0 ? ?? ? ?? ? 1 IV.

# Proposed Algorithm for Spectral Centroid Estimation

The input signal data is segmented into overlapped frames of frame size (W) with 50% overlap i.e. with a hop size of W/2. For each frame, Short Time Fourier Transform (STFT) is computed using FFT algorithm with Nfft points between [0,Fs/2]. The onesided magnitude spectrum is computed from the FFT output.

The algorithm for computing the Spectral Centroid is given in figure 2. When the steps in the dashed boxes A, B and C are eliminated, then the algorithm computes the spectral centroid using the equation ( 4) directly and it called the direct method here.

In the proposed method, a threshold STH is applied on the magnitude spectrum of each frame (operation: A) and a peak detection algorithm is applied on the spectral coefficients above the threshold (operation: B). Once the peaks are detected, magnitude spectrum is modified keeping only the peak values and making all other coefficients zero. The spectral centroid is then computed using this modified magnitude spectrum (operation: C). In this way the junk spectral coefficients (artifacts) which are produced due to finite data are get rid of from the computation process resulting in more accurate estimation of spectral centroid.

V.


# Simulations

The DFT spectrum is computed with 4096 points; thus for a sampling frequency of 44100Hz, the spectrum is computed with a resolution of /4096=10.76Hz and the frequency grid is (0, 

The algorithm is tested on the three categories of simulated test signals:
? Tones ? Sum of Tones ? Band Limited Unit Impulse Trains a) Test Data Set:1 (Tones)
In the first category, a set of 41 sine wave signals of frequencies: 96.9Hz, 635.23Hz, 1173.56Hz, ? , 21091.77Hz, 21630.10Hz with a uniform spacing of 538.33Hz and random amplitudes in the range [0,1] are generated. These spot frequencies are selected so as to coincide with the DFT grid points on the frequency line (0 -Fs/2) i.e. 0Hz -22050Hz, where Fs=44100Hz.


# b) Test Data Set:2 (Sum of Tones)

In the second category, a sum of 5 or 10 or 50 sine waves of distinct frequencies are generated. In each case, the sine waves are separated with a uniform spacing of 10.76 Hz or 96.90Hz or 495.26Hz. These spacing are selected so as the generated frequencies coincide with the DFT grid points. In each set of 5 or 10 or 50 frequencies. the first frequency is taken from one of the 41 spot frequencies of the first category, the total number of composite signals generated under this category is 41 x 3 x 3 =369.


# c) Test Data Set:3 (Band Limited Unit Impulse Trains)

In the third category, a set of Band Limited Unit Impulse Trains (BLUITs) each with a different fundamental frequency is generated. The frequencies of 41 sine waves of first category are used as fundamentals, thus we get 41 sets of BLUITs. The spectral envelope of each BLUIT can be constant (i.e. 0dB/Octave) or decay at a rate of 12dB/Octave. The Fundamental frequencies and number of harmonics in each BLUIT (=0.5 F s /F 0 ) are given in the table 2 


# Results

In this section, the results obtained by applying both the direct and proposed methods are presented. The performance comparison of both the methods is also given. The SC estimation results of Test Set-1 (Tones) signals of frequency spanning from 96.8994 Hz to 21630.1025Hz of 0.5 sec duration (hamming window size is 512, Fs=44100Hz) for both direct and proposed methods are given in Table .2. Each row in the table 2 corresponds to the estimated SC vector of a particular tone frequency of duration 0.5 seconds of full length signal corresponding to a total of 22050 samples. Both the mean (µ) and standard deviation (?) of this estimated spectral centroid vector is computed and given in the 3 rd column of the table 2.

The estimated errors for direct method are large at both the lowest and the highest frequencies in the range. For the lowest (start) frequency the error is negative and for the highest (end) frequency it is positive. It means the direct method over estimates the SC at lower frequencies and under estimates at the higher frequencies. This is because of the fact that for lower frequencies, the spectral mass distribution on either side of the tone frequency is unevenly distributed and is more on the right (higher frequency) side.Hence, the estimated values shift towards the higher side of the frequency axis.

Similarly, for higher frequencies, the estimated values shift towards the lower side of the frequency axis. As the frequency of the tone is spanned from the lowest frequency (96.8994Hz) to the highest frequency 21630.1025Hz), the mean error (µ) reduces and becomes zero at the middle of the range i.  2). For each tone, the standard deviation (?) is also computed. The estimation results of the proposed method for the same set of signals are given in the 5 th and the 6 th columns of table 2. This method exactly estimates the SC and hence both the mean (µ) and standard deviation (?) are zeros. The spectral threshold STH is chosen as the 0.02 fraction of the maximum value of the magnitude spectrum, which corresponds to about -14 dB down the peak value. This is approximately the side lobe level (SLL) of the spectrum of rectangular window. For other windows the SLL is always less than -13dB, though the main lobe width is more compared to that of a rectangular window, which anyway does not affect the peak detection process.

The estimation results of table 2 are also shown in figure 3(a) for both direct (solid line) and proposed (dashed line) methods are shown. For direct method, the RMS range of the estimated Centroid is marked as red vertical lines at each point. For the proposed method the estimated value is exactly equal to true value, hence the RMS range is zero. Thus no red vertical lines are seen on the dashed line. The figure (b) shows the similar results for window size is 256.  The estimation error follows a regular pattern for window size of 512 sample compared to the error for 256 sample window. This is due to the fact that the data has become too short to get a meaningful estimate. However, the error is almost symmetric around the middle frequency i.e. Fs/4. This symmetry would be disturbed if the window size is further reduced. The error becomes more for lower frequencies, as more number of cycles of the signal are not included in the short segment. So the window size is to be carefully selected based on the lowest frequency under consideration so that considerable number of signal cycles are included in the window. The figure 4   The results say that the estimation using the proposed is always better than that of the direct method. The accuracy is extremely well for larger spacing of tone frequencies, the reason being the better separation of. spectral peaks.  


# Conclusions

In this paper, windowing effects on the spectral centroid estimation are investigated considering three types of well structured signals: Tones, Sum of Tones and Band Limited Unit Impulse Trains. These test signals are considered because they appear frequently in speech and audio content. The spectral centroid is estimated using two methods: (1). the direct method using the equation 4. (2). The proposed method that uses threshold and peak detection on the magnitude spectrum. The proposed algorithm is shown to estimate the spectral centroid more accurately compared to direct method for all the signals under consideration and for all window lengths.
![(a) shows a sine wave of frequency 5840.20Hz and unity amplitude. Naturally the SC is also the same frequency 5840.20Hz. In figure 1(b) the signal consists of two sine waves of frequencies: 5840.20Hz and 11401.83Hz, and equal amplitude of 0.5. Here the SC is the mean of the two frequencies i.e. 8441.02Hz. In figure 1(c) the signal consists of two sine waves: 5840.20Hz (amp: 0.70) and 11401.83Hz (amp: 0.30). Here the SC (7256.69Hz) shifts towards the left from the mid (mean) value because the first sine wave amplitude is high. In figure 1(d) the signal consists of two sine waves: 5840.20Hz (amp: 0.15) and 11401.83Hz (amp: 0.85). In this case, the SC (10513.59Hz) shifts towards the right from the mid value. Because the second sine wave amplitude is high.](image-2.png "")
1![Fig.1: Description of Spectral Centroid. For cases of F 1 and F 2 are given in (a) through (d). In each case the sum of spectral amplitudes are selected to be unity. The spectral centroid in each case is shown as red colored star mark](image-3.png "Fig. 1 :")
![x(n) is divided into overlapping frames of size N, each frame is weighed by a window function w(k), typically a hamming or a hanning window and analyzed by using the Fourier Transform. A matrix is formed by arranging the short time fourier transform (STFT) coefficients as J e XV Issue IV Version I columns and is popularly known as a spectrogram, given by](image-4.png "")
3![Fig. 3 : SC Estimation Error of Test set: 1 (tone) signals of frequency spanning from 96.8994 Hz to 21630.1025Hz of 0.5 sec duration (a) for window size of 512). (b). for window size of 256](image-5.png "Fig. 3 :")
45![Fig. 4 : Magnitude spectrum of a single frame of tone signals of frequencies: 96.8994 Hz, 10863.501Hz and 21630.1025Hz on the left side (a), (c) and (e) for window length of 512 samples. Corresponding estimated spectral centroid vectors on the right side (b), (d) and (f)](image-6.png "Fig. 4 :Fig. 5 :")
678![Fig. 6 : (a). SC Estimation Error of Test set: 2 (sum of tones with a frequency spacing of 200 Hz) signals of lowest frequency spanning from 96.8994 Hz to 21630.1025Hz of 0.5 sec duration (window size is 512) for both direct (solid line) and proposed (dashed line) methods. (b). Same as (a) for window size is 256](image-7.png "Fig. 6 :Fig. 7 :Fig. 8 :")
9![Fig. 9 : SC Estimation Error of Test set: 3 (BLUITs with a fundamental frequency spanning from 96.8994 Hz to 21630.1025Hz of 0.5 sec duration; spectral slope 0 dB/Octave) for both direct (red line) and proposed (blue line) methods for (a). 256 sample window (b). 512 sample window (c). 768 sample window (d). 1024 sample window](image-8.png "Fig. 9 :")
910![Figure9: shows the estimation results for Test set: 3 (BLUITs) with a fundamental frequency spanning from 96.8994 Hz to 21630.1025Hz of 0.5 sec duration and spectral slope of 0 dB/Octave) for window sizes of 256, 512 768 and 1024 samples. Again results are extremely well for proposed method compared to those of the direct method, while the direct method fails even for larger window sizes. In figure10, the estimation errors for Test set: 3 (BLUITs) signals of spectral slope of -12dB/Octave are shown for window sizes of 256, 512, 768 and 1024 samples. It can be observed that in all cases, mean error drastically low compared to that of direct method. More over, as the window length increases, the standard deviation of estimation error reduces faster for the proposed method compared to that of the direct method. (first two lines are rearranged properly)](image-9.png "Figure 9 :Fig. 10 :")
![](image-10.png "")
![](image-11.png "")

Thusthetotaldatasetcomprises450 (=41 + 369 + 40) differently structured test signals.VI.
1
2Investigation of Window Effects and the Accurate Estimation of Spectral CentroidToneTrue SpectralSpectral CentroidSC Est. ErrorSpectral CentroidSC Est. ErrorYear 2015noCentroid (Hz)(Estimated by Direct Method)(Hz)(Direct Method) (Hz)(Estimated by Proposed Method)(Proposed Method) (Hz)351 2 3 4 5 6 7 8 9 10 11 12 13 14(1) 96.8994 635.2295 1173.5596 1711.8896 2250.2197 788.5498 3326.8799 3865.21 4403.54 4941.8701 5480.2002 6018.5303 6556.8604 7095.1904(2) 634.033 ± 103.2825 1107.9746 ± 114.8693 1608.3049 ± 107.8578 2117.5429 ± 99.1608 2623.4037 ± 94.3929 3131.4122 ± 89.3034 3642.626 ± 86.5037 4158.9111 ± 81.3724 4669.3952 ± 79.7196 5183.2597 ± 76.1674 5698.366 ± 75.0958 6217.31 ± 71.2696 6730.1438 ± 70.2138 7247.1104 ± 67.5403(1) -(2) -537.13 -472.75 -434.75 -405.65 -373.18 -342.86 -315.75 -293.7 -265.86 -241.39 -218.17 -198.78 -173.28 -151.92(Hz) (3) 96.8994 ± 0 635.2295 ± 0 1173.5596 ± 0 1711.8896 ± 0 2250.2197 ± 0 2788.5498 ± 0 3326.8799 ± 0 3865.21 ± 0 4403.54 ± 0 4941.8701 ± 0 5480.2002 ± 0 6018.5303 ± 0 6556.8604 ± 0 7095.1904 ± 0(1) -(3) 0 0 0 0 0 0 0 0 0 0 0 0 0 0( ) Volume XV Issue IV Version I J15 16 17 18 19 20 21 22 23 24 257633.5205 8171.8506 8710.1807 9248.5107 9786.8408 10325.1709 10863.501 11401.8311 11940.1611 12478.4912 13016.82137764.439 ± 66.7206 8283.9504 ± 64.5743 8797.6702 ± 64.3643 9316.0265 ± 63.4916 9834.3672 ± 63.363 10353.7776 ± 62.7943 10867.8735 ± 62.924 11387.1281 ± 63.1554 11905.6919 ± 63.1063 12424.4625 ± 63.3744 12938.3795 ± 63.3481-130.92 -112.1 -87.49 -67.52 -47.53 -28.61 -4.37 14.7 34.47 54.03 78.447633.5205 ± 0 8171.8506 ± 0 8710.1807 ± 0 9248.5107 ± 0 9786.8408 ± 0 10325.1709 ± 0 10863.501 ± 0 11401.8311 ± 0 11940.1611 ± 0 12478.4912 ± 0 13016.8213 ± 00 0 0 0 0 0 0 0 0 0 0of Researches in Engineering26 27 28 29 30 3113555.1514 14093.4814 14631.8115 15170.1416 15708.4717 16246.801813457.8904 ± 64.7948 13975.7026 ± 65.3201 14493.3087 ± 67.2566 15006.592 ± 67.9661 15525.9341 ± 71.2219 16042.2271 ± 72.416697.26 117.78 138.5 163.55 182.54 204.5713555.1514 ± 0 14093.4814 ± 0 14631.8115 ± 0 15170.1416 ± 0 15708.4717 ± 0 16246.8018 ± 00 0 0 0 0 0Global Journal3216785.131816557.6751 ± 75.7913227.4616785.1318 ± 003317323.461917069.2012 ± 76.9287254.2617323.4619 ± 003417861.79217586.7432 ± 81.2453275.0517861.792 ± 003518400.122118099.7754 ± 83.0062300.3518400.1221 ± 003618938.452118610.6458 ± 87.3431327.8118938.4521 ± 003719476.782219118.961 ± 90.5547357.8219476.7822 ± 003820015.112319632.1551 ± 97.5527382.9620015.1123 ± 003920553.442420138.8757 ± 102.7702414.5720553.4424 ± 004021091.772520639.2884 ± 109.3145452.4821091.7725 ± 004121630.102521136.0545 ± 116.1857494.0521630.1025 ± 00
			© 20 15 Global Journals Inc. (US)
			© 2015 Global Journals Inc. (US)
		
		
## Global Journals Inc. (US) Guidelines Handbook 2015

www.GlobalJournals.org
			
			
* 
	
		Does Timbral Brightness Scale with Frequency and Spectral Centroid?
		
			EmerySchubert
		
		
			JoeWolfe
		
	
		Acta Acustica United With Acustica
		
			92
			
			2006
		
	
* 
	
		Construction and evaluation of a robust multifeature speech/music discriminator
		
			EScheier
		
		
			MSlaney
		
	
		Proc. IEEE ICASSP
				IEEE ICASSP
		
			1997
		
	
* 
	
		Content-based classification, search, and retrieval of audio
		
			EWold
		
		
			TBlum
		
		
			DKeislar
		
		
			JWheaton
		
	
		IEEE Multimedia Mag
		
			3
			
			1996
		
	
* 
	
		Toward automatic music audio summary generation from signal analysis
		
			GPeeters
		
		
			ALBurthe
		
		
			XRodet
		
	
		Proceedings of the Third International Conference on Music Information Retrieval
				the Third International Conference on Music Information RetrievalParis, France
		
			2002
			
		
* 
	
		Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition
		
			Jia Min Karen Kua
		
		
			Al
		
	
		The Speaker and Language Recognition Workshop
				Brno, Czech Republic
		
			28 June -1 July 2010
			
		
* 
	
		Recognition of Noisy Speech Using Dynamic Spectral Subband Centroids
		
			JingdongChen
		
		
			Al
		
	
		IEEE Signal Processing Letters
		
			11
			2
			
			February 2004
		
	
* 
	
		Robust Speech Recognition in Noisy Environments Based on Subband Spectral Centroid Histograms
		
			BojanaGajic´
		
		
			KuldipKPaliwal
		
	
		IEEE Transactions On Audio, Speech, And Language Processing
		
			14
			2
			
			March 2006
		
	
* 
	
		Selecting Proper Features and Classifiers for Accurate Identification of Musical Instruments
		
			MChandwadkar
		
		
			MSSutaone
		
	
		International Journal of Machine Learning and Computing
		
			3
			2
			
			April 2013
		
	
* 
	
		Introduction to MPEG-7
		
			Al
		
		B. S. Manjunath
		
			2002
			Wiley
		
	
	1st edition


* 
	
		Narrowband Acoustic Doppler Volume Backscattering Signal-Part II: Spectral Centroid Estimation
		
			Xiao-JiaoTao
		
		
			Al
		
	
		IEEE Transactions On Signal Processing
		
			50
			11
			
			November 2002