Speech Quality Assessment Using Audio Features

Authors

Renuka Devi M.N, Assistant Professor
VTU Research Center, PESIT Bangalore South Campus, Bengaluru, Visvesvaraya Technological University Belagavi, Karnataka, Dept. of CSE, Dayananda Sagar University, Bengaluru, Karnataka, India.
Gowri Srinivasa, Professor
Dept. of Computer Science and Engineering,  PESIT Bangalore South Campus  (currently, PES University), Bengaluru, Karnataka, India.

Abstract

This paper discusses the design of features that aid in the classification of the quality of speech of a speaker. The data used in this study pertains to TED Talks. Since most TED speakers are high achievers and expert orators, we have a rich source of audio cues that define speech that is appealing to a large audience. The features used to categorize the speech quality can be the basis of analyzing the speech quality of novice speakers. Such a system can be used to draw a novice speaker’s attention to specific areas of improvement, such an increase in amplitude or maintaining vocal consistency 22 and facilitate directed effort towards improving the quality of one’s speech. We use a speaker classification technique designed and developed in house including Short Term Energy (STE), Zero Crossing Rate (ZCR), Mean power, Pitch, Magnitude and standard deviation. Finally we use an unsupervised classifying method called ‘Hierarchical clustering technique’ to group speakers into 6 categories.