|
|
||
|
Address:
Saarland University
|
Projects |
Back to Top |
Publications |
Back to Top |
Cinematic Analysis of Automotive Personalization
Christoph Endres, Michael Feld, Tim Schwartz and Christian Müller
Proceedings of the 2nd International Workshop on Multimodal Interfaces for Automotive Applications (MIAA 2010) in Conjunction with IUI 2010, ISBN 978-1-60558-997-8, ACM
show abstract
Personalization has become an important aspect in the design of cars and the human machine interface (HMI). Successful personalization bears the potential of increasing both safety as well as customer satisfaction. In order to reveal innovative concepts in automotive personalization, we present in this paper a cinematic study -- an analysis of what the movie industry "invented" and what might be worthwhile following up upon in research. Thereby, our notion of personalization is twofold: a) tailoring the car or parts of the car to a specific user (group) and b) the car or parts of the car becoming a persona -- the latter being relevant with respect to current automotive research activities on the "emotional" car. Based on the analysis of popular movies and TV series from the last four decades, we introduce a scheme that describes personalization concepts as imagined (and maybe even anticipated) by film-makers, who's creative drives have not been slowed down by the restraints of feasibility and costs.
Multilingual Speaker Age Recognition: Regression Analyses on the Lwazi Corpus
Michael Feld, Etienne Barnard, Charl van Heerden, Christian Müller
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)
[ DOWNLOAD NOTE ]
[BibTeX]
show abstract
Multilinguality represents an area of significant opportunities for automatic speech-processing systems: whereas multilingual societies are commonplace, the majority of speech-processing systems are developed with a single language in mind. As a step towards improved understanding of multilingual speech processing, the current contribution investigates how an important para-linguistic aspect of speech, namely speaker age, depends on the language spoken. In particular, we study how certain speech features affect the performance of an age recognition system for different South African languages in the Lwazi corpus. By optimizing our feature set and performing language-specific tuning, we are working towards true multi-lingual classifiers. As they are closely related, ASR and dialog systems are likely to benefit from an improved classification of the speaker.
In a comprehensive corpus analysis on long-term features, we have identified features that exhibit characteristic behaviors for particular languages. In a follow-up regression experiment, we confirm the suitability of our feature selection for age recognition and present cross-language error rates. The mean absolute error ranges between 7.7 and 12.8 years for same-language predictors and rises to 14.5 years for cross-language predictors.
An Integrated Development Environment for Speech-Based Classification
Michael Feld, Christian Müller
Proceedings of the 13th International Conference ``Speech and Computer'' SPECOM 2009
[ DOWNLOAD NOTE ]
[BibTeX]
show abstract
This paper presents a new machine learning framework for speech-based classification tasks that was developed in conjunction with the Agender project (age and gender recognition for telephone applications). The main goal of this framework is to provide a completely integrated development environment supporting all processes from design over evaluation to deployment of classification systems. It is intended for both researchers as well as application developers and specializes in audio signals as the resource to be classified. We show that the proposed framework outperforms other tools in several aspects.
Speaker Classification for Mobile Devices
Michael Feld, Christian Müller
Proceedings of the 2nd IEEE International Interdisciplinary Conference on Portable Information Devices (Portable 2008)
[pdf]
[BibTeX]
show abstract
User adaptivity is a key topic in the context of mobile devices and applications, and speech is one the sources of information which has more recently been discovered for this purpose. While considerable work has already been done in both finding algorithms and designing well-performing implementations for this speaker classification task on the desktop platform as part of the AGENDER approach, efforts to bring the results to portable platforms in a working framework have been rather scarce so far. This work seeks to state the major aspects that make mobile speaker classification different from its desktop counterpart, and proposes a number of changes and enhancements to the existing infrastructure to fulfill the requirements emerging from it.
Embedded Modules for Speaker Classification
Michael Feld
Proceedings of the Second IEEE International Conference on Semantic Computing 2008 (ICSC 2008)
[ DOWNLOAD NOTE ]
[BibTeX]
show abstract
Classifying speakers and their context is a research topic that increasingly finds its way into market-ready products. This paper describes how a speech-based classification problem can be split into components that are then combined in a classification module, which can be compiled for a specific platform and scenario with its respective technical requirements and limitations. We are focusing on the AGENDER Speaker Classification approach to show how a theoretic model can be transformed into a finished Embedded Module and present a tool that facilitates this in a fully automated build process.
Integrated Speaker Classification for Mobile Shopping Applications
Michael Feld and Gerrit Kahl
Proceedings of the 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2008), July 28 - August 1, 2008, Hannover, Germany
[ DOWNLOAD NOTE ]
[BibTeX]
show abstract
This paper presents an approach to how speaker classification can be used to enable new ways for recommender systems in a mobile shopping environment to bootstrap user models and avoid common problems such as the "early rater". In a concrete shopping scenario, we introduce the speech-controlled Mobile ShopAssist demonstrator that allows a new customer to more quickly find a product that fulfills his or her demographic group’s specific requirements by exploiting features extracted from speech using the AGENDER speaker classification system. We propose a method for computing preference scores based on the user's profile and demonstrate how the application’s GUI can be adapted to deliver the recommendations to the user.
Towards a Multilingual Approach on Speaker Classification
Christian Müller, Michael Feld
Proceedings of the 11th International Conference "Speech and Computer" SPECOM 2006, 25-29 June 2006, St. Petersburg, Russia.
[pdf]
[BibTeX]
show abstract
This paper outlines a framework for a multilingual speaker classification system which is based on an underlying language identification module. First, the Agender speaker classification technology is introduced, a two-layered approach which primarily recognizes the speakers' age and gender but also incorporates novel domain-independent aspects that can be applied to other speaker characteristics like emotions or cognitive load. Then, it is pointed out that one of its major drawbacks consists of the fact that it has not been verified that the chosen set of speech features also works for other languages, especially for those with different phonological aspects. To overcome this drawback, it is suggested to extend Agender with a language identification module. The module presented here is designed to meet the requirements of a specific telephone-based application (which itself is not within the focus of this paper): The languages German, English and Turkish shall be discriminated on the basis of the initial utterance of the speaker; for each of the possible languages, hypotheses about the nature of the initial utterance are available; the domain encompasses a list of English product names. Although the suggested method is as yet only partly implemented, the first evaluation results are very promising: Turkish could be identified with an accuracy of 71.75%, German with an accuracy of 78.39%, and English with an accuracy of 79.89%. Besides this, the paper outlines the use of the language identification module within a multilingual version of Agender.
Erzeugung von Sprecherklassifikationsmodulen für multiple Plattformen
Michael Feld
[pdf]
[BibTeX]
Portierung von Merkmalsextraktion auf die PocketPC-Plattform
Michael Feld
[pdf]
[BibTeX]
show abstract
Diese Arbeit stellt eine Softwarelösung zur Extraktion von phonetischen Merkmalen, die als Grundlage für andere Applikationen dienen, aus einem digitalen Sprachsignal auf der PocketPC-Plattform vor. Die Entwicklung der Lösung wird schrittweise von der Festlegung der Kriterien über den Entwurf bis hin zur Implementierung beschrieben. Das Ergebnis ist eine fertige Bibliothek, welche die geforderten Merkmale bietet. Zuvor wird ein kurzer Überblick über das Projekt M3I, in dessen Rahmen diese Arbeit entstand, sowie die computergestützte Sprachanalyse im Allgemeinen gegeben. Darüber hinaus werden einige softwaretechnische Aspekte der Applikationsportierung angesprochen und im Hinblick auf die mobile Plattform konkretisiert.
Lectures |
Back to Top |
Advised Student Projects |
Back to Top |