teaching machines

Final Project Design Document – voice-activated video player

November 22, 2011 by . Filed under cs491 mobile, fall 2011, postmortems.

For my final project, I’ll be building an incremental prototype that will be used for the Reader-animated storybooks research project I’m involved with.

The big idea is to build a framework for multi-page animated storybooks where animation on each page is triggered by the user reading on-screen words. In the process, we will experiment with different methods of animation (i.e., using video files, a flip book approach, or perhaps writing an interface that an artist familiar with Flash could use directly for simple keyframe animation), and hopefully different methods of speech recognition (the built-in Android SpeechRecognizer class for a cloud-based solution, CMU Sphinx for on-device, offline functionality, or even some direct audio signal processing).

That said, this assignment will be a small first step. Users will be able to select videos from a menu screen and activate controls via voice commands like “Play Video 3”. “Stop”, “Rewind”, “Menu”, etc.

In addition to being able to re-use the custom video player code, this assignment should help the research group start to understand how network latency that comes with SpeechRecognizer will affect user experience.