SENG 440: Lecture 22 – Speech Recognition
Dear students,
Today we will create an app called Recog that presents anagrams for the user to unscramble. But instead of typing, the user will speak the answer. We’ll use Android’s speech recognition facilities to make this happen.
Next lecture we will explore the new CameraX API that was just announced at Google I/O 2019. See the TODO below for the assigned exercises.
Consider this extra credit opportunity:
On Friday, 24 May at 1 PM, Chris Johnson will presenting a seminar entitled Computational Making in JE445. Attend and write down a response on a quarter sheet for an extra 0.5 Pakipaki.
Recog
Following are the exercises assigned last time. We will use and discuss the solutions that you’ve submitted as we assemble our app, but I include my solutions here for reference.
- Write extension function
anagramforString. It generates a newStringthat is a random shuffling of the letters of the originalString. Ensure that the shuffling isn’t identical to the originalString. Possible solution.fun String.anagram(): String { var candidate: String do { candidate = this.toCharArray().toList().shuffled().joinToString("") } while (candidate == this) return candidate }
- Write extension function
hasSameLettersforStringthat accepts anotherStringas a parameter. It returns true if the twoStrings have the same letters. For example,"these".hasSameLetters("sheet")returns true. We need this method because some anagrams may have multiple correct answers, and we want to accept any of them, not just the one we have in mind. For example, reab could unscramble to either bear or bare. Both have the same pronunciation. So it is with break and brake. Can you think of others? Possible solution.fun String.hasSameLetters(other: String) = this.toCharArray().sort() == other.toCharArray().sort()
- Define class
RandomWordFetcherto extendAsyncTaskand accept aMainActivity, a host string, and a key string as constructor parameters. In the background, fetch the JSON from the Words API endpoint athttps://wordsapiv1.p.mashape.com/words/, with the query parametersrandom=true,frequencyMin=4, andletters=5. Send also the headersX-RapidAPI-HostandX-RapidAPI-Keywith the specified strings. The resulting JSON will be formatted according to Words API. In the main thread, assign the randomly generated word to thewordproperty ofMainActivity. Possible solution.class RandomWordFetcher( context: MainActivity, private val host: String, private val key: String ) : AsyncTask<Unit, Unit, String>() { private val context = WeakReference(context) override fun doInBackground(vararg p0: Unit): String { val endpoint = "https://wordsapiv1.p.mashape.com/words/" val parameters = mapOf("random" to "true", "frequencyMin" to "4", "letters" to "5") val url = parameterizeUrl(endpoint, parameters) val headers = mapOf("X-RapidAPI-Host" to host, "X-RapidAPI-Key" to key) val json = getJson(url, headers) val word = json.getString("word") return word } override fun onPostExecute(word: String) { super.onPostExecute(word) context.get()?.let { it.word = word } } }
- Define method
generateRandomWordinMainActivityto start up a newRandomWordFetchertask. Pass string resources with IDswords_api_hostandwords_api_key. Possible solution.private fun generateRandomWord() { RandomWordFetcher(this, resources.getString(R.string.words_api_host), resources.getString(R.string.words_api_key)).execute() }
- Define property
wordinMainActivityto show the word inwordLabelandlistento the user. Possible solution.var word: String = "" set(value) { field = value wordLabel.text = value.anagram() listen() }
- Define method
listento start a new speech recognition activity. Use request code 440. Possible solution.private fun listen() { // For dialog approach. val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) intent.putExtra(RecognizerIntent.EXTRA_PROMPT, wordLabel.text) intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, "en-US") startActivityForResult(intent, 440) // For non-dialog approach. // recognizer.startListening(recognizeIntent) }
- Define method
onActivityResultto respond to the results of listening. Grab the candidate utterances and check to see if any of the candidates is a correct unscrambling. Possible solution.override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) { when (requestCode) { 440 -> { if (resultCode == RESULT_OK && data != null) { val candidates = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS) checkCandidates(candidates) } } else -> super.onActivityResult(requestCode, resultCode, data) } }
- Define method
checkCandidatesto accept anArrayListofString. If any element of the list has the same letters as the current word, generate a new word. Otherwise, pop up a toast advising the player to try again and listen to the user for another attempt. Possible solution.private fun checkCandidates(candidates: ArrayList<String>) { if (candidates.any { word.hasSameLetters(it) }) { generateRandomWord() } else { Toast.makeText(this, "Nope. Try again.", Toast.LENGTH_SHORT).show() listen() } }
- Define field
recognitionListenerto be an instance ofRecognitionListener. If an error occurs, pop up a toast warning. If no error occurs, grab the candidate utterances and check to see if any of the candidates is a correct unscrambling. Possible solution.private val recognitionListener = object : RecognitionListener { override fun onReadyForSpeech(p0: Bundle?) {} override fun onRmsChanged(p0: Float) {} override fun onBufferReceived(p0: ByteArray?) {} override fun onPartialResults(p0: Bundle?) {} override fun onEvent(p0: Int, p1: Bundle?) {} override fun onBeginningOfSpeech() {} override fun onEndOfSpeech() {} override fun onError(p0: Int) { Toast.makeText(this@MainActivity, "Error. Starting over.", Toast.LENGTH_SHORT).show() } override fun onResults(results: Bundle) { results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let { checkCandidates(it) } } }
- Define method
initializeSansDialogto create a newSpeechRecognizerthat calls back torecognitionListener. Once it’s constructured, generate a new random word. Possible solution.private fun initializeSansDialog() { recognizeIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply { putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM) putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, packageName) putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true) } recognizer = SpeechRecognizer.createSpeechRecognizer(this) recognizer.setRecognitionListener(recognitionListener) generateRandomWord() }
The full source can be found on GitHub. The master branch contains the final version, and the todo branch has the exercises incomplete.
TODO
Next lecture we will create an app called Two-Face that allows the user to take a split image on the front-facing camera. The left half and right half are taken at separate times. You can achieve some strange effects with such a camera, like a selfie in which you have two tongues, a before and after shot, or a blend of you and your sister.
The UI has 7 widgets:
- a
TextureViewnamedpreviewViewthat shows the camera preview - two capture buttons to take a picture and retain just half of it
- two
ImageViews that show the captured half-picture - two reset buttons that dispose of a previously taken half-picture
The exercises don’t require much knowledge of the widgets, but here’s a quick breakdown of the flow. The preview is always active and sits in the background. The two capture buttons are visible initially. As soon as a half-picture is taken, its capture button is hidden, and the ImageView and reset button appear in its place. When reset is hit, the ImageView and reset button are hidden, and the capture button is made visible.
Consult the following resources as you complete your exercises:
We’ll start with the following code, which implements the flow described above and finagles the transformations to get the images to appear correctly:
class MainActivity : PermittedActivity() {
private lateinit var captureUseCase: ImageCapture
private lateinit var previewView: TextureView
private lateinit var leftImageView: ImageView
private lateinit var rightImageView: ImageView
private lateinit var leftResetButton: ImageButton
private lateinit var rightResetButton: ImageButton
private lateinit var leftCaptureButton: ImageButton
private lateinit var rightCaptureButton: ImageButton
private var leftBitmap: Bitmap? = null
private var rightBitmap: Bitmap? = null
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
leftImageView = findViewById(R.id.leftImageView)
rightImageView = findViewById(R.id.rightImageView)
leftResetButton = findViewById(R.id.leftResetButton)
rightResetButton = findViewById(R.id.rightResetButton)
leftCaptureButton = findViewById(R.id.leftCaptureButton)
rightCaptureButton = findViewById(R.id.rightCaptureButton)
previewView = findViewById(R.id.leftView)
leftImageView.visibility = View.INVISIBLE
rightImageView.visibility = View.INVISIBLE
leftResetButton.visibility = View.INVISIBLE
rightResetButton.visibility = View.INVISIBLE
requestPermissions(arrayOf(Manifest.permission.CAMERA, Manifest.permission.WRITE_EXTERNAL_STORAGE), 100, {
previewView.post {
initializeUseCases()
}
registerCallbacks()
}, {
Log.d("FOO", "Bad...")
})
window.decorView.systemUiVisibility =
View.SYSTEM_UI_FLAG_IMMERSIVE_STICKY or
View.SYSTEM_UI_FLAG_FULLSCREEN or
View.SYSTEM_UI_FLAG_HIDE_NAVIGATION or
View.SYSTEM_UI_FLAG_LAYOUT_STABLE or
View.SYSTEM_UI_FLAG_LAYOUT_HIDE_NAVIGATION or
View.SYSTEM_UI_FLAG_LAYOUT_FULLSCREEN
}
private fun registerCallbacks() {
leftCaptureButton.setOnClickListener {
takeLeftImage()
}
rightCaptureButton.setOnClickListener {
takeRightImage()
}
leftResetButton.setOnClickListener {
leftBitmap = null
syncLeft()
}
rightResetButton.setOnClickListener {
rightBitmap = null
syncRight()
}
}
private fun sizeToCover(frame: View, image: Bitmap): SizeF {
val frameAspect = frame.width / frame.height.toFloat()
val imageAspect = image.width / image.height.toFloat()
val scaledWidth: Float
val scaledHeight: Float
if (frameAspect >= imageAspect) {
scaledWidth = frame.width.toFloat()
scaledHeight = scaledWidth / imageAspect
} else {
scaledHeight = frame.height.toFloat()
scaledWidth = scaledHeight * imageAspect
}
return SizeF(scaledWidth, scaledHeight)
}
private fun coverAnchoredLeft(frame: View, image: Bitmap): Matrix {
val scaled = sizeToCover(frame, image)
val xform = Matrix()
xform.postScale(-1f, 1f, image.width * 0.5f, 0f)
xform.postScale(scaled.width / image.width.toFloat(), scaled.height / image.height.toFloat())
return xform
}
private fun coverAnchoredRight(frame: View, image: Bitmap): Matrix {
val scaled = sizeToCover(frame, image)
val xform = Matrix()
xform.postScale(-1f, 1f, image.width * 0.5f, 0f)
xform.postScale(scaled.width / image.width.toFloat(), scaled.height / image.height.toFloat())
return xform
}
private fun updatePreviewTransform(textureSize: Size) {
val textureAspect = textureSize.height / textureSize.width.toFloat()
val scaledWidth: Float
val scaledHeight: Float
if (previewView.width > previewView.height) {
scaledHeight = previewView.width.toFloat()
scaledWidth = previewView.width * textureAspect
} else {
scaledHeight = previewView.height.toFloat()
scaledWidth = previewView.height * textureAspect
}
val centerX = previewView.width * 0.5f
val centerY = previewView.height * 0.5f
val xform = Matrix()
xform.postRotate(-viewToRotation(previewView).toFloat(), centerX, centerY)
xform.preScale(scaledWidth / previewView.width.toFloat(), scaledHeight / previewView.height.toFloat(), centerX, centerY)
previewView.setTransform(xform)
}
private fun syncLeft() {
if (leftBitmap == null) {
leftResetButton.visibility = View.INVISIBLE
leftImageView.visibility = View.INVISIBLE
leftCaptureButton.visibility = View.VISIBLE
} else {
leftCaptureButton.visibility = View.INVISIBLE
leftImageView.visibility = View.VISIBLE
leftResetButton.visibility = View.VISIBLE
leftBitmap?.let {
leftImageView.setImageBitmap(it)
leftImageView.imageMatrix = coverAnchoredRight(leftImageView, it)
}
}
}
private fun syncRight() {
if (rightBitmap == null) {
rightResetButton.visibility = View.INVISIBLE
rightImageView.visibility = View.INVISIBLE
rightCaptureButton.visibility = View.VISIBLE
} else {
rightCaptureButton.visibility = View.INVISIBLE
rightImageView.visibility = View.VISIBLE
rightResetButton.visibility = View.VISIBLE
rightBitmap?.let {
rightImageView.setImageBitmap(it)
rightImageView.imageMatrix = coverAnchoredLeft(rightImageView, it)
}
}
}
}
The exercises that you will collectively complete are listed below. Check your email for your assigned exercise and a link to submit your solution. Become an expert on your particular corner of the app, investigating background material as needed. Build on top of the activity and others’ code as you complete your exercise. No solution should be very long, but some have more overhead like curly braces and method signatures than others.
- Write method
createPreviewUseCasethat accepts parameters for the screen resolution as aSizeand the screen aspect ratio as aRational. Return aPreviewfor the front-facing camera, a target resolution that matches the screen resolution, a target aspect ratio that matches the screen aspect ratio, and a rotation that matches the orientation ofpreviewView. Add aOnPreviewOutputUpdateListenerto the use case that does three things:- Link the
SurfaceTextureofpreviewViewto the preview output. - Trigger
updatePreviewTransformto set up the transformation of the preview. Pass the output’s texture size.
- Link the
- Write method
initializeUseCasesthat uses CameraX to bind preview and capture use cases. Compute the screen size and aspect ratio ofpreviewViewusing the dimensions yielded bypreview.display.getRealMetrics. - Write method
createCaptureUseCasethat accepts parameters for the screen resolution as aSizeand the screen aspect ratio as aRational. Return anImageCapturefor the front-facing camera, a target aspect ratio that matches the screen aspect ratio, and a rotation that matches the orientation ofpreviewView. - Write method
viewToRotationthat accepts aViewparameter. Examine the view’s display rotation. Return 0, 90, 180, or 270 accordingly. - Write method
imageProxyToBitmapthat accepts anImageProxyparameter. Return the wrapped image as aBitmap. (You’ll find more reference code dealing withImagethanImageProxy. Luckily, they have the same interface.) - Write method
transformBitmapthat accepts parameters for aBitmapand a number of degrees as anInt. Return a new version of the bitmap has been rotated about its center by the specified number of degrees. Use a version ofBitmap.createBitmapthat accepts aMatrix. - Write method
meldBitmapsthat accepts parameters for a leftBitmapand a rightBitmap. Return a new bitmap in which the two images appear side-by-side. Use theARGB_8888pixel format. - Write method
saveBitmapthat accepts aBitmapparameter. It stores theBitmapas a JPEG in the top-level external storage directory under the nametwoface.jpg. - Write method
meldBitmapsAndSaveMaybethat melds the two bitmaps and saves the combined bitmap—but only if bothleftBitmapandrightBitmapare valid. - Write method
takeLeftImagethat triggers thecaptureUseCaseto take a picture in-memory. On a successful capture, convert the captured image to a bitmap, close the image, transform it according to the rotation, and crop it to contain only the left half of the image. Update theleftBitmapfield, try melding the bitmaps and saving the result, then sync the left half of the UI. - Write method
takeRightImagethat triggers thecaptureUseCaseto take a picture in-memory. On a successful capture, convert the captured image to a bitmap, close the image, transform it according to the rotation, and crop it to contain only the right half of the image. Update therightBitmapfield, try melding the bitmaps and saving the result, then sync the right half of the UI.
See you next time!
P.S. It’s time for a haiku! I had a hard time finding one that satisfied me, so you get three that didn’t.
It understands me
When I speak, and when I don’t
It undersits me
How to forget things
“Order Mom some flowers, please”
“Make this recurring?”
Share files with SpeechDrop
One phone speaks, the other hears
Now supports images!