SENG 440: Lecture 22 – Speech Recognition
Dear students,
Today we will create an app called Recog that presents anagrams for the user to unscramble. But instead of typing, the user will speak the answer. We’ll use Android’s speech recognition facilities to make this happen.
Next lecture we will explore the new CameraX API that was just announced at Google I/O 2019. See the TODO below for the assigned exercises.
Consider this extra credit opportunity:
On Friday, 24 May at 1 PM, Chris Johnson will presenting a seminar entitled Computational Making in JE445. Attend and write down a response on a quarter sheet for an extra 0.5 Pakipaki.
Recog
Following are the exercises assigned last time. We will use and discuss the solutions that you’ve submitted as we assemble our app, but I include my solutions here for reference.
- Write extension function
anagram
forString
. It generates a newString
that is a random shuffling of the letters of the originalString
. Ensure that the shuffling isn’t identical to the originalString
. Possible solution.fun String.anagram(): String { var candidate: String do { candidate = this.toCharArray().toList().shuffled().joinToString("") } while (candidate == this) return candidate }
- Write extension function
hasSameLetters
forString
that accepts anotherString
as a parameter. It returns true if the twoString
s have the same letters. For example,"these".hasSameLetters("sheet")
returns true. We need this method because some anagrams may have multiple correct answers, and we want to accept any of them, not just the one we have in mind. For example, reab could unscramble to either bear or bare. Both have the same pronunciation. So it is with break and brake. Can you think of others? Possible solution.fun String.hasSameLetters(other: String) = this.toCharArray().sort() == other.toCharArray().sort()
- Define class
RandomWordFetcher
to extendAsyncTask
and accept aMainActivity
, a host string, and a key string as constructor parameters. In the background, fetch the JSON from the Words API endpoint athttps://wordsapiv1.p.mashape.com/words/
, with the query parametersrandom=true
,frequencyMin=4
, andletters=5
. Send also the headersX-RapidAPI-Host
andX-RapidAPI-Key
with the specified strings. The resulting JSON will be formatted according to Words API. In the main thread, assign the randomly generated word to theword
property ofMainActivity
. Possible solution.class RandomWordFetcher( context: MainActivity, private val host: String, private val key: String ) : AsyncTask<Unit, Unit, String>() { private val context = WeakReference(context) override fun doInBackground(vararg p0: Unit): String { val endpoint = "https://wordsapiv1.p.mashape.com/words/" val parameters = mapOf("random" to "true", "frequencyMin" to "4", "letters" to "5") val url = parameterizeUrl(endpoint, parameters) val headers = mapOf("X-RapidAPI-Host" to host, "X-RapidAPI-Key" to key) val json = getJson(url, headers) val word = json.getString("word") return word } override fun onPostExecute(word: String) { super.onPostExecute(word) context.get()?.let { it.word = word } } }
- Define method
generateRandomWord
inMainActivity
to start up a newRandomWordFetcher
task. Pass string resources with IDswords_api_host
andwords_api_key
. Possible solution.private fun generateRandomWord() { RandomWordFetcher(this, resources.getString(R.string.words_api_host), resources.getString(R.string.words_api_key)).execute() }
- Define property
word
inMainActivity
to show the word inwordLabel
andlisten
to the user. Possible solution.var word: String = "" set(value) { field = value wordLabel.text = value.anagram() listen() }
- Define method
listen
to start a new speech recognition activity. Use request code 440. Possible solution.private fun listen() { // For dialog approach. val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH) intent.putExtra(RecognizerIntent.EXTRA_PROMPT, wordLabel.text) intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, "en-US") startActivityForResult(intent, 440) // For non-dialog approach. // recognizer.startListening(recognizeIntent) }
- Define method
onActivityResult
to respond to the results of listening. Grab the candidate utterances and check to see if any of the candidates is a correct unscrambling. Possible solution.override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) { when (requestCode) { 440 -> { if (resultCode == RESULT_OK && data != null) { val candidates = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS) checkCandidates(candidates) } } else -> super.onActivityResult(requestCode, resultCode, data) } }
- Define method
checkCandidates
to accept anArrayList
ofString
. If any element of the list has the same letters as the current word, generate a new word. Otherwise, pop up a toast advising the player to try again and listen to the user for another attempt. Possible solution.private fun checkCandidates(candidates: ArrayList<String>) { if (candidates.any { word.hasSameLetters(it) }) { generateRandomWord() } else { Toast.makeText(this, "Nope. Try again.", Toast.LENGTH_SHORT).show() listen() } }
- Define field
recognitionListener
to be an instance ofRecognitionListener
. If an error occurs, pop up a toast warning. If no error occurs, grab the candidate utterances and check to see if any of the candidates is a correct unscrambling. Possible solution.private val recognitionListener = object : RecognitionListener { override fun onReadyForSpeech(p0: Bundle?) {} override fun onRmsChanged(p0: Float) {} override fun onBufferReceived(p0: ByteArray?) {} override fun onPartialResults(p0: Bundle?) {} override fun onEvent(p0: Int, p1: Bundle?) {} override fun onBeginningOfSpeech() {} override fun onEndOfSpeech() {} override fun onError(p0: Int) { Toast.makeText(this@MainActivity, "Error. Starting over.", Toast.LENGTH_SHORT).show() } override fun onResults(results: Bundle) { results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let { checkCandidates(it) } } }
- Define method
initializeSansDialog
to create a newSpeechRecognizer
that calls back torecognitionListener
. Once it’s constructured, generate a new random word. Possible solution.private fun initializeSansDialog() { recognizeIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply { putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM) putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, packageName) putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true) } recognizer = SpeechRecognizer.createSpeechRecognizer(this) recognizer.setRecognitionListener(recognitionListener) generateRandomWord() }
The full source can be found on GitHub. The master
branch contains the final version, and the todo
branch has the exercises incomplete.
TODO
Next lecture we will create an app called Two-Face that allows the user to take a split image on the front-facing camera. The left half and right half are taken at separate times. You can achieve some strange effects with such a camera, like a selfie in which you have two tongues, a before and after shot, or a blend of you and your sister.
The UI has 7 widgets:
- a
TextureView
namedpreviewView
that shows the camera preview - two capture buttons to take a picture and retain just half of it
- two
ImageView
s that show the captured half-picture - two reset buttons that dispose of a previously taken half-picture
The exercises don’t require much knowledge of the widgets, but here’s a quick breakdown of the flow. The preview is always active and sits in the background. The two capture buttons are visible initially. As soon as a half-picture is taken, its capture button is hidden, and the ImageView
and reset button appear in its place. When reset is hit, the ImageView
and reset button are hidden, and the capture button is made visible.
Consult the following resources as you complete your exercises:
We’ll start with the following code, which implements the flow described above and finagles the transformations to get the images to appear correctly:
class MainActivity : PermittedActivity() {
private lateinit var captureUseCase: ImageCapture
private lateinit var previewView: TextureView
private lateinit var leftImageView: ImageView
private lateinit var rightImageView: ImageView
private lateinit var leftResetButton: ImageButton
private lateinit var rightResetButton: ImageButton
private lateinit var leftCaptureButton: ImageButton
private lateinit var rightCaptureButton: ImageButton
private var leftBitmap: Bitmap? = null
private var rightBitmap: Bitmap? = null
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
leftImageView = findViewById(R.id.leftImageView)
rightImageView = findViewById(R.id.rightImageView)
leftResetButton = findViewById(R.id.leftResetButton)
rightResetButton = findViewById(R.id.rightResetButton)
leftCaptureButton = findViewById(R.id.leftCaptureButton)
rightCaptureButton = findViewById(R.id.rightCaptureButton)
previewView = findViewById(R.id.leftView)
leftImageView.visibility = View.INVISIBLE
rightImageView.visibility = View.INVISIBLE
leftResetButton.visibility = View.INVISIBLE
rightResetButton.visibility = View.INVISIBLE
requestPermissions(arrayOf(Manifest.permission.CAMERA, Manifest.permission.WRITE_EXTERNAL_STORAGE), 100, {
previewView.post {
initializeUseCases()
}
registerCallbacks()
}, {
Log.d("FOO", "Bad...")
})
window.decorView.systemUiVisibility =
View.SYSTEM_UI_FLAG_IMMERSIVE_STICKY or
View.SYSTEM_UI_FLAG_FULLSCREEN or
View.SYSTEM_UI_FLAG_HIDE_NAVIGATION or
View.SYSTEM_UI_FLAG_LAYOUT_STABLE or
View.SYSTEM_UI_FLAG_LAYOUT_HIDE_NAVIGATION or
View.SYSTEM_UI_FLAG_LAYOUT_FULLSCREEN
}
private fun registerCallbacks() {
leftCaptureButton.setOnClickListener {
takeLeftImage()
}
rightCaptureButton.setOnClickListener {
takeRightImage()
}
leftResetButton.setOnClickListener {
leftBitmap = null
syncLeft()
}
rightResetButton.setOnClickListener {
rightBitmap = null
syncRight()
}
}
private fun sizeToCover(frame: View, image: Bitmap): SizeF {
val frameAspect = frame.width / frame.height.toFloat()
val imageAspect = image.width / image.height.toFloat()
val scaledWidth: Float
val scaledHeight: Float
if (frameAspect >= imageAspect) {
scaledWidth = frame.width.toFloat()
scaledHeight = scaledWidth / imageAspect
} else {
scaledHeight = frame.height.toFloat()
scaledWidth = scaledHeight * imageAspect
}
return SizeF(scaledWidth, scaledHeight)
}
private fun coverAnchoredLeft(frame: View, image: Bitmap): Matrix {
val scaled = sizeToCover(frame, image)
val xform = Matrix()
xform.postScale(-1f, 1f, image.width * 0.5f, 0f)
xform.postScale(scaled.width / image.width.toFloat(), scaled.height / image.height.toFloat())
return xform
}
private fun coverAnchoredRight(frame: View, image: Bitmap): Matrix {
val scaled = sizeToCover(frame, image)
val xform = Matrix()
xform.postScale(-1f, 1f, image.width * 0.5f, 0f)
xform.postScale(scaled.width / image.width.toFloat(), scaled.height / image.height.toFloat())
return xform
}
private fun updatePreviewTransform(textureSize: Size) {
val textureAspect = textureSize.height / textureSize.width.toFloat()
val scaledWidth: Float
val scaledHeight: Float
if (previewView.width > previewView.height) {
scaledHeight = previewView.width.toFloat()
scaledWidth = previewView.width * textureAspect
} else {
scaledHeight = previewView.height.toFloat()
scaledWidth = previewView.height * textureAspect
}
val centerX = previewView.width * 0.5f
val centerY = previewView.height * 0.5f
val xform = Matrix()
xform.postRotate(-viewToRotation(previewView).toFloat(), centerX, centerY)
xform.preScale(scaledWidth / previewView.width.toFloat(), scaledHeight / previewView.height.toFloat(), centerX, centerY)
previewView.setTransform(xform)
}
private fun syncLeft() {
if (leftBitmap == null) {
leftResetButton.visibility = View.INVISIBLE
leftImageView.visibility = View.INVISIBLE
leftCaptureButton.visibility = View.VISIBLE
} else {
leftCaptureButton.visibility = View.INVISIBLE
leftImageView.visibility = View.VISIBLE
leftResetButton.visibility = View.VISIBLE
leftBitmap?.let {
leftImageView.setImageBitmap(it)
leftImageView.imageMatrix = coverAnchoredRight(leftImageView, it)
}
}
}
private fun syncRight() {
if (rightBitmap == null) {
rightResetButton.visibility = View.INVISIBLE
rightImageView.visibility = View.INVISIBLE
rightCaptureButton.visibility = View.VISIBLE
} else {
rightCaptureButton.visibility = View.INVISIBLE
rightImageView.visibility = View.VISIBLE
rightResetButton.visibility = View.VISIBLE
rightBitmap?.let {
rightImageView.setImageBitmap(it)
rightImageView.imageMatrix = coverAnchoredLeft(rightImageView, it)
}
}
}
}
The exercises that you will collectively complete are listed below. Check your email for your assigned exercise and a link to submit your solution. Become an expert on your particular corner of the app, investigating background material as needed. Build on top of the activity and others’ code as you complete your exercise. No solution should be very long, but some have more overhead like curly braces and method signatures than others.
- Write method
createPreviewUseCase
that accepts parameters for the screen resolution as aSize
and the screen aspect ratio as aRational
. Return aPreview
for the front-facing camera, a target resolution that matches the screen resolution, a target aspect ratio that matches the screen aspect ratio, and a rotation that matches the orientation ofpreviewView
. Add aOnPreviewOutputUpdateListener
to the use case that does three things:- Link the
SurfaceTexture
ofpreviewView
to the preview output. - Trigger
updatePreviewTransform
to set up the transformation of the preview. Pass the output’s texture size.
- Link the
- Write method
initializeUseCases
that uses CameraX to bind preview and capture use cases. Compute the screen size and aspect ratio ofpreviewView
using the dimensions yielded bypreview.display.getRealMetrics
. - Write method
createCaptureUseCase
that accepts parameters for the screen resolution as aSize
and the screen aspect ratio as aRational
. Return anImageCapture
for the front-facing camera, a target aspect ratio that matches the screen aspect ratio, and a rotation that matches the orientation ofpreviewView
. - Write method
viewToRotation
that accepts aView
parameter. Examine the view’s display rotation. Return 0, 90, 180, or 270 accordingly. - Write method
imageProxyToBitmap
that accepts anImageProxy
parameter. Return the wrapped image as aBitmap
. (You’ll find more reference code dealing withImage
thanImageProxy
. Luckily, they have the same interface.) - Write method
transformBitmap
that accepts parameters for aBitmap
and a number of degrees as anInt
. Return a new version of the bitmap has been rotated about its center by the specified number of degrees. Use a version ofBitmap.createBitmap
that accepts aMatrix
. - Write method
meldBitmaps
that accepts parameters for a leftBitmap
and a rightBitmap
. Return a new bitmap in which the two images appear side-by-side. Use theARGB_8888
pixel format. - Write method
saveBitmap
that accepts aBitmap
parameter. It stores theBitmap
as a JPEG in the top-level external storage directory under the nametwoface.jpg
. - Write method
meldBitmapsAndSaveMaybe
that melds the two bitmaps and saves the combined bitmap—but only if bothleftBitmap
andrightBitmap
are valid. - Write method
takeLeftImage
that triggers thecaptureUseCase
to take a picture in-memory. On a successful capture, convert the captured image to a bitmap, close the image, transform it according to the rotation, and crop it to contain only the left half of the image. Update theleftBitmap
field, try melding the bitmaps and saving the result, then sync the left half of the UI. - Write method
takeRightImage
that triggers thecaptureUseCase
to take a picture in-memory. On a successful capture, convert the captured image to a bitmap, close the image, transform it according to the rotation, and crop it to contain only the right half of the image. Update therightBitmap
field, try melding the bitmaps and saving the result, then sync the right half of the UI.
See you next time!
P.S. It’s time for a haiku! I had a hard time finding one that satisfied me, so you get three that didn’t.
It understands me
When I speak, and when I don’t
It undersits me
How to forget things
“Order Mom some flowers, please”
“Make this recurring?”
Share files with SpeechDrop
One phone speaks, the other hears
Now supports images!