teaching machines

SENG 440: Lecture 22 – Speech Recognition

May 23, 2019 by . Filed under lectures, semester1-2019, seng440.

Dear students,

Today we will create an app called Recog that presents anagrams for the user to unscramble. But instead of typing, the user will speak the answer. We’ll use Android’s speech recognition facilities to make this happen.

Next lecture we will explore the new CameraX API that was just announced at Google I/O 2019. See the TODO below for the assigned exercises.

Consider this extra credit opportunity:

On Friday, 24 May at 1 PM, Chris Johnson will presenting a seminar entitled Computational Making in JE445. Attend and write down a response on a quarter sheet for an extra 0.5 Pakipaki.

Recog

Following are the exercises assigned last time. We will use and discuss the solutions that you’ve submitted as we assemble our app, but I include my solutions here for reference.

  1. Write extension function anagram for String. It generates a new String that is a random shuffling of the letters of the original String. Ensure that the shuffling isn’t identical to the original String. Possible solution.
    fun String.anagram(): String {
      var candidate: String
      do {
        candidate = this.toCharArray().toList().shuffled().joinToString("")
      } while (candidate == this)
      return candidate
    }
    
  2. Write extension function hasSameLetters for String that accepts another String as a parameter. It returns true if the two Strings have the same letters. For example, "these".hasSameLetters("sheet") returns true. We need this method because some anagrams may have multiple correct answers, and we want to accept any of them, not just the one we have in mind. For example, reab could unscramble to either bear or bare. Both have the same pronunciation. So it is with break and brake. Can you think of others? Possible solution.
    fun String.hasSameLetters(other: String) =
      this.toCharArray().sort() == other.toCharArray().sort()
    
  3. Define class RandomWordFetcher to extend AsyncTask and accept a MainActivity, a host string, and a key string as constructor parameters. In the background, fetch the JSON from the Words API endpoint at https://wordsapiv1.p.mashape.com/words/, with the query parameters random=true, frequencyMin=4, and letters=5. Send also the headers X-RapidAPI-Host and X-RapidAPI-Key with the specified strings. The resulting JSON will be formatted according to Words API. In the main thread, assign the randomly generated word to the word property of MainActivity. Possible solution.
    class RandomWordFetcher(
      context: MainActivity,
      private val host: String,
      private val key: String
    ) : AsyncTask<Unit, Unit, String>() {
      private val context = WeakReference(context)
    
      override fun doInBackground(vararg p0: Unit): String {
        val endpoint = "https://wordsapiv1.p.mashape.com/words/"
        val parameters = mapOf("random" to "true", "frequencyMin" to "4", "letters" to "5")
        val url = parameterizeUrl(endpoint, parameters)
        val headers = mapOf("X-RapidAPI-Host" to host, "X-RapidAPI-Key" to key)
        val json = getJson(url, headers)
        val word = json.getString("word")
        return word
      }
    
      override fun onPostExecute(word: String) {
        super.onPostExecute(word)
        context.get()?.let {
          it.word = word
        }
      }
    }
    
  4. Define method generateRandomWord in MainActivity to start up a new RandomWordFetcher task. Pass string resources with IDs words_api_host and words_api_key. Possible solution.
    private fun generateRandomWord() {
      RandomWordFetcher(this, resources.getString(R.string.words_api_host), resources.getString(R.string.words_api_key)).execute()
    }
    
  5. Define property word in MainActivity to show the word in wordLabel and listen to the user. Possible solution.
    var word: String = ""
      set(value) {
        field = value
        wordLabel.text = value.anagram()
        listen()
      }
    
  6. Define method listen to start a new speech recognition activity. Use request code 440. Possible solution.
    private fun listen() {
      // For dialog approach.
      val intent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
      intent.putExtra(RecognizerIntent.EXTRA_PROMPT, wordLabel.text)
      intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, "en-US")
      startActivityForResult(intent, 440)
    
      // For non-dialog approach.
      // recognizer.startListening(recognizeIntent)
    }
    
  7. Define method onActivityResult to respond to the results of listening. Grab the candidate utterances and check to see if any of the candidates is a correct unscrambling. Possible solution.
    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
      when (requestCode) {
        440 -> {
          if (resultCode == RESULT_OK && data != null) {
            val candidates = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS)
            checkCandidates(candidates)
          }
        }
        else -> super.onActivityResult(requestCode, resultCode, data)
      }
    }
    
  8. Define method checkCandidates to accept an ArrayList of String. If any element of the list has the same letters as the current word, generate a new word. Otherwise, pop up a toast advising the player to try again and listen to the user for another attempt. Possible solution.
    private fun checkCandidates(candidates: ArrayList<String>) {
      if (candidates.any { word.hasSameLetters(it) }) {
        generateRandomWord()
      } else {
        Toast.makeText(this, "Nope. Try again.", Toast.LENGTH_SHORT).show()
        listen()
      }
    }
    
  9. Define field recognitionListener to be an instance of RecognitionListener. If an error occurs, pop up a toast warning. If no error occurs, grab the candidate utterances and check to see if any of the candidates is a correct unscrambling. Possible solution.
    private val recognitionListener = object : RecognitionListener {
      override fun onReadyForSpeech(p0: Bundle?) {}
      override fun onRmsChanged(p0: Float) {}
      override fun onBufferReceived(p0: ByteArray?) {}
      override fun onPartialResults(p0: Bundle?) {}
      override fun onEvent(p0: Int, p1: Bundle?) {}
      override fun onBeginningOfSpeech() {}
      override fun onEndOfSpeech() {}
    
      override fun onError(p0: Int) {
        Toast.makeText(this@MainActivity, "Error. Starting over.", Toast.LENGTH_SHORT).show()
      }
    
      override fun onResults(results: Bundle) {
        results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)?.let {
          checkCandidates(it)
        }
      }
    }
    
  10. Define method initializeSansDialog to create a new SpeechRecognizer that calls back to recognitionListener. Once it’s constructured, generate a new random word. Possible solution.
    private fun initializeSansDialog() {
      recognizeIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply {
        putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM)
        putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, packageName)
        putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
      }
    
      recognizer = SpeechRecognizer.createSpeechRecognizer(this)
      recognizer.setRecognitionListener(recognitionListener)
    
      generateRandomWord()
    }
    

The full source can be found on GitHub. The master branch contains the final version, and the todo branch has the exercises incomplete.

TODO

Next lecture we will create an app called Two-Face that allows the user to take a split image on the front-facing camera. The left half and right half are taken at separate times. You can achieve some strange effects with such a camera, like a selfie in which you have two tongues, a before and after shot, or a blend of you and your sister.

The UI has 7 widgets:

The exercises don’t require much knowledge of the widgets, but here’s a quick breakdown of the flow. The preview is always active and sits in the background. The two capture buttons are visible initially. As soon as a half-picture is taken, its capture button is hidden, and the ImageView and reset button appear in its place. When reset is hit, the ImageView and reset button are hidden, and the capture button is made visible.

Consult the following resources as you complete your exercises:

We’ll start with the following code, which implements the flow described above and finagles the transformations to get the images to appear correctly:

class MainActivity : PermittedActivity() {
  private lateinit var captureUseCase: ImageCapture

  private lateinit var previewView: TextureView
  private lateinit var leftImageView: ImageView
  private lateinit var rightImageView: ImageView

  private lateinit var leftResetButton: ImageButton
  private lateinit var rightResetButton: ImageButton
  private lateinit var leftCaptureButton: ImageButton
  private lateinit var rightCaptureButton: ImageButton

  private var leftBitmap: Bitmap? = null
  private var rightBitmap: Bitmap? = null

  override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    setContentView(R.layout.activity_main)

    leftImageView = findViewById(R.id.leftImageView)
    rightImageView = findViewById(R.id.rightImageView)

    leftResetButton = findViewById(R.id.leftResetButton)
    rightResetButton = findViewById(R.id.rightResetButton)
    leftCaptureButton = findViewById(R.id.leftCaptureButton)
    rightCaptureButton = findViewById(R.id.rightCaptureButton)

    previewView = findViewById(R.id.leftView)

    leftImageView.visibility = View.INVISIBLE
    rightImageView.visibility = View.INVISIBLE
    leftResetButton.visibility = View.INVISIBLE
    rightResetButton.visibility = View.INVISIBLE

    requestPermissions(arrayOf(Manifest.permission.CAMERA, Manifest.permission.WRITE_EXTERNAL_STORAGE), 100, {
      previewView.post {
        initializeUseCases()
      }
      registerCallbacks()
    }, {
      Log.d("FOO", "Bad...")
    })

    window.decorView.systemUiVisibility =
      View.SYSTEM_UI_FLAG_IMMERSIVE_STICKY or
      View.SYSTEM_UI_FLAG_FULLSCREEN or
      View.SYSTEM_UI_FLAG_HIDE_NAVIGATION or
      View.SYSTEM_UI_FLAG_LAYOUT_STABLE or
      View.SYSTEM_UI_FLAG_LAYOUT_HIDE_NAVIGATION or
      View.SYSTEM_UI_FLAG_LAYOUT_FULLSCREEN
  }

  private fun registerCallbacks() {
    leftCaptureButton.setOnClickListener {
      takeLeftImage()
    }

    rightCaptureButton.setOnClickListener {
      takeRightImage()
    }

    leftResetButton.setOnClickListener {
      leftBitmap = null
      syncLeft()
    }

    rightResetButton.setOnClickListener {
      rightBitmap = null
      syncRight()
    }
  }

  private fun sizeToCover(frame: View, image: Bitmap): SizeF {
    val frameAspect = frame.width / frame.height.toFloat()
    val imageAspect = image.width / image.height.toFloat()

    val scaledWidth: Float
    val scaledHeight: Float

    if (frameAspect >= imageAspect) {
      scaledWidth = frame.width.toFloat()
      scaledHeight = scaledWidth / imageAspect
    } else {
      scaledHeight = frame.height.toFloat()
      scaledWidth = scaledHeight * imageAspect
    }

    return SizeF(scaledWidth, scaledHeight)
  }

  private fun coverAnchoredLeft(frame: View, image: Bitmap): Matrix {
    val scaled = sizeToCover(frame, image)
    val xform = Matrix()
    xform.postScale(-1f, 1f, image.width * 0.5f, 0f)
    xform.postScale(scaled.width / image.width.toFloat(), scaled.height / image.height.toFloat())
    return xform
  }

  private fun coverAnchoredRight(frame: View, image: Bitmap): Matrix {
    val scaled = sizeToCover(frame, image)
    val xform = Matrix()
    xform.postScale(-1f, 1f, image.width * 0.5f, 0f)
    xform.postScale(scaled.width / image.width.toFloat(), scaled.height / image.height.toFloat())
    return xform
  }

  private fun updatePreviewTransform(textureSize: Size) {
    val textureAspect = textureSize.height / textureSize.width.toFloat()

    val scaledWidth: Float
    val scaledHeight: Float

    if (previewView.width > previewView.height) {
      scaledHeight = previewView.width.toFloat()
      scaledWidth = previewView.width * textureAspect
    } else {
      scaledHeight = previewView.height.toFloat()
      scaledWidth = previewView.height * textureAspect
    }

    val centerX = previewView.width * 0.5f
    val centerY = previewView.height * 0.5f

    val xform = Matrix()
    xform.postRotate(-viewToRotation(previewView).toFloat(), centerX, centerY)
    xform.preScale(scaledWidth / previewView.width.toFloat(), scaledHeight / previewView.height.toFloat(), centerX, centerY)

    previewView.setTransform(xform)
  }

  private fun syncLeft() {
    if (leftBitmap == null) {
      leftResetButton.visibility = View.INVISIBLE
      leftImageView.visibility = View.INVISIBLE
      leftCaptureButton.visibility = View.VISIBLE
    } else {
      leftCaptureButton.visibility = View.INVISIBLE
      leftImageView.visibility = View.VISIBLE
      leftResetButton.visibility = View.VISIBLE

      leftBitmap?.let {
        leftImageView.setImageBitmap(it)
        leftImageView.imageMatrix = coverAnchoredRight(leftImageView, it)
      }
    }
  }

  private fun syncRight() {
    if (rightBitmap == null) {
      rightResetButton.visibility = View.INVISIBLE
      rightImageView.visibility = View.INVISIBLE
      rightCaptureButton.visibility = View.VISIBLE
    } else {
      rightCaptureButton.visibility = View.INVISIBLE
      rightImageView.visibility = View.VISIBLE
      rightResetButton.visibility = View.VISIBLE

      rightBitmap?.let {
        rightImageView.setImageBitmap(it)
        rightImageView.imageMatrix = coverAnchoredLeft(rightImageView, it)
      }
    }
  }
}

The exercises that you will collectively complete are listed below. Check your email for your assigned exercise and a link to submit your solution. Become an expert on your particular corner of the app, investigating background material as needed. Build on top of the activity and others’ code as you complete your exercise. No solution should be very long, but some have more overhead like curly braces and method signatures than others.

  1. Write method createPreviewUseCase that accepts parameters for the screen resolution as a Size and the screen aspect ratio as a Rational. Return a Preview for the front-facing camera, a target resolution that matches the screen resolution, a target aspect ratio that matches the screen aspect ratio, and a rotation that matches the orientation of previewView. Add a OnPreviewOutputUpdateListener to the use case that does three things:
    • Link the SurfaceTexture of previewView to the preview output.
    • Trigger updatePreviewTransform to set up the transformation of the preview. Pass the output’s texture size.
  2. Write method initializeUseCases that uses CameraX to bind preview and capture use cases. Compute the screen size and aspect ratio of previewView using the dimensions yielded by preview.display.getRealMetrics.
  3. Write method createCaptureUseCase that accepts parameters for the screen resolution as a Size and the screen aspect ratio as a Rational. Return an ImageCapture for the front-facing camera, a target aspect ratio that matches the screen aspect ratio, and a rotation that matches the orientation of previewView.
  4. Write method viewToRotation that accepts a View parameter. Examine the view’s display rotation. Return 0, 90, 180, or 270 accordingly.
  5. Write method imageProxyToBitmap that accepts an ImageProxy parameter. Return the wrapped image as a Bitmap. (You’ll find more reference code dealing with Image than ImageProxy. Luckily, they have the same interface.)
  6. Write method transformBitmap that accepts parameters for a Bitmap and a number of degrees as an Int. Return a new version of the bitmap has been rotated about its center by the specified number of degrees. Use a version of Bitmap.createBitmap that accepts a Matrix.
  7. Write method meldBitmaps that accepts parameters for a left Bitmap and a right Bitmap. Return a new bitmap in which the two images appear side-by-side. Use the ARGB_8888 pixel format.
  8. Write method saveBitmap that accepts a Bitmap parameter. It stores the Bitmap as a JPEG in the top-level external storage directory under the name twoface.jpg.
  9. Write method meldBitmapsAndSaveMaybe that melds the two bitmaps and saves the combined bitmap—but only if both leftBitmap and rightBitmap are valid.
  10. Write method takeLeftImage that triggers the captureUseCase to take a picture in-memory. On a successful capture, convert the captured image to a bitmap, close the image, transform it according to the rotation, and crop it to contain only the left half of the image. Update the leftBitmap field, try melding the bitmaps and saving the result, then sync the left half of the UI.
  11. Write method takeRightImage that triggers the captureUseCase to take a picture in-memory. On a successful capture, convert the captured image to a bitmap, close the image, transform it according to the rotation, and crop it to contain only the right half of the image. Update the rightBitmap field, try melding the bitmaps and saving the result, then sync the right half of the UI.

See you next time!

Sincerely,

P.S. It’s time for a haiku! I had a hard time finding one that satisfied me, so you get three that didn’t.

It understands me
When I speak, and when I don’t
It undersits me

How to forget things
“Order Mom some flowers, please”
“Make this recurring?”

Share files with SpeechDrop
One phone speaks, the other hears
Now supports images!