teaching machines

CS 330 Lecture 36 – Hello, Ruby and Roogle II

May 3, 2013 by . Filed under cs330, lectures, spring 2013.

Agenda

What Does This Do?

if true
  puts "boo"
else
  puts "boo" / 3
end

def double x
  x + x
end

What kind of values can you pass to double? How would you write this function in Java? C? Haskell? C++?

Dynamic vs. Duck

With dynamic typing, the interpreter goes through this thought process:

  1. Ah, I’ve got a value V here. The coder is trying to do operation O to it.
  2. Before I run this, I need to check that V-like things support operation O.
  3. V, what’s your type?
  4. T? Okay.
  5. T, do you support operation O?

With duck typing, the set up is a bit different:

  1. Ah, I’ve got a value V here. The coder is trying to do operation O to it.
  2. Before I run this, I need to check that V supports operation O.
  3. V, do you have an O method?

In the words of Alex Martelli:

In other words, don’t check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.

Code

wdtd1.rb

#!/usr/bin/env ruby

if !true
  puts "boo"
else
  puts "boo" / 3
end

wdtd2.rb

#!/usr/bin/env ruby

def double x
  x + x
end

puts double 5
puts double "foo"
puts double 5.6
puts double [1]

roogle_crawl.rb

#!/usr/bin/env ruby

require 'net/http'
require 'set'
require 'yaml'

$nlevels = 1

def get_links(prefix, html)
  links = html.scan(/(?<=href=").*?(?=")/)
  links.map do |link|
    link.downcase!
    if link !~ /^http/
      link = prefix + '/' + link
    else
      link = link.gsub(/^https?:\/\//, '')
    end
  end
end

# Extract words from HTML. Lowercase 'em too. Omit punctuation.
def get_words(html)
  html =~ /<body.*?>(.*)<\/body>/m
  body = $1
  return [] if !body
  body.gsub!(/<.*?>/, ' ')
  body.gsub!(/&.*?;/, ' ')
  body.gsub!(/'s/, 's')
  body.gsub!(/\W/, ' ')
  words = body.scan(/\w+/).map do |word|
    word.downcase
  end

  words
end

# A search engine maps query words to URLs.
$word_to_urls = Hash.new

def index_words(url, html)
  # Index all them thar words.
  words = get_words(html)
  words.each do |word|
    if not $word_to_urls.include? word
      $word_to_urls[word] = Set.new
    end
    $word_to_urls[word].add(url)
  end
end

def visit_links(prefix, html, level)
  # Recurse on all them thar links.
  links = get_links(prefix, html)
  links.each do |link|
    crawl(link, level)
  end
end

$visited = Set.new

# Add all of url's words to our index. Any words found in the url's HTML are
# associated with this URL in our index. Any links founds in the url's HTML
# are recursively added to our index.
def crawl(url, level)
  puts 'Indexing ' + url

  if $visited.include? url
    puts 'Already done.'
    return
  end
  $visited.add url

  url =~ /^(.*?)(\/.*)?$/
  host, page = $1, $2
  url =~ /^(.*)(\/)?/
  prefix = $1

  page = '/' if !page

  begin
    Net::HTTP.start(host, 80) do |http|
      response = http.get(page)
      return if response.content_type !~ /^text\//
      html = response.body
      index_words(url, html)
      visit_links(prefix, html, level + 1) if level < $nlevels
    end
  rescue
    puts url + ' failed. Skipping.'
  end
end

crawl('www.cs.uwec.edu/index.html', 0)

# Inspect our index.
$word_to_urls.each do |word, urls|
  puts "#{word} -> #{urls.to_a.join(' ')}"
end

File.open('index2.yaml', 'w') do |file|
  serialized = YAML::dump($word_to_urls, file)
end

roogle_search.rb

#!/usr/bin/env ruby

require 'yaml'
require 'set'

$word_to_urls = YAML::load(File.read(ARGV[0]))

print '? '
STDIN.each_line do |word|
  word.downcase!
  word.chomp!
  if $word_to_urls.include? word
    $word_to_urls[word].each do |url|
      puts "http://#{url}"
    end
  else
    puts "No lo encuentro."
  end
  print '? '
end

Haiku

Midgets trip giants
Then we say short is better
Till our cat gets treed