» CS 330 Lecture 36 – Hello, Ruby and Roogle II

CS 330 Lecture 36 – Hello, Ruby and Roogle II

May 3, 2013 by Chris Johnson. Filed under cs330, lectures, spring 2013.

Agenda

what ?s
what does this do?
Ruby’s classification
dynamic typing and duck typing
finishing Roogle

What Does This Do?

if true
  puts "boo"
else
  puts "boo" / 3
end

def double x
  x + x
end

What kind of values can you pass to double? How would you write this function in Java? C? Haskell? C++?

Dynamic vs. Duck

With dynamic typing, the interpreter goes through this thought process:

Ah, I’ve got a value V here. The coder is trying to do operation O to it.
Before I run this, I need to check that V-like things support operation O.
V, what’s your type?
T? Okay.
T, do you support operation O?

With duck typing, the set up is a bit different:

Ah, I’ve got a value V here. The coder is trying to do operation O to it.
Before I run this, I need to check that V supports operation O.
V, do you have an O method?

In the words of Alex Martelli:

In other words, don’t check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.

Code

wdtd1.rb

#!/usr/bin/env ruby

if !true
  puts "boo"
else
  puts "boo" / 3
end

wdtd2.rb

#!/usr/bin/env ruby

def double x
  x + x
end

puts double 5
puts double "foo"
puts double 5.6
puts double [1]

roogle_crawl.rb

#!/usr/bin/env ruby

require 'net/http'
require 'set'
require 'yaml'

$nlevels = 1

def get_links(prefix, html)
  links = html.scan(/(?<=href=").*?(?=")/)
  links.map do |link|
    link.downcase!
    if link !~ /^http/
      link = prefix + '/' + link
    else
      link = link.gsub(/^https?:\/\//, '')
    end
  end
end

# Extract words from HTML. Lowercase 'em too. Omit punctuation.
def get_words(html)
  html =~ /<body.*?>(.*)<\/body>/m
  body = $1
  return [] if !body
  body.gsub!(/<.*?>/, ' ')
  body.gsub!(/&.*?;/, ' ')
  body.gsub!(/'s/, 's')
  body.gsub!(/\W/, ' ')
  words = body.scan(/\w+/).map do |word|
    word.downcase
  end

  words
end

# A search engine maps query words to URLs.
$word_to_urls = Hash.new

def index_words(url, html)
  # Index all them thar words.
  words = get_words(html)
  words.each do |word|
    if not $word_to_urls.include? word
      $word_to_urls[word] = Set.new
    end
    $word_to_urls[word].add(url)
  end
end

def visit_links(prefix, html, level)
  # Recurse on all them thar links.
  links = get_links(prefix, html)
  links.each do |link|
    crawl(link, level)
  end
end

$visited = Set.new

# Add all of url's words to our index. Any words found in the url's HTML are
# associated with this URL in our index. Any links founds in the url's HTML
# are recursively added to our index.
def crawl(url, level)
  puts 'Indexing ' + url

  if $visited.include? url
    puts 'Already done.'
    return
  end
  $visited.add url

  url =~ /^(.*?)(\/.*)?$/
  host, page = $1, $2
  url =~ /^(.*)(\/)?/
  prefix = $1

  page = '/' if !page

  begin
    Net::HTTP.start(host, 80) do |http|
      response = http.get(page)
      return if response.content_type !~ /^text\//
      html = response.body
      index_words(url, html)
      visit_links(prefix, html, level + 1) if level < $nlevels
    end
  rescue
    puts url + ' failed. Skipping.'
  end
end

crawl('www.cs.uwec.edu/index.html', 0)

# Inspect our index.
$word_to_urls.each do |word, urls|
  puts "#{word} -> #{urls.to_a.join(' ')}"
end

File.open('index2.yaml', 'w') do |file|
  serialized = YAML::dump($word_to_urls, file)
end

roogle_search.rb

#!/usr/bin/env ruby

require 'yaml'
require 'set'

$word_to_urls = YAML::load(File.read(ARGV[0]))

print '? '
STDIN.each_line do |word|
  word.downcase!
  word.chomp!
  if $word_to_urls.include? word
    $word_to_urls[word].each do |url|
      puts "http://#{url}"
    end
  else
    puts "No lo encuentro."
  end
  print '? '
end

Haiku

Midgets trip giants
Then we say short is better
Till our cat gets treed