CS 330 Lecture 36 – Hello, Ruby and Roogle II
Agenda
- what ?s
- what does this do?
- Ruby’s classification
- dynamic typing and duck typing
- finishing Roogle
What Does This Do?
if true
puts "boo"
else
puts "boo" / 3
end
def double x
x + x
end
What kind of values can you pass to double? How would you write this function in Java? C? Haskell? C++?
Dynamic vs. Duck
With dynamic typing, the interpreter goes through this thought process:
- Ah, I’ve got a value V here. The coder is trying to do operation O to it.
- Before I run this, I need to check that V-like things support operation O.
- V, what’s your type?
- T? Okay.
- T, do you support operation O?
With duck typing, the set up is a bit different:
- Ah, I’ve got a value V here. The coder is trying to do operation O to it.
- Before I run this, I need to check that V supports operation O.
- V, do you have an O method?
In the words of Alex Martelli:
In other words, don’t check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.
Code
wdtd1.rb
#!/usr/bin/env ruby
if !true
puts "boo"
else
puts "boo" / 3
end
wdtd2.rb
#!/usr/bin/env ruby
def double x
x + x
end
puts double 5
puts double "foo"
puts double 5.6
puts double [1]
roogle_crawl.rb
#!/usr/bin/env ruby
require 'net/http'
require 'set'
require 'yaml'
$nlevels = 1
def get_links(prefix, html)
links = html.scan(/(?<=href=").*?(?=")/)
links.map do |link|
link.downcase!
if link !~ /^http/
link = prefix + '/' + link
else
link = link.gsub(/^https?:\/\//, '')
end
end
end
# Extract words from HTML. Lowercase 'em too. Omit punctuation.
def get_words(html)
html =~ /<body.*?>(.*)<\/body>/m
body = $1
return [] if !body
body.gsub!(/<.*?>/, ' ')
body.gsub!(/&.*?;/, ' ')
body.gsub!(/'s/, 's')
body.gsub!(/\W/, ' ')
words = body.scan(/\w+/).map do |word|
word.downcase
end
words
end
# A search engine maps query words to URLs.
$word_to_urls = Hash.new
def index_words(url, html)
# Index all them thar words.
words = get_words(html)
words.each do |word|
if not $word_to_urls.include? word
$word_to_urls[word] = Set.new
end
$word_to_urls[word].add(url)
end
end
def visit_links(prefix, html, level)
# Recurse on all them thar links.
links = get_links(prefix, html)
links.each do |link|
crawl(link, level)
end
end
$visited = Set.new
# Add all of url's words to our index. Any words found in the url's HTML are
# associated with this URL in our index. Any links founds in the url's HTML
# are recursively added to our index.
def crawl(url, level)
puts 'Indexing ' + url
if $visited.include? url
puts 'Already done.'
return
end
$visited.add url
url =~ /^(.*?)(\/.*)?$/
host, page = $1, $2
url =~ /^(.*)(\/)?/
prefix = $1
page = '/' if !page
begin
Net::HTTP.start(host, 80) do |http|
response = http.get(page)
return if response.content_type !~ /^text\//
html = response.body
index_words(url, html)
visit_links(prefix, html, level + 1) if level < $nlevels
end
rescue
puts url + ' failed. Skipping.'
end
end
crawl('www.cs.uwec.edu/index.html', 0)
# Inspect our index.
$word_to_urls.each do |word, urls|
puts "#{word} -> #{urls.to_a.join(' ')}"
end
File.open('index2.yaml', 'w') do |file|
serialized = YAML::dump($word_to_urls, file)
end
roogle_search.rb
#!/usr/bin/env ruby
require 'yaml'
require 'set'
$word_to_urls = YAML::load(File.read(ARGV[0]))
print '? '
STDIN.each_line do |word|
word.downcase!
word.chomp!
if $word_to_urls.include? word
$word_to_urls[word].each do |url|
puts "http://#{url}"
end
else
puts "No lo encuentro."
end
print '? '
end
Haiku
Midgets trip giants
Then we say short is better
Till our cat gets treed
Then we say short is better
Till our cat gets treed