teaching machines

Getting a List of Forks of a Bitbucket Repository

December 18, 2013 by . Filed under code, public.

A friend of mine handles the distribution and submission of homework assignments in an elegant manner:

  1. He creates one source repository on Bitbucket for an assignment, committing to it any scaffolding code that he provides to students.
  2. Each student forks his repository. The student makes the fork private, so only he and the student can see it.
  3. The students work work work.
  4. My friend clones each student’s repository at the homework deadline. Whatever was last pushed gets graded.

I think this idea is great. I’d been using a submission directory on our students’ file server. This method had its deficiencies: students worked in one directory but submitted to another, causing synchronization confusion; students’ code was locked up in university storage behind a VPN; and reliably offering read and write access to the share for both Windows and Linux was as likely to happen as us getting a new building.

This next semester I will be adopting my friend’s approach. The only thing I saw missing was a way to get a list of all the students’ repositories to make checkouts quick and painless. The following Ruby script was my solution. Given a Bitbucket user and the name of repository owned by that user, it collects the names of all the repository’s forks using Bitbucket’s API and assembles them in a simple CSV format. The script uses basic HTTP authentication to gain access to the user’s repository metadata.

#!/usr/bin/env ruby

# ---------------------------------------------------------------------------- 
# FILE:   bbforks                                                              
# AUTHOR: Chris Johnson                                                        
# DATE:   Dec 17 2013                                                          
#                                                                              
# A script for amassing the names of all forks of a Bitbucket repository and
# exporting them in CSV format.
# ---------------------------------------------------------------------------- 

require 'net/http'
require 'net/https'
require 'json'

if ARGV.length != 2
  STDERR.puts "Usage: #{$0} user repo"
  exit 1
end

user = ARGV[0]
repo_slug = ARGV[1]
nresults_per_page = 100
page_link = "https://bitbucket.org/api/2.0/repositories/#{user}/#{repo_slug}/forks?pagelen=#{nresults_per_page}"

# Get the password of owner of original repository. Do not echo password
# characters to the console.
`stty -echo`
STDERR.print 'Password: '
password = $stdin.gets.chomp
`stty echo`
STDERR.puts ''

forks = []

# Grab the pages of fork listings from Bitbucket. A single request may not grab
# all forks, as Bitbucket paginates the result. A single page has the following
# form:
#
#   {
#     values: [
#       {
#         full_name: 'forking-user/forked-repo-name',
#         is_private: <true-or-false>,
#         owner: {
#           display_name: 'Firstname Lastname',
#           ...
#         },
#         ...
#       },
#       <fork2>,
#       <fork3>,
#       ...
#     ],
#     next: <link-to-next-page-of-forks>,
#     ...
#   }
#
# If there is no next page, key next will not appear.
#
# Currently, we retain only the path to each repo and the owners' names.
while page_link
  uri = URI page_link
  Net::HTTP.start(uri.host, uri.port, :use_ssl => true, :verify_mode => OpenSSL::SSL::VERIFY_NONE) do |http|
    request = Net::HTTP::Get.new uri.request_uri
    request.basic_auth(user, password)
    response = http.request request

    body = JSON.parse response.body
    page_link = body[:next]

    body['values'].each do |fork|
      STDERR.puts "#{fork['full_name']} is not private!" if !fork['is_private']
      forks << {repo: fork['full_name'], owner: fork['owner']['display_name']}
    end
  end
end

# Emit the listing of forks in CSV.
puts "owner,repository"
forks.sort_by {|fork| :repo}.each {|fork| puts "#{fork[:owner]},#{fork[:repo]}"}