teaching machines

QuickHN, a quicker Hacker News

December 20, 2011 by . Filed under cs491 mobile, fall 2011, postmortems.

The application I built is named “QuickHN”. It’s an entire rebuild x10 of my homework3 assignment.

I am happy to say that it works as good as I wished it to – though there are always more features I can add.

My app pulls information from news.ycombinator.com (hacker news) and displays it succinctly.  It doesn’t seem too hard but I didn’t want  HTML scraping and parsing and storing on the phone, and I wanted a LOT of data.  My original plan was to scrape 3,000 links from that website every 10 minutes and store it in a database somewhere (??).   I had the idea of using the CS345TEAM5 database from my database class, and I got Professor Morrison’s approval on that… the only bad thing is my data will be wiped at the end of the semester.

 

So, I learned PhP a bit for this project (had a little bit for a DB assignment, now a little bit here).  When testing my big scraping file, I was blocked from the website.  I realized that the website blocks any ip that pings them too much… maybe they thoughtI was DoSing them, so I changed my plan to scrape 300 links every hour… lighten up the pinging.  Also, when scraping, I would scrape one page a minute for 10 minutes.

The website still blocked me.  I thought I would have to abandon all hope and choose a different website, but, when scraping 5 pages from their site every hour they never block me.  My data isn’t as ambitious as I originally hoped but it’s probably better to not transfer as much data to a dinky phone, so hey, it works out.

So my app hooks up to an on campus website that I had access to yet from the DB homework, pings a script I wrote that pings the database to return data in JSON format. Awesome!

Then I noticed that the database was storing some crazy values – Euro signs where they shouldn’t be (i.e. at all) and other weird characters.  Apparently, hacker news uses UTF-16 encoding, whereas the php command that scrapes a website translates everything to UTF-8, which broke some characters.  I dropped the database a few times when it had little data and changed my script to parse and replace the misc. characters before insertion.  As I noticed more later on in my app I would just search the DB and replace with a select statement.

…and lastly I noticed that when the php script uses json_encode, it also messes up some characters.  I don’t know if, if I didn’t change the database, it would take the cryptic letters and put them back to normal, but I think that may be the case and I originally wasted my time ‘fixing’ everything.

I started the app with the intent to get a graph in there somehow and I found some android graphing API’s online.  I decided on using afreechart.jar (http://code.google.com/p/afreechart/) because that package has the best looking graphs for free.

There is little to no documentation on it, and figuring out how to even get a graph working was ridiculous.  I eventually found some code that I broke down and down until it had just a few lines to put 3 points into a graph… and I learned.  Awesome, again!

Back to my app; now I have an activity that has a bunch of links on it, and an activity based off of that that displays a lot of information and a graph about the ebb and flow of the link on hacker news.   This is most of what I wanted.

There are a few features that I didn’t implement that I don’t know if I really want to anymore (because it would be distracting): tracking which links are new, tracking what links changed the most in points since last checked, and having more graph features.

The graph features are worth looking into, the other two, not so much.

The last thing I implemented into the project because I thought it would be cool is the ability to search for old links.  This, again, pings a php script I wrote.

I decided to write this in both Android 4.0, which was the only android sdk platform I had when I started, and in Android 2.3.3, so that Chris could test the code on an actual device, because I love this app and it’s freaking cool.  Android 4.0.0 looks amazingly better, though.

My app requires the internet so I check for it in every activity besides the home one – the user isn’t going to get internet data without an internet.  I also handle orientation changes: in most activities I allow rotation and keep the data, however, in my activity that holds the graph I do not allow rotation (graph wouldn’t look good).

Also, I designed the icon for this, in low medium and high resolution, and it looks really cool.

Everybody seems to be posting code but I didn’t really find anything on the Java side the most complicated… so here’s some of the php scrape script that gave me troubles for 2 weeks:

 

while($i < 5 && $i > -1) {
	$startParse = 0;
	$endParse = 0;
	$ups;
	$nextStartParse = -1;
	echo "scraping...";
	$source = file_get_contents($baseurl . $link);
	echo " parsing...\n";
	
	while($nextStartParse !== false) {
																							
		$startParse = strPos($source, $searchForURL, $startParse) + strlen($searchForURL);	
		$endParse = strPos($source, '"', $startParse);
		$parseLength = $endParse - $startParse;
		$link = substr($source, $startParse, $parseLength);

		$nextStartParse = strPos($source, $searchForURL, $startParse + 1);
		if ($nextStartParse === false) {
			if ($link{0} !== '/' && substr($link, 0, 4) !== 'http') {
				$link = '/' . $link;
			} else if (substr($link, 0, 4) === 'http'){
				$i = -100;
			}
			echo $link;
			break;
		}
		
		$k++;
		echo " link number " . $k . " ";
	
		
		if (substr($link, 0, 4) === 'item') {
			$link = $baseurl . '/' . $link;
		}

		$startParse = strPos($source, '>', $endParse + 1) + 1; 
		$endParse = strPos($source, '<', $startParse);
		$parseLength = $endParse - $startParse;
		$title = substr($source, $startParse, $parseLength);
		$title = str_replace('“','"', $title);
		$title = str_replace('’','\'', $title);
		$title = str_replace('â€','"', $title);
		$title = str_replace('\'¦', '...', $title);
		$title = str_replace('"¦', '...', $title);
		$title = str_replace('"“', '-', $title);
		$title = str_replace('', '', $title);
		$title = str_replace('"˜', '\'', $title);
		$title = str_replace('"”', '-', $title);
		
		$origStartParse = $startParse;
		
		$startParse = strPos($source, $searchForComhead, $startParse) + strlen($searchForComhead);
		$endParse = strPos($source, ')', $startParse);
		$parseLength = $endParse - $startParse;
		$abvUrl = substr($source, $startParse, $parseLength);
		if ($startParse === strlen($searchForComhead) || $startParse === false || $startParse > $nextStartParse) {
			$abvUrl = null;
		}
		
		$startParse = $origStartParse;
		$endParse = $origStartParse + 1;
		
		$startParse = strPos($source, 'id=score_', $endParse + 1);
		if ($startParse !== false) {
			$startParse = strPos($source, '>', $startParse + 1) + 1; 
			$endParse = $startParse + 1;	                           
			while(is_numeric($source{$endParse})) {
				$endParse++;
			}
		} else {
			$endParse = false;
		}
		
		$parseLength = $endParse - $startParse;
		

		$hopefulUps = substr($source, $startParse, $parseLength); //may be empty.
		$ups = null; //even though this is converted to a string, I want to make sure it doesn't containt characters yet

I had so much trouble getting that working completely, but it’s awesome now.  I’ve had it set up hourly on a scheduler on my computer running nonstop, which means I can’t turn off my computer.

One glaring thing I could clean up is fix how many times the user needs to ping my database.  I have the person pinging it every time they search for links even though I only update my links every hour… this should be fixed.  Otherwise, I tried to code this as securely as I could.

And finally, one cool ‘trick’ I did is that, when designing my Search activity, I have it spawn my normal link-displaying Activity and just reuse all the code in it.  It saved me from having to specifically write an entire new complicated ListActivity to do roughly the same exact thing.  Less of a trick, though, and more of foresight.

Was fun, yo.