Using spareroom efficiently

May 5, 2011 at 1:24 am Leave a comment

I find searching sites like gumtree, craigslist or spareroom irritating, the main problem I get is that I end up wasting time clicking through results that I’ve seen before and generally browsing around. This is partly due to my ineffectualness, but also do with the layout of the site. Things become particularly annoying if you visit the site frequently.

What I really want is a means to consider each listing once and only once – some sites might be quite good at doing this – by giving you e-mail updates and such like – but I couldn’t easily see a way of making this work.

Dirty hacks to the rescue:

I wrote the following scraper which will perform a search for me and print out a list of urls

# Write-once code - do not pass judgment
import urllib
from lxml.etree import HTML
URL = "http://www.spareroom.co.uk/flatshare/search.pl?searchtype=advanced&flatshare_type=offered&location_type=area&search=&miles_from_max=0&showme_rooms=Y&showme_buddyup_properties=Y&min_rent=&max_rent=&per=pw&rooms_for=&no_of_rooms=&available_search=N&day_avail=&mon_avail=&year_avail=&min_age_req=&max_age_req=&min_beds=&max_beds=&keyword=&nmsq_mode=%21nmsq_mode%21&action=search&templateoveride=&x=149&y=13"

def mangle_urls(urls):
    new_matches = list(set([url.split('&', 1)[0] for url in urls]))
    return new_matches

stream = urllib.urlopen(URL)
new_url = stream.url
page = stream.read()
tree = HTML(page)
matches = mangle_urls(tree.xpath('//a[contains("More info", text())]/@href'))
matches_so_far = set(matches)
for offset in range(10, 300, 10):
    url = new_url + 'offset=%d' % offset
    page = urllib.urlopen(url).read()
    tree = HTML(page)
    new_matches = mangle_urls(tree.xpath('//a[contains("More info", text())]/@href'))
    new_matches = set(new_matches) - matches_so_far
    matches_so_far |= set(new_matches)
    matches += list(new_matches)
matches = ['http://www.spareroom.co.uk' + url for url in matches if not url.startswith('http')]
print '\n'.join(matches)

I then took the output saved it to ‘new_rooms’. With vim and some macro magic, I then went through links one at a time, considering them, possibly sending replies, and then moving the links to and ‘already_considered’ file. Using command line magic I was then able to ensure that the next time I got a list of links I could remove the links I’d already considered.

Not quite sure if this quite justified the time invested, but it’s coming pretty close. If this script still works when you read this post it could just be worth your time using the script (since you don’t have to write it.)

Kind regards,
Anon

Advertisements

Entry filed under: Uncategorized.

Templating in make: parameterised targets Wifi tethering out of the box on gingerbread without root

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


May 2011
M T W T F S S
« Apr   Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

%d bloggers like this: