Opinions on shortened URLs are a dime a dozen these days, but the basic facts are:
- They’re awfully convenient for passing around (and this was true even before Twitter came about)
- They are, by nature, short-lived (either the services or the URLs)
- You should never rely on their being around later on
So basically you have absolutely no excuse to not be able to handle them. I decided to mess around with the concept a few weeks back to see how simple I could make it all work, and came up with a couple of useful Python classes that I can share with the world:
Creating short URLs
The trouble with creating short URLs is that there are entirely too many shortening services, and far too many variations on APIs – in fact, nearly all of them suffer from “not invented here” syndrome and try to “enhance” their APIs to give you a lot of stuff that you basically don’t (ever) need, and wrap their results in JSON or XML
Me, I refuse to put up with that kind of crap.
So I poked around a bit, found the simplest services to work against and created the following class, which will try all its known services in turn until it gives you a working URL:
import urllib, urllib2, urlparse, httplib
BITLY_AUTH = 'login=foo&apiKey=bar'
class URLShortener:
services = {
'api.bit.ly':
"http://api.bit.ly/shorten?version=2.0.1&%s&format=text&longUrl=" % BITLY_AUTH,
'api.tr.im': '/api/trim_simple?url=',
'tinyurl.com': '/api-create.php?url=',
'is.gd': '/api.php?longurl='
}
def query(self, url):
for shortener in self.services.keys():
c = httplib.HTTPConnection(shortener)
c.request("GET", self.services[shortener] + urllib.quote(url))
r = c.getresponse()
shorturl = r.read().strip()
if ("Error" not in shorturl) and ("http://" + urlparse.urlparse(shortener)[1] in shorturl):
return shorturl
else:
continue
raise IOError
Yes, the error handling is naïve – any network exceptions and stuff ought to be caught upstream from this – but it works fine so far.
Expanding short URLs
This is the really fun bit, because it is not immediately obvious whether or not a short URL will actually be immediately useful – there are plenty of times when you’ll actually be redirected to something else, and while fooling around with the Google Reader API (something I’ll eventually write about alter), I found that also applied (in spades) to Feedburner links and whatnot.
So I decided to build some smarts into the process and have it not only ping some known hosts twice, but also turn it into a link checker of sorts, and learning which hosts were actually redirecting to other places:
import urllib, urllib2, urlparse, httplib
class URLExpander:
# known shortening services
shorteners = ['tr.im','is.gd','tinyurl.com','bit.ly','snipurl.com','cli.gs',
'feedproxy.google.com','feeds.arstechnica.com']
twofers = [u'\u272Adf.ws']
# learned hosts
learned = []
def resolve(self, url, components):
""" Try to resolve a single URL """
c = httplib.HTTPConnection(components.netloc)
c.request("GET", components.path)
r = c.getresponse()
l = r.getheader('Location')
if l == None:
return url # it might be impossible to resolve, so best leave it as is
else:
return l
def query(self, url, recurse = True):
""" Resolve a URL """
components = urlparse.urlparse(url)
# Check weird shortening services first
if (components.netloc in self.twofers) and recurse:
return self.query(self.resolve(url, components), False)
# Check known shortening services first
if components.netloc in self.shorteners:
return self.resolve(url, components)
# If we haven't seen this host before, ping it, just in case
if components.netloc not in self.learned:
ping = self.resolve(url, components)
if ping != url:
self.shorteners.append(components.netloc)
self.learned.append(components.netloc)
return ping
# The original URL was OK
return url
This one’s a bit more convoluted but has turned out to be very useful indeed, and you can simply pickle the whole object to preserve its learned hosts.