Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook below) .. which is used to display a textual progress bar. This is Python >=2.6.

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

However, urlretrieve is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')
>>> h.items() 
[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),
 ('expires', '-1'),
 ('content-type', 'text/html; charset=ISO-8859-1'),
 ('server', 'gws'),
 ('cache-control', 'private, max-age=0')]
>>> h.status
''
>>>

What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?

Check out urllib.urlretrieve‘s complete code:

def urlretrieve(url, filename=None, reporthook=None, data=None):
  global _urlopener
  if not _urlopener:
    _urlopener = FancyURLopener()
  return _urlopener.retrieve(url, filename, reporthook, data)

In other words, you can use urllib.FancyURLopener (it’s part of the public urllib API). You can override http_error_default to detect 404s:

class MyURLopener(urllib.FancyURLopener):
  def http_error_default(self, url, fp, errcode, errmsg, headers):
    # handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

You should use:

import urllib2

try:
    resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/")
except urllib2.URLError, e:
    if not hasattr(e, "code"):
        raise
    resp = e

print "Gave", resp.code, resp.msg
print "=" * 80
print resp.read(80)

Edit: The rationale here is that unless you expect the exceptional state, it is an exception for it to happen, and you probably didn’t even think about it — so instead of letting your code continue to run while it was unsuccessful, the default behavior is–quite sensibly–to inhibit its execution.

The URL Opener object’s “retreive” method supports the reporthook and throws an exception on 404.

http://docs.python.org/library/urllib.html#url-opener-objects