feedfinder

Ultra-liberal feed finder

http://diveintomark.org/projects/feed_finder/

Usage: getFeeds(uri) - returns list of feeds associated with this address

Example: >>> import feedfinder >>> feedfinder.getFeeds('http://diveintomark.org/') ['http://diveintomark.org/xml/atom.xml'] >>> feedfinder.getFeeds('macnn.com') ['http://www.macnn.com/macnn.rdf']

Can also use from the command line. Feeds are returned one per line: $ python feedfinder.py diveintomark.org http://diveintomark.org/xml/atom.xml

How it works: 0. At every step, feeds are minimally verified to make sure they are really feeds. 1. If the URI points to a feed, it is simply returned; otherwise

System Message: ERROR/3 (fmtorrent.master.aggregator.feedfinder, line 20)

Unexpected indentation.
the page is downloaded and the real fun begins.

System Message: WARNING/2 (fmtorrent.master.aggregator.feedfinder, line 21)

Block quote ends without a blank line; unexpected unindent.
  1. Feeds pointed to by LINK tags in the header of the page (autodiscovery)
  2. <A> links to feeds on the same server ending in ".rss", ".rdf", ".xml", or ".atom"
  3. <A> links to feeds on the same server containing "rss", "rdf", "xml", or "atom"
  4. <A> links to feeds on external servers ending in ".rss", ".rdf", ".xml", or ".atom"
  5. <A> links to feeds on external servers containing "rss", "rdf", "xml", or "atom"
  6. As a last ditch effort, we search Syndic8 for feeds matching the URI

Attributes

a __credits__

'Abe Fettig for a patch to sort Syndic8 feeds by popularity\nAlso Jason Diamond, Brian Lalor for bug reporting and patches'

a __history__

"\n1.1 - MAP - 2003/02/20 - added support for Robot Exclusion Standard.  Will\nfetch /robots.txt once per domain and verify that URLs are allowed to be\ndownloaded.  Identifies itself as\n  rssfinder/<version> Python-urllib/<version> +http://diveintomark.org/projects/rss_finder/\n1.2 - MAP - 2004-01-09 - added Atom support, changed name, relicensed,\n  don't query Syndic8 by default (pass querySyndic8=1 to getFeeds to do it anyway)\n"

Functions

f getLinks(data, baseuri) ...

f getALinks(data, baseuri) ...

f getLocalLinks(links, baseuri) ...

f isFeedLink(link) ...

f isFeed(uri) ...

f sortFeeds(feed1Info, feed2Info) ...

f getFeeds(uri, querySyndic8=0) ...

f test() ...

Classes

C RobotFileParserFixed(...) ...

patched version of RobotFileParser, integrating fixes from Python 2.3a2 and bug 690214

This class contains 2 members.

C URLGatekeeper(...) ...

a class to track robots.txt rules across multiple servers

This class contains 3 members.

C BaseParser(...) ...

This class contains 4 members.

C LinkParser(...) ...

This class contains 6 members.

C ALinkParser(...) ...

This class contains 5 members.

See the source for more information.