feedfinder
Ultra-liberal feed finder
The feedfinder module is accessible via the fmtorrent.master.aggregator module.
http://diveintomark.org/projects/feed_finder/
Usage: getFeeds(uri) - returns list of feeds associated with this address
Example: >>> import feedfinder >>> feedfinder.getFeeds('http://diveintomark.org/') ['http://diveintomark.org/xml/atom.xml'] >>> feedfinder.getFeeds('macnn.com') ['http://www.macnn.com/macnn.rdf']
Can also use from the command line. Feeds are returned one per line: $ python feedfinder.py diveintomark.org http://diveintomark.org/xml/atom.xml
How it works: 0. At every step, feeds are minimally verified to make sure they are really feeds. 1. If the URI points to a feed, it is simply returned; otherwise
the page is downloaded and the real fun begins.
- Feeds pointed to by LINK tags in the header of the page (autodiscovery)
- <A> links to feeds on the same server ending in ".rss", ".rdf", ".xml", or ".atom"
- <A> links to feeds on the same server containing "rss", "rdf", "xml", or "atom"
- <A> links to feeds on external servers ending in ".rss", ".rdf", ".xml", or ".atom"
- <A> links to feeds on external servers containing "rss", "rdf", "xml", or "atom"
- As a last ditch effort, we search Syndic8 for feeds matching the URI
Attributes
a __credits__
'Abe Fettig for a patch to sort Syndic8 feeds by popularity\nAlso Jason Diamond, Brian Lalor for bug reporting and patches'
a __history__
"\n1.1 - MAP - 2003/02/20 - added support for Robot Exclusion Standard. Will\nfetch /robots.txt once per domain and verify that URLs are allowed to be\ndownloaded. Identifies itself as\n rssfinder/<version> Python-urllib/<version> +http://diveintomark.org/projects/rss_finder/\n1.2 - MAP - 2004-01-09 - added Atom support, changed name, relicensed,\n don't query Syndic8 by default (pass querySyndic8=1 to getFeeds to do it anyway)\n"
Functions
Classes
C RobotFileParserFixed(...) ...
patched version of RobotFileParser, integrating fixes from Python 2.3a2 and bug 690214
This class contains 2 members.
C URLGatekeeper(...) ...
a class to track robots.txt rules across multiple servers
This class contains 3 members.
C BaseParser(...) ...
This class contains 4 members.
C LinkParser(...) ...
This class contains 6 members.
C ALinkParser(...) ...
This class contains 5 members.
See the source for more information.