Finding the source of a feed

By Glendon Solsberry on 10 Jul 2013

I've been concerned for a while that Feedburner was going away; with Reader's disappearance, Feedburner seems like the next obvious choice to be put out to pasture. I needed to make sure that the feeds publishing aren't using Feedburner (I fixed that a while back), but also that the feeds I'm reading (that are currently pointing to Feedburner) are pointing to the right location.

I've been a paying Newsblur fan for a couple of years now (ever since the Reader redesign). It handles sync perfectly, but the way the system is designed, it can't work as a backend to any of the other clients out there. But I wanted to make sure I wasn't missing something awesome that was out there. rss2email caught my eye after reading about everyone's syncing issues.

rss2email keeps a local list of my subscriptions, so it's easy enough to get them:

 
r2e list

But I needed to find whether those feed urls pointed to the appropriate location that the site owner wants. So, I wrote the following script:

 
r2e list | grep feedb | awk {'print '} | xargs -I{} python -c "from bs4 import BeautifulSoup; import feedparser; import urllib2; import pickle; url = '{}'; feed = feedparser.parse( url ); content = urllib2.urlopen(feed['channel']['link']).read(); soup = BeautifulSoup(content);
for link in soup.find_all('link', rel='alternate'):
  if url != link.get('href'):
    print url, link.get('href')
"

Note the newline after soup = BeautifulSoup(content);. At least from what I've seen, it's necessary, due to the way Python expects formatting to happen.

Now, this script is designed to specifically work with data coming from rss2email, but easily works with any list of RSS urls.

dp.cx blog