New interface

Now that the first pass sweep is finished we have a new interface for Find That Feed. The feed URL is hyperlinked from the white-on-orange XML button that appears next to each search result. If we were able to find the HTML URL for the feed, it is hyperlinked from the feed title. If not, the feed title is not hyperlinked.
There will be a fair number of non-hyperlinked feed titles. The sweep had a roughly 11% failure rate.

  Total: 22204 feeds
Errors: 2719 feeds

Here’s a breakdown with the top error sources:

  404 Not Found: 616 feeds
XML not well-formed: 335 feeds
Network timeout: 294 feeds
Got HTML not XML: 246
Connection refused: 126 feeds

8 thoughts on “New interface”

  1. Here’s an obvious idea: if two feeds point at the same website, can we group them together? This could solve the problem that Dave proposed rssHints for, but without changes to existing RSS.

    Or am I ignoring something that would make this infeasible?

  2. Using the URL as a basis (as I understand Shimon is proposing) should be done carefully I think, since sometimes feeds might differ only in the final name because and yet they might be pointing to different categories within the same website/weblog, and so might be subsets, or not maybe not feeds of the same weblog at all. That aside, it should be doable (getting the LINK item from the CHANNEL should clear up those ambiguities though, but it implies downloading the actual feed to do the check, the OPML alone wouldn’t have enough information). Just my 2c :)

  3. Thanks for the comments, Diego. I’m doing exactly what you describe — downloading the actual feeds to do the check. And getting an education in real world feed parsing in the process.

  4. I’m curious to know how the name of the feed is obtained – a search on “raw”, the title of my blog doesn’t find it, yet a search on “danny” does. The title is given in the markup both on the blog and the feed itself.

    Re. feed parsing – such fun!

  5. Hi Danny,

    The name of the feed is taken from the “text” field of the first SYO SDK OPML file I run across that contains the feed URL. Presumably the outcome is a snapshot of what the feed name was at some point in history.

  6. Right, I can get a more up-to-date title in my initial sweep. Not much extra coding hopefully.

    The nagging question is whether to go back and periodically refresh *all* feed data, not just the latest from SYO. Sounds worthwhile, yet somehow not very exciting to code :->

  7. Thanks Andrew – I’m still a little puzzled (was the feed ever called that?) but it makes sense.

    I’m somewhat biased, not liking the way Dave’s approached this (and loathing OPML!), but I think as a general principle it would probably be best for a system to be as loosely connected to some other centralised service as possible. So some form of periodic *direct* update would probably be a good idea (though you’re right, it does sound dull as dishwater codewise 😉
    If you could autodiscover (*cough* scrape *cough*) the blogrolls of the targets at the same time, all the better.

Comments are closed.