Find That Feed update

The code that updates Find That Feed nightly from Share Your OPML is nice and stable. I get an email like this every day that tells me how it went:

Subject: SYO: sweep successful

When Started: 2004-02-04 03:30:05
When Finished: 2004-02-04 03:30:25
Feeds: 23607
Subscribers: 678
Subscriptions: 74357

Given the time I’d like to periodically refresh the channel titles, and, what the hell, make the descriptions searchable as well, by pulling these directly from the feeds as described in the comments on this post.
I’ve also got some suggestions for new kinds of reports sitting in my Inbox. I haven’t forgotten them, just have a lot going on.

2 thoughts on “Find That Feed update”

  1. Yes, the canonical title of a channel can be pulled from the title data in the channel itself — call this the “author’s title”. However, I’m finding that the author’s title is often nondescript. An alternative approach is to maintain a list of _reader’s_ (user’s) titles, pulled from the reader’s OPML. The reader’s titles are presumably initially set, by the reader’s aggregator, to the author’s title. It appears that some readers modify the title, making it more descriptive. The reader’s modified title is reported by their aggregator when they upload their feed list to SYO. My approach when manipulating and viewing the SYO SDK data is to select the top two occuring reader’s titles and display the title in reports as “Top Title (Secondary Title)”.

    For example, using the SYO SDK data, here are the titles for the Scripting News feed, grouped and ordered by count:

    count title
    1 Scripting News 1
    1 htt_://
    9 Dave Winer
    382 Scripting News

    Nine reader’s have a title other than the author’s. I report the title as:
    Scripting News (Dave Winer)

    Another example, using Jon Udell’s feed:

    count title
    1 htt_://
    1 John Udell
    1 jon udell
    1 udell
    15 Jon Udell
    191 Jon’s Radio

    -> Jon’s Radio (Jon Udell)

    Lastly, an example using the Boing Boing feed:

    count title
    1 bOing bOing
    1 boingboing
    1 Boing Boing: A Directory of Wonderful Things
    1 Boingboing
    1 htt_://
    1 net.boingboing
    2 BoingBoing
    41 Boing Boing
    274 Boing Boing Blog

    To handle cases like this, where the second title provides no extra info, my code doesn’t report the secondary title, as it is contained within the first title (substring).


Comments are closed.