About that “true” offline RSS reader that I pitched in my last post, I’ll have you know that I made a minimally functioning system based on that outline.
These are the primary challenges/unknowns that I assessed from the outset:
- Manipulating relative URLs of supporting files
- Parsing HTML in Python
- Searching and replacing within the HTML file
- Downloaded .js files that include other .js files
For #1, Python’s urlparse library works wonders. For #2 and #3, look no farther than Python’s HTMLParser module. This blog post helped me greatly. I have chosen not to address #4 at this time. I’m not downloading any JavaScript files right now; the CSS and supporting images are mostly adequate.
Further, it turned out not to be necessary to manually build an XML parser. Whenever I encountered a task that felt like it was going to be too much work — like manually parsing the XML feeds using Python’s low-level XML systems — a little searching revealed that all the hard work was already done. In the case of parsing the RSS files, the task was rendered trivial thanks to FeedParser.
Brief TODO list, for my own reference:
- Index the database tables in a sane manner
- Deal with exceptions thrown by malformed HTML
- Update the post table to indicate that a post has been “read” when it is accessed
- Implement HTTP redirection (since some RSS feeds apparently do that)
- Implement cache control so that the browser will properly refresh feed lists
- Add a stylesheet that will allow the server to control the appearance of links depending on whether or not the posts have been read
- Take into account non-ASCII encoding (really need to train myself to do this from the get-go)
- Forge user agent and referrer strings in HTTP requests, for good measure
- Slap some kind of UI prettiness on top of the whole affair; I’m thinking an accordian widget containing tables might work well and I think there are a number of JavaScript libraries that could make that happen
Once I get that far, I’ll probably put some code out there. Based on what I have read, I’m not the only person who is looking for a solution like this.
I eventually released this software. Find it on Github.