{"id":2342,"date":"2010-03-29T08:19:56","date_gmt":"2010-03-29T16:19:56","guid":{"rendered":"http:\/\/multimedia.cx\/eggs\/?p=2342"},"modified":"2012-04-22T22:26:52","modified_gmt":"2012-04-23T05:26:52","slug":"my-own-offline-rss-reader-part-2","status":"publish","type":"post","link":"https:\/\/multimedia.cx\/eggs\/my-own-offline-rss-reader-part-2\/","title":{"rendered":"My Own Offline RSS Reader (Part 2)"},"content":{"rendered":"<p>About that <a href=\"http:\/\/multimedia.cx\/eggs\/my-own-offline-rss-reader\/\">&#8220;true&#8221; offline RSS reader that I pitched in my last post<\/a>, I&#8217;ll have you know that I made a minimally functioning system based on that outline.<\/p>\n<p>These are the primary challenges\/unknowns that I assessed from the outset:<\/p>\n<ol>\n<li>Manipulating relative URLs of supporting files<\/li>\n<li>Parsing HTML in Python<\/li>\n<li>Searching and replacing within the HTML file<\/li>\n<li>Downloaded .js files that include other .js files<\/li>\n<\/ol>\n<p>For #1, Python&#8217;s <a href=\"http:\/\/docs.python.org\/library\/urlparse.html\">urlparse library<\/a> works wonders. For #2 and #3, look no farther than Python&#8217;s <a href=\"http:\/\/docs.python.org\/library\/htmlparser.html\">HTMLParser module<\/a>. <a href=\"http:\/\/unethicalblogger.com\/node\/180\">This blog post<\/a> helped me greatly. I have chosen not to address #4 at this time. I&#8217;m not downloading any JavaScript files right now; the CSS and supporting images are mostly adequate.<\/p>\n<p>Further, it turned out not to be necessary to manually build an XML parser. Whenever I encountered a task that felt like it was going to be too much work &#8212; like manually parsing the XML feeds using Python&#8217;s low-level XML systems &#8212; a little searching revealed that all the hard work was already done. In the case of parsing the RSS files, the task was rendered trivial thanks to <a href=\"http:\/\/feedparser.org\/\">FeedParser<\/a>.<\/p>\n<p>Brief TODO list, for my own reference:<\/p>\n<ul>\n<li>Index the database tables in a sane manner<\/li>\n<li>Deal with exceptions thrown by malformed HTML<\/li>\n<li>Update the post table to indicate that a post has been &#8220;read&#8221; when it is accessed<\/li>\n<li>Implement HTTP redirection (since some RSS feeds apparently do that)<\/li>\n<li>Implement cache control so that the browser will properly refresh feed lists<\/li>\n<li>Add a stylesheet that will allow the server to control the appearance of links depending on whether or not the posts have been read<\/li>\n<li>Take into account non-ASCII encoding (really need to train myself to do this from the get-go)<\/li>\n<li>Forge user agent and referrer strings in HTTP requests, for good measure<\/li>\n<li>Slap some kind of UI prettiness on top of the whole affair; I&#8217;m thinking an accordian widget containing tables might work well and I think there are a number of JavaScript libraries that could make that happen<\/li>\n<\/ul>\n<p>Once I get that far, I&#8217;ll probably put some code out there. Based on what I have read, I&#8217;m not the only person who is looking for a solution like this.<\/p>\n<p><strong>I eventually released this software. <a href=\"https:\/\/github.com\/multimediamike\/GhettoRSS\">Find it on Github.<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I got my true offline RSS reader idea working<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[],"class_list":["post-2342","post","type-post","status-publish","format-standard","hentry","category-python"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/comments?post=2342"}],"version-history":[{"count":9,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2342\/revisions"}],"predecessor-version":[{"id":3800,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/posts\/2342\/revisions\/3800"}],"wp:attachment":[{"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/media?parent=2342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/categories?post=2342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/multimedia.cx\/eggs\/wp-json\/wp\/v2\/tags?post=2342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}