User:Mike-ZeleaCom/Difference feeder

From Wiki
Jump to: navigation, search

Contents

The difference feed is a Web service providing an aggregate newsfeed of consensus-making discussions as they happen. A harvesting services crawls or subscribes to various discussion media that are compatible with difference bridging (mailing lists, Web forums, chat networks, microblogs and so forth); detects discussions that are focused on concrete differences of position, and collects a summary of the relevant messages into a single, aggregate newsfeed to which its clients may subscribe. The present page is a temporary scratch pad for hashing out the design of the harvester and is not necessarily up-to-date. The final design might be documented directly in the Java source code, as it takes shape.

Here's an example of a message from a mailing list source. If the harvester had been subscribed to that source, then it would have received the message, parsed it and detected the embedded difference URL http://obsidian.reluk.ca:8080/v/w/Diff?b=3860&a=3891. If that URL points the harvester's local difference bridge, and if the poster is one of the drafters named in the diff (Mike-ZeleaCom or ThomasvonderElbe GmxDe), then the message would have been accepted as relevant and a summary of it incorporated in the feed.

                   Discussion
  Client            Source
   * \                / *
      \              /
       \            /
   - - - - - - - - - - - - - - - - - - - - - -
  |      \        /                           |
          \      /
  |   1..* \    / 1..*                        |
          Difference ------------ Difference
  |        harvester     1     1..*   bridge     |
              \ 1
  |            \                              |
                \
  |              \ 1                          |
               Pollwiki
  |                                      site |
   - - - - - - - - - - - - - - - - - - - - - -

Each harvester service is located at a site (server cluster) anchored by a single pollwiki (bottom of diagram). It reads some of its config from the wiki, as well as generating relative URLs to the wiki's poll, position and user pages, as part of the feed content. It also works in conjuction with the site's difference bridge (right). It is only through detecting links to the bridge that it discovers relevent messages. The harvester may also be a client of the bridge's own discovery services. (Currently we're prototyping only a single difference bridge. That won't change till we implement free-range drafting across multiple media, in addition to MediaWiki. Till then, the harvester prototype can assume a single bridge.)

Functions

Discovery of message sources

   - none at first
       - we keep a page somewhere in the pollwiki
         that lists our dev-test sources (lists, forums, etc)
   - maybe later the harvester can discover new sources
     by detecting when they are added to that page
   - later still, the discovery might be largely automated with the help
     of the difference bridge
       - the bridge can provide its own feed of reverse URLs [1]
         (e.g. to list archives) where people are clicking on its diff links
       - the harvester can then attempt to trace back to the original sources (e.g. lists)

Harvesting of message sources

   - manual for now
       - so the admin has to do the work of configuring each mailing list etc.
   - we'll improve this later, teaching the harvester how to crawl/subscribe on its own
     to the various different media

Structure

Input from message sources

   - a web archive and respective web-harvest crawling script. some scripts are provided by us

Intermediate piping

   - scans crawled/incoming messages for diff URLs
       - only URLs to the local difference bridge
       - only where the message is sent by a user whose draft is referenced
         in the diff
           - so ID of message poster must equal or correlate
             to drafter ID (email address) obtained
             by forward tracing of the diff URL

Storage of feed

A summarized version of each message containing the related poll and the minimal information necessary like difference URL and URL to the post is stored.

Output of metadata

   - pollwiki URL
       - so clients can construct absolute URLs into the pollwiki
   - difference bridge URL
       - ditto
       - or put this in the 'diff' part of the feed?
         to support multiple bridges in future

Output of feed

Reading from storage and assembling the feed in response to each request.

   - request format
       - HTTP with parameters
   - result types
       - feeds
   - result formats
       [ JSONP
           - allows cross-origin requests despite browser's "same origin policy" restrictions
               / http://code.google.com/webtoolkit/doc/latest/DevGuideCodingBasicsJSON.html
             / - allowed anyway in Firefox 3.5:
             /   https://developer.mozilla.org/En/HTTP_access_control
             / - see also Cross-Origin Resource Sharing
             /   http://www.w3.org/TR/cors/
             // but JSONP is more standard

Tasks

   - integrate with other clients than Crossforum
       [ Atom (or RSS whichever is best)
           / extended as necessary
           - for compatability with general newsreader clients,
   - make the client feed configurable and scalable for a large depot of messages