--- Log opened Sat Aug 13 00:00:20 2011
04:40 < conseo> hmm, at least you make some progress. i had to think about how to do the scraping and i think i have a plan now
04:41 < conseo> mcallan: the web-harvest based scraper works as expected, but i still have to bring it into the right shape
05:50 < conseo> mcallan: where do you get your drawings for UML in inkscape from? do you have some template file? we could share that then to keep things in a similiar look (i plan to diagram the harvester stuff next)
05:54 < mcallan> conseo: here's what i use c, nothing fancy: http://zelea.com/_/uml.svg
05:56 < mcallan> mostly i just copy from old diagrams though.  you'll see all the svg source posted with my pngs
05:58 < mcallan> btw, if u don't know, you have to open one inkscape doc from another (using File|Open or whatever) or you can't copy/paste
06:13 < conseo> ok, thx
07:27 < conseo> you have to fine tune the arrows for every length in svg?
07:27 < conseo> i am not sure which diagram type to use to visualize the concept of the difference feed
07:27 < conseo> (uml diagram type)
07:29 < conseo> mcallan: my idea is to expose the diffservlet for message drops as well. so you can simply call the url with the necessary parameters and it will check the message for valid diffs and store it then, so we don't need any bindings and you can write your scraper in any way you like
07:30 < conseo> i have collected some thoughts during the last week, but most of it became obsolete after some more recherche
07:36 < mcallan> conseo: the check-and-store thing that scrapers call for each URL, it's exposed as a web api?  or java api?
07:36 < conseo> i will try to summarize my current status before doing an official diagram with inkscape
07:38 < conseo> atm. i bind in votorola in the classpath and can therefore use the current api from BeanShell. But this is unnecessary imo and there are many potential ways to gather "DiffMessages", either by scraping or by parsing or by getting from the db, ... even a bash script is possible if you have stuff stored in text files
07:38 < conseo> a web api would make that accessible from anywhere, right?
07:39 < mcallan> about diags: yes, it's good to postpone diagrams, because they are hard to keep in sync with code.  myself i never trust them, and always look at api docs.  (i only doc the top level of the arch)
07:39 < conseo> ok
07:39 < conseo> well web api, just mean some url you can call with the respective post parameters for the diff message
07:41 < mcallan> i was only worried that a web api would be too slow for such a workhorse routine.  if it's java api, no problem ofc
07:42 < mcallan> i thought "web api", when you said "expose the diffservlet for message drops"
07:43 < conseo> i just parse any possible message and drop it there, no matter how it has been generated. the whole sense of it would be to have an interface that can be called from any scraping environment including javascript, ruby, xslt or anything else besides web-harvest
07:44 < mcallan> right, i was only asking whether that iface was web, or java
07:45 < mcallan> web is not going to work, if it's called once per message - could be millions of messages
07:45 < mcallan> (unless i misunderstand)
07:46 < conseo> mcallan: something like http://u.zelea.com:8080/v/w/xfDiff?action=check&title=[MG] new diff parser in place&author=4consensus@web.de&diffURLs=http://blabla/v/w/D?a=1234&b=3456,http://blabla...&url=http://pipermail.archive.com/votorola/2011-November/1333.html&...
07:46 < conseo> with proper encoding ofc
07:48 < conseo> how is that not going to work? if we only send valid candidates (ones that really contain a diff-url), i don't see a single provider getting too many messages at once
07:52 < conseo> i still feel pretty unexperienced, so please enlighten me :-). I felt uncomfortable with any other single solution i came up with last week (either using web-harvest, env-js (ecmascript fake-browser environment), xslt or some home-brewn java-*script env)
07:53 < conseo> i don't know what admins will use and we won't write the scrapers our own, so would like to leave it to them as far as possible. web-harvest is a good default entry point for people not experienced with scraping, but it is also a bit blown and not necessarily the best solution everywhere
07:53 < mcallan> i may simply misunderstand, do you have skype - just for 15 minutes?
07:54 < conseo> yep
07:55 < mcallan> ok, i'll ring you in a few mins...
07:55 < conseo> ok
08:00 < mcallan> ur not online yet...
08:00 < mcallan> u ring me
08:05 < conseo> hmm, skype shows online buddies but not u
08:05 < conseo> restarting skype...
08:13 < conseo> ok, was my sound setting, skype had a problem with
08:14 < conseo> mcallan: but u r no more online
09:38 < mcallan> http://zelea.com/w/Category:Forum
--- Log closed Sun Aug 14 00:00:36 2011