--- Log opened Sat Aug 13 00:00:20 2011 04:40 < conseo> hmm, at least you make some progress. i had to think about how to do the scraping and i think i have a plan now 04:41 < conseo> mcallan: the web-harvest based scraper works as expected, but i still have to bring it into the right shape 05:50 < conseo> mcallan: where do you get your drawings for UML in inkscape from? do you have some template file? we could share that then to keep things in a similiar look (i plan to diagram the harvester stuff next) 05:54 < mcallan> conseo: here's what i use c, nothing fancy: http://zelea.com/_/uml.svg 05:56 < mcallan> mostly i just copy from old diagrams though. you'll see all the svg source posted with my pngs 05:58 < mcallan> btw, if u don't know, you have to open one inkscape doc from another (using File|Open or whatever) or you can't copy/paste 06:13 < conseo> ok, thx 07:27 < conseo> you have to fine tune the arrows for every length in svg? 07:27 < conseo> i am not sure which diagram type to use to visualize the concept of the difference feed 07:27 < conseo> (uml diagram type) 07:29 < conseo> mcallan: my idea is to expose the diffservlet for message drops as well. so you can simply call the url with the necessary parameters and it will check the message for valid diffs and store it then, so we don't need any bindings and you can write your scraper in any way you like 07:30 < conseo> i have collected some thoughts during the last week, but most of it became obsolete after some more recherche 07:36 < mcallan> conseo: the check-and-store thing that scrapers call for each URL, it's exposed as a web api? or java api? 07:36 < conseo> i will try to summarize my current status before doing an official diagram with inkscape 07:38 < conseo> atm. i bind in votorola in the classpath and can therefore use the current api from BeanShell. But this is unnecessary imo and there are many potential ways to gather "DiffMessages", either by scraping or by parsing or by getting from the db, ... even a bash script is possible if you have stuff stored in text files 07:38 < conseo> a web api would make that accessible from anywhere, right? 07:39 < mcallan> about diags: yes, it's good to postpone diagrams, because they are hard to keep in sync with code. myself i never trust them, and always look at api docs. (i only doc the top level of the arch) 07:39 < conseo> ok 07:39 < conseo> well web api, just mean some url you can call with the respective post parameters for the diff message 07:41 < mcallan> i was only worried that a web api would be too slow for such a workhorse routine. if it's java api, no problem ofc 07:42 < mcallan> i thought "web api", when you said "expose the diffservlet for message drops" 07:43 < conseo> i just parse any possible message and drop it there, no matter how it has been generated. the whole sense of it would be to have an interface that can be called from any scraping environment including javascript, ruby, xslt or anything else besides web-harvest 07:44 < mcallan> right, i was only asking whether that iface was web, or java 07:45 < mcallan> web is not going to work, if it's called once per message - could be millions of messages 07:45 < mcallan> (unless i misunderstand) 07:46 < conseo> mcallan: something like http://u.zelea.com:8080/v/w/xfDiff?action=check&title=[MG] new diff parser in place&author=4consensus@web.de&diffURLs=http://blabla/v/w/D?a=1234&b=3456,http://blabla...&url=http://pipermail.archive.com/votorola/2011-November/1333.html&... 07:46 < conseo> with proper encoding ofc 07:48 < conseo> how is that not going to work? if we only send valid candidates (ones that really contain a diff-url), i don't see a single provider getting too many messages at once 07:52 < conseo> i still feel pretty unexperienced, so please enlighten me :-). I felt uncomfortable with any other single solution i came up with last week (either using web-harvest, env-js (ecmascript fake-browser environment), xslt or some home-brewn java-*script env) 07:53 < conseo> i don't know what admins will use and we won't write the scrapers our own, so would like to leave it to them as far as possible. web-harvest is a good default entry point for people not experienced with scraping, but it is also a bit blown and not necessarily the best solution everywhere 07:53 < mcallan> i may simply misunderstand, do you have skype - just for 15 minutes? 07:54 < conseo> yep 07:55 < mcallan> ok, i'll ring you in a few mins... 07:55 < conseo> ok 08:00 < mcallan> ur not online yet... 08:00 < mcallan> u ring me 08:05 < conseo> hmm, skype shows online buddies but not u 08:05 < conseo> restarting skype... 08:13 < conseo> ok, was my sound setting, skype had a problem with 08:14 < conseo> mcallan: but u r no more online 09:38 < mcallan> http://zelea.com/w/Category:Forum --- Log closed Sun Aug 14 00:00:36 2011