--- Log opened Thu Mar 22 00:00:43 2012
14:29 < mcallan> conseo: i saw some of your crawls hit the list.  how goes the battle?
14:40 < conseo> mcallan: had a weird bug, but i have now found a mashup of the httpcore-nio examples which actually posts a HTTP GET asynchronously
14:41 < conseo> it took me the last week (with some interruptions) to get the idea of how to build the reactor pattern with httpcore, i hope i can progress faster now
14:48 < mcallan> i've had some learning to do myself.  mine is a smaller job though and it looks like i'm on the home stretch.  if it tests okay today, then i'll clean up the wiki pages and release it.
14:53 < conseo> mcallan: ok. what have you done?
15:03 < mcallan> http://zelea.com/project/votorola/_/javadoc/votorola/s/gwt/stage/Stage.html
15:05 < mcallan> what i'm testing now isn't documented there: the relay of state across active links.  so if you navigate from the bridge (say) to a draft, then any track (or other component like the footings) may optionally initialize the stage such that it automatically selects the same difference as shown on the referring page (bridge or whatever).  that diff is then shown on all tracks etc.  sort of like here: http://zelea.com/w/User:Test-bg-ZeleaCom/G/p/sandbox#4176-4168
15:06 < mcallan> but it is done without a fragment (#), because that won't work when we have dozens of state variables, aside from diff key
15:12 < conseo> hmm, so you basically make the javascript environment stateful over all pages?... i need to have a look
15:12 < conseo> i only remember sessionid from very old php times (was php 4 i think... :-) )
15:13 < mcallan> yes, so it's like the stage was part of the browser, and not a single page (but it's all done on the client side, where possible - no storage on server)
15:15 < conseo> ok, sounds nifty, i need to dive in a bit to understand what you do
15:15 < mcallan> i will post a demo, hopefully tonight or tomorrow
15:19 < conseo> ok
15:31 < conseo> mcallan: hmm WP_D looks interesting, where does setCacheable(boolean) refer to?
16:19 < mcallan> http://zelea.com/project/votorola/_/javadoc/votorola/a/web/wic/VPageHTML.html#setCacheable(boolean)
16:19 < mcallan> the bridge is a stateless page, so it's cacheable
16:20 < mcallan> (most of our wicket pages are cacheable)
17:05 < conseo> ok. what is the concept. stateless on the client? do you use html5 storage functionality?
17:10 < mcallan> yes, session store to persist state of single page for back and forth nav.  but cookie for passing state across links
17:11 < conseo> so you can access the session store with the help of the cookie beyond cross-domain restrictions?
17:12 < conseo> (i mean single origin policy)
17:12 < conseo> mcallan: it schedules and runs:-)
17:13 < mcallan> no, session store and cookie are used for 2 different purposes.  what session store is used for is documented here: http://zelea.com/project/votorola/_/javadoc/votorola/s/gwt/stage/Stage.html
17:13 < mcallan> cookie store is for relaying state *between* pages in a link relation, and i will doc that today
17:15 < mcallan> neither will work in all cases, and we'll have to resort to server side storage.  i will code that for next release, and we'll add the stage to metagov's wiki
17:15 < mcallan> (that will require server side store, i think)
17:15 < conseo> ok
17:16 < mcallan> all of this complexity is hidden from tracks and other props on stage.  they just have to obey the api
17:16 < conseo> mcallan: will you remove the old navigation from the pollwiki (the nav on the right side)?
17:16 < mcallan> yes, tonight
17:17 < conseo> huuh, tonight is the night :-D
17:17 < mcallan> gonna be alright :-)
17:18 < conseo> i have to figure out how to properly scatter future crawls so they don't lump
17:18 < mcallan> you mean how to ensure the minimum gap (5 s or whatevr) is respacted
17:19 < conseo> well, it doesn't have to be respected absolutely, only on average
17:20 < mcallan> you sure?
17:20 < mcallan> methought that wuz whole purpose of scheduling
17:21 < conseo> the purpose is to be gentile, but we can do two crawls in a second and then wait for 10 or more imo
17:21 < conseo> we will have bursts anyway, when we get a DiffKick event
17:21 < mcallan> you sure that's ok?
17:21 < conseo> because waiting 5 secs for 2-3 calls will break your 10 sec maximum delay
17:22 < conseo> don't know
17:22 < conseo> i don't know if even 5 secs is ok
17:22 < conseo> this depends on the admin
17:22 < mcallan> i think ww'll get our asses blacklisted if we don't obey robots.txt
17:22 < conseo> i would even do it slower in the background and only do bursts when absolutely necessary
17:23 < mcallan> bursts should never be necessary
17:23 < conseo> they are, at least two requests are necessary for pipermail
17:23 < conseo> or we can't predict how long it takes...
17:25 < conseo> well, i can you restrict that in robots.txt? looking...
17:25 < mcallan> this is why i wanted to see your algorithm for a single scheduled job: what it does 1, 2, 3 - because it *can* be scheduled to avoid hitting server in ways that would disobey its robots.txt
17:26 < conseo> ok, i will show you. the work now was necessary to understand how an asynchronous http reactor pattern works. i should be able to document it now
17:26 < conseo> ok
17:26 < conseo> the parameter is called "Crawl-delay"
17:28 < mcallan> hmmm it looks non-standard.  i think if you don't hit more than once a second, nobody will complain. later you can obey special robots.txt directives
17:28 < conseo> yes, non-standard. ok
17:28 < mcallan> but you are right, and very short bursts (2 or 3 req's) should be ok
17:30 < mcallan> so this will greatly simplify the structure of your jobs.  but when you know you have 2-3 fetches per job, you should probably give that harvester a 2.5 second min gap, right?
17:31 < mcallan> or better yet, since each job must probably schedule the next job, it will calculate the gap based on how many fetches it made
17:32 < conseo> well, you gave me 10 seconds, so i can divide that in any way i like, right :-D
17:32 < conseo> i can also decide on running the jobs to reschedule them later if already one job for this second and this url exists
17:32 < conseo> (hostname)
17:33 < mcallan> (but 10 is prob too extreme, that's one thing.  go with 1 s)
17:33 < conseo> i have a stepping of 1s
17:33 < conseo> and then the delay is set in steps
17:34 < mcallan> yes, forget what i said about one job scheduling another...
17:34 < conseo> they do
17:35 < mcallan> in any case, if you need to schedule a job, schedule it for lastJob+gap
17:35 < conseo> at least atm. i will try to present the design to you and then you can criticize it in any way you see necessary
17:36 < mcallan> (but *before* scheduling, you must set lastJob+=gap) ok, again i need to see 1,2,3 for a single job.  that is all
17:36 < conseo> yes
17:37 < conseo> will the gap be configurable by the wiki or should i assume a default value of let's say 10s (for background, not burst)
17:38 < mcallan> average of a fetch per second should be okay, i imagine
17:39 < mcallan> no need to config, in future obey robots.txt, that's all
17:40 < conseo> one fetch per second? ok
17:40 < mcallan> sure, that's gentile enough
17:50 < conseo> ok, running that way a test on metagov now
17:52 < conseo> i will document it tonight :-) and post to the list
17:52 < mcallan> ok
20:52 < conseo> mcallan: i have the javadocs in my example pipermail harvester. is this enough? it creates three types of the jobs you can directly read PipermailHarvester to see how they trigger each other to sequentially crawl the archive backwards.
20:53 < conseo> i can also commit this for you
20:59 < mcallan> not sure till i see.  more likely too much, rather than too little ;-)
21:01 < mcallan> (crawling backwards is a good idea, no need to wait for the newest)
21:02 < conseo> yes, with irc and pipermail we can simply crawl backwards (although post ids are not consistent completely (due to date distortions)
21:39 < conseo> mcallan: i am commenting all system.out.printlns out now. i would like to commit then. javadoc is uptodate and code is running, although i get no messages, because:
21:39 < conseo> votorola.g.MediaWiki$IDException: No such page revision(s): 2962 ...
21:40 < conseo> i have to work on the Cache still though, i likely have to update something
21:40 < conseo> votorola.a.position.PointerRevision$MalformedPageException: not a proper draft pointer: rev 4026
21:43 < mcallan> conseo: i think there are one or two malformed URLs in the metagov list, from a temporary version of the bridge.  maybe that's what you hit?
21:44 < mcallan> i think it was over a year ago, maybe two years
21:44 < conseo> maybe, but the cache is not too critical to document and demo the scheduler. i will have a look at it tomorrow
21:44 < conseo> ok
21:46 < mcallan> sure, as long as nothing is broken for any of the code we actually run on the server, then its ok to commit if need be.  i guess your server is still down, and you can't post stuff
21:48 < conseo> mcallan: my server is up and you can pull from there. i don't know what caused the trouble though, still have to check two pci cards
21:49 < conseo> i can push to you directly if you prefer that.
21:50 < conseo> i can also expose just the javadocs, but then pulling and looking at both the docs and the example code is more reasonable imo
21:51 < conseo> is that ok for you or am i causing you more work than necessary? the PipermailHarvester code is really straightforward imo
21:56 < mcallan> you know, i just want 1,2,3 design.  if i have to parse code to read it, then i can do that to certain extent.  it's probably too much work for you to serve the code and repos the way i do, e.g. here http://zelea.com/project/votorola/
21:56 < mcallan> and here: http://zelea.com/var/db/repo/
21:57 < mcallan> i will do whatever you tell me.  no need to ask always, just say here it is!
21:58 < conseo> ok. here it is. gn8 :-)
21:58 < mcallan> wait a minute... where?
21:58 < conseo> i'll try to set something similar up, i have a new domain, but it takes some time, because i have dynip and i nee
21:59 < conseo> sry
21:59 < conseo> it is here: sftp://mike@whiletaker.homeip.net//opt/WORK
22:00 < mcallan> don't do extra work for nothing, i don't mind looking on obsidian for your stuff, or whatever
22:00 < conseo> s/sftp/ssh/
22:00 < conseo> it ran there, but atm. the server is debian stable with java 1.6 and i haven't updated it yet
22:00 < mcallan> i seem to need a password
22:01 < conseo> oh, it is port 2222
22:01 < conseo> (as it has been before my server failure)
22:01 < conseo> same place
22:01 < mcallan> ah, i missed your subst there, and put ftp in my browser :-)
22:02 < mcallan> sure, i will pull and look.  you are crashing for the night?
22:02 < conseo> sooner or later, yes. it is already late again :-)
22:03 < conseo> although i am quite happy that it finally works. it gave me some headaches :-)
22:04 < conseo> i have watched a lot of the documentaries from thoughtmaybe and they have all been very good. it was time well spend, thx!
22:06 < mcallan> welcome.  i've been watching some too, which is unusual for me.  what classes did you say i should look at, or were you gonna post a brief note about that?
22:07 < conseo> nobody except you cares for it atm. just look at PipermailHarvester, from there you automatically get linked to HarvestJob and HarvestRunner as well as HarvestHistory which cover all the necessary functionality
22:08 < conseo> (i hope) i still have to move HarvestHistory directly in HarvestJob, I think now, but i have left that for now
22:08 < conseo> (i mean the usage of HarvestHistory)
22:08 < conseo> i can post though, once we agree that this is reasonable
22:09 < conseo> which ones did you watch?
22:11 < mcallan> many of curtis's.  you are crashing right?  i can look at the code after supper, and we can hook up again in the am.  i am still trying to finish my own work
22:12 < mcallan> or i can look now if you are waiting
22:14 < conseo> nope, i'll go to bed now. we can discuss is it whenever you find the time to have a look
22:14 < conseo> tomorrow sounds fine
22:14 < mcallan> ok, that will be in about an hour
22:14 < conseo> yes the ones from curtis are impressive. i wish we would have watched these at school
22:15 < conseo> ok, but i am wasted. i'd still prefer to go to bed now, if you don't mind
22:16 < mcallan> i meant i will have time to look at it in one hour, and then we talk later
22:16 < mcallan> go to sleep... :-)
22:16 < mcallan> i like this one, next best: http://thoughtmaybe.com/video/the-power-of-nightmares
22:27 < conseo> ok, that one is still on my todo. this is one of my favourites so far: http://thoughtmaybe.com/video/the-century-of-the-self
22:27 < conseo> gn8 and cu tomorrow :-)
22:27 < mcallan> yes, i liked that one too.  n8 c
--- Log closed Fri Mar 23 00:00:59 2012