--- Log opened Wed Mar 07 00:00:46 2012 06:24 < conseo> mcallan: the command line interface could be a special detector, right? we could than manually trigger events during runtime 06:24 < conseo> s/than/then 06:30 < conseo> since all actions implemented by harvesters are triggered by events, this would nicely hook in 06:36 < mcallan> conseo: except there's no actual need to communicate with the running daemon, and it's hard to do so 06:36 < mcallan> all you need to communicate with is cache (= database and filebase) 06:37 < conseo> mcallan: we cannot run the same harvester in several instances, this will make scheduling a nightmare 06:37 < mcallan> scheduling is unaffected 06:37 < mcallan> oh... 06:37 < mcallan> you mean to avoid hitting the archive too often? 06:38 < conseo> yes 06:39 < conseo> also you said that we would command each harvester in a similiar way, yet we cannot assume a common interface 06:39 < conseo> the event interface is already there... but maybe it is a problem 06:40 < mcallan> scheduling is a minor edge case, and will be 50X easier to solve than a full blown, general purpose inter-process communication interface 06:40 < conseo> mcallan: we could simply extend the irc detector for commands 06:40 < conseo> hmmm 06:43 < conseo> voharvest clear -> same like executing a sql command and should be necessary only in rare cases (more likely you need to update the archive to some new url or so) 06:43 < conseo> voharvest detect -> they run anyway once they are configured. you cannot start and stop them all the time imo 06:44 < conseo> voharvest harvest FORUM -> hmm, possible, it would just mean to trigger a crawl. this is the interesting one imo 06:45 < mcallan> why would a command (harvest or clear) have to stop detectors? i think you are complicating it 06:45 < conseo> the detect command implies that, right? 06:46 < mcallan> detect *is* the daemon launcher 06:47 < mcallan> that's all it does is run detectors, and ofc that implies loading kicker and harvesters - but the purpose is to detect 06:47 < conseo> ok, we could add a command shell to the daemon, so you would run it in a "screen" session 06:48 < mcallan> oh man, it's not needed. bash is your shell. just code the command and run it from bash :-) 06:48 < conseo> so "voharvest detect" is meant to start the daemon and return? 06:48 < conseo> but bash is a seperate process 06:50 < conseo> pizza arrived, miami :-) 06:50 < mcallan> at bash prompt, you call "voharvest harvest metagov" 06:51 < mcallan> well, we can talk later... 06:51 < mcallan> bon appetite 06:55 < conseo> and then all running harvesters fetch an event for metagov and the pipermail one triggers a crawl? because the harvesters run all the time, right? 06:57 < mcallan> if there's a kick from a pipermail forum, it only goes to pipermail harvester (unless candidate is on multiple mailing lists, which is unlikely) right? 07:00 < mcallan> sorry, you were asking about "voharvest harvest metagov"... 07:00 < mcallan> it creates a pipermail harvester, and kicks it ... 07:03 < mcallan> or whatever it takes to get it to update from metagov 07:04 < mcallan> earlier you asked about daemon (voharvest detect), it will not return. our daemons have to be backgrounded if you want your prompt back 07:04 < mcallan> (using bash or whatever) 07:05 < mcallan> but i wouldn't code a special shell, because then you can't script the commands, or cron them, or anything 07:09 < mcallan> conseo: and the commands are not for admin use (aside from daemon "detect") at least not yet. they just for test purposes. so best to do the simplest thing 07:19 < mcallan> it's late for me, been up all night. i'm signing off shortly 07:23 < conseo> mcallan: ok 07:23 < conseo> mcallan: well, we can have different daemon modes for tests, this would be no problem. just calling different instances at the same time for admin is a problematic imo 07:24 < conseo> mcallan: mind you, we can also have a table for the jobs and run an instance to insert the jobs in the table 07:24 < conseo> we have to serialize them anyway 07:24 < conseo> (as we already talked about) 07:25 < conseo> but this is not part of the current initial design sketch, right? 07:29 < mcallan> serializing the markers in the archive? or coding the command line? i think both are needed 07:30 < conseo> not only the markers but also the jobs (runnables). they have to be saved for restart. if we store them in the db, then we can insert jobs from a different instance 07:30 < mcallan> why store jobs? 07:30 < conseo> well, this depends on the markers as well, i have to hammer that out for the pipermail reference harvester 07:32 < conseo> because markers might not be enough. you can see them as the state of the job. hmm... i can't wrap my head around it now, we talk later/tomorrow... 07:32 < conseo> have a good rest 07:32 < mcallan> for each forum, i guess i would store 2 things: marker, and last access time 07:32 < conseo> yes 07:33 < conseo> we could manipulate these markers from the command line (the markers are states of the jobs, so it depends if the markers are sufficient for all jobs) 07:35 < mcallan> you think it's hard to code a command, but it's easy. just create harvester, call a method on it, and exit 07:36 < conseo> but then several crawls run in parallel 07:36 < mcallan> possibly (though unlikely) but it does not matter 07:37 < conseo> well, if you call this command from cron or do some other manual scripting it can be 07:37 < conseo> if it is supposed to be an official feature 07:38 < mcallan> how can it be a problem? 07:38 < conseo> we can risk it, if you think it is not a big issue 07:38 < mcallan> i see no risk. do you? 07:39 < conseo> having several instances running defeats the timing settings 07:39 < conseo> except we do it through the same db storage 07:40 < conseo> but it is not a heavy issue, we can still find a solution once it is a problem 07:41 < mcallan> if it's coded right, and the access time is stored, there is no possibility of hitting the archive too often 07:41 < conseo> ok, so we have an atomic central storage for the state (markers/jobs/...) 07:41 < conseo> markers for now 07:42 < conseo> this is not documented yet, so i wasn't sure about the spec 07:42 < conseo> if this is supposed to be used by all harvesters, then we need an official api 07:42 < conseo> right?? 07:43 < conseo> s/??/? 07:44 < mcallan> i think each programmer is free... but they will probably decide to share the same table/directory or whatever 07:45 < mcallan> so yes, it'll need to be doc'd at that point 07:47 < conseo> mcallan: which point? now or later? 07:47 < conseo> mcallan: where should i document the command line interface? in the javadocs? 07:48 < mcallan> later is ok, because it's not in the arch diagram 07:49 < conseo> good 07:51 < mcallan> command line is doc'd in the manual. there was a special xf manual, but this really is not xf/stage specific. i think it belongs in manual along with vocount etc. 07:52 < conseo> ok 07:52 < conseo> something open still or are we done with docs now? 07:52 < mcallan> nah, i'm working on footings. you can change stuff 07:54 < conseo> what do you mean? i meant my 5 point agenda. can i start implementing the pipermail harvester or is something open still? 07:55 < mcallan> i wouldn't code till everything is doc'd 07:55 < conseo> ok 07:56 < conseo> so we talk later, i'll reply to the list 07:56 < mcallan> right, do 3b and 4. then we review everything and see if it looks ok 07:56 < conseo> ok 07:56 < conseo> is daylight already there? :-) 07:57 < conseo> it has to be 07:57 < mcallan> yes, 8am. and i'm off 07:57 < conseo> ok, gn8 and thx for your time 07:58 < mcallan> welcome, thanks for yers too! cu soon 08:06 < conseo> cu 18:57 < mcallan> conseo: don't forget to put DiffKey or Forum name in kick event. you must have at least one of those 18:58 < mcallan> let me know when, and we'll look over docs for problems --- Log closed Thu Mar 08 00:00:02 2012