--- Log opened Wed Mar 07 00:00:46 2012
06:24 < conseo> mcallan: the command line interface could be a special detector, right? we could than manually trigger events during runtime
06:24 < conseo> s/than/then
06:30 < conseo> since all actions implemented by harvesters are triggered by events, this would nicely hook in
06:36 < mcallan> conseo: except there's no actual need to communicate with the running daemon, and it's hard to do so
06:36 < mcallan> all you need to communicate with is cache (= database and filebase)
06:37 < conseo> mcallan: we cannot run the same harvester in several instances, this will make scheduling a nightmare
06:37 < mcallan> scheduling is unaffected
06:37 < mcallan> oh...
06:37 < mcallan> you mean to avoid hitting the archive too often?
06:38 < conseo> yes
06:39 < conseo> also you said that we would command each harvester in a similiar way, yet we cannot assume a common interface
06:39 < conseo> the event interface is already there... but maybe it is a problem
06:40 < mcallan> scheduling is a minor edge case, and will be 50X easier to solve than a full blown, general purpose inter-process communication interface
06:40 < conseo> mcallan: we could simply extend the irc detector for commands
06:40 < conseo> hmmm
06:43 < conseo> voharvest clear -> same like executing a sql command and should be necessary only in rare cases (more likely you need to update the archive to some new url or so)
06:43 < conseo> voharvest detect -> they run anyway once they are configured. you cannot start and stop them all the time imo
06:44 < conseo> voharvest harvest FORUM -> hmm, possible, it would just mean to trigger a crawl. this is the interesting one imo
06:45 < mcallan> why would a command (harvest or clear) have to stop detectors?  i think you are complicating it
06:45 < conseo> the detect command implies that, right?
06:46 < mcallan> detect *is* the daemon launcher
06:47 < mcallan> that's all it does is run detectors, and ofc that implies loading kicker and harvesters - but the purpose is to detect
06:47 < conseo> ok, we could add a command shell to the daemon, so you would run it in a "screen" session
06:48 < mcallan> oh man, it's not needed.  bash is your shell.  just code the command and run it from bash :-)
06:48 < conseo> so "voharvest detect" is meant to start the daemon and return?
06:48 < conseo> but bash is a seperate process
06:50 < conseo> pizza arrived, miami :-)
06:50 < mcallan> at bash prompt, you call "voharvest harvest metagov"
06:51 < mcallan> well, we can talk later...
06:51 < mcallan> bon appetite
06:55 < conseo> and then all running harvesters fetch an event for metagov and the pipermail one triggers a crawl? because the harvesters run all the time, right?
06:57 < mcallan> if there's a kick from a pipermail forum, it only goes to pipermail harvester (unless candidate is on multiple mailing lists, which is unlikely) right?
07:00 < mcallan> sorry, you were asking about "voharvest harvest metagov"...
07:00 < mcallan> it creates a pipermail harvester, and kicks it ...
07:03 < mcallan> or whatever it takes to get it to update from metagov
07:04 < mcallan> earlier you asked about daemon (voharvest detect), it will not return.  our daemons have to be backgrounded if you want your prompt back
07:04 < mcallan> (using bash or whatever)
07:05 < mcallan> but i wouldn't code a special shell, because then you can't script the commands, or cron them, or anything
07:09 < mcallan> conseo: and the commands are not for admin use (aside from daemon "detect") at least not yet.  they just for test purposes.  so best to do the simplest thing
07:19 < mcallan> it's late for me, been up all night. i'm signing off shortly
07:23 < conseo> mcallan: ok
07:23 < conseo> mcallan: well, we can have different daemon modes for tests, this would be no problem. just calling different instances at the same time for admin is a problematic imo
07:24 < conseo> mcallan: mind you, we can also have a table for the jobs and run an instance to insert the jobs in the table
07:24 < conseo> we have to serialize them anyway
07:24 < conseo> (as we already talked about)
07:25 < conseo> but this is not part of the current initial design sketch, right?
07:29 < mcallan> serializing the markers in the archive?  or coding the command line?  i think both are needed
07:30 < conseo> not only the markers but also the jobs (runnables). they have to be saved for restart. if we store them in the db, then we can insert jobs from a different instance
07:30 < mcallan> why store jobs?
07:30 < conseo> well, this depends on the markers as well, i have to hammer that out for the pipermail reference harvester
07:32 < conseo> because markers might not be enough. you can see them as the state of the job. hmm... i can't wrap my head around it now, we talk later/tomorrow...
07:32 < conseo> have a good rest
07:32 < mcallan> for each forum, i guess i would store 2 things: marker, and last access time
07:32 < conseo> yes
07:33 < conseo> we could manipulate these markers from the command line (the markers are states of the jobs, so it depends if the markers are sufficient for all jobs)
07:35 < mcallan> you think it's hard to code a command, but it's easy.  just create harvester, call a method on it, and exit
07:36 < conseo> but then several crawls run in parallel
07:36 < mcallan> possibly (though unlikely) but it does not matter
07:37 < conseo> well, if you call this command from cron or do some other manual scripting it can be
07:37 < conseo> if it is supposed to be an official feature
07:38 < mcallan> how can it be a problem?
07:38 < conseo> we can risk it, if you think it is not a big issue
07:38 < mcallan> i see no risk.  do you?
07:39 < conseo> having several instances running defeats the timing settings
07:39 < conseo> except we do it through the same db storage
07:40 < conseo> but it is not a heavy issue, we can still find a solution once it is a problem
07:41 < mcallan> if it's coded right, and the access time is stored, there is no possibility of hitting the archive too often
07:41 < conseo> ok, so we have an atomic central storage for the state (markers/jobs/...)
07:41 < conseo> markers for now
07:42 < conseo> this is not documented yet, so i wasn't sure about the spec
07:42 < conseo> if this is supposed to be used by all harvesters, then we need an official api
07:42 < conseo> right??
07:43 < conseo> s/??/?
07:44 < mcallan> i think each programmer is free... but they will probably decide to share the same table/directory or whatever
07:45 < mcallan> so yes, it'll need to be doc'd at that point
07:47 < conseo> mcallan: which point? now or later?
07:47 < conseo> mcallan: where should i document the command line interface? in the javadocs?
07:48 < mcallan> later is ok, because it's not in the arch diagram
07:49 < conseo> good
07:51 < mcallan> command line is doc'd in the manual.  there was a special xf manual, but this really is not xf/stage specific.  i think it belongs in manual along with vocount etc.
07:52 < conseo> ok
07:52 < conseo> something open still or are we done with docs now?
07:52 < mcallan> nah, i'm working on footings.  you can change stuff
07:54 < conseo> what do you mean? i meant my 5 point agenda. can i start implementing the pipermail harvester or is something open still?
07:55 < mcallan> i wouldn't code till everything is doc'd
07:55 < conseo> ok
07:56 < conseo> so we talk later, i'll reply to the list
07:56 < mcallan> right, do 3b and 4.  then we review everything and see if it looks ok
07:56 < conseo> ok
07:56 < conseo> is daylight already there? :-)
07:57 < conseo> it has to be
07:57 < mcallan> yes, 8am.  and i'm off
07:57 < conseo> ok, gn8 and thx for your time
07:58 < mcallan> welcome, thanks for yers too!  cu soon
08:06 < conseo> cu
18:57 < mcallan> conseo: don't forget to put DiffKey or Forum name in kick event.  you must have at least one of those
18:58 < mcallan> let me know when, and we'll look over docs for problems
--- Log closed Thu Mar 08 00:00:02 2012