incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Arnold <jer...@possiblyfaulty.com>
Subject Droids as a standalone service
Date Wed, 02 Mar 2011 18:21:06 GMT
Hello droids developers,

I wanted to talk a little bit about my use case and hopefully get an
idea of how far away the code base is from it, as well as where I
could put in personal time to help get it there.

I'm interested in having a standalone searching/crawling service so I
can have 1 application hosting the searching and indexing used by many
different custom apps. I would like to have the option to use Solr or
elasticsearch for indexing/searching. I've been working with elastic
search for the past few days and am growing very fond of the
flexibility it provides. I'm also a sucker for JSON over HTTP
services. I need to be able to start crawls for both a filesystem and
webpages from within my custom webapp, or have the crawls run at
scheduled times. Those crawls then need to be indexed. I also need to
have the ability to integrate my own content handlers so I can specify
how certain pieces of content are indexed (e.g., custom PDF metadata).
I also need to be able to easily add or remove items in an index from
within my custom app, as well as the obvious updating items in an
index.

How far is the codebase away from being able to be used in the
scenario described above?

I've spent a lot of time over the past 3 days looking at the droids
code base. It looks really promising but I'm not sure where it really
stands overall. I know the elasticsearch piece doesn't exist, and I
would love to put together that contribution if it seems like an
acceptable counterpart to the existing droids-solr module. I would
also like to take on the task of bringing a lot more consistency to
the code base (e.g., commenting, code consistency). I'm just a bit
concerned about taking on such a large task and submitting it as a
patch. I also see a few places where testing would be beneficial and
do not mind attacking that as well, I just don't want to waste time
testing things that may be going away in the future.

I'd like to start a discussion about my particular use case and where
it would be most beneficial for me to get involved, or if there is a
project out there that is better suited for my use case. I've
evaluated Nutch but I can't say it's been the best experience and it
doesn't quite fit into what I am trying to do. It does look like Nutch
2 will fit well into my use case but the timeline does not.


Thanks,
Jeremy

Mime
View raw message