lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Multi-threaded post.jar?
Date Sat, 02 Mar 2013 04:19:07 GMT
Hi,

Sure, lots of things could be done with creative curl usage.... but there
is still something to be said about having an ecosystem of nice devops
friendly tools...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Feb 27, 2013 at 8:01 AM, Upayavira <uv@odoko.co.uk> wrote:

> I took the cheap and cheerful approach, and created another class that
> wraps SimplePostTool. It makes lots of assumptions, such as that the
> shell will already have expanded any globs/wildcards, and just assigns
> various arguments to the various threads. It is good enough for what I
> need.
>
> The idea of a shell is an interesting one. But is there stuff we
> couldn't achieve without creative use of 'curl'?
>
> Upayavira
>
> On Tue, Feb 26, 2013, at 04:34 AM, Otis Gospodnetic wrote:
> > Upayavira, ever did this?
> >
> > Ha, look at my email from 20 days ago and this:
> > https://github.com/javanna/elasticshell
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Feb 6, 2013 at 2:38 PM, Otis Gospodnetic
> > <otis.gospodnetic@gmail.com
> > > wrote:
> >
> > > Btw wouldn't this be a chance to create a solr cli tool, much like
> > > es2unix?  Maybe with a shell? I'm off-line now, but I recently came
> across
> > > a java lib that makes this easy... jclam jsomething ...
> > >
> > > Otis
> > > Solr & ElasticSearch Support
> > > http://sematext.com/
> > > On Feb 6, 2013 8:48 AM, "Jan Høydahl" <jan.asf@cominvent.com> wrote:
> > >
> > >> With dependencies I meant external jar dependencies. Perhaps
> extensions
> > >> could have deps while leaving the "core" compilable without?
> > >>
> > >> --
> > >> Jan Høydahl, search solution architect
> > >> Cominvent AS - www.cominvent.com
> > >> Solr Training - www.solrtraining.com
> > >>
> > >> 5. feb. 2013 kl. 17:10 skrev Upayavira <uv@odoko.co.uk>:
> > >>
> > >> > By dependencies, do you mean other java classes? I was thinking of
> > >> > splitting it out into a few classes, each of which is clearer in its
> > >> > purpose.
> > >> >
> > >> > Upayavira
> > >> >
> > >> > On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote:
> > >> >> Wiki page exists already: http://wiki.apache.org/solr/post.jar
> > >> >>
> > >> >> I'm happy to consider a refactoring, especially if it make it
> SIMPLER
> > >> to
> > >> >> read and interact with and doesn't add a ton of mandatory
> dependencies.
> > >> >> It should probably still be possible to say something like
> > >> >>
> > >> >>  javac org/apache/solr/util/SimplePostTool.java
> > >> >>  java -cp . org.apache.solr.util.SimplePostTool -h
> > >> >>
> > >> >> That's just how I've been thinking so far though. If other
> committers
> > >> are
> > >> >> happy with abandoning the simple-ness and instead create a
> > >> best-practices
> > >> >> based feature-rich tool with dependencies, then I'll not object.
> > >> >>
> > >> >> --
> > >> >> Jan Høydahl, search solution architect
> > >> >> Cominvent AS - www.cominvent.com
> > >> >> Solr Training - www.solrtraining.com
> > >> >>
> > >> >> 5. feb. 2013 kl. 05:22 skrev Upayavira <uv@odoko.co.uk>:
> > >> >>
> > >> >>> Thx Jan,
> > >> >>>
> > >> >>> All I know is I've got a data set of 500k documents, Solr
> formatted,
> > >> and
> > >> >>> I want it to be as easy as possible to get them into Solr.
I also
> want
> > >> >>> to be able to show the benefit of multithreading. The outcome
> would
> > >> >>> really be "make sure your code uses multiple threads to push
to
> Solr"
> > >> >>> rather than "use post.jar in production". I see post.jar as
a
> > >> >>> demonstration tool, rather than anything else, and am considering
> > >> adding
> > >> >>> another feature to enhance that.
> > >> >>>
> > >> >>> However, I did stall once I started looking at the
> SimplePostTool.jar
> > >> >>> class, because it is loosing its connection with the term
> 'Simple'.
> > >> >>> Adding multithreading, however useful, correct, whatever,
would
> > >> >>> completely push it over the edge. Thus, I think the proper
> approach is
> > >> >>> to refactor the tool into a number of classes, and only then
think
> > >> about
> > >> >>> adding multithreading as a completely separate affair. I'm
more
> than
> > >> >>> happy to have a go at that refactoring, especially if you're
> prepared
> > >> to
> > >> >>> review it.
> > >> >>>
> > >> >>> I guess the other thing that is much needed is a wiki page
that
> > >> details
> > >> >>> the features of the tool, and also explains that its role
is
> > >> >>> educational, rather than anything else.
> > >> >>>
> > >> >>> Upayavira
> > >> >>>
> > >> >>> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote:
> > >> >>>> Hi,
> > >> >>>>
> > >> >>>> Hmm, the tool is getting bloated for a one-class no-deps
tool
> > >> already :)
> > >> >>>> Guess it would be useful too with real-life code examples
using
> > >> SolrJ and
> > >> >>>> other libs as well (such as robots.txt lib, commons-cli
etc), but
> > >> whether
> > >> >>>> that should be an extension of SimplePostTool or a totally
new
> tool
> > >> from
> > >> >>>> scratch is something to discuss. Please bring on your
ideas of
> how
> > >> you
> > >> >>>> plan to extend it, perhaps even simplifying the code in
the
> process?
> > >> >>>>
> > >> >>>> --
> > >> >>>> Jan Høydahl, search solution architect
> > >> >>>> Cominvent AS - www.cominvent.com
> > >> >>>> Solr Training - www.solrtraining.com
> > >> >>>>
> > >> >>>> 3. feb. 2013 kl. 17:19 skrev Upayavira <uv@odoko.co.uk>:
> > >> >>>>
> > >> >>>>> I have a scenario in which I need to post 500,000
documents to
> Solr
> > >> as a
> > >> >>>>> test. I have these documents in XML files already
formatted in
> > >> Solr's
> > >> >>>>> xml format.
> > >> >>>>>
> > >> >>>>> Posting to Solr using post.jar it takes 1m55s. With
a bit of
> bash
> > >> >>>>> jiggery-pokery, I was able to get this down to 1m08s
by running
> four
> > >> >>>>> concurrent post.jar instances, which strikes me as
a significant
> > >> >>>>> improvement.
> > >> >>>>>
> > >> >>>>> I'm considering adding multithreaded capabilities
to post.jar,
> but
> > >> >>>>> before I go to that effort, I wanted to see if anyone
else would
> > >> >>>>> consider it a useful feature. Given that the SimplePostTool
is
> > >> becoming
> > >> >>>>> far from simple, I wanted to see whether the feature
is likely
> to be
> > >> >>>>> accepted before I put in the effort. Also, I would
need to
> consider
> > >> >>>>> which parts of the tool to add that to. Currently
I only want
> it for
> > >> >>>>> posting XML docs, but there's also crawling capabilities
in it
> too.
> > >> >>>>>
> > >> >>>>> Thoughts?
> > >> >>>>>
> > >> >>>>> Upayavira
> > >> >>>>
> > >> >>
> > >>
> > >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message