uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Solrcas questions
Date Fri, 04 Feb 2011 16:48:52 GMT
Hi Jorn,
good points :)

2011/2/3 Jörn Kottmann <kottmann@gmail.com>

> Hi,
>
> I have two questions about the Solrcas project in the sandbox.
>
> Reviewed the code, is there any special reason that SolrCASConsumer
> does extend JCasAnnotator_ImplBase ? It looks like it is not using any
> JCas features and could also extend CasAnnotator_ImplBase instead.
> But maybe I am mistaken.
>

Thanks for notifying, I'm not at my computer so cannot have a deeper look
but surely do as soon as I can.


>
> In line 132 the SolrServer.add method is called inside the AEs process
> method.
> Does this method already transmit the document over the network into Solr ?
> Or does this happens in the line after where SolrServer.commit is called.
> I am asking because if we could use auto commit the Solr Server process
> might be able to group multiple documents into one commit and then
> we would not need to call SolrServer.commit for every document. The commit
> behavior could be configurable.
>
> In case add does not transmit the document synchronously to the Solr
> Server,
> the process method can return but the CAS might still cause an error in a
> future
> call to the process method which I do not want, because it makes the error
> handling complicated.


The autocommit option could be leveraged adding a parameter within the
CASConsumer descriptor to support such a scenario or otherwise that property
can be derived "downloading" the solrconfig.xml (via a Solr REST call) and
behave according to that. Regarding the add method I think it's synchronous,
as well as the commit one, but need to check better on Solr code.
Regarding the embedded Solr server I think it can be worth to have this
option also for non testing scenarios as it avoid a lot of network overhead
so one could take advantage of it if the two instances (Solr and UIMA) are
on the same machine.


2011/2/4 Jörn Kottmann <kottmann@gmail.com>

> And some more.
>
> This is defined as dependency in the pom:
> <dependency>
> <groupId>org.apache.uima</groupId>
> <artifactId>uimaj-component-test-util</artifactId>
> <version>2.3.1</version>
> </dependency>
>
> Shouldn't that be a test dependency only ?
>

 I forgot to add the <scope>test</scope> tag for uimaj-component-test-util,
need to fix it.


> Otherwise it indicated that the jar plus dependencies should be
> on the classpath while deployed.
>
> I also wonder which of all these jars I really need to run SolrJ. It looks
> like that most are needed for the embedded solr server.
> Does it make sense to use the embedded solr server for anything
> else than testing ?
>
> Should we declare createServer as protected ? Than people can overwrite
> it to create what ever kind of solr server they want or maybe
> tune/customize
> the http parameters.
>

this sounds a good option to provide customizations so I am +1 for that
change


>
> Thanks,
> Jörn
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message