uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Solrcas questions
Date Wed, 09 Feb 2011 15:12:48 GMT
On 2/9/11 3:47 PM, Tommaso Teofili wrote:
> regarding asserts in the initialize() method they can be safely removed as they were
put there mainly for debugging purpose, however the initialization of the Consumer would fail
if such params are null or badly defined as you can see inside the createServer(type,path)
and inside the FieldMappingReader.getConf(path) methods

Lets open a jira for this one.
> the cas element in the mapping file is an optional one and I thought it was useful to
track the cas which delivered information, in the sample file it gets mapped inside an id
field but it doesn't mean it MUST be unique; however that is optional and maybe the toString()
method isn't the best one to store the cas information, but I still think it makes sense to
not loose such an information.

I believe in the very most cases it is really not unique. People can 
have a FS in the cas which contains a unique id, that
can be easily mapped to an id field in solr. The current implementation 
can do that already. I also believe the
toString value it not all helpful to debug anything. You might want to 
log debug information into the CAS.
If you wish to keep that in solr, it would be possible to simply map 
these FSes.

> I agree with the need to switch to the CAS API
Then lets open a jira for it.
> I agree also regarding the enhancing the exception handling for debugging errors; if
commit fails I think that should be handled the same way as an add() fails otherwise it should
be created a commit policy (i.e. a cache of documents previously added to try to re-send them)
parameter but I think it's out of the scope of a basic Solrcas implementation and more related
to how Solr handles commit errors
> I'd introduce the already discussed autocommit configuration parameter (boolean) to indicate
if Solrcas should also send a commit to the SolrServer (it may also make sense to create a
third value for this param called 'destroy' that would trigger the commit only on the destroy()
method even if in that case any errors during the commit could not be recovered)

When there is not a unique id the document will be added again into solr 
when commit failed the first time. Not sure what is the
best way to handle these errors. In some cases you might just want to 
ignore it, in other you might want to retry. I also wonder if
autocommit is not the best option when there is a massive amount of 
documents streamed to solr from multiple
uima pipelines. Do you have some experience here ?
> regarding the EmbeddedSolrServer I agree that it's generally not a top option in production
but I am working now with a Solr project where network latency has a significance impact (being
Solr the best solution anyways) and I'd get a considerable advantage if I can query it avoiding
HTTP requests that way, however since the main way to query Solr is via REST calls I have
no objections removing it
>
Sounds good, lets use it for testing only. We also need to enhance the 
test. We should add a document and then retrieve
it to see that it is in solr as expected.

Do you want to open the jiras yourself ?

Jörn


Mime
View raw message