lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject Re: DataImportHandler Questions-Load data in parallel and temp tables
Date Mon, 16 Nov 2009 14:17:36 GMT
On Mon, Nov 16, 2009 at 6:25 PM, amitj <amitj@ieee.org> wrote:
>
> Is there also a way we can include some kind of annotation on the schema
> field and send the data retrieved for that field to an external application.
> We have a requirement where we require some data fields (out of the fields
> for an entity defined in data-config.xml) to act as entities for entity
> extraction and auto complete purposes and we are using some external
> application.
No. it is not possible in Solr now.
>
>
> Noble Paul നോബിള്‍  नोब्ळ् wrote:
>>
>> writing to a remote Solr through SolrJ is in the cards. I may even
>> take it up after 1.4 release. For now your best bet is to override the
>> class SolrWriter and override the corresponding methods for
>> add/delete.
>>
>>>> 2009/4/27 Amit Nithian <anithian@gmail.com>:
>>>> > All,
>>>> > I have a few questions regarding the data import handler. We have some
>>>> > pretty gnarly SQL queries to load our indices and our current loader
>>>> > implementation is extremely fragile. I am looking to migrate over to
>>>> the
>>>> > DIH; however, I am looking to use SolrJ + EmbeddedSolr + some custom
>>>> stuff
>>>> > to remotely load the indices so that my index loader and main search
>>>> engine
>>>> > are separated.
>>>> > Currently, unless I am missing something, the data gathering from the
>>>> entity
>>>> > and the data processing (i.e. conversion to a Solr Document) is done
>>>> > sequentially and I was looking to make this execute in parallel so
>>>> that I
>>>> > can have multiple threads processing different parts of the resultset
>>>> and
>>>> > loading documents into Solr. Secondly, I need to create temporary
>>>> tables
>>>> to
>>>> > store results of a few queries and use them later for inner joins was
>>>> > wondering how to best go about this?
>>>> >
>>>> > I am thinking to add support in DIH for the following:
>>>> > 1) Temporary tables (maybe call it temporary entities)? --Specific
>>>> only
>>>> to
>>>> > SQL though unless it can be generalized to other sources.
>>>> > 2) Parallel support
>>>> >  - Including some mechanism to get the number of records (whether it
>>>> be
>>>> > count or the MAX(custom_id)-MIN(custom_id))
>>>> > 3) Support in DIH or Solr to post documents to a remote index (i.e.
>>>> create a
>>>> > new UpdateHandler instead of DirectUpdateHandler2).
>>>> >
>>>> > If any of these exist or anyone else is working on this (OR you have
>>>> better
>>>> > suggestions), please let me know.
>>>> >
>>>> > Thanks!
>>>> > Amit
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> -
>>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: http://old.nabble.com/DataImportHandler-Questions-Load-data-in-parallel-and-temp-tables-tp23266396p26371403.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Mime
View raw message