lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: DIH primary key
Date Sun, 04 Sep 2011 19:44:28 GMT
On 9/4/2011 12:16 PM, Kissue Kissue wrote:
> I was reading about DIH on the this Wiki link :
> http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config
> The following was said about entity primary key: "is *optional* and only
> needed when using delta-imports". Does this mean that the primary key is
> mandatory for delta imports? I am asking because i am going to be importing
> from a view with no primary key.

I believe what it means is that you have to specify a field to be the 
primary key, and that it must exist in all three queries that you 
defined - query, deltaQuery and deltaImportQuery.  In my case, query and 
deltaImportQuery are identical, and deltaQuery is "SELECT 1 AS did".  
The only thing this query does is tell the DIH that there is something 
to do for a delta-import, which it then uses deltaImportQuery to do.  I 
keep track of which documents are new outside of Solr and pass values 
for the query in via the dataimport URL.

As you might surmise, did is the primary key in my dataimport config 
file.  I couldn't say what would happen if your query results have 
duplicate values in the primary key field.  In my case, did actually is 
is the primary key in the database, but I don't think that's required.  
I use different fields for primary key and uniqueKey.  This allows us a 
little extra flexibility in the index.

Hopefully you do still have a field that is unique (even if it's not a 
primary key) that you can use as the primary key in your config file.  
It's a good idea to have such a thing available to serve as the 
uniqueKey in schema.xml, for automatic overwrites (delete and reinsert) 
of documents that change.

Thanks,
Shawn


Mime
View raw message