lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kramer <David.Kra...@shoebuy.com>
Subject Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”
Date Mon, 06 Feb 2017 23:10:35 GMT
For closure, I’ve solved the problem!  It was not using my schema.xml at all.  I had to change
the solrconfig.xml to include <schemaFactory class="ClassicIndexSchemaFactory"/> and
comment out the schema adding processor.

My schema still didn’t work right, but I took the managed-schema and renamed it and changed
uniqueKey to uuid and everything worked!

Thanks for your time and help.


On 2/2/17, 4:35 PM, "David Kramer" <David.Kramer@shoebuy.com> wrote:

    Yes, think of the starving orphan records…
    
    Ours is an eCommerce system, selling mostly shoes.  We have three levels of nested objects
representing what we sell:
    - Product: Mostly title and description
    - Item: A specific color and some other attributes, including price. Products have 1 or
more Items, Items belong to one product.
    - SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong to one Item.
    [PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]
    
    Products, items, and SKUs all have ID numbers. One product will never have the same ID
as another product, but it’s possible for a product to have the same ID as an Item or a
SKU. And that is the problem.  So the program that creates the import file adds a new field
called uuid, that is a P, I, or S (for Product, Item, or SKU) followed by the ID.  We did
it this way because my understanding is Solr can’t implement a compound unique key.  The
uuid is unique across all documents, not just all documents of the same docType.
    
    So in the case of my unique test to see if it would complain if the UUID of a document
I was inserting was not unique, I grabbed the first few products from the full import file,
and changed the IDs so they are not duplicates of the real data, but left the UUIDs alone,
so they are duplicates of the real data, which was already loaded.  
    
    My expectation was that when I loaded the data I would get some  error saying that UUID
was already used.  YOUR expectation is that the record would be overwritten.  What actually
happened is that the new documents got added with their duplicate UUIDs, which is the worst
possible case.  This is why I think it’s not respecting my uniqueKey setting in schema.xml.
    
    Does that make more sense?  I hope you can help me understand this discrepancy. Thanks
for your efforts so far.
    
    On 2/2/17, 3:13 PM, "Mikhail Khludnev" <mkhl@apache.org> wrote:
    
        David,
        I hardly get the way which IDs are assigned, but beware that repeating
        uniqueKey
        value causes deleting former occurrence. In case of block join index it
        corrupts block structure: parent can't be deleted and left children orphans
        (.. so touching, I'm sorry). Just make sure that number of deleted docs is
        0 at first.
        
        On Thu, Feb 2, 2017 at 6:20 PM, David Kramer <David.Kramer@shoebuy.com>
        wrote:
        
        > Thanks, for responding. Mikhail.  There are no deleted documents.  Since
        > I’m fairly new to Solr, one of the things I’ve been paranoid about is I
        > have no way of validating my schema.xml, or know whether Solr is even using
        > it (I have evidence it’s not, more below). So for each test, I’ve wiped out
        > the index, recreated, and reimported.
        >
        > Back to whether my schema.xml is being used, I mentioned that I had to
        > come up with a compound UUID field of the first character of the docType
        > plus the ID, and we put “<uniqueKey>uuid</uniqueKey>” (was id)
in our
        > schema.xml.  Then I deleted and recreated the index and restarted Solr.  In
        > order to verify it was working, I created an import file that had unique
        > IDs but UUIDs which were duplicates of existing records, and it imported
        > the new records even though the UUIDs existed in the database already.  I’m
        > not sure if Solr should have produced an error or not. I’ll research that,
        > but I mention that here in case it’s relevant.
        >
        > Thanks.
        >
        > On 2/2/17, 6:10 AM, "Mikhail Khludnev" <mkhl@apache.org> wrote:
        >
        >     David,
        >
        >     Can you make sure your index doesn't have deleted docs? This  can be
        > seen
        >     in SolrAdmiun.
        >     And can you merge index to avoid having them in the index?
        >
        >     On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
        > David.Kramer@shoebuy.com>
        >     wrote:
        >
        >     >
        >     >
        >     > Some background:
        >     > ·         The data involved is catalog data, with three nested
        > objects:
        >     > Products, Items, and Skus, in that order. We have a docType field on
        > each
        >     > record as a differentiator.
        >     > ·         The "id" field in our data is unique within datatype, but
        > not
        >     > across datatypes. We added a "uuid" field in our program that
        > generates the
        >     > Solr import file that is the id prefixed by the first letter of the
        >     > docType, like P12345. That makes the uuid field unique, and we have
        > that as
        >     > the uniqueKey in our schema.xml.
        >     > ·         We are trying to retrieve the parent Product, and all
        > children
        >     > documents. As such, we are using the ChildDocTransformerFactory
        >     > ([child...]) to retrieve the children along with the parent. We have
        > not
        >     > yet solved the problem of getting items within SKUs as nested
        > documents in
        >     > the results, and we will have to figure that out at some point, but
        > for now
        >     > we get them flattened
        >     > ·         We are building out the proof of concept for this. This is
        > all
        >     > new work, so we are free to change a lot.
        >     > ·         This is Solr 6.0.0, and we are importing in JSON format,
        > if that
        >     > matters
        >     > ·         I submitted this question to StackOverflow<http://
        >     > stackoverflow.com/questions/41969353/solr-querying-nested-
        > documents-with-
        >     > childdoctransformerfactory-get-parent-quer> but haven’t gotten
any
        >     > answers yet.
        >     >
        >     >
        >     > Our data looks like this (I've removed some fields for simplicity):
        >     >
        >     > {
        >     >
        >     >   "id": 739063,
        >     >
        >     >   "docType": "Product",
        >     >
        >     >   "uuid": "P739063",
        >     >
        >     >   "_childDocuments_": [
        >     >
        >     >     {
        >     >
        >     >       "id": 1537378,
        >     >
        >     >       "price": 25.45,
        >     >
        >     >       "color": "Blush",
        >     >
        >     >       "docType": "Item",
        >     >
        >     >       "productId": 739063,
        >     >
        >     >       "uuid": "I1537378",
        >     >
        >     >       "_childDocuments_": [
        >     >
        >     >         {
        >     >
        >     >           "id": 12799578,
        >     >
        >     >           "size": "10",
        >     >
        >     >           "width": "W",
        >     >
        >     >           "docType": "Sku",
        >     >
        >     >           "itemId": 1537378,
        >     >
        >     >           "uuid": "S12799578"
        >     >
        >     >         }
        >     >
        >     >       ]
        >     >
        >     >     }
        >     >
        >     > }
        >     >
        >     >
        >     >
        >     > The query to fetch all Products and their children nested inside
        > them is
        >     > q=docType:Product&fl=title,id,docType,[child
        >     > parentFilter=docType:Product]. When I run that query, all is well,
        > and it
        >     > returns the first 10 rows. However, if I fetch more rows by adding,
        > say
        >     > &rows=500, we get the error Parent query yields document which is
not
        >     > matched by parents filter, docID=XXX.
        >     >
        >     > When we first saw that error, we discovered our id field was not
        > unique
        >     > across document types, so we added the uuid field as mentioned
        > above, which
        >     > is. we also added in our schema.xml file, wiped the core, recreated
        > it, and
        >     > restarted Solr just to make sure it was in effect. We have double
        > checked
        >     > and are sure that the uuid fields are unique.
        >     >
        >     >
        >     >
        >     > In all the search results for that error that I've found, the OP did
        > not
        >     > have a field that could differentiate the different document types,
        > but as
        >     > you see we do. Since both the query and the parentFilter are
        > searching for
        >     > docType:Product I don't see how either could possibly return
        > anything but
        >     > parents. We've also tried adding childFilter=docType:Item and
        >     > childFilter=docType:Sku but that did not help.  I also tried using
        > title:*
        >     > for the filter since only products have titles.
        >     >
        >     >
        >     >
        >     > Is there anything else we can try?
        >     >
        >     > Any explanation of this?
        >     >
        >     > Is it possible that it's not using uuid as the unique identifier even
        >     > though it's specified in the schema.xml, and would that even cause
        > this?
        >     >
        >     > Thanks.
        >     >
        >     >
        >     >
        >
        >
        >     --
        >     Sincerely yours
        >     Mikhail Khludnev
        >
        >
        >
        
        
        -- 
        Sincerely yours
        Mikhail Khludnev
        
    
    

Mime
View raw message