lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bogdan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-6700) ChildDocTransformer doesn't return correct children after updating and optimising sol'r index
Date Tue, 04 Nov 2014 09:38:33 GMT
Bogdan created SOLR-6700:
----------------------------

             Summary: ChildDocTransformer doesn't return correct children after updating and
optimising sol'r index
                 Key: SOLR-6700
                 URL: https://issues.apache.org/jira/browse/SOLR-6700
             Project: Solr
          Issue Type: Bug
            Reporter: Bogdan
            Priority: Blocker
             Fix For: 4.10.3, 5.0


I have an index with nested documents. 
{code:title=schema.xml snippet|borderStyle=solid}
 <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"
/>
<field name="entityType" type="int" indexed="true" stored="true" required="true"/>
<field name="pName" type="string" indexed="true" stored="true"/>
<field name="cAlbum" type="string" indexed="true" stored="true"/>
<field name="cSong" type="string" indexed="true" stored="true"/>
<field name="_root_" type="string" indexed="true" stored="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
{code}

Afterwards I add the following documents:
{code}
<add>
  <doc>
    <field name="id">1</field>
    <field name="pName">Test Artist 1</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">11</field>
        <field name="cAlbum">Test Album 1</field>
	    <field name="cSong">Test Song 1</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="pName">Test Artist 2</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">22</field>
        <field name="cAlbum">Test Album 2</field>
	    <field name="cSong">Test Song 2</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
</add>
{code}

After performing the following query 
{quote}
http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3DentityType%3A1%7D&fl=*%2Cscore%2C%5Bchild+parentFilter%3DentityType%3A1%5D&wt=json&indent=true
{quote}
I get a correct answer (child matches parent, check _root_ field)
{code:title=add docs|borderStyle=solid}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"1",
        "pName":"Test Artist 1",
        "entityType":1,
        "_version_":1483832661048819712,
        "_root_":"1",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"}]},
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]}]
  }}
{code}

Afterwards I try to update one document:
{code:title=update doc|borderStyle=solid}
<add>
<doc>
<field name="id">1</field>
<field name="pName" update="set">INIT</field>
</doc>
</add>
{code}

After performing the previous query I get the right result (like the previous one but with
the pName field updated).

The problem only comes after performing an optimize. 
Now, the same query yields the following result:
{code}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"},
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]},
      {
        "id":"1",
        "pName":"INIT",
        "entityType":1,
        "_root_":"1",
        "_version_":1483832916867809280,
        "score":1.0}]
  }}
{code}

As can be seen, the document with id:2 now contains the child with id:11 that belongs to the
document with id:1. 

I haven't found any references on the web about this except http://blog.griddynamics.com/2013/09/solr-block-join-support.html
{quote}
Let me show you one unlucky example. Let’s remove parent and left children in the index.
<update><delete><query>id:10</query></delete><commit/></update>
 
At first, It seems like everything still works. Children 11 and 12 are left in the index,
but ToParentBlockJoinQuery somehow detects it and q={!parent which='type_s:parent'}+COLOR_s:Red
+SIZE_s:XL  correctly returns parent 30. However after <optimize/> is executed, deleted
parent document is purged from the index and all of the sudden children 11 and 12 start to
be considered as if they belong to parent 20! The same query q={!parent which='type_s:parent'}+COLOR_s:Red
+SIZE_s:XL now returns 20 and 30 which is wrong! I’m afraid there are few other similar
cases of wrong behavior. As a reliable workaround I suggest to send explicit deletes by query
with implicit field _root_. I hope this caveat will be fixed in future.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message