lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duke DAI <duke.dai....@gmail.com>
Subject Re: problem found with DiskDocValuesFormat
Date Mon, 21 Oct 2013 10:28:55 GMT
Hi guys,

Seems I have the same problem with Lucene45DocValuesFormat, no problem with
MemoryDocValuesFormat. The problem I encountered with Lucene4.4 is with
DiskDocValuesFormat, no with Lucene42DocValuesFormat.

I dig into a little and found the superficial cause. In SegmentCoreReaders,
there is a ThreadLocal variable, docValuesLocal. Its purpose is avoid
building data structure repeatedly by query thread . But how about the
query thread is from thread pool, and reused for different query?
I removed docValuesLocal and built a lucene-core.jar, it works with my
multi-threads(thread pool) test cases.

Do you have any idea about this? Information is enough?


Thanks,
Duke


Best regards,
Duke
If not now, when? If not me, who?


On Tue, Aug 13, 2013 at 4:54 PM, Duke DAI <duke.dai.007@gmail.com> wrote:

> Hi experts,
>
> I'm upgrading Lucene 4.4 and trying to use DocValues instead of store
> field for performance reason. But due to unknown size of index(depends on
> customer), so I will use DiskDocValuesFormat, especially for some binary
> field. Then I wrote my customized Codec:
>
>       final Codec codec = new Lucene42Codec() {
>
>         private final Lucene42DocValuesFormat memoryDVFormat = new
> Lucene42DocValuesFormat();
>         private final DiskDocValuesFormat diskDVFormat = new
> DiskDocValuesFormat();
>
>         @Override
>         public DocValuesFormat getDocValuesFormatForField(String field) {
>           if
> (LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field)
>               || LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field) ||
> LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) {
>             return diskDVFormat;
>           } else {
>             return memoryDVFormat
>           }
>         }
>       };
>       iwc.setCodec(codec);
>
> Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric field,
> long type. And others are binary.
>
> Then I consume DV like below pseudo-code:
>     nodeIDDocValuesSource =
>             MultiDocValues.getNumericValues(searcher.getIndexReader(),
>                 LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE);
>
>    ......
>    long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc);
>
> Then I'm sure I get a wrong nodeId, which will be verified by upper logic
> and treated as data corruption.
>
>
> But if I change to memoryDVFormat for the long type field, then everything
> is OK.
>
> Also for upgrading legacy data, I keep two index format, DV or stored
> field, controlled by version. If I use stored field, everything is OK.
> So I guess there is a bug with  DiskDocValuesFormat, numeric data type,
> does it relate to byte-aligned numeric compression?
> Or I didn't use DiskDocValuesFormat correctly? Seems no other parameters
> for it.
>
> Sorry that I have no pure Lucene test case yet. Hope someone shed some
> light on this.
>
>
>
>
> Best regards,
> Duke
> If not now, when? If not me, who?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message