lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6322) IndexSearcher.doc(int docID, SetfieldsToLoad) is slower in Lucene 4.9 when compared to Lucene 2.9
Date Mon, 02 May 2016 08:40:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266229#comment-15266229
] 

Adrien Grand commented on LUCENE-6322:
--------------------------------------

You could make a codec that uses a different stored fields format, but this codec would not
be supported in terms of backward compatibility. So you would have to mave back to the default
codec and then again to your custom codec on every upgrade.

> IndexSearcher.doc(int docID, SetfieldsToLoad)  is slower in Lucene 4.9 when compared
to Lucene 2.9
> --------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6322
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6322
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/codecs
>    Affects Versions: 4.9
>         Environment: Windows, JDK 7/8
>            Reporter: Sekhar
>             Fix For: 4.10.5
>
>
> We use IndexSearcher.doc(int docID, SetfieldsToLoad) method to get the document with
selected stored fields. If we did not mention few stored fields which have data more than
500KB, this call is slower in Lucene 4.9 when compared to Lucene 2.9.
> I debugged the above method with Lucene 4.9 and found that CompressingStoredFieldsReader#visitDocument(int
docID, StoredFieldVisitor visitor) is spending more time while loading file content and decompressing
in chunks of 16kb, even to skip the fields. It is noticeable degrade if the document's field
size is more than 1MB, and we call this method in loop for more than 1000 such documents.
> In case of Lucene 2.9, there was no compression, and if we want to skip the field, it
just does file seek to set the next pointer to read the stored field. For example see Lucene3xStoredFieldsReader#skipField()
method how it works for skipping a field in Lucene 2.9 which is VERY faster compared to Lucene
4.9.
> We should have something in CompressingStoredFieldsReader to know the field’s compressed
length in file and just do the file seek to set the next pointer instead of loading content
from file and decompress that in 16KB chunks to just skip the field from the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message