lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Cornacchia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7171) IndexableField changes its IndexableFieldType when the index is re-opened for reading
Date Mon, 04 Apr 2016 16:24:25 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224436#comment-15224436
] 

Roberto Cornacchia commented on LUCENE-7171:
--------------------------------------------

I've been pointed at this bit of documentation for {{IndexReader.document(int dicID)}}:
{quote}
NOTE: only the content of a field is returned, if that field was stored during indexing. Metadata
like boost, omitNorm, IndexOptions, tokenized, etc., are not preserved.
{quote}

This explains what I've reported. But I find it hard not to consider this a design flaw. 

If I take the retrieved document and store it into a new index, I would expect this document
to be the same as the one stored in the first index. It doesn't matter where it's stored.
Those properties are defined for the fields of that document, not for a particular index.

However, if I now try to retrieve that same document from the second index (on the exact match
with its isbn), it won't be found, because {{isbn}} has been tokenized. This is surely not
intended, is it?


> IndexableField changes its IndexableFieldType when the index is re-opened for reading
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7171
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7171
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.5
>            Reporter: Roberto Cornacchia
>
> This code:
> {code}
> /* Store one document into an index */
> Directory index = new RAMDirectory();
> IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
> IndexWriter w = new IndexWriter(index, config);
> Document d1 = new Document();
> d1.add(new StringField("isbn", "9900333X", Field.Store.YES));
> w.addDocument(d1);
> w.commit();
> w.close();
> /* inspect IndexableFieldType */
> IndexableField f1 = d1.getField("isbn");
> System.err.println("FieldType for " + f1.stringValue() + " : " + f1.fieldType());
> /* retrieve all documents and inspect IndexableFieldType */
> IndexSearcher s = new IndexSearcher(DirectoryReader.open(index));
> TopDocs td = s.search(new MatchAllDocsQuery(), 1);
> for (ScoreDoc sd : td.scoreDocs) {
>     Document d2 = s.doc(sd.doc);
>     IndexableField f2 = d2.getField("isbn");
>     System.err.println("FieldType for " + f2.stringValue() + " : " + f2.fieldType());
> }
> {code}
> Produces:
> {code}
> FieldType for 9900333X : stored,indexed,omitNorms,indexOptions=DOCS
> FieldType for 9900333X : stored,indexed,tokenized,omitNorms,indexOptions=DOCS
> {code}
> The {{StringField}} field {{isbn}} is not tokenized, as correctly reported by the first
output, which happens right after closing the writer.
> However, it becomes tokenized when the index is re-opened with a new reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message