lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: [jira] [Commented] (SOLR-8017) solr.PointType can't deal with coordination in format like (0.9504547, 1.0, 1.0890503)
Date Mon, 02 May 2016 12:08:35 GMT
>> so that means that using tika metadata indexing with schemaless mode 
> is, well, useless ?
Yes. 

>I know of nobody using "schemaless" for production for the simple >reason that it makes
the best guess it can based on the _first_ time it >sees a particular field. There's absolutely
no way to guarantee that that >doc is representative of all docs.
> And if you want to really get weird, some programs allow custom attributes.

Agreed. It makes no sense to go schemaless with Tika's metadata.

>In the Tika case you've also got the problem that there's no universal metadata definition.
What's "author" >in one type of doc might be "editor" in another. Or "most_recent_edit"
might be "last_edited" and even if >these are dates the format won't necessarily be the
same.

We do try to normalize across file formats to Dublin Core when possible -- dc:creator, dc:created.
 We also try to normalize date formats for those metadata items that we know are dates (dc:created,
etc.).  If you find issues with normalization or can recommend areas for improvement, please
do!



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Mime
View raw message