lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-11916) new SortableTextField using docValues built from the original string input
Date Fri, 26 Jan 2018 23:09:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-11916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-11916:
----------------------------
    Description: 
I propose adding a new SortableTextField subclass that would functionally work the same as
TextField except:
 * {{docValues="true|false"}} could be configured, with the default being "true"
 * The docValues would contain the original input values (just like StrField) for sorting
(or faceting)
 ** By default, to protect users from excessively large docValues, only the first 1024 of
each field value would be used – but this could be overridden with configuration.

----
Consider the following sample configuration:
{code:java}
<field name="title" type="text_sortable" docValues="true"
       indexed="true" docValues="true" stored="true" multiValued="false"/>
<fieldType name="text_sortable" class="solr.SortableTextField">
  <analyzer type="index">
   ...
  </analyzer>
  <analyzer type="query">
   ...
  </analyzer>
</fieldType>
{code}
Given a document with a title of "Solr In Action"

Users could:
 * Search for individual (indexed) terms in the "title" field: {{q=title:solr}}
 * Sort documents by title ( {{sort=title asc}} ) such that this document's sort value would
be "Solr In Action"

If another document had a "title" value that was longer then 1024 chars, then the docValues
would be built using only the first 1024 characters of the value (unless the user modified
the configuration)

This would be functionally equivalent to the following existing configuration - including
the on disk index segments - except that the on disk DocValues would refer directly to the
"title" field, reducing the total number of "field infos" in the index (which has a small
impact on segment housekeeping and merge times) and end users would not need to sort on an
alternate "title_string" field name - the original "title" field name would always be used
directly.
{code:java}
<field name="title" type="text"
       indexed="true" docValues="true" stored="true" multiValued="false"/>
<field name="title_string" type="string"
       indexed="false" docValues="true" stored="false" multiValued="false"/>
<copyField source="title" dest="title_string" maxCharsForDocValues="1024" />
{code}

  was:
I propose adding a new SortableTextField subclass that would functionally work the same as
TextField except:
* {{docValues="true|false"}} could be configured, with the default being "true"
* The docValues would contain the original input values (just like StrField) for sorting (or
faceting)
** By default, to protect users from excessively large docValues, only the first 1024 of each
field value would be used -- but this could be overridden with configuration.

----

Consider the following sample configuration:

{code}
<field name="title" type="text_sortable" docValues="true"
       indexed="true" docValues="true" stored="true" multiValued="false"/>
<fieldType name="text_sortable" class="solr.SortableTextField">
  <analyzer type="index">
   ...
  </analyzer>
  <analyzer type="query">
   ...
  </analyzer>
</fieldType>
{code}

Given a document with a title of "Solr In Action"

Users could:
* Search for individual (indexed) terms in the "title" field: {{q=title:solr}}
* Sort documents by title ( {{sort=title asc}} ) such that this document's sort value would
be "Solr In Action"

If another document had a "title" value that was longer then 1024 chars, then the docValues
would be built using only the first 1024 characters of the value (unless the user modified
the configuration) 

This would be functionally equivalent to the following existing configuration - including
the on disk index segments - except that the on disk DocValues would refer directly to the
"title" field, reducing the total number of "field infos" in the index (which has a small
impact on segment housekeeping and merge times) and end users would not need to sort on an
alternate "title_string" field name - the original "title" field name would always be used
directly.

{code}
<field name="title" type="text"
       indexed="true" docValues="true" stored="true" multiValued="false"/>
<field name="title_string" type="string"
       indexed="false" docValues="true" stored="false" multiValued="false"/>
<copyField source="title" dest="title_string" maxChars="1024" />
{code}



NOTE: I edited the issue description to update the example configuration from using {{maxChars="1024"}}
to {{maxCharsForDocValues="1024"}} ... i forgot when creating this Jira that that i had made
that option a bit more verbose in the patch to avoid any risk that people might asume it limited
the number of characters being *indexed*

> new SortableTextField using docValues built from the original string input
> --------------------------------------------------------------------------
>
>                 Key: SOLR-11916
>                 URL: https://issues.apache.org/jira/browse/SOLR-11916
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: SOLR-11916.patch
>
>
> I propose adding a new SortableTextField subclass that would functionally work the same
as TextField except:
>  * {{docValues="true|false"}} could be configured, with the default being "true"
>  * The docValues would contain the original input values (just like StrField) for sorting
(or faceting)
>  ** By default, to protect users from excessively large docValues, only the first 1024
of each field value would be used – but this could be overridden with configuration.
> ----
> Consider the following sample configuration:
> {code:java}
> <field name="title" type="text_sortable" docValues="true"
>        indexed="true" docValues="true" stored="true" multiValued="false"/>
> <fieldType name="text_sortable" class="solr.SortableTextField">
>   <analyzer type="index">
>    ...
>   </analyzer>
>   <analyzer type="query">
>    ...
>   </analyzer>
> </fieldType>
> {code}
> Given a document with a title of "Solr In Action"
> Users could:
>  * Search for individual (indexed) terms in the "title" field: {{q=title:solr}}
>  * Sort documents by title ( {{sort=title asc}} ) such that this document's sort value
would be "Solr In Action"
> If another document had a "title" value that was longer then 1024 chars, then the docValues
would be built using only the first 1024 characters of the value (unless the user modified
the configuration)
> This would be functionally equivalent to the following existing configuration - including
the on disk index segments - except that the on disk DocValues would refer directly to the
"title" field, reducing the total number of "field infos" in the index (which has a small
impact on segment housekeeping and merge times) and end users would not need to sort on an
alternate "title_string" field name - the original "title" field name would always be used
directly.
> {code:java}
> <field name="title" type="text"
>        indexed="true" docValues="true" stored="true" multiValued="false"/>
> <field name="title_string" type="string"
>        indexed="false" docValues="true" stored="false" multiValued="false"/>
> <copyField source="title" dest="title_string" maxCharsForDocValues="1024" />
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message