lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Schema Change: Int -> String
Date Thu, 06 Jun 2013 12:05:45 GMT
1. Generally, any schema change requires a full reindex. Sure, a lot of 
times you can squeak by, but with Solr and Lucene there are no guarantees. 
If it works for you, great. If not, don't complain - just reindex. And even 
if it does work for the current release, there is no guarantee that a 
similar change in a future release might not require a reindex.

2. Make up you mind whether a field is a number or a string, and stick with 
that import format.

General rule: clean up your data before you send it to Solr. But... you can 
do some amount of cleanup using update processors, including white space 
trimming and limited regex editing. You can also develop custom update 
processors, as well as write in scripting languages such as JavaScript. For 
example, you could parse a string of numbers and then send them to other 
fields.

3. Too hard to say from the way you have described it. Show us some sample 
input.

In general, TextField is for text, not numbers. If you intend to query data 
as numbers, don't use Text field.

-- Jack Krupansky

-----Original Message----- 
From: TwoFirst TwoLast
Sent: Thursday, June 06, 2013 1:25 AM
To: solr-user@lucene.apache.org
Subject: Schema Change: Int -> String

1) If I change one field's type in my schema, will that cause problems with
the index or searching?  My data is pulled in chunks off of a mysql server
so one field in the currently indexed data is simply an "int" type field in
solr.  I would like to change this to a string moving forward, but still
expect to search across the int/string field.  Will this be ok?

2) My motivation for #1 is that I have thousands of records that are
exactly the same in mysql aside from a user_id column.  Prior to inserting
into mysql I am thinking that I can concatenate the user_ids together into
a space separated string and let solr just parse the string.  So the
database and my data import handler would change a bit.

3) If #2 is an appropriate approach, will a solr.TextField with
a solr.WhitespaceTokenizerFactory be an ok way to approach this?  This does
produce words where I would expect integers. I tried using a
solr.TrieIntField with the solr.WhitespaceTokenizerFactory, but it throws
an error.

Finally I need to make sure that exact matches will be performed on
user_ids in the string when searching.

Much appreciated! 


Mime
View raw message