lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Schema Change: Int -> String (i am the original poster, new email address)
Date Fri, 07 Jun 2013 14:03:07 GMT
Right, a search for "442" would not match "1442".

-- Jack Krupansky

-----Original Message----- 
From: z z
Sent: Friday, June 07, 2013 2:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Change: Int -> String (i am the original poster, new 
email address)

Maybe if I were to say that the column "user_id" will become "user_ids"
that would clarify things?

user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more"

becomes

user_id*s*:2002+AND+created:[${**from}+TO+${until}]+data:"more"

where I want 2002 to be an exact positive match on one of the user_ids
embedded in the TEXT ... not string :)  If I am totally off or making no
sense, feedback it very welcome.  I am just seeing lots of similar data
going into my db and it feels like Solr should be able to handle this.

I just want to know if transforming the data like that will still allow
exact searches against a user_id.  My language from a solr gurus point of
view is probably *very* poorly phrased ... "exact" and TEXT might not go
hand in hand.

Is the TEXT "20 1442 35" parsed as "20" "1442" "35" so that a search
against it for "1442" will yield "exact" results?  A search against "442"
wont match right?

1. "20 1442 35"
2. "20 442 35"
3. "20 1442"

user_ids:1442 -> yields #1 & #3 always?
user_ids:442 -> yields only #2 always?

My lack of understanding about what solr does when it indexes is shining
through :)


On Fri, Jun 7, 2013 at 1:43 PM, z z <zenlok.testing7@gmail.com> wrote:

> My language might be a bit off (I am saying "string" when I probably mean
> "text" in the context of solr), but I'm pretty sure that my story is
> unwavering ;)
>
> `id` int(11) NOT NULL AUTO_INCREMENT
> `created` int(10)
> `data` varbinary(255)
> `user_id` int(11)
>
> So, imagine that we have 1000 entries come in where "data" above is
> exactly the same for all 1000 entries, but user_id is different (id and
> created being different is irrelevant).  I am thinking that prior to
> inserting into mysql, I should be able to concatenate the user_ids 
> together
> with whitespace and then insert them into something like:
>
> `id` int(11) NOT NULL AUTO_INCREMENT
> `created` int(10)
> `data` varbinary(255)
> `user_id` blob
>
> Then on solr's end it will treat the user_id as Text and parse it (I want
> to say tokenize, but maybe my language is incorrect here?).
>
> Then when I search
>
> user_id:2002+AND+created:[${**from}+TO+${until}]+data:"more"
>
> I want to be sure that if I look for user_id "2002", I will get data that
> only has a value "2002" in the user_id column and that a separate user 
> with
> id "20" cannot accidentally pull data for user_id "2002" as a result of a
> fuzzy (my language ok?) match of 20 against (20)02.
>
> Current schema definition:
>
>  <field name="user_id" type="int" indexed="true" stored="true"/>
>
> New schema definition:
>
>     <field name="user_id" type="user_id_string" indexed="true"
> stored="true"/>
> ...
>     <fieldType name="user_id_string" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"
> maxTokenLength="120"/>
>       </analyzer>
>     </fieldType>
>
> 


Mime
View raw message