lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Schema Change: Int -> String (i am the original poster, new email address)
Date Fri, 07 Jun 2013 04:23:47 GMT
To be clear, one normally doesn't do queries on portions of an "ID" - 
usually it is one integrated string.

Further strings are definitely NOT tokenized in Solr.

Your story keeps changing, which is why I have to keep hedging my answers.

At least with your latest store, your user_id should be a text/TextField so 
that it will be tokenized. A query for "2002" will
match on complete tokens, not parts of tokens. If you want to match exactly 
on the full user_id, use a quoted phrase for the full user_id.

But... I still have to hedge, because you refer to "a string of concatenated 
user id values". You seem to have two distinct definitions for user id.

So, until you disclose all of your requirements and your data model, 
including a clarification about user id vs. "a string of concatenated user 
id values", I can't answer your question definitively, other than "Maybe, 
depending on what you really mean by user id."

-- Jack Krupansky

-----Original Message----- 
From: z z
Sent: Friday, June 07, 2013 12:11 AM
To: solr-user@lucene.apache.org
Subject: Re: Schema Change: Int -> String (i am the original poster, new 
email address)

The unique key is an auto-incremented int in the db.  Sorry for having
given the impression that user_id is the unique key per document.  This is
a table of events that are happening as users interact with our system.
It just so happens that we were inserting individual records for each user
before we even began to think about using something like Solr.  Now,
however, it seems to me that we should be able to ask questions like "give
me all records for user "2002" that have this string value "more" in data2,
across this time stamp range [ .... ].  Several simultaneously inserted
rows into the db are exactly the same aside from the user_ids.  I just want
to know beforehand if I can still maintain exact matches for a user if the
user_id becomes a string of concatenated user id values.

>From what you are saying it sounds like the "user_id_str" is really all I
need.  It is tokenized and allows for partial searches.  I just want to
make sure that "2002 15000 45" when tokenized doesn't allow "20" to
partially match the token "2002".

On Fri, Jun 7, 2013 at 12:57 PM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> In that case, you will need to keep two copies of the user ID, one which
> is a single, complete string, and one which is a tokenized field
> text/TextField so that you can do a keyword search against it. Use the
> string/StrField as the main copy and then use a <copyField> directive in
> the schema to copy from the main copy to the other copy.
>
> So, maybe "user_id" is the full unique key - you would have to specify,
> the full exact key to query against it, or use wildcards for partial
> matches, and "user" or "user_id_str" would be the tokenized text version
> that would allow a simple search by partial value, such as "2002".
>
> Even so, I'm still not convinced that you have given us your complete
> requirements. Is the user_id in fact the unique key for the documents?
>
> 


Mime
View raw message