lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Indexing product keys with and without spaces in them
Date Tue, 03 Jan 2012 10:06:28 GMT
When indexing you could normalise them down to a standard format
without spaces or hyphens, but searching is much harder if you really
can't identify possible product ids within user queries.  Make
triplets without spaces or hyphens?  "CRX USB-2.0 16GB" ==>
CRXUSB2.016GB but also "some random words" ==> somerandomwords.  The
latter wouldn't match, the former would if it was a valid id.

Some form of synonym analysis/injection at indexing would be better if
you could do that: CRXUSB2.016GB ==> "CRX USB2.0 16GB", to be indexed
as well as the base value.

If you can't either have a dedicated product id search field or
standardise the product ids, this is going to be hard.


On Tue, Jan 3, 2012 at 8:44 AM, Christoph Kaser <> wrote:
> Hello,
> we use lucene as search engine in an online shop. The products in this shop
> often contain product keys like CRXUSB2.0-16GB.
> We would like our customers to be able to find products by entering their
> key. The problem is that product keys sometimes contain spaces or dashes and
> customers sometimes don't enter these whitespaces correctly. On the other
> hand, some customers enter whitespaces where there are none. Is there an
> analyzer or some other method that allows us to find the product if the user
> enters things like:
> - "CRX USB2.0 16GB"
> - "CRXUSB2.016GB"
> - "CRX USB-2.0 16GB"
> ...
> The problem is that the product keys don't all have a common format and are
> contained in the normal text, so we don't have an easy way to treat them
> different to the rest of the text.
> Any help would be great!
> Best regards,
> Christoph
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message