lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko Mancin <mirko.man...@t-frutta.it>
Subject Re: Problem with NGram
Date Wed, 01 Apr 2015 15:07:10 GMT
Doesn't work with two word! :-(

If I search "jakartd apache lucene"~10 not found  "jakarta apache lucene"

But

If I search "jakarte apache lucene"~10 FOUND  "jakarta apache lucene"

WHY?!?!?!

Mirko Mancin

Software Developer

[cid:38E1590B-64FC-42C9-B24C-27DC3CBD6984]

Ubiq srl
stradello Conrad Marca-Relli, 9
43122 Parma (PR)
t. +39 0521 781601
cell. +39 346 4137577
follow us on Linkedin<https://www.linkedin.com/company/ubiq-srl>

This email and any files transmitted with it are confidential and intended solely for the
use of the individual or entity to whom they are addressed. If you have received this email
in error please notify the system manager. This message contains confidential information
and is intended only for the individual named. If you are not the named addressee you should
not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail
if you have received this e-mail by mistake and delete this e-mail from your system. If you
are not the intended recipient you are notified that disclosing, copying, distributing or
taking any action in reliance on the contents of this information is strictly prohibited.

Da: Mostafa Gomaa <mostafa.gomaa89@gmail.com<mailto:mostafa.gomaa89@gmail.com>>
Risposta: "dev@lucene.apache.org<mailto:dev@lucene.apache.org>" <dev@lucene.apache.org<mailto:dev@lucene.apache.org>>
Data: mercoledì 1 aprile 2015 15:54
A: "dev@lucene.apache.org<mailto:dev@lucene.apache.org>" <dev@lucene.apache.org<mailto:dev@lucene.apache.org>>
Oggetto: Re: Problem with NGram

Hello Mirko,

Try using fuzzy queries. You can do that by adding a tilde at the end of the term you're searching
for, like PRIN3ER~. It uses the edit distance algorithm to find similar words. You can also
specify the number of edits by adding the number after the tilde, for example, PRIN3ER~2 will
match similar words up to two edits. Hope this helps.

Regards,

Mostafa Gomaa.

On Wed, Apr 1, 2015 at 2:37 PM, Mirko Mancin <mirko.mancin@t-frutta.it<mailto:mirko.mancin@t-frutta.it>>
wrote:
Hi,

    I have a problem with n-gram. I would try to find the word "PRINTER".

I have this fields:


<field name="bestExternalDescriptionStandard" type="text_general" indexed="true" stored="true"
multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>

   <field name="bestExternalDescriptionGram" type="text_ngram" indexed="true" stored="true"
multiValued="true" termVectors="true" termPositions="true" termOffsets="true"/>




<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">

      <analyzer>

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="Italian" />

      </analyzer>

</fieldType>


<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">

<analyzer>

          <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="4"/>


          <filter class="solr.LowerCaseFilterFactory"/>

          <filter class="solr.SnowballPorterFilterFactory" language="Italian" />

        </analyzer>

</fieldType>



And rightly found:

"BROTHER PRINTER","SAMSUNG PRINTER",ecc...

But if I search "PRIN3R" (with an error within the string), solr do not return anything!!

How to do it? How to setup my schema.xml for found documents with a certain similarity?

Thanks


Mirko Mancin

Software Developer

[cid:522DC2EC-33F1-4171-B17A-171D46B2CF64]

Ubiq srl
stradello Conrad Marca-Relli, 9
43122 Parma (PR)
t. +39 0521 781601
cell. +39 346 4137577
follow us on Linkedin<https://www.linkedin.com/company/ubiq-srl>

This email and any files transmitted with it are confidential and intended solely for the
use of the individual or entity to whom they are addressed. If you have received this email
in error please notify the system manager. This message contains confidential information
and is intended only for the individual named. If you are not the named addressee you should
not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail
if you have received this e-mail by mistake and delete this e-mail from your system. If you
are not the intended recipient you are notified that disclosing, copying, distributing or
taking any action in reliance on the contents of this information is strictly prohibited.


Mime
View raw message