lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gagandeep singh <gagan.g...@gmail.com>
Subject Re: MoreLikeThis - Odd results - what am I doing wrong?
Date Mon, 01 Apr 2013 04:11:55 GMT
say &debugQuery=true&mlt=true and see the scores for the MLT query, not a
sample query. You can use Amazon ec2 to bring up your solr, you should be
able to get a micro instance for free trial.


On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1000@gmail.com> wrote:

> I did try the raw query against the *simi* field and those seem to return
> results in the order expected.
> For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
> Running a query with those words against the simi field returns the
> expected models (X5, Audi Q5, etc) and then the subsequent documents have
> decreasing relevance. So the basic query mechanism seems to be fine.
>
> The issue just seems to be with MoreLikeThis component and handler.
> I can post the index on a public SOLR instance - any suggestions? (or for
> hosting)
>
>
> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh <gagan.goku@gmail.com
> >wrote:
>
> > If you can bring up your solr setup on a public machine then im sure a
> lot
> > of debugging can be done. Without that, i think what you should look at
> is
> > the tf-idf scores of the terms like "camry" etc. Usually idf is the
> > deciding factor into which results show at the top (tf should be 1 for
> your
> > data).
> > Enable &debugQuery=true and look at explain section to see show score is
> > getting calculated.
> >
> > You should try giving different boosts to class, type, drive, size to
> > control the results.
> >
> >
> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1000@gmail.com> wrote:
> >
> >> I am running some experiments on more like this and the results seem
> >> rather odd - I am doing something wrong but just cannot figure out what.
> >> Basically, the similarity results are decent - but not great.
> >>
> >> *Issue 1  = Quality*
> >> Toyota Camry : finds Altima (good) but then next one is Camry Hybrid
> >> whereas it should have found Accord.
> >> I have normalized the data into a simi field which has only the
> >> attributes that I care about.
> >> Without the simi field, I could not get mlt.qf boosts to work well
> enough
> >> to return results
> >>
> >> *Issue 2*
> >> Some fields do not work at all. For instance, text+simi (in mlt.fl)
> works
> >> whereas just simi does not.
> >> So some weirdness that am just not understanding.
> >>
> >> Would be grateful for your guidance !
> >>
> >>
> >> Here is the setup:
> >> *1. SOLR Version*
> >> solr-spec 4.2.0.2013.03.06.22.32.13
> >> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
> >> lucene-spec 4.2.0
> >> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
> >>
> >> *2. Machine Information*
> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
> >> 19.0-b09)
> >> Windows 7 Home 64 Bit with 4 GB RAM
> >>
> >> *3. Sample Data *
> >> I created this 'dummy' data of cars  - the idea being that these would
> be
> >> sufficient and simple to generate similarity and understand how it would
> >> work.
> >> There are 181 rows in the data set (I have attached it for reference in
> >> CSV format)
> >>
> >> [image: Inline image 1]
> >>
> >> *4. SCHEMA*
> >> *Field Definitions*
> >>    <field name="id" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >>    <field name="make" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >>    <field name="model" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >>    <field name="class" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >>    <field name="type" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >>    <field name="drive" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >>    <field name="comment" type="text_general" indexed="true"
> stored="true"
> >> termVectors="true" multiValued="true"/>
> >>    <field name="size" type="string" indexed="true" stored="true"
> >> termVectors="true" multiValued="false"/>
> >> *
> >> *
> >> *Copy Fields*
> >> <copyField   source="make"     dest="make_en"   />  <!-- Search  -->
> >> <copyField   source="model"     dest="model_en"   />  <!-- Search 
-->
> >> <copyField   source="class"     dest="class_en"   />  <!-- Search 
-->
> >> <copyField   source="type"     dest="type_en"   />  <!-- Search  -->
> >> <copyField   source="drive"     dest="drive_en"   />  <!-- Search 
-->
> >> <copyField   source="comment"     dest="comment_en"   />  <!-- Search
>  -->
> >> <copyField   source="size"     dest="size_en"   />  <!-- Search  -->
> >> <copyField   source="id"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="make"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="model"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="class"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="type"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="drive"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="comment"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
> >> *<copyField   source="class"     dest="simi_en"   />  <!-- similarity
> >>  -->*
> >> *<copyField   source="type"     dest="simi_en"   />  <!-- similarity
>  -->
> >> *
> >> *<copyField   source="drive"     dest="simi_en"   />  <!-- similarity
> >>  -->*
> >> *<copyField   source="size"     dest="simi_en"   />  <!-- similarity
>  -->
> >> *
> >>
> >> Note that the "simi" field ends up with values like  make, class, size
> >> and drive:
> >> - Luxury SUV 4WD Large
> >> - Standard Sedan Front Familt
> >>
> >>
> >> *5. MLT Setup*
> >> a. mlt.FL  = *text* QF=*text*  Works but results are obviously not good
> >> (make is not a good similarity indicator)
> >>
> >>
> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=text&mlt.qf=text
> >>
> >> b. mlt.FL  = *simi* QF=*simi*  Does not work at all (0 results)
> >>
> >>
> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi&mlt.qf=simi
> >>
> >> c.  mlt.FL  = *simi,text * QF=*simi^10 text^.1*   Works with decent
> >> results in most cases
> >>
> >>
> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi,text&mlt.qf=simi
> >> ^10%20text^.01
> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large)
> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead of
> Honda.
> >>
> >>
> >> *
> >> *
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message