lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DC tech <dctech1...@gmail.com>
Subject Re: MoreLikeThis - Odd results - what am I doing wrong?
Date Tue, 02 Apr 2013 06:02:07 GMT
OK - so I have my SOLR instance running on AWS. 
Any suggestions on how to safely share the link?  Right now, the whole SOLR instance is totally
open. 



Gagandeep singh <gagan.goku@gmail.com> wrote:

>say &debugQuery=true&mlt=true and see the scores for the MLT query, not a
>sample query. You can use Amazon ec2 to bring up your solr, you should be
>able to get a micro instance for free trial.
>
>
>On Mon, Apr 1, 2013 at 5:10 AM, dc tech <dctech1000@gmail.com> wrote:
>
>> I did try the raw query against the *simi* field and those seem to return
>> results in the order expected.
>> For instance, Acura MDX has  ( large, SUV, 4WD   Luxury) in the simi field.
>> Running a query with those words against the simi field returns the
>> expected models (X5, Audi Q5, etc) and then the subsequent documents have
>> decreasing relevance. So the basic query mechanism seems to be fine.
>>
>> The issue just seems to be with MoreLikeThis component and handler.
>> I can post the index on a public SOLR instance - any suggestions? (or for
>> hosting)
>>
>>
>> On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh <gagan.goku@gmail.com
>> >wrote:
>>
>> > If you can bring up your solr setup on a public machine then im sure a
>> lot
>> > of debugging can be done. Without that, i think what you should look at
>> is
>> > the tf-idf scores of the terms like "camry" etc. Usually idf is the
>> > deciding factor into which results show at the top (tf should be 1 for
>> your
>> > data).
>> > Enable &debugQuery=true and look at explain section to see show score is
>> > getting calculated.
>> >
>> > You should try giving different boosts to class, type, drive, size to
>> > control the results.
>> >
>> >
>> > On Sun, Mar 31, 2013 at 8:52 PM, dc tech <dctech1000@gmail.com> wrote:
>> >
>> >> I am running some experiments on more like this and the results seem
>> >> rather odd - I am doing something wrong but just cannot figure out what.
>> >> Basically, the similarity results are decent - but not great.
>> >>
>> >> *Issue 1  = Quality*
>> >> Toyota Camry : finds Altima (good) but then next one is Camry Hybrid
>> >> whereas it should have found Accord.
>> >> I have normalized the data into a simi field which has only the
>> >> attributes that I care about.
>> >> Without the simi field, I could not get mlt.qf boosts to work well
>> enough
>> >> to return results
>> >>
>> >> *Issue 2*
>> >> Some fields do not work at all. For instance, text+simi (in mlt.fl)
>> works
>> >> whereas just simi does not.
>> >> So some weirdness that am just not understanding.
>> >>
>> >> Would be grateful for your guidance !
>> >>
>> >>
>> >> Here is the setup:
>> >> *1. SOLR Version*
>> >> solr-spec 4.2.0.2013.03.06.22.32.13
>> >> solr-impl 4.2.0 1453694   rmuir - 2013-03-06 22:32:13
>> >> lucene-spec 4.2.0
>> >> lucene-impl 4.2.0 1453694 -  rmuir - 2013-03-06 22:25:29
>> >>
>> >> *2. Machine Information*
>> >> Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23
>> >> 19.0-b09)
>> >> Windows 7 Home 64 Bit with 4 GB RAM
>> >>
>> >> *3. Sample Data *
>> >> I created this 'dummy' data of cars  - the idea being that these would
>> be
>> >> sufficient and simple to generate similarity and understand how it would
>> >> work.
>> >> There are 181 rows in the data set (I have attached it for reference in
>> >> CSV format)
>> >>
>> >> [image: Inline image 1]
>> >>
>> >> *4. SCHEMA*
>> >> *Field Definitions*
>> >>    <field name="id" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="make" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="model" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="class" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="type" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="drive" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >>    <field name="comment" type="text_general" indexed="true"
>> stored="true"
>> >> termVectors="true" multiValued="true"/>
>> >>    <field name="size" type="string" indexed="true" stored="true"
>> >> termVectors="true" multiValued="false"/>
>> >> *
>> >> *
>> >> *Copy Fields*
>> >> <copyField   source="make"     dest="make_en"   />  <!-- Search
 -->
>> >> <copyField   source="model"     dest="model_en"   />  <!-- Search
 -->
>> >> <copyField   source="class"     dest="class_en"   />  <!-- Search
 -->
>> >> <copyField   source="type"     dest="type_en"   />  <!-- Search
 -->
>> >> <copyField   source="drive"     dest="drive_en"   />  <!-- Search
 -->
>> >> <copyField   source="comment"     dest="comment_en"   />  <!--
Search
>>  -->
>> >> <copyField   source="size"     dest="size_en"   />  <!-- Search
 -->
>> >> <copyField   source="id"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="make"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="model"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="class"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="type"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="drive"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="comment"     dest="text"   />  <!-- Glob 
-->
>> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>> >> <copyField   source="size"     dest="text"   />  <!-- Glob  -->
>> >> *<copyField   source="class"     dest="simi_en"   />  <!-- similarity
>> >>  -->*
>> >> *<copyField   source="type"     dest="simi_en"   />  <!-- similarity
>>  -->
>> >> *
>> >> *<copyField   source="drive"     dest="simi_en"   />  <!-- similarity
>> >>  -->*
>> >> *<copyField   source="size"     dest="simi_en"   />  <!-- similarity
>>  -->
>> >> *
>> >>
>> >> Note that the "simi" field ends up with values like  make, class, size
>> >> and drive:
>> >> - Luxury SUV 4WD Large
>> >> - Standard Sedan Front Familt
>> >>
>> >>
>> >> *5. MLT Setup*
>> >> a. mlt.FL  = *text* QF=*text*  Works but results are obviously not good
>> >> (make is not a good similarity indicator)
>> >>
>> >>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=text&mlt.qf=text
>> >>
>> >> b. mlt.FL  = *simi* QF=*simi*  Does not work at all (0 results)
>> >>
>> >>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi&mlt.qf=simi
>> >>
>> >> c.  mlt.FL  = *simi,text * QF=*simi^10 text^.1*   Works with decent
>> >> results in most cases
>> >>
>> >>
>> http://localhost:8983/solr/cars/select/?q=id:2&mlt=true&fl=text&mlt.fl=simi,text&mlt.qf=simi
>> >> ^10%20text^.01
>> >> Works for getting similarity for Acura MDX (Luxury SUV 4WD Large)
>> >> But for Toyota Camry - it finds hybrid family cars (Prius) ahead of
>> Honda.
>> >>
>> >>
>> >> *
>> >> *
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
Mime
View raw message