mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Blankers <te...@amritanet.com>
Subject Re: lucene2seq error: field does not exist in the index
Date Sat, 19 Apr 2014 02:34:02 GMT
Hi Frank,

In working with a small test index, if I change the 'body' field to 
indexed it indeed does work as expected. It would be great if lucene2seq 
could be fixed to read un-indexed stored fields as per design as I need 
to query various corpura where I don't have control over the schema. Is 
there anything else I can do at this point?

Thanks,

Terry


On 4/16/14, 1:52 PM, Frank Scholten wrote:
> Hi Terry,
>
> What happens when you make the 'body' field indexed in your schema?
>
> LuceneIndexHelper checks the field using an IndexSearcher so it might be
> that the field has to be indexed as well as being stored, which would be a
> bug because lucene2seq is designed to load stored fields.
>
> Cheers,
>
> Frank
>
>
> On Fri, Apr 11, 2014 at 5:33 AM, Terry Blankers <terry@amritanet.com> wrote:
>
>> Hi All, I'm very new to trying to use lucene2seq so I'm not sure if it's
>> just user error, but I'm experiencing some unexpected behavior when running
>> lucene2seq against my solr index (4.7.1). I've tried using both 0.9 and the
>> trunk build of mahout. (And BTW, I have been able to successfully run
>> Reuters example as a test baseline.)
>>
>>
>> Here's the command I'm running:
>>
>>     $MAHOUT_HOME/bin/mahout lucene2seq -i
>>     /home/ec2-user/solr/solr-data/solrindex/index -o solr/sequence -id
>>     key_sha1hex -f body -xm sequential -q topics:diabetes -n 500
>>
>>
>> Excerpts from my solr schema:
>>
>> <fieldname="content"type="text"stored="false"indexed="
>> true"multiValued="true"/>
>> <fieldname="body"type="string"stored="true"indexed="false"/>
>>
>> <!-- Use the indexed/un-stored "content" field for searching --><copyField
>> source="body" dest="content" />
>> <!-- field for the QueryParser to use when an explicit fieldname is absent
>> --><defaultSearchField>content</defaultSearchField>
>>
>>
>>
>> When I use SolrAdmin and specify fl=body the search handler returns the
>> 'body' field with data as expected. Yet I get the following error when
>> running lucene2seq and specify '-f body':
>>
>>     /IllegalArgumentException: Field 'body' does not exist in the index/
>>
>>
>>
>> And if I specify '-f content', lucene2seq runs without errors or warnings,
>> but seqdumper output shows no values for any key:
>>
>>     /Key class: class org.apache.hadoop.io.Text Value Class: class
>>     org.apache.hadoop.io.Text
>>     Key: 96C4C76CF9D7449C724CA77CB8F650EAFD33E31C: Value:
>>     Key: D6842B81B8D09733B50BEDB4767C2A5C49E43B20: Value:
>>     Key: 61CB95FEE2C6BF0AC6E8A1F7738338CA36F42264: Value:
>>     Key: 0F9903B72A7C9F0373A5171403B3AAEB291B16E1: Value: /
>>
>>
>> Can anyone give me any suggestions as to how to track down what might be
>> happening here?
>>
>> Many thanks,
>>
>> Terry
>>
>>
>>
>>
>>
>>
>>
>>
>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message