mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: lucene2seq error: field does not exist in the index
Date Sat, 19 Apr 2014 02:56:19 GMT
Please file a jira for this. Thanks again.

Sent from my iPhone

> On Apr 18, 2014, at 10:34 PM, Terry Blankers <terry@amritanet.com> wrote:
> 
> Hi Frank,
> 
> In working with a small test index, if I change the 'body' field to indexed it indeed
does work as expected. It would be great if lucene2seq could be fixed to read un-indexed stored
fields as per design as I need to query various corpura where I don't have control over the
schema. Is there anything else I can do at this point?
> 
> Thanks,
> 
> Terry
> 
> 
>> On 4/16/14, 1:52 PM, Frank Scholten wrote:
>> Hi Terry,
>> 
>> What happens when you make the 'body' field indexed in your schema?
>> 
>> LuceneIndexHelper checks the field using an IndexSearcher so it might be
>> that the field has to be indexed as well as being stored, which would be a
>> bug because lucene2seq is designed to load stored fields.
>> 
>> Cheers,
>> 
>> Frank
>> 
>> 
>>> On Fri, Apr 11, 2014 at 5:33 AM, Terry Blankers <terry@amritanet.com> wrote:
>>> 
>>> Hi All, I'm very new to trying to use lucene2seq so I'm not sure if it's
>>> just user error, but I'm experiencing some unexpected behavior when running
>>> lucene2seq against my solr index (4.7.1). I've tried using both 0.9 and the
>>> trunk build of mahout. (And BTW, I have been able to successfully run
>>> Reuters example as a test baseline.)
>>> 
>>> 
>>> Here's the command I'm running:
>>> 
>>>    $MAHOUT_HOME/bin/mahout lucene2seq -i
>>>    /home/ec2-user/solr/solr-data/solrindex/index -o solr/sequence -id
>>>    key_sha1hex -f body -xm sequential -q topics:diabetes -n 500
>>> 
>>> 
>>> Excerpts from my solr schema:
>>> 
>>> <fieldname="content"type="text"stored="false"indexed="
>>> true"multiValued="true"/>
>>> <fieldname="body"type="string"stored="true"indexed="false"/>
>>> 
>>> <!-- Use the indexed/un-stored "content" field for searching --><copyField
>>> source="body" dest="content" />
>>> <!-- field for the QueryParser to use when an explicit fieldname is absent
>>> --><defaultSearchField>content</defaultSearchField>
>>> 
>>> 
>>> 
>>> When I use SolrAdmin and specify fl=body the search handler returns the
>>> 'body' field with data as expected. Yet I get the following error when
>>> running lucene2seq and specify '-f body':
>>> 
>>>    /IllegalArgumentException: Field 'body' does not exist in the index/
>>> 
>>> 
>>> 
>>> And if I specify '-f content', lucene2seq runs without errors or warnings,
>>> but seqdumper output shows no values for any key:
>>> 
>>>    /Key class: class org.apache.hadoop.io.Text Value Class: class
>>>    org.apache.hadoop.io.Text
>>>    Key: 96C4C76CF9D7449C724CA77CB8F650EAFD33E31C: Value:
>>>    Key: D6842B81B8D09733B50BEDB4767C2A5C49E43B20: Value:
>>>    Key: 61CB95FEE2C6BF0AC6E8A1F7738338CA36F42264: Value:
>>>    Key: 0F9903B72A7C9F0373A5171403B3AAEB291B16E1: Value: /
>>> 
>>> 
>>> Can anyone give me any suggestions as to how to track down what might be
>>> happening here?
>>> 
>>> Many thanks,
>>> 
>>> Terry
> 

Mime
View raw message