mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Blankers <te...@amritanet.com>
Subject Re: lucene2seq error: field does not exist in the index
Date Fri, 18 Apr 2014 20:18:51 GMT

No problem Suneel, I've been traveling & unavailable myself until now.

>
> On 4/13/14, 6:12 PM, Suneel Marthi wrote:
>> Apologies for the delayed response Terry.
>>
>> Mahout's presently at Lucene 4.6.1 (both 0.9 and trunk).  The 
>> practice so far has been to upgrade to the latest Lucene version 
>> right before a planned release.
>>
>> Not sure what has changed in Solr/Lucene 4.7.1.
>>
>> You could try either of 2 things:-
>> a) Is your index spread across multiple shards? 

No, I'm using a fairly simple installation with no sharding.

>> b) Upgrade Mahout locally to Lucene 4.7.1 and run ur tests again and 
>> see if that works.

Actually I'm using Solr 4.2.1 and I did build Mahout locally from trunk 
about 2 weeks ago against Hadoop 2.3. I've tested against my local build 
and against 0.9 binaries. Sorry for the confusion.

>> c) It could possibly be a bug in lucene2seq and we may not have 
>> adequate test coverage, could u create a unit test to reproduce this 
>> scenario?
>>
>> Would it possible for u to share a sample index along with the Solr 
>> Schema for testing?

What's the best way to share my sample index and the schema? As an 
attachment to this email, post it inline as xml, or some other way? Am 
assuming something like 2 or 3 docs in xml format would be sufficient 
for a test?

Regards,

Terry

>>
>>
>>
>>
>> On Thursday, April 10, 2014 11:34 PM, Terry Blankers 
>> <terry@amritanet.com> wrote:
>>   Hi All, I'm very new to trying to use lucene2seq so I'm not sure if 
>> it's
>> just user error, but I'm experiencing some unexpected behavior when
>> running lucene2seq against my solr index (4.7.1). I've tried using both
>> 0.9 and the trunk build of mahout. (And BTW, I have been able to
>> successfully run Reuters example as a test baseline.)
>>
>>
>> Here's the command I'm running:
>>
>>      $MAHOUT_HOME/bin/mahout lucene2seq -i
>>      /home/ec2-user/solr/solr-data/solrindex/index -o solr/sequence -id
>>      key_sha1hex -f body -xm sequential -q topics:diabetes -n 500
>>
>>
>> Excerpts from my solr schema:
>>
>> <fieldname="content"type="text"stored="false"indexed="true"multiValued="true"/>

>>
>> <fieldname="body"type="string"stored="true"indexed="false"/>
>>
>> <!-- Use the indexed/un-stored "content" field for searching
>> --><copyField source="body" dest="content" />
>> <!-- field for the QueryParser to use when an explicit fieldname is
>> absent --><defaultSearchField>content</defaultSearchField>
>>
>>
>>
>> When I use SolrAdmin and specify fl=body the search handler returns the
>> 'body' field with data as expected. Yet I get the following error when
>> running lucene2seq and specify '-f body':
>>
>>      /IllegalArgumentException: Field 'body' does not exist in the 
>> index/
>>
>>
>>
>> And if I specify '-f content', lucene2seq runs without errors or
>> warnings, but seqdumper output shows no values for any key:
>>
>>      /Key class: class org.apache.hadoop.io.Text Value Class: class
>>      org.apache.hadoop.io.Text
>>      Key: 96C4C76CF9D7449C724CA77CB8F650EAFD33E31C: Value:
>>      Key: D6842B81B8D09733B50BEDB4767C2A5C49E43B20: Value:
>>      Key: 61CB95FEE2C6BF0AC6E8A1F7738338CA36F42264: Value:
>>      Key: 0F9903B72A7C9F0373A5171403B3AAEB291B16E1: Value: /
>>
>>
>> Can anyone give me any suggestions as to how to track down what might be
>> happening here?
>>
>> Many thanks,
>>
>> Terry
>


Mime
View raw message