lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nasseam Elkarra <nass...@bodukai.com>
Subject Re: Importing CSV file slow/crashes
Date Wed, 07 Oct 2009 01:54:52 GMT
Hello Yonik,

Thank you for looking into this. Your question of if I'm using stock  
solr put me in the right direction. I am in fact using a patched  
version of solr to get hierarchal facet support (http://issues.apache.org/jira/browse/SOLR-64

). I took out the 4 hiefacet fields from the schema and the import was  
back to normal times of less than a minute. This same configuration  
worked fine with the 5/1 patched build.

Here is the field definition:
<fieldType name="hierarchy" class="solr.HierarchicalFacetField"  
omitNorms="true" positionIncrementGap="0" indexed="true"  
stored="false" delimiter="/" />

<!-- fields -->
<field name="category" type="hierarchy" indexed="true" stored="true"  
multiValued="true"/>
<field name="category_seo" type="hierarchy" indexed="true"  
stored="true" multiValued="true"/>

<!-- facet fields -->
<field name="category_facet" type="hierarchy" indexed="true"  
stored="false" multiValued="true"/>
<field name="category_seo_facet" type="hierarchy" indexed="true"  
stored="false" multiValued="true"/>

<copyField source="category" dest="category_facet"/>
<copyField source="category_seo" dest="category_seo_facet"/>

CSV file snippet:
category,category_seo
"T-Shirt Mens/Crew Neck/","t-shirt-mens/crew-neck/"

Thanks again!
Nasseam

On Oct 6, 2009, at 3:22 PM, Yonik Seeley wrote:

> On Tue, Oct 6, 2009 at 1:06 PM, Nasseam Elkarra  
> <nasseam@bodukai.com> wrote:
>> I had a dev build of 1.4 from 5/1/2009 and importing a 20K row took  
>> less
>> than a minute. Updating to the latest as of yesterday, the import  
>> is really
>> slow and I had to cancel it after a half hour. This prevented me from
>> upgrading a few months ago as well.
>
> I haven't had any success at replicating this problem.
>
> I just tried a 100K row CSV file, consisting of an id and a few text
> fields.  The total size of the file is 79MB.
>
> On trunk (today): 22 seconds to index, another 5-7 secons to commit
> 5/21 version: 28 seconds to index, another 8 seconds to commit
>
> Then I modified the 5/1 schema to closer match the trunk schema
> (removing defaults, copyfields that could slow things down).
> Modified 5/1 version: 25 seconds to index, another 8 seconds to commit
>
> I only did 2 runs with trunk and 2 with one from 5/1, so the accuracy
> is probably low... but good enough to see there wasn't a problem in
> this test.
>
> We really need more info to help reproduce this.
> Are you using stock solr?  Do you have any custom plugins, analyzers,
> token filters, etc?
>
> You're going to need to provide something so others can reproduce  
> this.
>
> -Yonik
> http://www.lucidimagination.com


Mime
View raw message