lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: DIH nested entities don't work
Date Fri, 09 Nov 2012 15:34:12 GMT
Here are things I would try:
- You need to package the patch from SOLR-2943 in your jar as well as SOLR-2613 (to get the
class DIHCachePersistCacheProperties)

- You need to specify "cacheImpl", not "persistCacheImpl"

- You are correct using "persistCacheName" & "persistCacheBaseDir" , contra the test case
for which these parameters are extraneous and are out-of-date. 

- I wouldn't cache the parent entity, just the child.

- Don't specify persistCachePartitionNumber unless you're actually trying to partition your
caches (I wouldn't try this at first).

What will happen is it will loop through the resultset of the parent, document-by-document.
 At the first iteration, it will note that the child entity's cache hasn't been initalized
and it will build a cache for it. Then, for each iteration, it pulls out of the cache for
the child while looping the resultset for the parent.

Hopefully this will work better for yu.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: mroosendaal [mailto:mroosendaal@yahoo.com] 
Sent: Friday, November 09, 2012 12:39 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH nested entities don't work

Hi James,

What i did:
* build a jar from the patch
* downloaded the BDB library
* added them to my classpath
* download a nightly 4.1 Sol build
* created a db config according to:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestEphemeralCache.java

although i got things working, after 2 hours of indexing i stopped the
proces. For that amount of data it took endeca 1h15. After looking at some
of the tests in the patch i configured the data-config.xml as follows:
<document>
        	<entity name="END_FRG_PRODUCTS_VW" 
           		processor="SqlEntityProcessor"
           	
persistCacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache" 
           		persistCacheName="END_FRG_PRODUCTS_VW"
           		persistCachePartitionNumber="0"
           		persistCacheBaseDir="d:\cacheloc"
           		berkleyInternalCacheSize="1000000"			
			berkleyInternalShared="true"
           		query="select PDT_ID, SEARCH_TITLE from END_FRG_PRODUCTS_VW">
           		<entity name="END_FRG_FEATURES_VW"
				processor="SqlEntityProcessor"
				persistCacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
				persistCacheName="FEATURE"
				cacheKey="PDT_ID"
				cacheLookup="END_FRG_PRODUCTS_VW.PDT_ID"
				berkleyInternalCacheSize="1000000"			
				berkleyInternalShared="true"
				persistCacheBaseDir="d:\cacheloc"
             			query="select * from END_FRG_FEATURES_VW"/>
               	</entity>
	</document>

Although different in behaviour:
[snapshot from the indexing after 8 minutes: Requests: 2899, Fetched:
28974398, Skipped: 0, Processed: 2258] it was still slow and the parameter
'persistCacheBaseDir' has no effect. The difference in behaviour from the
previous is that it had only 2 requests and hadn't processed anything after
2 hours.

Hope you can help me.

Thanks,
Maarten




--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4019223.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mime
View raw message