lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <>
Subject Re: Solr: Images, Docs and Binary data
Date Wed, 06 Apr 2011 18:28:46 GMT
I put binary data in an ordinary Solr stored field, don't need any 
special schema.

I have run into trouble making sure the data is not corrupted on the way 
in during indexing, depending on exactly what form of communication is 
being used to index (SolrJ, SolrJ with EmbeddedSolr, DIH, etc.), as well 
as settings in the container (eg jetty or tomcat) used to house Solr.   
But I think it's possible to get it working no matter what the path, if 
you run into trouble someone may be able to help you.

My binary data is not very large though (generally under 1 meg).

However, in general, _indexing_ large data should be fine, although it 
will create a larger index which can require more RAM, or be slower, 
etc.  But that's geenrally just a function of total size of index, or 
really total number of unique terms, doesn't matter if the docs they 
come from are big or small.

_Storing_ large fields can sometimes be a problem, lucene/Solr are 
really optimized as an index, not a key/value store.  Some people choose 
to _store_ their large objects in some external store (rdbms, nosql 
key/value, whatever), and have the client application look up the 
objects themselves by primary-key/unique-id, after the pk/uid's 
themselves are retrieved from Solr. Use Solr for what it's good at, 
indexing, use something else good at storing for storing large objects.  
But other people sometimes store large objects directly in Solr without 
problems, can depend on the exact nature of your index and use.

On 4/6/2011 2:09 PM, Ezequiel Calderara wrote:
> Another question that maybe is easier to answer, how can i store binary
> data? Any example schema?
> 2011/4/6 Ezequiel Calderara<>
>> Hello everyone, i need to know if some has used solr for indexing and
>> storing images (upt to 16MB) or binary docs.
>> How does solr behaves with this type of docs? How affects performance?
>> Thanks Everyone
>> --
>> ______
>> Ezequiel.
>> Http://

View raw message