lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Can Apache Solr Handle TeraByte Large Data
Date Tue, 04 Aug 2015 12:03:37 GMT
If you have data that only consists of id (full filename) and filename (indexed, tokenized)
40M of those will fit comfortably into a single shard provided enough RAM to operate.

I know SolrJ is tossed out there a lot as a/the way to index - but if you’ve got a directory
tree of files and want to index _just_ the file names then a shell script that generated a
CSV could be easy and clean.  It’s trivial to `bin/post -c <your collection> data.csv`

Erik Hatcher, Senior Solutions Architect <>

> On Aug 4, 2015, at 5:51 AM, Mugeesh Husain <> wrote:
> Thank @Alexandre and  Erickson ,Hatcher.
> I will generate ID of MD5  with help of filename using java.
> I can do it with help of SolrJ nicely because i am java developer apart from
> this 
> The question raised that data is too large i think it will break into
> multiple shards(core)
> Using multi core indexing how i can analysed duplicate ID while reindexing
> the whole.(Using Solrj) and
> How i will analysed one core contains such amount of data and other etc.
> I have decide i will do it with SolrJ because i don't have good
> understanding with DIH for such type operation which i needed on my
> requirement. i'd google but unable to find such type of DIH Example which i
> can implement on my problem.
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message