lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Gribov <gros...@gmail.com>
Subject Re: Can Apache Solr Handle TeraByte Large Data
Date Mon, 03 Aug 2015 22:15:21 GMT
Upayavira, manual commit isn't a good advice, especially with small bulks
or single document, is it? I see recommendations on using
autoCommit+autoSoftCommit instead of manual commit mostly.

вт, 4 авг. 2015 г. в 1:00, Upayavira <uv@odoko.co.uk>:

> SolrJ is just a "SolrClient". In pseudocode, you say:
>
> SolrClient client = new
> SolrClient("http://localhost:8983/solr/whatever");
>
> List<SolrInputDocument> docs = new ArrayList<>();
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("id", "abc123");
> doc.addField("some-text-field", "I like it when the sun shines");
> docs.add(doc);
> client.add(docs);
> client.commit();
>
> (warning, the above is typed from memory)
>
> So, the question is simply how many documents do you add to docs before
> you do client.add(docs);
>
> And how often (if at all) do you call client.commit().
>
> So when you are told "Use SolrJ", really, you are being told to write
> some Java code that happens to use the SolrJ client library for Solr.
>
> Upayavira
>
>
> On Mon, Aug 3, 2015, at 10:01 PM, Alexandre Rafalovitch wrote:
> > Well,
> >
> > If it is just file names, I'd probably use SolrJ client, maybe with
> > Java 8. Read file names, split the name into parts with regular
> > expressions, stuff parts into different field names and send to Solr.
> > Java 8 has FileSystem walkers, etc to make it easier.
> >
> > You could do it with DIH, but it would be with nested entities and the
> > inner entity would probably try to parse the file. So, a lot of wasted
> > effort if you just care about the file names.
> >
> > Or, I would just do a directory listing in the operating system and
> > use regular expressions to split it into CSV file, which I would then
> > import into Solr directly.
> >
> > In all of these cases, the question would be which field is the ID of
> > the record to ensure no duplicates.
> >
> > Regards,
> >    Alex.
> >
> > ----
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> >
> > On 3 August 2015 at 15:34, Mugeesh Husain <mugeesh@gmail.com> wrote:
> > > @Alexandre  No i dont need a content of a file. i am repeating my
> requirement
> > >
> > > I have a 40 millions of files which is stored in a file systems,
> > > the filename saved as ARIA_SSN10_0007_LOCATION_0000129.pdf
> > >
> > > I just  split all Value from a filename only,these values i have to
> index.
> > >
> > > I am interested to index value to solr not file contains.
> > >
> > > I have tested the DIH from a file system its work fine but i dont know
> how
> > > can i implement my code in DIH
> > > if my code get some value than how i can i index it using DIH.
> > >
> > > If i will use DIH then How i will make split operation and get value
> from
> > > it.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> http://lucene.472066.n3.nabble.com/Can-Apache-Solr-Handle-TeraByte-Large-Data-tp3656484p4220552.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Best regards,
Konstantin Gribov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message