lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno Mannina" <bmann...@free.fr>
Subject RE: Is Solr can do that ?
Date Mon, 24 Jun 2019 09:06:00 GMT
Hello Erick,

Well I do not know TIKA, I will of course study it.

Thanks for the info concerning solrj and Tika.

Bruno

-----Message d'origine-----
De : Erick Erickson [mailto:erickerickson@gmail.com] 
Envoyé : vendredi 21 juin 2019 19:10
À : solr-user@lucene.apache.org
Objet : Re: Is Solr can do that ?

What Sam said. 

Here’s something to get you started on how and why it’s better to be using Tika rather
than shipping the docs to Solr and having ExtractingRequestHandler do it on Solr: https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

> On Jun 21, 2019, at 9:56 AM, Samuel Kasimalla <skasimalla@gmail.com> wrote:
> 
> Hi Bruno,
> 
> Assuming you meant 30TB, the first step is to use TIka parser and 
> convert the rich documents into plain text.
> 
> We need the number of documents, the unofficial word on the street is 
> about
> 50 million documents per shard, of course a lot of parameters are 
> involved in this, it's a simple question but answer is not so simple :).
> 
> Hope this helps.
> 
> Thanks
> Sam
> https://www.linkedin.com/in/skasimalla/
> 
> On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info < 
> info@matheo-software.com> wrote:
> 
>> Dear Solr User,
>> 
>> 
>> 
>> My question is very simple J I would like to know if Solr can process 
>> around 30To of data (Pdf, Text, Word, etc…) ?
>> 
>> 
>> 
>> What is the best way to index this huge data ? several servers ? 
>> several shards ? other ?
>> 
>> 
>> 
>> Many thanks for your information,
>> 
>> 
>> 
>> 
>> 
>> Cordialement, Best Regards
>> 
>> Bruno Mannina
>> 
>> www.matheo-software.com
>> 
>> www.patent-pulse.com
>> 
>> Tél. +33 0 970 738 743
>> 
>> Mob. +33 0 634 421 817
>> 
>> [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image:
>> 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737]
>> <https://www.linkedin.com/company/matheo-software>[image: 1425551760] 
>> <https://www.youtube.com/user/MatheoSoftware>
>> 
>> 
>> 
>> 
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm
>> _campaign=sig-email&utm_content=emailclient> Garanti sans virus. 
>> www.avast.com 
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm
>> _campaign=sig-email&utm_content=emailclient>
>> <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus
Avast.
https://www.avast.com/antivirus


Mime
View raw message