lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno Mannina" <bmann...@free.fr>
Subject RE: Is Solr can do that ?
Date Mon, 24 Jun 2019 08:58:08 GMT
Hello Sam,

First, thanks for your answer.

I don't know yet the number of document, I know just that it will be Text, Pdf, Word, Xls,
etc...
I will try to get more info about the number of document.

I don't know TIka, I will investigate it.

Thanks,
Bruno


-----Message d'origine-----
De : Samuel Kasimalla [mailto:skasimalla@gmail.com] 
Envoyé : vendredi 21 juin 2019 18:56
À : solr-user@lucene.apache.org
Objet : Re: Is Solr can do that ?

Hi Bruno,

Assuming you meant 30TB, the first step is to use TIka parser and convert the rich documents
into plain text.

We need the number of documents, the unofficial word on the street is about
50 million documents per shard, of course a lot of parameters are involved in this, it's a
simple question but answer is not so simple :).

Hope this helps.

Thanks
Sam
https://www.linkedin.com/in/skasimalla/

On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info < info@matheo-software.com> wrote:

> Dear Solr User,
>
>
>
> My question is very simple J I would like to know if Solr can process 
> around 30To of data (Pdf, Text, Word, etc…) ?
>
>
>
> What is the best way to index this huge data ? several servers ? 
> several shards ? other ?
>
>
>
> Many thanks for your information,
>
>
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] <https://www.facebook.com/PatentPulse>[image:
> 1425551717] <https://twitter.com/matheosoftware>[image: 1425551737]
> <https://www.linkedin.com/company/matheo-software>[image: 1425551760] 
> <https://www.youtube.com/user/MatheoSoftware>
>
>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient> Garanti sans virus. 
> www.avast.com 
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_
> campaign=sig-email&utm_content=emailclient>
> <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus
Avast.
https://www.avast.com/antivirus


Mime
View raw message