drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nipari...@gmail.com>
Subject Re: DRILL 1.4 - newline in strings not supported
Date Mon, 01 Feb 2016 15:26:59 GMT
Hello Abdel,

I am creating parquet file from those CSV files. (CREATE TABLE syntax).
Basically, I have a text column, with a maximum of 50k characters,
containing newlines (the texts come from pdf extracted). I have
multimilions tuples of texts. I am subseting texts containing some patterns
(LIKE '%foo%' or regex => sadly I haven't found mention about regex in
documentation (postgresql "~" operator equivalent))
Usually I used postgresql or monetdb in order to mine the texts, but I am
benchmarking/studying apache drill too.

Thanks,


2016-02-01 15:54 GMT+01:00 Abdel Hakim Deneche <adeneche@maprtech.com>:

> Hey Nicolas,
>
> what kind of queries are you running on your csv file ?
>
> On Sun, Jan 31, 2016 at 12:14 PM, Nicolas Paris <niparisco@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am trying to import a csv containing large texts. They contains newline
> > character "\n".
> > Apache Drill conplains about that. There is a jira issue opened on
> >
> >
> https://www.google.fr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwjUscyr7tTKAhXBVhoKHf0CAjYQFggpMAE&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fdrill-dev%2F201505.mbox%2F%253CJIRA.12832322.1432356299000.15684.1432356317225%40Atlassian.JIRA%253E&usg=AFQjCNHEwAdEpCBmS1QeuLhdfL8SIdTx6Q&sig2=4EM_xXq2QWd8kmC3LT2-Wg
> >
> > Is there a workaround ? (different that removing \n from texts)
> >
> > Thanks by advance
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message