drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurram Faraaz <kfar...@maprtech.com>
Subject Re: Last Column showing blank in csv file
Date Sat, 03 Dec 2016 12:12:10 GMT
I am not exactly sure this will work (I haven't tried it on Windows 10)
Setting lineDelimiter to '\r\n' should try to help in your case, can you
please try. For details look at DRILL-3149
<https://issues.apache.org/jira/browse/DRILL-3149>

Example:

select * from table(dfs.`my_table`(type=>'text', 'lineDelimiter'=>'\r\n'))


On Sat, Dec 3, 2016 at 2:28 AM, Kunal Khatua <kkhatua@maprtech.com> wrote:

> There is a dos2unix utility for Linux that allows you to substitute the
> multichar newline with the single char newline.
>
> For Windows, you can use either a similar util on SourceForge or the
> CygUtils (part of the Cygwin shell, I believe) to achieve the same
> conversion.
>
> In the meanwhile, like Abhishek suggested, please file a JIRA for this so
> that the issue can be tracked and fixed in a future release.
>
> Thanks
> Kunal
>
> On Wed 30-Nov-2016 8:24:00 PM, Abhishek Girish <abhishek.girish@gmail.com>
> wrote:
> Well the only workaround I got to work is this: I saved the file in
> UNIX/OS_X Format and executed the query successfully (with no blanks in the
> class column).
>
> Some text editors such as Notepad++ let you do this directly on Windows. Or
> you could use a utility like iconv or dos2unix for larger files like Leon
> suggested.
>
> Also, please file an enhancement JIRA for supporting a new 'newline'
> attribute (which takes multi-byte values) for text formats. This will be
> helpful for files on windows which use CRLF (\r\n) as newline characters.
>
> On Wed, Nov 30, 2016 at 7:56 PM, Abhishek Girish
> wrote:
>
> > Okay, I could reproduce the issue on Windows.
> >
> > 0: jdbc:drill:zk=local> select A.sepalen, A.sepalwidth, A.patelen,
> > A.patelwidth,
> > A.class from dfs.`/drill/tmp.csv` as A;
> > +----------+-------------+----------+--------------+--------+
> > | sepalen | sepalwidth | patelen | patelwidth | class |
> > +----------+-------------+----------+--------------+--------+
> > | 5.1 | 3.5 | 1.4 | Iris-setosa | |
> > | 4.9 | 3 | 1.4 | Iris-setosa | |
> > | 4.7 | 3.2 | 1.3 | Iris-setosa | |
> > | 4.6 | 3.1 | 1.5 | Iris-setosa | |
> > | 5 | 3.6 | 1.4 | Iris-setosa | |
> > | 5.4 | 3.9 | 1.7 | Iris-setosa | |
> > | 4.6 | 3.4 | 1.4 | Iris-setosa | |
> > | 5 | 3.4 | 1.5 | Iris-setosa | |
> > | 4.4 | 2.9 | 1.4 | Iris-setosa | |
> > | 4.9 | 3.1 | 1.5 | Iris-setosa | |
> > | 5.4 | 3.7 | 1.5 | Iris-setosa | |
> > | 4.8 | 3.4 | 1.6 | Iris-setosa | |
> > +----------+-------------+----------+--------------+--------+
> > 12 rows selected (0.277 seconds)
> >
> > I'll get back if and once I have a workaround / solution.
> >
> > On Wed, Nov 30, 2016 at 7:36 PM, Abhishek Girish
> > abhishek.girish@gmail.com> wrote:
> >
> >> I forgot to mention that, the other query works too.
> >>
> >> > select A.sepalen, A.sepalwidth, A.patelen, A.patelwidth, A.class from
> >> dfs.tmp.`tmp.csv` as A;
> >> +----------+-------------+----------+--------------+--------+
> >> | sepalen | sepalwidth | patelen | patelwidth | class |
> >> +----------+-------------+----------+--------------+--------+
> >> | 5.1 | 3.5 | 1.4 | Iris-setosa | 0.2 |
> >> | 4.9 | 3 | 1.4 | Iris-setosa | 0.2 |
> >> | 4.7 | 3.2 | 1.3 | Iris-setosa | 0.2 |
> >> | 4.6 | 3.1 | 1.5 | Iris-setosa | 0.2 |
> >> | 5 | 3.6 | 1.4 | Iris-setosa | 0.2 |
> >> | 5.4 | 3.9 | 1.7 | Iris-setosa | 0.4 |
> >> | 4.6 | 3.4 | 1.4 | Iris-setosa | 0.3 |
> >> | 5 | 3.4 | 1.5 | Iris-setosa | 0.2 |
> >> | 4.4 | 2.9 | 1.4 | Iris-setosa | 0.2 |
> >> | 4.9 | 3.1 | 1.5 | Iris-setosa | 0.1 |
> >> | 5.4 | 3.7 | 1.5 | Iris-setosa | 0.2 |
> >> | 4.8 | 3.4 | 1.6 | Iris-setosa | 0.2 |
> >> +----------+-------------+----------+--------------+--------+
> >> 12 rows selected (0.31 seconds)
> >>
> >>
> >> On Wed, Nov 30, 2016 at 8:15 AM, Leon Clayton
> >> wrote:
> >>
> >>> Are we sure there is no hidden characters on the end of the one or more
> >>> lines. thinking ?
> >>>
> >>> try a dos2unix on the file to check this theory out.
> >>>
> >>>
> >>> > On 30 Nov 2016, at 10:45, Sanjiv Kumar wrote:
> >>> >
> >>> > Hello
> >>> > Yes you are right. select * from `tmp.csv`
> >>> > Is Working fine, but if select column then the last column data
> showing
> >>> > blank.
> >>> > Run this query:-
> >>> > select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
> >>> A.class
> >>> > from dfs.tmp.`copydata.csv` as A;
> >>> >
> >>> > If you run this query you will get last column data as blank.
> >>> >
> >>> > On Wed, Nov 30, 2016 at 11:15 AM, Sanjiv Kumar
> >>> wrote:
> >>> >
> >>> >> I am using latest version 1.8 and in window 10 operating system.
> >>> >>
> >>> >> On Tue, Nov 29, 2016 at 11:40 AM, Sanjiv Kumar
> >>> >> wrote:
> >>> >>
> >>> >>> I already pasted the csv file. Just copy and save it as csv.
I am
> >>> also
> >>> >>> attaching csv file. And the query is same as i mention above.
> >>> >>>
> >>> >>> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
> >>> A.class
> >>> >>> from dfs.tmp.`copydata.csv` as A;
> >>> >>>
> >>> >>> On Mon, Nov 28, 2016 at 6:30 PM, Sanjiv Kumar
> >>> >>> wrote:
> >>> >>>
> >>> >>>> Ya Its working but what if i am using :-
> >>> >>>> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
> >>> >>>> A.class from dfs.tmp.`copydata.csv` as A;
> >>> >>>>
> >>> >>>> why my last column data showing blank?
> >>> >>>> And One more thing See my csv file:-
> >>> >>>>
> >>> >>>> sepalen,sepalwidth,patelen,patelwidth,class
> >>> >>>> 5.1,3.5,1.4,Iris-setosa,0.2
> >>> >>>> 4.9,3,1.4,Iris-setosa,0.2
> >>> >>>> 4.7,3.2,1.3,Iris-setosa,0.2
> >>> >>>> 4.6,3.1,1.5,Iris-setosa,0.2
> >>> >>>> 5,3.6,1.4,Iris-setosa,0.2
> >>> >>>> 5.4,3.9,1.7,Iris-setosa,0.4
> >>> >>>> 4.6,3.4,1.4,Iris-setosa,0.3
> >>> >>>> 5,3.4,1.5,Iris-setosa,0.2
> >>> >>>> 4.4,2.9,1.4,Iris-setosa,0.2
> >>> >>>> 4.9,3.1,1.5,Iris-setosa,0.1
> >>> >>>> 5.4,3.7,1.5,Iris-setosa,0.2
> >>> >>>> 4.8,3.4,1.6,Iris-setosa,0.2
> >>> >>>>
> >>> >>>> This is my previous file.
> >>> >>>> Now if i add comma after class Check this new file.
> >>> >>>>
> >>> >>>> sepalen,sepalwidth,patelen,patelwidth,class,
> >>> >>>> 5.1,3.5,1.4,Iris-setosa,0.2
> >>> >>>> 4.9,3,1.4,Iris-setosa,0.2
> >>> >>>> 4.7,3.2,1.3,Iris-setosa,0.2
> >>> >>>> 4.6,3.1,1.5,Iris-setosa,0.2
> >>> >>>> 5,3.6,1.4,Iris-setosa,0.2
> >>> >>>> 5.4,3.9,1.7,Iris-setosa,0.4
> >>> >>>> 4.6,3.4,1.4,Iris-setosa,0.3
> >>> >>>> 5,3.4,1.5,Iris-setosa,0.2
> >>> >>>> 4.4,2.9,1.4,Iris-setosa,0.2
> >>> >>>> 4.9,3.1,1.5,Iris-setosa,0.1
> >>> >>>> 5.4,3.7,1.5,Iris-setosa,0.2
> >>> >>>> 4.8,3.4,1.6,Iris-setosa,0.2
> >>> >>>>
> >>> >>>>
> >>> >>>> And Fire this query:- select A.`sepalen`, A.`sepalwidth`,
> >>> A.`patelen`,
> >>> >>>> A.`patelwidth`, A.class from dfs.tmp.`copydata.csv` as
A;
> >>> >>>>
> >>> >>>> then the output is show fine. but if the comma is not there
in
> then
> >>> the
> >>> >>>> last column data showing blank.
> >>> >>>>
> >>> >>>> Is this a bug ??
> >>> >>>> ......................
> >>> >>>> Thanks & Regards
> >>> >>>> *Sanjiv Kumar*
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> ......................
> >>> >>> Thanks & Regards
> >>> >>> *Sanjiv Kumar*
> >>> >>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> ......................
> >>> >> Thanks & Regards
> >>> >> *Sanjiv Kumar*
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > ......................
> >>> > Thanks & Regards
> >>> > *Sanjiv Kumar*
> >>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message