sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request 28139: Support Time and List Type in CSV IDF
Date Wed, 19 Nov 2014 17:03:01 GMT


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java,
lines 21-62
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line21>
> >
> >     Nit: Seems as unncecessary change in ordering.
> 
> Veena Basavaraj wrote:
>     I used the IDE ordering, I am not sure if there a standard for this.

You're correct that we don't have standard for import ordering, which is not a big deal in
my mind. We however don't like unnecessary changes such as changing import order just for
the sake of changing the order.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java,
lines 78-79
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line78>
> >
> >     Can you plase document how is array expected to be coded up in the IDF? It seems
that we don't have this in the code yet, so I would simply update the wiki:
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     (We can move the IDF proposal from wiki to code in subsequent JIRA)
> 
> Veena Basavaraj wrote:
>     I have a ticket for that. I would rather keep this in one place
>     https://issues.apache.org/jira/browse/SQOOP-1706

Ack.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java,
line 99
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line99>
> >
> >     I'm wondering if this one really needs to be public?
> 
> Veena Basavaraj wrote:
>     it is default access so we can use it in unit tests, there is no need to be public
yes

I see.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java,
line 102
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line102>
> >
> >     Based on the intermediate format proposal the correct Time format should be:
HH:MM:DD[.ZZZZZZ][+/-XX]
> >     
> >     https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation#Sqoop2Intermediaterepresentation-Intermediateformatrepresentationproposal
> >     
> >     Can we add the missing parts?
> 
> Veena Basavaraj wrote:
>     that is support this as Time?
>     
>     so what are some of the examples you would like to be tested.?
>     
>     would be easy for me to add those to test cases
> 
> Veena Basavaraj wrote:
>     I am going to fix this is a follow up ticket, might be worth discussiin timestamp
as well.

I'm fine with finishing the work in subsequent ticket.


> On Nov. 19, 2014, 1 a.m., Jarek Cecho wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java,
line 321
> > <https://reviews.apache.org/r/28139/diff/1/?file=766590#file766590line321>
> >
> >     Why space as a separator for nested data types when we are already using comma
as a separator?
> 
> Veena Basavaraj wrote:
>     this is for the elements inside the array. We use comma between the 2 columns.
>     
>     
>     Another alternative would be encode this as JSON string, which I might try as Abe
suggested below.
>     
>     ['A' 'B'], 'text'

I don't see a reason why we need to have two different separators depending whether we are
in nested type or not, why not simply have: ["A","B"],"C" for type "Array(String),String"?
I would assume that this will become much easier for nested types.

It seems that PostgreSQL is the only database that supports Arrays from the dbs that I've
explored back in SQOOP-515, so I'm wondering how does pg_dump serializes the array? I would
suggest to follow that so that we can limit the number of transformations when reading from
pg_dump in the future. Another project worth exploring would be Hive to see how arrays are
serialized.

The goal here should be to enable the direct tools (mysqldump, pg_dump) to be equivalent with
the CSV IDF with as few changes as possible to avoid unnecessary CPU cycles when using those
utilities.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28139/#review62081
-----------------------------------------------------------


On Nov. 19, 2014, 6:58 a.m., Veena Basavaraj wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28139/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2014, 6:58 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-1749
>     https://issues.apache.org/jira/browse/SQOOP-1749
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> see JIRA
> 
> Supporting Nested arrays is going to be dody logic
> 
> some of the assumptions here in parsing need more docs. Esp use of special characters
to parse nested arrays.
> 
> Welcome feedback if I can improve this more.
> 
> I am sure mores tests can be added.
> 
> 
> NOTE: WS will be addressed once the patch is reviewed for functionality and this needs
a rebase since I have other patches pending review that overlap.
> 
> 
> Diffs
> -----
> 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
39a01c1b5b290903a6c93dbb8427af525515e2b2 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormat.java
5ef6fc64ec695fbdf2f9b6acec1eb0235fd675be 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/IntermediateDataFormatError.java
4d41679d0b4b9e04d05eef4dd5280f201bbe58e2 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/TestCSVIntermediateDataFormat.java
fcf6c3c15c6558cf21a0d82c78a51903d8fb371c 
> 
> Diff: https://reviews.apache.org/r/28139/diff/
> 
> 
> Testing
> -------
> 
> yes unit tests added. 
> 
> 
> File Attachments
> ----------------
> 
> SQOOP-1749-v2.patch
>   https://reviews.apache.org/media/uploaded/files/2014/11/19/cfdfc0ed-57c9-40d3-959b-f482f1600c03__SQOOP-1749-v2.patch
> 
> 
> Thanks,
> 
> Veena Basavaraj
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message