sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1811) IDF API changes
Date Mon, 01 Dec 2014 18:33:12 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230198#comment-14230198
] 

Jarek Jarcec Cecho commented on SQOOP-1811:
-------------------------------------------

My apologies for the confusion [~vybs]. The IDF implementation is required to provide method
{{getSqoopCSVData()}} and {{setSqoopCSVData()}} to work with the CSV-ish format. However Sqoop
don't have to call this method in every case and hence the conversion should be "lazy" (performed
on demand). E.g. there might be scenario where Sqoop when working with Connector 1 and 2 decide
to use the {{getData()}} and {{setData()}} methods (both connectors are using the same IDF
implementation and we are not required to do any conversions) and at the same time for Connector
1 and 3 decides to use {{getSqoopCSVData()}} and {{setSqoopCSVData()}} (because text happens
to be the best format to move data around). The first example should impose no CPU penalty
for converting into CSV-ish format as in that case the CSV-ish format is not used.

{quote}
1. We have agreed to rename the API to getSqoopCSVData() and setSqoopCSVData(). I propose
making this final and moving the text field to the base class (IDF). My understanding was
that every IDF implementation will have to provide this string.
{quote}

I think that the suggestion to rename {{getTextData()}} and {{setTextData()}} to {{getSqoopCSVData()}}
and {{setSqoopCSVData()}} is a good idea and I'm supporting that. I still don't see a value
in defining either of those methods as final and/or require to have {{text}} element in the
{{IntermediateDataFormat}} base class.

{quote}
2. getData() and getObjectData() even though I am not sure why both will be needed in all
cases. So in case of AvroIDF why would we need both.
getData() and setData() would set a Avro object that represents the row
{quote}

I don't think that all methods will be used during all data transfers. That is also why we
are defining those methods as abstract and why the implementation should be "lazy". I believe
that your example is correct, to sum it up, for AvroIDF:

* {{getData()}} will return Avro object representing the row
* {{getObjectData()}} will convert the internal Avro object to return "plain" Java objects
* {{getSqoopCSVData()}} will convert the internal Avro object to return CSV-ish text


> IDF API changes
> ---------------
>
>                 Key: SQOOP-1811
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1811
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: sqoop2-framework
>            Reporter: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> 1. update the java docs for IDF apis.
> 2.  Make the getTextData final and call it getCSV and setCSV, so it is obvious that we
want to enforce CSV format
>  the following code can move to the base class IntermediateDataFormat and made final,
so there is no way to override this and we can enforce all to return String instead of generic
T
> {code}
> // hold the string in IDF base class
>  private final String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}
> There is code in CSVIDF implementation that has the rules for CSV parsing that can be
pulled out into CSV Utils so that the connectors can use
> The T in CSV happens to String, which is just a coincidence, If I write a new IDF implementation
T can be a custom object that could encapsulate the whole row.
> Third, getData and setData can have custom implementation so they can be overriden to
return the generic type T
> Correction :
> {code}
> // hold the string in IDF base class, is !final
>  private String text.
>  
>   public final String getCSVTextData() {
>     return text;
>   }
>  
>   public final void setCSVTextData(String text) {
>     this.text = text;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message