sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Lin <eric....@cloudera.com>
Subject Re: Review Request 57551: SQOOP-2272 - Import decimal columns from mysql to hive 0.14
Date Tue, 28 Mar 2017 04:35:51 GMT


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > Hey Eric,
> > 
> > For the first shot I do understand you motivation here, and also attached some comments/requests
around the implementation.
> > 
> > Although I'm a bit concerned about the big picture as well B/C I do feel in the
current form it would be a breaking change.
> > 
> > E.g. Still so far we'd created tables in hive with double columns, but after applying
the change all of these imports + tables would contain DECIMAL values. Wouldn't this change
confuse the end users? (for me it would be very much unexpected especially after a version
upgrade!)
> > 
> > Questions on my side:
> > Am I right your change would only effect Sqoop imports when the import ends up in
creating a new Hive table?
> > Wouldn't this cause problems on the end user side (e.g. in case of big incremental
data ingest cases with many tables)?
> > What will happen with the jobs inserting data into an already existing Hive table?
Have you tested that scenario as well?
> > Although your change seems to be much more type/Hive agnostic, shouldn't we introduce
a fallback scenario for the users (e.g. turning off the Decimal->Decimal mapping with an
option). Or should we just advise the explicit type mapping for the users in this case?
> > 
> > Thanks for your answers,
> > Attila

Hi Attila,

Thanks for your feedback. To answer your questions:

> Am I right your change would only effect Sqoop imports when the import ends up in creating
a new Hive table?
Yes, it only affects when creating a new table, append mode will not be affected

> Wouldn't this cause problems on the end user side (e.g. in case of big incremental data
ingest cases with many tables)?
I am not sure about your example here, can you give me a bit more details on your use case?

> What will happen with the jobs inserting data into an already existing Hive table? Have
you tested that scenario as well?
I have tested manually, the CREATE TABLE IF EXISTS will be ignored and whatever table was
before will not be changed. After that it will be LOAD DATA INPATH command, so nothing will
change.

> Although your change seems to be much more type/Hive agnostic, shouldn't we introduce
a fallback scenario for the users (e.g. turning off the Decimal->Decimal mapping with an
option). Or should we just advise the explicit type mapping for the users in this case?
I would think that using --map-column-hive should be sufficient, correct me if I am wrong
here.

Thanks


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 126-130 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line127>
> >
> >     Why do we add "null" values to a typed Integer list here?
> >     It looks like a bit wired for the first shot, and could be the source of unexpected
NPE.
> >     
> >     Could you please provide some more information here?

Hi Attila, the function connManager.getColumnInfoForQuery returns column info as an arraylist
with 3 integer values (typeId, precision and scale). As precision and scale do not apply to
other column types, putting value of "0" will be misleading and difficult for any code need
to decide if precision or scale need to be included or not. So I think putting NULL value
makes sense, so that we can check accordingly. But if you think  there is a better way, please
let me know and I am happy to change it.


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 142-145 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line143>
> >
> >     We should not depend on null references.
> >     
> >     Please use rather and empty list and check if it is still empty after the retrieval
of the columnInfo.

Function connManager.getColumnInfoForQuery can return NULL value, so I think we should check
that. I have actually added checking "columnTypes.size() == 0", however, it breaks quite a
few test cases in class TestTableDefWriter, so I think I will just not add the checking.


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 192-200 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line193>
> >
> >     I'm quite confused here as well according to the null values (especially because
if we'd put null values into the position 1 and 2 in the code befor, then it will just assign
null to the variables which hold null values already).
> >     
> >     Could you please provide more highlight here?

There are two possible ways to get the columnTypes, see below code:

    if (externalColTypes != null) {
      columnTypes = new SqlTypeMap<String, List<Integer>>();
      // Use pre-defined column types.
      for(String key : externalColTypes.keySet()) {
        List<Integer> info = new ArrayList<Integer>();
        info.add(externalColTypes.get(key));
        info.add(null);
        info.add(null);
        columnTypes.put(key, info);
      }

    } else {
      // Get these from the database.
      if (null != inputTableName) {
        columnTypes = connManager.getColumnInfo(inputTableName);
      } else {
        columnTypes = connManager.getColumnInfoForQuery(options.getSqlQuery());
      }
    }
    
So if we hit "if" statement, it will be NULL, but if we hit "else" statement, it will return
precision and scale values. The code here is to simply retrieve the precision and scale values,
which can be checked later.


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 210-221 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line211>
> >
> >     Although this is a good to know / nice catch thingy, at least we do need some
log statements here (e.g. with info/debug level). We should not do anything like this silently!
> >     
> >     Please introduce logging statement(s) with the required level of information!

I will add the LOG in the latest patch, thanks for the suggestion, it totally makes sense.


- Eric


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57551/#review169453
-----------------------------------------------------------


On March 28, 2017, 4:35 a.m., Eric Lin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57551/
> -----------------------------------------------------------
> 
> (Updated March 28, 2017, 4:35 a.m.)
> 
> 
> Review request for Sqoop, Attila Szabo and Szabolcs Vasas.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently Sqoop converts DECIMAL from RDMS into DOUBLE in Hive, which is not correct
as user will lose precisions. Since Hive supports DECIMAL long ago, we should support DECIMAL
to DECIMAL conversion from Sqoop to Hive.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/HiveTypes.java ad00535 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 32fcca3 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 33e0cc4 
>   src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java dbf0dde 
>   testdata/hive/scripts/decimalImport.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/57551/diff/5/
> 
> 
> Testing
> -------
> 
> Test case + maunaul test
> 
> 
> Thanks,
> 
> Eric Lin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message