sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fero Szabo via Review Board <nore...@reviews.apache.org>
Subject Re: Review Request 65607: SQOOP-2976 Flag to expand decimal values to fit AVRO schema
Date Tue, 13 Feb 2018 15:54:23 GMT


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/java/org/apache/sqoop/avro/AvroUtil.java
> > Lines 88 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957645#file1957645line88>
> >
> >     I think this if statement is redundant since if schemaContainingScale is null
then the method will return null anyway.

Nope, this is there because we are looking for the first non-null value returned by getDecimalSchema.
Here is an example schema where this comes into play:
{"type" : "union", "value" : "{\"type\" : \"null\"}, {\"type\" : \"null\", \"precision\" :
10, \"scale\" : 5}"}

If we would remove the if statement, then getDecimalSchema would return null.

Maybe, I should have added 
```
else {
  continue;
}
```
for verbosity?


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java
> > Lines 101 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957649#file1957649line101>
> >
> >     Do we really need this file? It seems we do not use dates in this test case.

Nope, removed.


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java
> > Lines 93 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957651#file1957651line93>
> >
> >     A similar logic is already present in org.apache.sqoop.testutil.BaseSqoopTestCase
can we somehow reuse that logic?
> >     For example it already has a ConnManager field which can be initialized in org.apache.sqoop.testutil.BaseSqoopTestCase#setUp

Thanks for pointing this one out. In the end I figured out how to reuse the table creation
and insertion logic from BaseSqoopTestCase as well.


> On Feb. 13, 2018, 11:39 a.m., Szabolcs Vasas wrote:
> > src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java
> > Lines 102 (patched)
> > <https://reviews.apache.org/r/65607/diff/2/?file=1957652#file1957652line102>
> >
> >     I think this logic handling the tool options could be also moved to ArgumentUtils
since it already contains similar stuff.
> >     After that this method could be simplified, the notEmpty checks could also be
removed.
> >     
> >     We could also think about moving all the logic from ArgumentUtils to the builder
since it would be a better practice to use the builder instead of the static methods.

I think the latter makes more sense, so I've moved the functions from ArgumentUtils to the
builder and also replaced every call of ArgumentUtil's functions to use the new builder.


- Fero


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65607/#review197269
-----------------------------------------------------------


On Feb. 12, 2018, 2:31 p.m., Fero Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65607/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2018, 2:31 p.m.)
> 
> 
> Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.
> 
> 
> Bugs: SQOOP-2976
>     https://issues.apache.org/jira/browse/SQOOP-2976
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> **Summary:**
> Certain databases, such as SQL Server and Postgres are storing decimal values padded
with 0s, should the user insert them with less digits than the given scale. 
> 
> Other databases however, such as Oracle and HSQLDB store these numbers without trailing
0s. Then, when the JDBC driver returns these as BigDecimals, they won't match the scale in
the avro schema.
> 
> Take the following SQL commands for an example: 
> ```
> create table salary (id int, amount number (10,5));
> insert into salary (id, amount) values (1, 10.5);
> insert into salary (id, amount) values (2, 10.50);
> select * from salary;
> ```
> Records in an Oracle database:
> 1	10.5
> 2	10.5
> 
> Records in SQL Server (using decimal instead of number in the create statement):
> 1	10.50000
> 2	10.50000
> 
> **Solition:**
> The fix is simply checking the scale of the returned BigDecimals against what's in the
avro schema and recreates the objects in case of a mismatch. I've introduced a new property
to enable this new feature, so existing behavior is not affected. 
> 
> **Concerns: **
> - trimmings can happen silently, should we rather raise an exception? Enabling trimming
adds a new feature, but it also adds the possibility silently lose scale while import. The
latter could be mitigated by a thorough documentation.
> - The flags current name () doesn't really match the behavior, should I change it to
something else? (avro.decimal_scale_harmonization.enable)
> - How / where to document this new flag?
> 
> 
> **Other notable changes:**
> - Introduced ArgumentArrayBuilder that reuses the existing Argument class and introduces
a useful builder pattern for creating commandline arguments for tests.
> - Slightly modified BaseSqoopTest to fit my needs. *(However, further refactoring would
be required in this class to enable better reuse. For example: the current implementation
can't be used with SQL Server, because one also needs to specify the schema besides the tablename
in the create and insert statements. There are also code duplications.)*
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1aae8df2 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 7a19a62c 
>   src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java a5e5bf5a 
>   src/test/org/apache/sqoop/manager/hsqldb/TestHsqldbAvroPadding.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/oracle/OracleAvroPaddingImportTest.java PRE-CREATION

>   src/test/org/apache/sqoop/manager/sqlserver/MSSQLTestUtils.java 2220b7d5 
>   src/test/org/apache/sqoop/manager/sqlserver/SQLServerAvroPaddingImportTest.java PRE-CREATION

>   src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/AvroTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 588f439c 
> 
> 
> Diff: https://reviews.apache.org/r/65607/diff/2/
> 
> 
> Testing
> -------
> 
> See the 3 new test classes (HSQLDB, Oracle, SQL Server).
> 
> 
> Thanks,
> 
> Fero Szabo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message