sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sambit Tripathy (RBEI/PJ-NBS)" <Sambit.Tripa...@in.bosch.com>
Subject RE: Complex free form queries
Date Fri, 19 Sep 2014 00:05:41 GMT

Sqoop calculates the split condition by firing this

select min(split_by_col), max(split_by_col) from table;

The max and min is calculated by sorting the split column and string sorting could be different
from numeric sorting.

After retrieving the min and max value of the column, split size is calculated:
split_size = (max – min) / no_of_mappers

From: pratik khadloya [mailto:tispratik@gmail.com]
Sent: Thursday, September 18, 2014 4:12 PM
To: user@sqoop.apache.org
Subject: Re: Complex free form queries

Thanks Venkat. Do you know of any example for "a complex query with a split by column that
can generate incorrect data in each of the mappers".
I haven't yet understood the corner case when sqoop will not work. If we have knowledge about
it then we can avoid that pitfall and also enlighten others precisely to not fall into it.

Thanks & Regards,

On Thu, Sep 18, 2014 at 3:43 PM, Venkat Ranganathan <vranganathan@hortonworks.com<mailto:vranganathan@hortonworks.com>>
There are a few scenarios where we warn against inconsistencies.   Using a character column
as a split by column, using complex queries with split by column that can potentially generate
incorrect data in each of the mappers than what is intended.

If you use -m 1 option, then you don't have the inconsistency issues.


On Thu, Sep 18, 2014 at 2:40 PM, pratik khadloya <tispratik@gmail.com<mailto:tispratik@gmail.com>>
Am not facing any problem. Am checking to see what are the reservations against not supporting
complex joins with OR conditions.
I would like to know when it could create a problem and would the problem be solvable by using
a "view" or limiting the number of mappers to just 1.
I would like to know if the problem if any is due to the parallelism which comes with increasing
the number of mappers?


On Thu, Sep 18, 2014 at 1:23 PM, Sambit Tripathy (RBEI/PJ-NBS) <Sambit.Tripathy@in.bosch.com<mailto:Sambit.Tripathy@in.bosch.com>>

Are you facing a problem or trying to make a recommendation?


From: pratik khadloya [mailto:tispratik@gmail.com<mailto:tispratik@gmail.com>]
Sent: Thursday, September 18, 2014 1:09 PM
To: user@sqoop.apache.org<mailto:user@sqoop.apache.org>
Subject: Complex free form queries

The sqoop docs say:

The facility of using free-form query in the current version of Sqoop is limited to simple
queries where there are no ambiguous projections and no OR conditions in the WHERE clause.
Use of complex queries such as queries that have sub-queries or joins leading to ambiguous
projections can lead to unexpected results.

Does anyone know why such is case is not supported and can it be avoided by:

a) Using only 1 mapper
b) Creating a view out of the complex query

I have tested a hive textfile import for a very complex query and verified the data and it
seems to be correct. I checked the number of words, number of lines and file sizes of the
dump from mysql vs the text file imported onto hdfs by sqoop.
My query does have OR conditions. I have attached an obfuscated version of the query, and
that screenprint is still 1/2 of the complete query.

Any info on this will be helpful.


NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.

View raw message