Thanks Venkat. Do you know of any example for "a complex query with a split by column that can generate incorrect data in each of the mappers".
I haven't yet understood the corner case when sqoop will not work. If we have knowledge about it then we can avoid that pitfall and also enlighten others precisely to not fall into it.

Thanks & Regards,
Pratik

On Thu, Sep 18, 2014 at 3:43 PM, Venkat Ranganathan <vranganathan@hortonworks.com> wrote:
There are a few scenarios where we warn against inconsistencies.   Using a character column as a split by column, using complex queries with split by column that can potentially generate incorrect data in each of the mappers than what is intended.  

If you use -m 1 option, then you don't have the inconsistency issues.

Venkat

On Thu, Sep 18, 2014 at 2:40 PM, pratik khadloya <tispratik@gmail.com> wrote:
Am not facing any problem. Am checking to see what are the reservations against not supporting complex joins with OR conditions.
I would like to know when it could create a problem and would the problem be solvable by using a "view" or limiting the number of mappers to just 1.
I would like to know if the problem if any is due to the parallelism which comes with increasing the number of mappers?

~Pratik

On Thu, Sep 18, 2014 at 1:23 PM, Sambit Tripathy (RBEI/PJ-NBS) <Sambit.Tripathy@in.bosch.com> wrote:

Pratik,

 

Are you facing a problem or trying to make a recommendation?

 

 

Regards,

Sambit.

 

 

From: pratik khadloya [mailto:tispratik@gmail.com]
Sent: Thursday, September 18, 2014 1:09 PM
To: user@sqoop.apache.org
Subject: Complex free form queries

 

The sqoop docs say:

 

The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.

 

Does anyone know why such is case is not supported and can it be avoided by:

 

a) Using only 1 mapper

or

b) Creating a view out of the complex query

 

I have tested a hive textfile import for a very complex query and verified the data and it seems to be correct. I checked the number of words, number of lines and file sizes of the dump from mysql vs the text file imported onto hdfs by sqoop.

My query does have OR conditions. I have attached an obfuscated version of the query, and that screenprint is still 1/2 of the complete query.

 

Any info on this will be helpful.

 

Thanks,

Pratik




CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.