If it's kind of a risk, and you can't take any chances...Why are you testing in that environment?

Why not set up a VM with a test database, and a VM with a pseudo-cluster, and load a subset of your data, and experiment in a development environment so that you can know for sure - even if someone guarantees you the answer on here, you can not be certain everything is identical across all the versions of Sqoop, Hadoop, etc for them as it would be for you...if the data you are working with has value, you should find a safe way to experiment rather than trust your valuable data to the mailing list answers.

Now, in answer to your question:

According to my peer (I am not the Sqoop person where I work) if your incremental split is on a column that has increasing values, you can safely split on that, but if the value you split on is always the same, it is a bad choice for incremental splitting - he uses a datetime column I believe, and then the import is from the last imported datetime value up to the current max. I am not sure if that helps your case, but it is my hope that you find it useful.

Devin Suiter
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Dec 30, 2013 at 2:27 PM, yogesh kumar <yogeshsqoop@gmail.com> wrote:
Thanks Chalcy, I got your point, let me try a simple test for it..   but the situation here is for incremental import i have to change the column for split by

Its a kind of risk..   can not take a chance.  just want to be sure that.

it will not affect the hive table and data into it after being incremental import. my incremental  import will directly pull data and put it at where my old sqooped data resides 

Want suggestion from champions of sqoop 
Pls hep me out 





On Tue, Dec 31, 2013 at 12:30 AM, Chalcy <chalcy@gmail.com> wrote:
I have not tried this but I believe you can change the split by as you wish.  The split by is used to split the jobs while --check-column and --last-value are used for incremental import.  

I do not know exact scenario but if empno gives a better split, you still can use that for incremental import instead of changing the split-by field.

I would suggest you do a very simple test to find out.

Hope this helps,
Chalcy 


On Mon, Dec 30, 2013 at 1:18 PM, yogesh kumar <yogeshsqoop@gmail.com> wrote:
Hello all,
 
I have done sqoop import for a particluar table first time say table Employee..
 
sqoop import -libjars .....
--query "select empno, name, date, loc from table Employee where \$CONDITIONS ..  "
--split-by empno
--fields-terminated-by ',' 
.
.
.
.
 
I have created an external table on hive,
 
Now I want to pull data on daily basis by using incremental pull.  can I specify the different column for --split-by
 
like
 
sqoop import -libjars .....
--query "select empno, name, date, loc from table Employee where \$CONDITIONS ..  "
--check-column date
--incremental append
--last-value 2013-05-01
--split-by date
--split-by empno
 
 
Can I change the column for split by in incremental sqoop, if not then how to do it.
 
Pls suggest