sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivas Surasani <vas...@gmail.com>
Subject Re: sqoop exporting from hdfs to postgresql, using --staging-table option
Date Sat, 28 Jan 2012 17:58:18 GMT
Abhijeet --

1) By default it is loading 1000 records and as you mentioned earlier this
can be tweaked using  "sqoop.export.records.per.statement ".
2) I  just run select count(*) on teradata table and seen 1000 records
inserted at one go.
3) I believe  setting number of mappers > 1 works for only non-parallel
databases.

-- Srinivas

On Sat, Jan 28, 2012 at 12:47 PM, abhijeet gaikwad
<abygaikwad17@gmail.com>wrote:

> Hi Srinivas,
> Export with multiple mappers is allowed in SQOOP. I have exported data
> into Sql Server as well as MySql using multiple mappers.
>
> Regarding the issue you are facing, I have few questions:
> 1. How did you set the batch size, 1000 you are talking about.
> 2. Can you share SQOOP logs in detail?
> 3. Deadlock issue seems to have raised by Teradata. There is an equal
> probability that a Teradata admin will be able to resolve this issue.
>
> Thanks,
> Abhijeet Gaikwad
> On 28 Jan 2012 22:16, "Srinivas Surasani" <vasajb@gmail.com> wrote:
>
>> Hi Abhijeet --,
>>
>>  Thanks for the information. I have one more question. Is the exports is
>> done always with one mapper? ( entering into table deadlocks if number of
>> mappers set to more than one ).
>> Also, FYI: I have observed the default number of rows inserted is 1000.
>>
>> Thanks,
>> -- Srinvas
>>
>> On Sat, Jan 28, 2012 at 10:12 AM, abhijeet gaikwad <
>> abygaikwad17@gmail.com> wrote:
>>
>>> Hi Srinivas,
>>> Haven't played with Teradata Connector, but in general there are two
>>> properties that drive insertions (SQOOP Export) in a table - namely:
>>>
>>> 1. "sqoop.export.records.per.statement" : This property is used to
>>> specify the number of records/rows to be inserted using a single
>>> INSERT statement. Default value is 100.
>>> 2. "sqoop.export.statements.per.transaction" : This property is used
>>> to specify the number the insert statements before a commit is fired -
>>> which you can call batch size. Default value is 100.
>>>
>>> You can use -D hadoop argument to specify these properties at command
>>> line. E.g. -Dsqoop.export.statements.per.transaction=50
>>>
>>> NOTE: Make sure you use this argument(-D) before using any of the
>>> SQOOP tool specific arguments. See SQOOP User Guide for more details.
>>>
>>> Thanks,
>>> Abhijeet Gaikwad
>>>
>>> On 1/26/12, Srinivas Surasani <vasajb@gmail.com> wrote:
>>> > Kathleen,
>>> >
>>> > Any information on below request.
>>> >
>>> > Hi All,
>>> >
>>> > I'm working on Hadoop CDH3 U0 and  Sqoop CDH3 U2.
>>> >
>>> > I'm trying to export csv files from HDFS to Teradata, it works well
>>> with
>>> > setting mapper to "1" ( with batch loading of 1000 records at a time ).
>>> > when I tried increasing the number of mappers to more than one I'm
>>> getting
>>> > the following error. Also, is it possible to configure batch size at
>>> the
>>> > time of export ( from the command line)??
>>> >
>>> >
>>> >  sqoop export  --verbose --driver com.teradata.jdbc.TeraDriver
>>> > --connect jdbc:teradata://xxxx/database=xxxx  --username xxxxx
>>> --password
>>> > xxxxx --table xxxx --export-dir /user/surasani/10minutes.txt
>>> > --fields-terminated-by '|' -m 4 --batch
>>> >
>>> > 12/01/24 16:17:21 INFO mapred.JobClient:  map 3% reduce 0%
>>> > 12/01/24 16:17:48 INFO mapred.JobClient: Task Id :
>>> > attempt_201112211106_68553_m_000001_2, Status : FAILED
>>> > *java.io.IOException: java.sql.BatchUpdateException: [Teradata
>>> Database]
>>> > [TeraJDBC 13.00.00.07] [Error 2631] [SQLState 40001] Transaction
>>> ABORTed
>>> > due to DeadLock*.
>>> >
>>> > Srinivas --
>>> >
>>> > On Wed, Jan 25, 2012 at 8:01 PM, Kathleen Ting <kathleen@cloudera.com
>>> >wrote:
>>> >
>>> >> Srinivas, as it happens, the Cloudera Connector for Teradata supports
>>> >> staging tables. It is freely available here:
>>> >>
>>> https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Teradata+Download
>>> >> .
>>> >>
>>> >> Regards, Kathleen
>>> >>
>>> >> On Wed, Jan 25, 2012 at 3:36 PM, Srinivas Surasani
>>> >> <vasajb@gmail.com>wrote:
>>> >>
>>> >>> Hi Kathleen,
>>> >>>
>>> >>> Same issue with Teradata.
>>> >>>
>>> >>>
>>> >>> Srinivas --
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 23, 2012 at 8:26 PM, Kathleen Ting
>>> >>> <kathleen@cloudera.com>wrote:
>>> >>>
>>> >>>> Hi Weihua -
>>> >>>>
>>> >>>> Unfortunately, the generic jdbc manager does not support staging.
>>> >>>> As a result, I've filed
>>> >>>> https://issues.apache.org/jira/browse/SQOOP-431on your behalf.
>>> >>>>
>>> >>>> Regards, Kathleen
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Jan 23, 2012 at 3:10 PM, Weihua Zhu <wzhu@adconion.com>
>>> wrote:
>>> >>>>
>>> >>>>> Hi Guys,
>>> >>>>>
>>> >>>>>   Good afternoon!
>>> >>>>>   I have a question. I was trying to sqoop exporting from
hdfs to
>>> >>>>> postgresql, using --staging-table options due to transactions
>>> >>>>> consideration. But it gives me error below.
>>> >>>>>   I am wondering if the staging_able is supported for
>>> >>>>> GenericJdbcManager? if not, what kind of manager should
I use?
>>> >>>>>   Thanks very much!
>>> >>>>>
>>> >>>>>  -Weihua
>>> >>>>>
>>> >>>>> error message:
>>> >>>>>
>>> >>>>> 12/01/23 15:00:39 ERROR tool.ExportTool: Error during export:
The
>>> >>>>> active connection manager
>>> (org.apache.sqoop.manager.GenericJdbcManager)
>>> >>>>> does not support staging of data for export. Please retry
without
>>> >>>>> specifying the --staging-table option.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>

Mime
View raw message