sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Kemper <mar...@cloudera.com>
Subject Re: Getting upper bound in --incremental mode
Date Thu, 20 Jul 2017 22:41:17 GMT
I am not aware of a method to derive the --last-value from the target RDBMS
table.  You could use Sqoop eval to select the value and pass it to
--last-value however that is not very elegant.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Jul 20, 2017 at 6:30 PM, Jagrut Sharma <jagrutsharma@gmail.com>
wrote:

> Hi Markus - I'm using Sqoop v1, but with a custom metastore (not the one
> that Sqoop provides). My original question was on deriving the --last-value
> based on table column values (and not time of job execution).
>
> Thanks.
> --
> Jagrut
>
> On Thu, Jul 20, 2017 at 2:48 PM, Markus Kemper <markus@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Are you using the Sqoop1 Metastore job tool (assuming yes)?
>> Are you wanting to override the current stored --last-value when
>> executing the Sqoop job?
>>
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <jagrutsharma@gmail.com>
>> wrote:
>>
>>> Hi Markus - The question was that --incremental with --lastmodified
>>> option always takes the current time as the upper bound, and this gets
>>> stored as the --last-value for the next run.
>>>
>>> In certain cases, it is desirable that the upper bound should come from
>>> the actual column values, and that should get set for the --last-value for
>>> next run.
>>> -
>>> Jagrut
>>>
>>>
>>>
>>> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <markus@cloudera.com>
>>> wrote:
>>>
>>>> Hey Jagrut,
>>>>
>>>> Can you elaborate more about the problem you are facing and what you
>>>> mean
>>>> by (Is this possible to set while running sqoop?).
>>>>
>>>>
>>>> Markus Kemper
>>>> Customer Operations Engineer
>>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <jagrutsharma@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Tony - I was under the assumption that append mode will not work
>>>> for
>>>> > timestamp column. But I gave it a try after your reply, and it works.
>>>> And
>>>> > it gets the upper bound from the database itself. Thanks.
>>>> >
>>>> > --
>>>> > Jagrut
>>>> >
>>>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <tony@phdata.io>
>>>> wrote:
>>>> >
>>>> >> Does `--incremental append` work for you?
>>>> >>
>>>> >> > You should specify append mode when importing a table where
new
>>>> rows
>>>> >> are continually being added with increasing row id values
>>>> >>
>>>> >> Tony
>>>> >>
>>>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsharma@gmail.com
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > Hi all - For --incremental mode with 'lastmodified' option,
Sqoop
>>>> (v
>>>> >> 1.4.2)
>>>> >> > generates a query like:
>>>> >> > WHERE column >= last_modified_time and column < current_time
>>>> >> >
>>>> >> > The --last-value is set to the current_time and gets used for
the
>>>> next
>>>> >> run.
>>>> >> >
>>>> >> > Here, the upper bound is always set to the current_time. In
some
>>>> cases,
>>>> >> > this upper bound is required to be taken from the database
table
>>>> column
>>>> >> > itself. So, the query is required of the form:
>>>> >> > WHERE column >= last_modified_time and column <
>>>> >> max_time_in_db_table_column
>>>> >> >
>>>> >> > And the --last-value for next run needs to be set as
>>>> >> > the max_time_in_db_table_column (and not the current_time).
>>>> >> >
>>>> >> > Is this possible to set while running sqoop?  If no, is there
any
>>>> >> > workaround suggested for this?
>>>> >> >
>>>> >> > Thanks a lot.
>>>> >> > --
>>>> >> > Jagrut
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Jagrut
>>>
>>>
>>
>
>
> --
> Jagrut
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message