sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1273) Multiple append jobs can easily end up sharing directories
Date Wed, 29 Jan 2014 20:08:10 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885735#comment-13885735
] 

Hudson commented on SQOOP-1273:
-------------------------------

SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #879 (See [https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/879/])
SQOOP-1273: Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2)
* src/java/org/apache/sqoop/util/AppendUtils.java
* src/java/org/apache/sqoop/tool/ImportTool.java
* src/test/com/cloudera/sqoop/TestAppendUtils.java


> Multiple append jobs can easily end up sharing directories
> ----------------------------------------------------------
>
>                 Key: SQOOP-1273
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1273
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.4
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 1.4.5
>
>         Attachments: SQOOP-1273.patch
>
>
> I've noticed at multiple user deployments that when running Sqoop in append mode ({{--append}})
it can happen that two separate jobs will end up using the same temporary directory.  This
is a disaster as those jobs will then start interfering with each other and possibly even
cause a data loss. Currently we are using following code to generate temporary directory ([AppendUtils.java|https://github.com/apache/sqoop/blob/trunk/src/java/org/apache/sqoop/util/AppendUtils.java#L269]):
> {code}
>   public static Path getTempAppendDir(String tableName) {
>     String timeId = DATE_FORM.format(new Date(System.currentTimeMillis()));
>     String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName;
>     return new Path(tempDir);
>   }
> {code}
> There are three different parts that we are currently using to generate the temporary
directory:
> * {{TEMP_IMPORT_ROOT}}: Constant. It can be changed by the user if needed, but as we
do not have this documented, most users are using the default constant value.
> * {{timeId}} - Current time with millisecond precision.
> * {{tableName}} - Name of the transferred table or {{null}} for query ({{--query}}) based
import.
> The problem mainly surfaces in the {{--query}} based import when 2 out of the 3 parts
are constants and it can happen that two Sqoop jobs might get started at the same time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message