sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aryan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1390) Import data to HDFS as a set of Parquet files
Date Mon, 03 Nov 2014 20:31:34 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195040#comment-14195040

Aryan commented on SQOOP-1390:

Hi Pratik and Qian Thanks for the inputs.

I am trying to build the above project from git usind ant build.xml but the build fails with
following exception :

'git' is not recognized as an internal or external command,
    [exec] 'git' is not recognized as an internal or external command,
     [exec] operable program or batch file.
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
      [get] To: C:\Users\310185009\git\mynewsqoop2Copy1\lib\ivy-2.3.0.jar
      [get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
to C:\Users\310185009\git\mynewsqoop2Copy1\lib\ivy-2.3.0.jar

1. Do I need to modify build.xml ?
2. Can you tell me the steps to build using either ant or maven or any other way, because
I have tried all permutations and combinations and it is still failing.
3.When I import the cloned repository and import it to package explorer as "general project"
the java libraries donot get added to the project hence I first create a " new java project"
and then point it to the repository folder and then build it. Is there any other way to build
4.Can you provide the final complete set of build jars so that I can test sqoop import for
the time being ?

Thanks in advance :)

> Import data to HDFS as a set of Parquet files
> ---------------------------------------------
>                 Key: SQOOP-1390
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1390
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: tools
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.4.6
>         Attachments: SQOOP-1390.patch
> Parquet files keep data in contiguous chunks by column, appending new records to a dataset
requires rewriting substantial portions of existing a file or buffering records to create
a new file. 
> The JIRA proposes to add the possibility to import an individual table from a RDBMS into
HDFS as a set of Parquet files. We will also provide a command-line interface with a new argument
> Example invocation: 
> {{sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir /path/to/files}}
> The major items are listed as follows:
> * Implement ParquetImportMapper
> * Hook up the ParquetOutputFormat and ParquetImportMapper in the import job.
> * Be able to support import from scratch or in append mode
> Note that as Parquet is a columnar storage format, it doesn't make sense to write to
it directly from record-based tools. So we'd consider to use Kite SDK to simplify the handling
of Parquet specific things.

This message was sent by Atlassian JIRA

View raw message