sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: [parquet-dev] Will parquet support ETL tools like Hadoop Sqoop
Date Tue, 01 Oct 2013 01:09:32 GMT
DB Tsai,

I do not have experience with sqoop, but it looks like the process should
be pretty straightforward. As far as I can see sqoop can only export
delimited text or SequenceFile (
http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_file_formats). That
being said, both of these formats are readable by hive and pig. If you do
not mind doing a two pass conversion you can use sqoop to get your data
into HDFS in either of the formats and then use hive or pig to read them
and re-export into parquet. Depending on your cluster setup and use case I
would look at the various encodings and compressions offered in parquet, as
these will need to be chosen when you write the files. In most cases
compression will save you time reading the data.

Regards,
Jason Altekruse




On Mon, Sep 30, 2013 at 3:33 PM, Marcel Kornacker <marcel@cloudera.com>wrote:

> Cross-posting to Sqoop dev list.
>
> On Mon, Sep 30, 2013 at 12:03 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
> > Hi parquet developers,
> >
> > Is there any way to use ETL tools like hadoop sqoop with parquet
> > format? If not, how do users dump the data from database to hdfs to do
> > further analysis now?
> >
> > Thanks.
> >
> > Sincerely,
> >
> > DB Tsai
> > -----------------------------------
> > Web: http://www.dbtsai.com
> >
> > --
> > http://parquet.github.com/
> > ---
> > You received this message because you are subscribed to the Google
> Groups "Parquet" group.
> > To post to this group, send email to parquet-dev@googlegroups.com.
>
> --
> http://parquet.github.com/
> ---
> You received this message because you are subscribed to the Google Groups
> "Parquet" group.
> To post to this group, send email to parquet-dev@googlegroups.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message