sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srivathsan Srinivas <periya.d...@gmail.com>
Subject denormalize data during import
Date Sun, 21 Aug 2011 06:06:06 GMT
Hi,
   I am looking for a way to denormalize my tables when I am importing from
MySQL to HDFS/HIVE. What is the best approach? Is there a way to add columns
dynamically when using Sqoop or any other means?

For eg., an RDBMS table of students might have student name and address(es).
This may be a normalized form in different tables. What I want during my
import process is a single row for each student where there will be multiple
columns for all the addresses for a given student (like addr_1, addr_2,
...etc). I can write UDFs for it, but, do not know how to attach it to
sqoop. I undertand that sqoop can add SQL queries (such as join) before
importing. But, that will add more rows. I want a single row with multiple
columns.

Should I have UDFs in Sqoop or Oozie in my workflow and denormalize them on
the fly? I have not tried it. Is there any suggested approach? I am
beginning to look at Solr's DataImportHandler...perhaps that could solve
this.

Suggestions are appreciated.

Thanks,
PD.

Mime
View raw message