spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: importing data into hdfs/spark using Informatica ETL tool
Date Wed, 09 Nov 2016 21:26:55 GMT
Basically you mention the options. However, there are several ways how informatica can extract
(or store) from/to rdbms. If the native option is not available then you need to go via JDBC
as you have described. 
Alternatively (but only if it is worth it) you can schedule fetching of the files via oozie
and use it to convert the csv into orc/ parquet etc.
If this is a common use case in the company you can extend informatica with Java classes that
for instance convert the data directly into parquet or orc. However, is some effort.

> On 9 Nov 2016, at 14:56, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> Hi,
> 
> I am exploring the idea of flexibility with importing multiple RDBMS tables using Informatica
that customer has into HDFS.
> 
> I don't want to use connectivity tools from Informatica to Hive etc.
> 
> So this is what I have in mind
> 
> If possible get the tables data out using Informatica and use Informatica ui  to convert
RDBMS data into some form of CSV, TSV file (Can Informatica do it?) I guess yes
> Put the flat files on an edge where HDFS node can see them.
> Assuming that a directory can be created by Informatica daily, periodically run a cron
that ingest that data from directories into HDFS equivalent daily directories
> Once the data is in HDFS one can use, Spark csv, Hive etc to query data
> The problem I have is to see if someone has done such thing before. Specifically can
Informatica create target flat files on normal directories.
> 
> Any other generic alternative?
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage
or destruction of data or any other property which may arise from relying on this email's
technical content is explicitly disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
>  

Mime
View raw message