spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject importing data into hdfs/spark using Informatica ETL tool
Date Wed, 09 Nov 2016 13:56:37 GMT

I am exploring the idea of flexibility with importing multiple RDBMS tables
using Informatica that customer has into HDFS.

I don't want to use connectivity tools from Informatica to Hive etc.

So this is what I have in mind

   1. If possible get the tables data out using Informatica and use
   Informatica ui  to convert RDBMS data into some form of CSV, TSV file (Can
   Informatica do it?) I guess yes
   2. Put the flat files on an edge where HDFS node can see them.
   3. Assuming that a directory can be created by Informatica daily,
   periodically run a cron that ingest that data from directories into HDFS
   equivalent daily directories
   4. Once the data is in HDFS one can use, Spark csv, Hive etc to query

The problem I have is to see if someone has done such thing before.
Specifically can Informatica create target flat files on normal directories.

Any other generic alternative?


Dr Mich Talebzadeh

LinkedIn *

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

View raw message