spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tapan Upadhyay <>
Subject Re: migration from Teradata to Spark SQL
Date Wed, 04 May 2016 15:19:12 GMT
Thank you everyone for guidance.

*Jorn* our motivation is to move bulk of adhoc queries to hadoop so that we
have enough bandwidth on our DB for imp batch/queries.

For implementing lambda architecture is it possible to get the real time
updates from Teradata of any insert/update/delete? DBlogs?

*Deepak *should we query data from cassandra using spark? how it will be
different in terms of performance if we store our data in hive
tables(parquet) and query using spark? in case there is not much
performance gain why add one more layer of processing

*Mich *we plan to sync the data using sqoop hourly/EOD jobs? still not
decided how frequently we would need to do that. It will be based on user
requirement. In case they need real time data we need to think of an
alternative? How are you doing the same for Sybase? How you sync real time?

Thank you!!

Tapan Upadhyay
+1 973 652 8757

On Wed, May 4, 2016 at 4:33 AM, Alonso Isidoro Roman <>

> I agree with Deepak and i would try to save data in parquet and avro
> format, if you can, try to measure the performance and choose the best, it
> will probably be parquet, but you have to know for yourself.
> Alonso Isidoro Roman.
> Mis citas preferidas (de hoy) :
> "Si depurar es el proceso de quitar los errores de software, entonces
> programar debe ser el proceso de introducirlos..."
>  -  Edsger Dijkstra
> My favorite quotes (today):
> "If debugging is the process of removing software bugs, then programming
> must be the process of putting ..."
>   - Edsger Dijkstra
> "If you pay peanuts you get monkeys"
> 2016-05-04 9:22 GMT+02:00 Jörn Franke <>:
>> Look at lambda architecture.
>> What is the motivation of your migration?
>> On 04 May 2016, at 03:29, Tapan Upadhyay <> wrote:
>> Hi,
>> We are planning to move our adhoc queries from teradata to spark. We have
>> huge volume of queries during the day. What is best way to go about it -
>> 1) Read data directly from teradata db using spark jdbc
>> 2) Import data using sqoop by EOD jobs into hive tables stored as parquet
>> and then run queries on hive tables using spark sql or spark hive context.
>> any other ways through which we can do it in a better/efficiently?
>> Please guide.
>> Regards,
>> Tapan

View raw message