spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petar Zecevic <petar.zece...@gmail.com>
Subject Re: Is spark suitable for real time query
Date Tue, 28 Jul 2015 15:11:36 GMT

You can try out a few tricks employed by folks at Lynx Analytics... 
Daniel Darabos gave some details at Spark Summit:
https://www.youtube.com/watch?v=zt1LdVj76LU&index=13&list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs


On 22.7.2015. 17:00, Louis Hust wrote:
> My code like below:
>             Map<String, String> t11opt = new HashMap<String, String>();
>             t11opt.put("url", DB_URL);
>             t11opt.put("dbtable", "t11");
>             DataFrame t11 = sqlContext.load("jdbc", t11opt);
>             t11.registerTempTable("t11");
>
>             .......the same for t12, t21, t22
>
>
>             DataFrame t1 = t11.unionAll(t12);
>             t1.registerTempTable("t1");
>             DataFrame t2 = t21.unionAll(t22);
>             t2.registerTempTable("t2");
>             for (int i = 0; i < 10; i ++) {
>                 System.out.println(new Date(System.currentTimeMillis()));
>                 DataFrame crossjoin = sqlContext.sql("select txt from 
> t1 join t2 on t1.id <http://t1.id> = t2.id <http://t2.id>");
>                 crossjoin.show();
>                 System.out.println(new Date(System.currentTimeMillis()));
>             }
>
> Where t11,t12, t21,t22 are all table dataframe load from jdbc  of 
> mysql database which is at local with the spark job.
>
> But each loop execute about 3 seconds. i do not know why cost so many 
> time?
>
>
>
>
> 2015-07-22 19:52 GMT+08:00 Robin East <robin.east@xense.co.uk 
> <mailto:robin.east@xense.co.uk>>:
>
>     Here’s an example using spark-shell on my laptop:
>
>     sc.textFile("LICENSE").filter(_ contains "Spark").count
>
>     This takes less than a second the first time I run it and is
>     instantaneous on every subsequent run.
>
>     What code are you running?
>
>
>>     On 22 Jul 2015, at 12:34, Louis Hust <louis.hust@gmail.com
>>     <mailto:louis.hust@gmail.com>> wrote:
>>
>>     I do a simple test using spark in standalone mode(not cluster),
>>      and found a simple action take a few seconds, the data size is
>>     small, just few rows.
>>     So each spark job will cost some time for init or prepare work no
>>     matter what the job is?
>>     I mean if the basic framework of spark job will cost seconds?
>>
>>     2015-07-22 19:17 GMT+08:00 Robin East <robin.east@xense.co.uk
>>     <mailto:robin.east@xense.co.uk>>:
>>
>>         Real-time is, of course, relative but you’ve mentioned
>>         microsecond level. Spark is designed to process large amounts
>>         of data in a distributed fashion. No distributed system I
>>         know of could give any kind of guarantees at the microsecond
>>         level.
>>
>>         Robin
>>
>>         > On 22 Jul 2015, at 11:14, Louis Hust <louis.hust@gmail.com
>>         <mailto:louis.hust@gmail.com>> wrote:
>>         >
>>         > Hi, all
>>         >
>>         > I am using spark jar in standalone mode, fetch data from
>>         different mysql instance and do some action, but i found the
>>         time is at second level.
>>         >
>>         > So i want to know if spark job is suitable for real time
>>         query which at microseconds?
>>
>>
>
>


Mime
View raw message