spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Muthu Jayakumar <bablo...@gmail.com>
Subject Re: Fast write datastore...
Date Wed, 15 Mar 2017 14:28:30 GMT
Hello Vincent,

Cassandra may not fit my bill if I need to define my partition and other
indexes upfront. Is this right?

Hello Richard,

Let me evaluate Apache Ignite. I did evaluate it 3 months back and back
then the connector to Apache Spark did not support Spark 2.0.

Another drastic thought may be repartition the result count to 1 (but have
to be cautions on making sure I don't run into Heap issues if the result is
too large to fit into an executor)  and write to a relational database like
mysql / postgres. But, I believe I can do the same using ElasticSearch too.

A slightly over-kill solution may be Spark to Kafka to ElasticSearch?

More thoughts welcome please.

Thanks,
Muthu

On Wed, Mar 15, 2017 at 4:53 AM, Richard Siebeling <rsiebeling@gmail.com>
wrote:

> maybe Apache Ignite does fit your requirements
>
> On 15 March 2017 at 08:44, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> Hi
>> If queries are statics and filters are on the same columns, Cassandra is
>> a good option.
>>
>> Le 15 mars 2017 7:04 AM, "muthu" <babloo80@gmail.com> a écrit :
>>
>> Hello there,
>>
>> I have one or more parquet files to read and perform some aggregate
>> queries
>> using Spark Dataframe. I would like to find a reasonable fast datastore
>> that
>> allows me to write the results for subsequent (simpler queries).
>> I did attempt to use ElasticSearch to write the query results using
>> ElasticSearch Hadoop connector. But I am running into connector write
>> issues
>> if the number of Spark executors are too many for ElasticSearch to handle.
>> But in the schema sense, this seems a great fit as ElasticSearch has
>> smartz
>> in place to discover the schema. Also in the query sense, I can perform
>> simple filters and sort using ElasticSearch and for more complex
>> aggregate,
>> Spark Dataframe can come back to the rescue :).
>> Please advice on other possible data-stores I could use?
>>
>> Thanks,
>> Muthu
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Fast-write-datastore-tp28497.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>>
>

Mime
View raw message