spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shiva Ramagopal <tr.s...@gmail.com>
Subject Re: Fast write datastore...
Date Wed, 15 Mar 2017 14:50:26 GMT
Probably Cassandra is a good choice if you are mainly looking for a
datastore that supports fast writes. You can ingest the data into a table
and define one or more materialized views on top of it to support your
queries. Since you mention that your queries are going to be simple you can
define your indexes in the materialized views according to how you want to
query the data.

Thanks,
Shiva



On Wed, Mar 15, 2017 at 7:58 PM, Muthu Jayakumar <babloo80@gmail.com> wrote:

> Hello Vincent,
>
> Cassandra may not fit my bill if I need to define my partition and other
> indexes upfront. Is this right?
>
> Hello Richard,
>
> Let me evaluate Apache Ignite. I did evaluate it 3 months back and back
> then the connector to Apache Spark did not support Spark 2.0.
>
> Another drastic thought may be repartition the result count to 1 (but have
> to be cautions on making sure I don't run into Heap issues if the result is
> too large to fit into an executor)  and write to a relational database like
> mysql / postgres. But, I believe I can do the same using ElasticSearch too.
>
> A slightly over-kill solution may be Spark to Kafka to ElasticSearch?
>
> More thoughts welcome please.
>
> Thanks,
> Muthu
>
> On Wed, Mar 15, 2017 at 4:53 AM, Richard Siebeling <rsiebeling@gmail.com>
> wrote:
>
>> maybe Apache Ignite does fit your requirements
>>
>> On 15 March 2017 at 08:44, vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> Hi
>>> If queries are statics and filters are on the same columns, Cassandra is
>>> a good option.
>>>
>>> Le 15 mars 2017 7:04 AM, "muthu" <babloo80@gmail.com> a écrit :
>>>
>>> Hello there,
>>>
>>> I have one or more parquet files to read and perform some aggregate
>>> queries
>>> using Spark Dataframe. I would like to find a reasonable fast datastore
>>> that
>>> allows me to write the results for subsequent (simpler queries).
>>> I did attempt to use ElasticSearch to write the query results using
>>> ElasticSearch Hadoop connector. But I am running into connector write
>>> issues
>>> if the number of Spark executors are too many for ElasticSearch to
>>> handle.
>>> But in the schema sense, this seems a great fit as ElasticSearch has
>>> smartz
>>> in place to discover the schema. Also in the query sense, I can perform
>>> simple filters and sort using ElasticSearch and for more complex
>>> aggregate,
>>> Spark Dataframe can come back to the rescue :).
>>> Please advice on other possible data-stores I could use?
>>>
>>> Thanks,
>>> Muthu
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Fast-write-datastore-tp28497.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>>
>>
>

Mime
View raw message