storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Irek Khasyanov <qua...@gmail.com>
Subject Re: storm-rdbms consume data from kafka spout fast enough?
Date Wed, 10 Dec 2014 19:30:28 GMT
I can't help with storm ui, problem can be with many things.

>Again back to batch mode, when you doing the batch copy, my assumption is,
accumulate tuples in a byte array[], and cop/multi-insert into DB, clear
array and >reload ......, is that the way or an existing API I can use?
You can use LinkedBlockingQueue<Tuple> and store tuples, not bytes.

Good example is here https://github.com/hmsonline/storm-cassandra and
http://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/

On 10 December 2014 at 03:13, Sa Li <sa.in.vanc@gmail.com> wrote:

> Hi, Irek
>
> What you have done is exactly I want,  I was running my topology in
> localcluster, but I submit it to storm cluster :
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/etc/apache-storm-0.9.3/lib/logback-classic-1.0.13.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/stuser/backup/pof.analytics.messaging/kafka-storm-ingress/target/kafka-storm-ingress-0.0.1-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/Static
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
> DB connected .....
> 531  [main] INFO  backtype.storm.StormSubmitter - Jar not uploaded to
> master yet. Submitting jar...
> 542  [main] INFO  backtype.storm.StormSubmitter - Uploading topology jar
> target/kafka-storm-ingress-0.0.1-SNAPSHOT-jar-with-dependencies.jar to
> assigned location: /app/storm/nimbus/inbox/s
> r
> 739  [main] INFO  backtype.storm.StormSubmitter - Successfully uploaded
> topology jar to assigned location:
> /app/storm/nimbus/inbox/stormjar-f3b2a8bd-0d16-4ba5-9d94-51b3ecf53e5b.jar
> 740  [main] INFO  backtype.storm.StormSubmitter - Submitting topology 2 in
> distributed mode with conf
> {"topology.max.task.parallelism":5,"nimbus.host":"10.100.70.128","topology.workers":2,
>
> ":6627,"storm.zookeeper.servers":["10.100.70.128"],"topology.trident.batch.emit.interval.millis":2000}
> 842  [main] INFO  backtype.storm.StormSubmitter - Finished submitting
> topology: 2
>
>
> but I find nothing shown in UI, this is one issure. Again back to batch
> mode, when you doing the batch copy, my assumption is, accumulate tuples in
> a byte array[], and cop/multi-insert into DB, clear array and reload
> ......, is that the way or an existing API I can use?
>
> thank
>
> Alec
>
> On Tue, Dec 9, 2014 at 2:04 PM, Irek Khasyanov <quard8@gmail.com> wrote:
>
>> >Do I need to make bulk copy?
>>
>> It depends. If you topology will fail, kafka spout will starts read from
>> last known offset. If you will have too many data to write. And inserting
>> one row can be bottleneck.
>>
>> You can test it actually, stop topology, write around 10000+/- messages
>> to kafka and start topology. In storm ui you will see capacity for writer
>> bolt. If it red colored and over 1.0 you should notice that and this is
>> your bottleneck.
>>
>> We have kafka to HP Vertica stream. Vertica don't like 1 row inserts and
>> we added batches with 10K rows. With 4 workers everything looks great.
>>
>>
>>
>> On 10 December 2014 at 00:34, Sa Li <sa.in.vanc@gmail.com> wrote:
>>
>>> Hello, all
>>>
>>> I have a question here, as I post several threads before, I am using
>>> storm-rdbms to write into postgresqlDB, data was collected from
>>> kafkaSpout, it works. Since it insert into DB once I get a tuple, per
>>> row/insert operation. I have concern that if this type of consuming is fast
>>> enough and will potentially cost the overhead?
>>>
>>> Do I need to make bulk copy?
>>>
>>>
>>> thanks
>>>
>>>
>>> Alec
>>>
>>
>>
>>
>> --
>> With best regards, Irek Khasyanov.
>>
>
>


-- 
With best regards, Irek Khasyanov.

Mime
View raw message