storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniela S" <daniela_4...@gmx.at>
Subject Aw: Re: Re: Re: Pull from Redis
Date Thu, 02 Jun 2016 08:37:19 GMT
<html><head></head><body><div style="font-family: Verdana;font-size:
12.0px;"><div>
<div>Assuming I will use the Lua script in Redis for calculating my values. How can
I ensure that this step is repeated every minute? And further how can I ensure that Storm
receives always the latest values from the last minute from Redis? Do I need a Redis Spout
or do I have to use a bolt with a tick tuple or a time window?</div>

<div>&nbsp;</div>

<div>Thank you in advance!</div>

<div>&nbsp;
<div name="quote" style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px
solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div style="margin:0 0 10px 0;"><b>Gesendet:</b>&nbsp;Dienstag, 31.
Mai 2016 um 01:11 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Yuri Kostin&quot; &lt;kostine@gmail.com&gt;<br/>
<b>An:</b>&nbsp;user@storm.apache.org<br/>
<b>Betreff:</b>&nbsp;Re: Re: Re: Pull from Redis</div>

<div name="quoted-content">
<div>In theory if you could store profile with the program start time you could calculate
the value you need in whatever database you are storing it in. In redis you can do it with
LUA script for example. My guess is that there might a point when iteration and calculation
would take longer than a minute and essentially be out of date as soon as it finishes. Eliminating
the lookup of profile during calculation might help with the processing time. If minute intervals
are predictable ex. if minute 1 is 2, and minute 2 is 4 then minute 60 is 120, you only need
to store &ldquo;formula&rdquo; or pattern for this calculation and not an array of
all possible values. If multiple programs share this formula, you could preload them into
a lookup and use &ldquo;type&rdquo; to look it up, etc. It&rsquo;s a tricky one.
If you use redis list, it would be more difficult to remove an item after you receive end
time. Essentially you would need to keep track of both active and recently ended programs
and check during calculation and either remove or leave the program in the list. This could
be done by creating an empty redis key &ldquo;program_id_ended&rdquo; or something
similar, then you can use redis to check if it exists while you are iterating and remove both
value and this ended flag key and keep going. You can create a structure in storm, hash, etc.
populate it with your programs and profiles and calculate this sum on system tick tuple. I
don&rsquo;t know what kind of performance and memory requirement you will get if you store
millions of items, but you should be able to scale it across many servers. Durability of this
approach is also not the same as redis, if topology goes down, this store will have to be
rebuilt from somewhere. This is pretty simple map/reduce process, I am just not sure redis
is the best tool for the job, maybe multiple redis servers to share the load of key iteration,
if it becomes the bottleneck. I would try redis with either million items in a list or million
keys, then use LUA to do your calculation and return its sum, and store profiles in the same
json payload. This would be a benchmark for &ldquo;ideal&rdquo; situation using redis.
It should be fairly easy to populate test redis db with some data using a script in any language.<br/>
&nbsp;
<div>
<blockquote>
<div>On May 30, 2016, at 4:24 PM, Daniela S &lt;<a href="daniela_4444@gmx.at"
target="_parent">daniela_4444@gmx.at</a>&gt; wrote:</div>
&nbsp;

<div>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>No problem, I am so glad that you help me! Thank you!</div>

<div>&nbsp;</div>

<div>No unfortunately this is not possible. I only receive events containing the program,
the timestamp and the command &quot;start&quot; or &quot;end&quot;. I could
join the &quot;profile&quot; with each event but I am not sure if this makes sense
as I still have to repeat my calculation every minute and to store my active programs anywhere.
Otherwise I do not know which programs have already ended.</div>

<div>This is different according to the program, but I would say like 100 to 150&nbsp;minute
values per program. There could be millions of programs running at the same time.</div>

<div>&nbsp;</div>

<div>Regards,</div>

<div>Daniela</div>

<div>&nbsp;
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left:
2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Montag, 30.
Mai 2016 um 23:17 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Yuri Kostin&quot; &lt;<a href="kostine@gmail.com"
target="_parent">kostine@gmail.com</a>&gt;<br/>
<b>An:</b>&nbsp;<a href="user@storm.apache.org" target="_parent">user@storm.apache.org</a><br/>
<b>Betreff:</b>&nbsp;Re: Re: Re: Re: Pull from Redis</div>

<div>
<div>is it possible to store these values with original json payload? How many minute
values are there? How many programs could be running at the same time?
<div>I am sorry about all the questions, there are just so many ways this can be approached
and every detail could make a difference.<br/>
&nbsp;
<div>
<blockquote>
<div>On May 30, 2016, at 4:14 PM, Daniela S &lt;<a>daniela_4444@gmx.at</a>&gt;
wrote:</div>
&nbsp;

<div>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>No unfortunately not. Each program has its own &quot;profile&quot; with
different values each minute.</div>

<div>&nbsp;
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left:
2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Montag, 30.
Mai 2016 um 23:04 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Yuri Kostine&quot; &lt;<a>kostine@gmail.com</a>&gt;<br/>
<b>An:</b>&nbsp;<a>user@storm.apache.org</a><br/>
<b>Betreff:</b>&nbsp;Re: Aw: Re: Re: Re: Pull from Redis</div>

<div>
<div>
<div>&nbsp;</div>

<div>Makes sense. Do all programs have the same value at minute 3?</div>

<div><br/>
On May 30, 2016, at 3:55 PM, Daniela S &lt;<a>daniela_4444@gmx.at</a>&gt;
wrote:<br/>
&nbsp;</div>

<blockquote>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>I try to explain a little bit more in detail:</div>

<div>&nbsp;</div>

<div>I receive for example a start event for program X. When program X is finisehd I
will receive an end message for program X. As long as I do not receive an end message for
a program I assume that it is running and it should be stored in Redis.</div>

<div>Let&#39;s assume that program X has been started and I did not receive an end
message yet. So I have to pull it from Redis and to calculate how far the program is at the
moment (current time - start time). With this value, let&#39;s assume it is minute 3,
I have to look up which value corresponds to minute 3. And this value is the value I need
for my sum.&nbsp;</div>

<div>I have to do this for every started program and I have to repeat the sum building
every minute as every program changes its&nbsp;value each minute, as long as it has not
ended.</div>

<div>&nbsp;</div>

<div>Thank you and regards,</div>

<div>Daniela</div>

<div>&nbsp;
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left:
2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Montag, 30.
Mai 2016 um 22:34 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Yuri Kostine&quot; &lt;<a>kostine@gmail.com</a>&gt;<br/>
<b>An:</b>&nbsp;<a>user@storm.apache.org</a><br/>
<b>Betreff:</b>&nbsp;Re: Aw: Re: Re: Pull from Redis</div>

<div>
<div>
<div>&nbsp;</div>

<div>Is the sum the amount of time all current programs have been running? How does
storm/redis know when the program is done and needs to be removed? For example, you get a
json payload with a start time, no end time. You push that into redis key or list. 1 minute
lapses (no other events have been written) you look at that json and calculate time in seconds
etc, time now-start time. Let&#39;s say it&#39;s 120, then you take 120 and do what
with it? And if there are 10 events, each returning 120 will that be 1200 &gt; calculation
or do you have to calculate each event by itself and then sum results because each event gets
its own unique multiplier?</div>

<div><br/>
On May 30, 2016, at 2:35 PM, Daniela S &lt;<a>daniela_4444@gmx.at</a>&gt;
wrote:<br/>
&nbsp;</div>

<blockquote>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>Thank you for your support! I will try to explain what I would like to do:</div>

<div>&nbsp;</div>

<div>I am receiving JSON strings from Kafka. These JSON strings contain start and end
events of programs. I would like to use Redis as cache to store all the programs, which are
started but have not ended yet. As soon as a program has ended it should be deleted from Redis.
I would like to build a sum over all programs stored in Redis. But I need another value to
build the sum. To get this value I have to calculate the difference between the actual time
and the timestamp of each event stored in Redis. With this calculated value I would like to
look up the value I need to build the sum. This must be done for each stored entry and it
should be&nbsp;repeated everytime a new value has been added or removed from Redis or
otherwise every minute.</div>

<div>&nbsp;</div>

<div>How should such problems be solved within Storm? I thought about a kind of cache
like Redis.</div>

<div>&nbsp;</div>

<div>Thank you in advance.</div>

<div>&nbsp;</div>

<div>Regards,</div>

<div>Daniela</div>

<div>&nbsp;
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left:
2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Montag, 30.
Mai 2016 um 21:16 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Yuri Kostine&quot; &lt;<a>kostine@gmail.com</a>&gt;<br/>
<b>An:</b>&nbsp;<a>user@storm.apache.org</a><br/>
<b>Betreff:</b>&nbsp;Re: Aw: Re: Pull from Redis</div>

<div>
<div>
<div>&nbsp;</div>

<div>It depends on definition of slow and data stored of course, my guess is that few
million of keys might take a minute? Pure guess. Redis is a key value store, you give it a
key and you can perform an operation on its value. Iterating over all keys is the slowest
operation in redis. I think it will also block all other operations while this one is executing.
I know this is a storm and not redis group, I am not sure there is a storm solution if redis
is your partial data storage. It&#39;s not a relational database so it&#39;s not great
at joins, aggregations, etc. just my 2c. Time series aggregations in redis are done with 1
key per interval, for example. 2016-06-01: 1:30pm event would execute a counter increment
in 2016 key, 2016-06, 2016-06-01, etc down to your smallest interval. Then to pull count for
a day you would get 1 key only, 2016-06-01. This approach is fast because all operations are
key value based, accessing only 1 key at a time. There is no way to pull data you need at
the same time before you store that key into redis? You can use redis as your queue and process
it once a minute with a topology, then create a new time based queue key and keep going. You
would store your data a bit differently though. Instead of many keys, you would have one key
with array of values. You keep pushing into it based on a time stamp, then when it lapses
you process it with storm and pop those values out one at a time. Lookup the data you need,
keep an aggregate and keep going till the queue is empty.&nbsp;</div>

<div><br/>
On May 30, 2016, at 1:17 PM, Daniela S &lt;<a>daniela_4444@gmx.at</a>&gt;
wrote:<br/>
&nbsp;</div>

<blockquote>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>I have to pull the entries and to add a specific value to every entry. This value
is stored in another database and therefore I would like to make the join. based on some conditions,&nbsp;in
Storm.&nbsp;I need this value to build the sum, as the entries do not contain any information
for the sum.&nbsp;</div>

<div>&nbsp;</div>

<div>What would be very few keys?&nbsp;</div>

<div>&nbsp;</div>

<div>Thank you and regards,</div>

<div>Daniela</div>

<div>&nbsp;
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left:
2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Montag, 30.
Mai 2016 um 20:11 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Yuri Kostine&quot; &lt;<a>kostine@gmail.com</a>&gt;<br/>
<b>An:</b>&nbsp;<a>user@storm.apache.org</a><br/>
<b>Betreff:</b>&nbsp;Re: Pull from Redis</div>

<div>
<div>
<div>&nbsp;</div>

<div>Do you pull entries only to sum them up? Why not keep a running total in redis
in a time stamped key by minute? Generally speaking redis is not great for pulling all keys
unless there are very few keys.&nbsp;</div>

<div><br/>
On May 30, 2016, at 12:49 PM, Daniela S &lt;<a>daniela_4444@gmx.at</a>&gt;
wrote:<br/>
&nbsp;</div>

<blockquote>
<div>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>Hi</div>

<div>&nbsp;</div>

<div>I have a topology that stores entries in Redis. Now I would like to pull all entries
from Redis every minute or as soon as a value has changed. How can I do that? Can I add another
bolt to my topology for this task or do I have to use a spout or even a new topology? I would
like to build a sum over all entries every minute. Do you have any advice for that?</div>

<div>&nbsp;</div>

<div>Thank you in advance.</div>

<div>&nbsp;</div>

<div>Regards,</div>

<div>Daniela</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div></div></body></html>

Mime
View raw message