samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shekar Tippur <ctip...@gmail.com>
Subject Re: Samza as a Caching layer
Date Tue, 02 Sep 2014 22:24:04 GMT
Chris ,

In the current state, I just want Samza to connect to 127.0.0.1.

I have set YARN_HOME to

$ echo $YARN_HOME

/home/ctippur/hello-samza/deploy/yarn

I still dont see anything on hadoop console.

 Also, I see this during startup


SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.

- Shekar


On Tue, Sep 2, 2014 at 2:20 PM, Chris Riccomini <
criccomini@linkedin.com.invalid> wrote:

> Hey Shekar,
>
> Ah ha. In that case, do you expect your SamzaContainer to try to connect
> to the RM at 127.0.0.1, or do you expect it to try to connect to some
> remote RM? If you expect it to try and connect to a remote RM, it's not
> doing that. Perhaps because YARN_HOME isn't set.
>
> If you go to your RM's web interface, how many active nodes do you see
> listed?
>
> Cheers,
> Chris
>
> On 9/2/14 2:17 PM, "Shekar Tippur" <ctippur@gmail.com> wrote:
>
> >Chris ..
> >
> >I am using a rhel server to host all the components (Yarn, kafka, samza).
> >I
> >dont have ACLs open to wikipedia.
> >I am following ..
> >
> http://samza.incubator.apache.org/learn/tutorials/latest/run-hello-samza-w
> >ithout-internet.html
> >
> >- Shekar
> >
> >
> >
> >On Tue, Sep 2, 2014 at 2:13 PM, Chris Riccomini <
> >criccomini@linkedin.com.invalid> wrote:
> >
> >> Hey Shekar,
> >>
> >> Can you try changing that to:
> >>
> >>   http://127.0.0.1:8088/cluster
> >>
> >>
> >> And see if you can connect?
> >>
> >> Cheers,
> >> Chris
> >>
> >> On 9/2/14 1:21 PM, "Shekar Tippur" <ctippur@gmail.com> wrote:
> >>
> >> >Other observation is ..
> >> >
> >> >http://10.132.62.185:8088/cluster shows that no applications are
> >>running.
> >> >
> >> >- Shekar
> >> >
> >> >
> >> >
> >> >
> >> >On Tue, Sep 2, 2014 at 1:15 PM, Shekar Tippur <ctippur@gmail.com>
> >>wrote:
> >> >
> >> >> Yarn seem to be running ..
> >> >>
> >> >> yarn      5462  0.0  2.0 1641296 161404 ?      Sl   Jun20  95:26
> >> >> /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m
> >> >> -Dhadoop.log.dir=/var/log/hadoop-yarn
> >> >>-Dyarn.log.dir=/var/log/hadoop-yarn
> >> >> -Dhadoop.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log
> >> >> -Dyarn.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log
> >> >> -Dyarn.home.dir=/usr/lib/hadoop-yarn
> >> >>-Dhadoop.home.dir=/usr/lib/hadoop-yarn
> >> >> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA
> >> >> -Djava.library.path=/usr/lib/hadoop/lib/native -classpath
> >> >>
> >>
> >>>>/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/
> >>>>*:
> >>
> >>>>/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*
> >>>>:/
> >>
> >>>>usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yar
> >>>>n/
> >>
> >>>>.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/us
> >>>>r/
> >>
> >>>>lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/rm-con
> >>>>fi
> >> >>g/log4j.properties
> >> >> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> >> >>
> >> >> I can tail kafka topic as well ..
> >> >>
> >> >> deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper
> >>localhost:2181
> >> >>--topic wikipedia-raw
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Sep 2, 2014 at 1:04 PM, Chris Riccomini <
> >> >> criccomini@linkedin.com.invalid> wrote:
> >> >>
> >> >>> Hey Shekar,
> >> >>>
> >> >>> It looks like your job is hanging trying to connect to the RM on
> >>your
> >> >>> localhost. I thought that you said that your job was running in
> >>local
> >> >>> mode. If so, it should be using the LocalJobFactory. If not, and
you
> >> >>> intend to run on YARN, is your YARN RM up and running on localhost?
> >> >>>
> >> >>> Cheers,
> >> >>> Chris
> >> >>>
> >> >>> On 9/2/14 12:22 PM, "Shekar Tippur" <ctippur@gmail.com> wrote:
> >> >>>
> >> >>> >Chris ..
> >> >>> >
> >> >>> >$ cat ./deploy/samza/undefined-samza-container-name.log
> >> >>> >
> >> >>> >2014-09-02 11:17:58 JobRunner [INFO] job factory:
> >> >>> >org.apache.samza.job.yarn.YarnJobFactory
> >> >>> >
> >> >>> >2014-09-02 11:17:59 ClientHelper [INFO] trying to connect to
RM
> >> >>> >127.0.0.1:8032
> >> >>> >
> >> >>> >2014-09-02 11:17:59 NativeCodeLoader [WARN] Unable to load
> >> >>>native-hadoop
> >> >>> >library for your platform... using builtin-java classes where
> >> >>>applicable
> >> >>> >
> >> >>> >2014-09-02 11:17:59 RMProxy [INFO] Connecting to ResourceManager
> >>at /
> >> >>> >127.0.0.1:8032
> >> >>> >
> >> >>> >
> >> >>> >and Log4j ..
> >> >>> >
> >> >>> ><!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
> >> >>> >
> >> >>> ><log4j:configuration
> >>xmlns:log4j="http://jakarta.apache.org/log4j/">
> >> >>> >
> >> >>> >  <appender name="RollingAppender"
> >> >>> >class="org.apache.log4j.DailyRollingFileAppender">
> >> >>> >
> >> >>> >     <param name="File"
> >> >>> >value="${samza.log.dir}/${samza.container.name}.log"
> >> >>> >/>
> >> >>> >
> >> >>> >     <param name="DatePattern" value="'.'yyyy-MM-dd" />
> >> >>> >
> >> >>> >     <layout class="org.apache.log4j.PatternLayout">
> >> >>> >
> >> >>> >      <param name="ConversionPattern" value="%d{yyyy-MM-dd
> >>HH:mm:ss}
> >> >>> %c{1}
> >> >>> >[%p] %m%n" />
> >> >>> >
> >> >>> >     </layout>
> >> >>> >
> >> >>> >  </appender>
> >> >>> >
> >> >>> >  <root>
> >> >>> >
> >> >>> >    <priority value="info" />
> >> >>> >
> >> >>> >    <appender-ref ref="RollingAppender"/>
> >> >>> >
> >> >>> >  </root>
> >> >>> >
> >> >>> ></log4j:configuration>
> >> >>> >
> >> >>> >
> >> >>> >On Tue, Sep 2, 2014 at 12:02 PM, Chris Riccomini <
> >> >>> >criccomini@linkedin.com.invalid> wrote:
> >> >>> >
> >> >>> >> Hey Shekar,
> >> >>> >>
> >> >>> >> Can you attach your log files? I'm wondering if it's a
> >> >>>mis-configured
> >> >>> >> log4j.xml (or missing slf4j-log4j jar), which is leading
to
> >>nearly
> >> >>> empty
> >> >>> >> log files. Also, I'm wondering if the job starts fully.
Anything
> >>you
> >> >>> can
> >> >>> >> attach would be helpful.
> >> >>> >>
> >> >>> >> Cheers,
> >> >>> >> Chris
> >> >>> >>
> >> >>> >> On 9/2/14 11:43 AM, "Shekar Tippur" <ctippur@gmail.com>
wrote:
> >> >>> >>
> >> >>> >> >I am running in local mode.
> >> >>> >> >
> >> >>> >> >S
> >> >>> >> >On Sep 2, 2014 11:42 AM, "Yan Fang" <yanfang724@gmail.com>
> >>wrote:
> >> >>> >> >
> >> >>> >> >> Hi Shekar.
> >> >>> >> >>
> >> >>> >> >> Are you running job in local mode or yarn mode?
If yarn mode,
> >>the
> >> >>> log
> >> >>> >> >>is in
> >> >>> >> >> the yarn's container log.
> >> >>> >> >>
> >> >>> >> >> Thanks,
> >> >>> >> >>
> >> >>> >> >> Fang, Yan
> >> >>> >> >> yanfang724@gmail.com
> >> >>> >> >> +1 (206) 849-4108
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> On Tue, Sep 2, 2014 at 11:36 AM, Shekar Tippur
> >> >>><ctippur@gmail.com>
> >> >>> >> >>wrote:
> >> >>> >> >>
> >> >>> >> >> > Chris,
> >> >>> >> >> >
> >> >>> >> >> > Got some time to play around a bit more.
> >> >>> >> >> > I tried to edit
> >> >>> >> >> >
> >> >>> >> >> >
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> >>>>>>>>>samza-wikipedia/src/main/java/samza/examples/wikipedia/task/Wikipe
> >>>>>>>>>di
> >> >>>>>>>aFe
> >> >>> >>>>ed
> >> >>> >> >>StreamTask.java
> >> >>> >> >> > to add a logger info statement to tap the
incoming message.
> >>I
> >> >>>dont
> >> >>> >>see
> >> >>> >> >> the
> >> >>> >> >> > messages being printed to the log file.
> >> >>> >> >> >
> >> >>> >> >> > Is this the right place to start?
> >> >>> >> >> >
> >> >>> >> >> > public class WikipediaFeedStreamTask implements
StreamTask {
> >> >>> >> >> >
> >> >>> >> >> >   private static final SystemStream OUTPUT_STREAM
= new
> >> >>> >> >> > SystemStream("kafka",
> >> >>> >> >> > "wikipedia-raw");
> >> >>> >> >> >
> >> >>> >> >> >   private static final Logger log = LoggerFactory.getLogger
> >> >>> >> >> > (WikipediaFeedStreamTask.class);
> >> >>> >> >> >
> >> >>> >> >> >   @Override
> >> >>> >> >> >
> >> >>> >> >> >   public void process(IncomingMessageEnvelope
envelope,
> >> >>> >> >>MessageCollector
> >> >>> >> >> > collector, TaskCoordinator coordinator)
{
> >> >>> >> >> >
> >> >>> >> >> >     Map<String, Object> outgoingMap
=
> >> >>> >> >> > WikipediaFeedEvent.toMap((WikipediaFeedEvent)
> >> >>> >>envelope.getMessage());
> >> >>> >> >> >
> >> >>> >> >> >     log.info(envelope.getMessage().toString());
> >> >>> >> >> >
> >> >>> >> >> >     collector.send(new
> >>OutgoingMessageEnvelope(OUTPUT_STREAM,
> >> >>> >> >> > outgoingMap));
> >> >>> >> >> >
> >> >>> >> >> >   }
> >> >>> >> >> >
> >> >>> >> >> > }
> >> >>> >> >> >
> >> >>> >> >> >
> >> >>> >> >> > On Mon, Aug 25, 2014 at 9:01 AM, Chris Riccomini
<
> >> >>> >> >> > criccomini@linkedin.com.invalid> wrote:
> >> >>> >> >> >
> >> >>> >> >> > > Hey Shekar,
> >> >>> >> >> > >
> >> >>> >> >> > > Your thought process is on the right
track. It's probably
> >> >>>best
> >> >>> to
> >> >>> >> >>start
> >> >>> >> >> > > with hello-samza, and modify it to
get what you want. To
> >> >>>start
> >> >>> >>with,
> >> >>> >> >> > > you'll want to:
> >> >>> >> >> > >
> >> >>> >> >> > > 1. Write a simple StreamTask that just
does something
> >>silly
> >> >>>like
> >> >>> >> >>just
> >> >>> >> >> > > print messages that it receives.
> >> >>> >> >> > > 2. Write a configuration for the job
that consumes from
> >>just
> >> >>>the
> >> >>> >> >>stream
> >> >>> >> >> > > (alerts from different sources).
> >> >>> >> >> > > 3. Run this to make sure you've got
it working.
> >> >>> >> >> > > 4. Now add your table join. This can
be either a
> >>change-data
> >> >>> >>capture
> >> >>> >> >> > (CDC)
> >> >>> >> >> > > stream, or via a remote DB call.
> >> >>> >> >> > >
> >> >>> >> >> > > That should get you to a point where
you've got your job
> >>up
> >> >>>and
> >> >>> >> >> running.
> >> >>> >> >> > > From there, you could create your own
Maven project, and
> >> >>>migrate
> >> >>> >> >>your
> >> >>> >> >> > code
> >> >>> >> >> > > over accordingly.
> >> >>> >> >> > >
> >> >>> >> >> > > Cheers,
> >> >>> >> >> > > Chris
> >> >>> >> >> > >
> >> >>> >> >> > > On 8/24/14 1:42 AM, "Shekar Tippur"
<ctippur@gmail.com>
> >> >>>wrote:
> >> >>> >> >> > >
> >> >>> >> >> > > >Chris,
> >> >>> >> >> > > >
> >> >>> >> >> > > >I have gone thro the documentation
and decided that the
> >> >>>option
> >> >>> >> >>that is
> >> >>> >> >> > > >most
> >> >>> >> >> > > >suitable for me is stream-table.
> >> >>> >> >> > > >
> >> >>> >> >> > > >I see the following things:
> >> >>> >> >> > > >
> >> >>> >> >> > > >1. Point samza to a table (database)
> >> >>> >> >> > > >2. Point Samza to a stream - Alert
stream from different
> >> >>> sources
> >> >>> >> >> > > >3. Join key like a hostname
> >> >>> >> >> > > >
> >> >>> >> >> > > >I have Hello Samza working. To
extend that to do what my
> >> >>>needs
> >> >>> >> >>are, I
> >> >>> >> >> am
> >> >>> >> >> > > >not sure where to start (Needs
more code change OR
> >> >>> configuration
> >> >>> >> >> changes
> >> >>> >> >> > > >OR
> >> >>> >> >> > > >both)?
> >> >>> >> >> > > >
> >> >>> >> >> > > >I have gone thro
> >> >>> >> >> > > >
> >> >>> >> >> >
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> http://samza.incubator.apache.org/learn/documentation/latest/api/overvie
> >> >>>w
> >> >>> >> >> > > .
> >> >>> >> >> > > >html
> >> >>> >> >> > > >
> >> >>> >> >> > > >Is my thought process on the right
track? Can you please
> >> >>>point
> >> >>> >>me
> >> >>> >> >>to
> >> >>> >> >> the
> >> >>> >> >> > > >right direction?
> >> >>> >> >> > > >
> >> >>> >> >> > > >- Shekar
> >> >>> >> >> > > >
> >> >>> >> >> > > >
> >> >>> >> >> > > >
> >> >>> >> >> > > >On Thu, Aug 21, 2014 at 1:08 PM,
Shekar Tippur
> >> >>> >><ctippur@gmail.com>
> >> >>> >> >> > wrote:
> >> >>> >> >> > > >
> >> >>> >> >> > > >> Chris,
> >> >>> >> >> > > >>
> >> >>> >> >> > > >> This is perfectly good answer.
I will start poking more
> >> >>>into
> >> >>> >> >>option
> >> >>> >> >> > #4.
> >> >>> >> >> > > >>
> >> >>> >> >> > > >> - Shekar
> >> >>> >> >> > > >>
> >> >>> >> >> > > >>
> >> >>> >> >> > > >> On Thu, Aug 21, 2014 at 1:05
PM, Chris Riccomini <
> >> >>> >> >> > > >> criccomini@linkedin.com.invalid>
wrote:
> >> >>> >> >> > > >>
> >> >>> >> >> > > >>> Hey Shekar,
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>> Your two options are really
(3) or (4), then. You can
> >> >>>either
> >> >>> >>run
> >> >>> >> >> some
> >> >>> >> >> > > >>> external DB that holds
the data set, and you can
> >>query it
> >> >>> >>from a
> >> >>> >> >> > > >>> StreamTask, or you can
use Samza's state store
> >>feature to
> >> >>> >>push
> >> >>> >> >>data
> >> >>> >> >> > > >>>into a
> >> >>> >> >> > > >>> stream that you can then
store in a partitioned
> >>key-value
> >> >>> >>store
> >> >>> >> >> along
> >> >>> >> >> > > >>>with
> >> >>> >> >> > > >>> your StreamTasks. There
is some documentation here
> >>about
> >> >>>the
> >> >>> >> >>state
> >> >>> >> >> > > >>>store
> >> >>> >> >> > > >>> approach:
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>>
> >> >>> >> >> > >
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st
> >> >>> >> >> > > >>>ate
> >> >>> >> >> > > >>> -management.html
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>><
> >> >>> >> >> > >
> >> >>> >> >>
> >> >>>
> >> >>>>>
> >> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/
> >> >>>>>s
> >> >>> >> >> > > >>>tate-management.html>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>> (4) is going to require
more up front effort from you,
> >> >>>since
> >> >>> >> >>you'll
> >> >>> >> >> > > >>>have
> >> >>> >> >> > > >>> to understand how Kafka's
partitioning model works,
> >>and
> >> >>> setup
> >> >>> >> >>some
> >> >>> >> >> > > >>> pipeline to push the updates
for your state. In the
> >>long
> >> >>> >>run, I
> >> >>> >> >> > believe
> >> >>> >> >> > > >>> it's the better approach,
though. Local lookups on a
> >> >>> >>key-value
> >> >>> >> >> store
> >> >>> >> >> > > >>> should be faster than
doing remote RPC calls to a DB
> >>for
> >> >>> >>every
> >> >>> >> >> > message.
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>> I'm sorry I can't give
you a more definitive answer.
> >>It's
> >> >>> >>really
> >> >>> >> >> > about
> >> >>> >> >> > > >>> trade-offs.
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>> Cheers,
> >> >>> >> >> > > >>> Chris
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>> On 8/21/14 12:22 PM, "Shekar
Tippur"
> >><ctippur@gmail.com>
> >> >>> >>wrote:
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>> >Chris,
> >> >>> >> >> > > >>> >
> >> >>> >> >> > > >>> >A big thanks for a
swift response. The data set is
> >>huge
> >> >>>and
> >> >>> >>the
> >> >>> >> >> > > >>>frequency
> >> >>> >> >> > > >>> >is in burst.
> >> >>> >> >> > > >>> >What do you suggest?
> >> >>> >> >> > > >>> >
> >> >>> >> >> > > >>> >- Shekar
> >> >>> >> >> > > >>> >
> >> >>> >> >> > > >>> >
> >> >>> >> >> > > >>> >On Thu, Aug 21, 2014
at 11:05 AM, Chris Riccomini <
> >> >>> >> >> > > >>> >criccomini@linkedin.com.invalid>
wrote:
> >> >>> >> >> > > >>> >
> >> >>> >> >> > > >>> >> Hey Shekar,
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> This is feasible,
and you are on the right thought
> >> >>> >>process.
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> For the sake
of discussion, I'm going to pretend
> >>that
> >> >>>you
> >> >>> >> >>have a
> >> >>> >> >> > > >>>Kafka
> >> >>> >> >> > > >>> >> topic called
"PageViewEvent", which has just the IP
> >> >>> >>address
> >> >>> >> >>that
> >> >>> >> >> > was
> >> >>> >> >> > > >>> >>used
> >> >>> >> >> > > >>> >> to view a page.
These messages will be logged every
> >> >>>time
> >> >>> a
> >> >>> >> >>page
> >> >>> >> >> > view
> >> >>> >> >> > > >>> >> happens. I'm
also going to pretend that you have
> >>some
> >> >>> >>state
> >> >>> >> >> called
> >> >>> >> >> > > >>> >>"IPGeo"
> >> >>> >> >> > > >>> >> (e.g. The maxmind
data set). In this example, we'll
> >> >>>want
> >> >>> >>to
> >> >>> >> >>join
> >> >>> >> >> > the
> >> >>> >> >> > > >>> >> long/lat geo
information from IPGeo to the
> >> >>>PageViewEvent,
> >> >>> >>and
> >> >>> >> >> send
> >> >>> >> >> > > >>>it
> >> >>> >> >> > > >>> >>to a
> >> >>> >> >> > > >>> >> new topic: PageViewEventsWithGeo.
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> You have several
options on how to implement this
> >> >>> example.
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> 1. If your joining
data set (IPGeo) is relatively
> >> >>>small
> >> >>> >>and
> >> >>> >> >> > changes
> >> >>> >> >> > > >>> >> infrequently,
you can just pack it up in your jar
> >>or
> >> >>>.tgz
> >> >>> >> >>file,
> >> >>> >> >> > and
> >> >>> >> >> > > >>> open
> >> >>> >> >> > > >>> >> it open in every
StreamTask.
> >> >>> >> >> > > >>> >> 2. If your data
set is small, but changes somewhat
> >> >>> >> >>frequently,
> >> >>> >> >> you
> >> >>> >> >> > > >>>can
> >> >>> >> >> > > >>> >> throw the data
set on some HTTP/HDFS/S3 server
> >> >>>somewhere,
> >> >>> >>and
> >> >>> >> >> have
> >> >>> >> >> > > >>>your
> >> >>> >> >> > > >>> >> StreamTask refresh
it periodically by
> >>re-downloading
> >> >>>it.
> >> >>> >> >> > > >>> >> 3. You can do
remote RPC calls for the IPGeo data
> >>on
> >> >>> every
> >> >>> >> >>page
> >> >>> >> >> > view
> >> >>> >> >> > > >>> >>event
> >> >>> >> >> > > >>> >> by query some
remote service or DB (e.g.
> >>Cassandra).
> >> >>> >> >> > > >>> >> 4. You can use
Samza's state feature to set your
> >>IPGeo
> >> >>> >>data
> >> >>> >> >>as a
> >> >>> >> >> > > >>>series
> >> >>> >> >> > > >>> >>of
> >> >>> >> >> > > >>> >> messages to a
log-compacted Kafka topic
> >> >>> >> >> > > >>> >> (
> >> >>> >> >>
> >>https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
> >> >>> >> >> > ),
> >> >>> >> >> > > >>> and
> >> >>> >> >> > > >>> >> configure your
Samza job to read this topic as a
> >> >>> bootstrap
> >> >>> >> >> stream
> >> >>> >> >> > > >>> >> (
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>>
> >> >>> >> >> > >
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >> >>>
> >>
> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st
> >> >>> >> >> > > >>>r
> >> >>> >> >> > > >>> >>e
> >> >>> >> >> > > >>> >> ams.html).
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> For (4), you'd
have to partition the IPGeo state
> >>topic
> >> >>> >> >>according
> >> >>> >> >> > to
> >> >>> >> >> > > >>>the
> >> >>> >> >> > > >>> >> same key as PageViewEvent.
If PageViewEvent were
> >> >>> >>partitioned
> >> >>> >> >>by,
> >> >>> >> >> > > >>>say,
> >> >>> >> >> > > >>> >> member ID, but
you want your IPGeo state topic to
> >>be
> >> >>> >> >>partitioned
> >> >>> >> >> > by
> >> >>> >> >> > > >>>IP
> >> >>> >> >> > > >>> >> address, then
you'd have to have an upstream job
> >>that
> >> >>> >> >> > re-partitioned
> >> >>> >> >> > > >>> >> PageViewEvent
into some new topic by IP address.
> >>This
> >> >>>new
> >> >>> >> >>topic
> >> >>> >> >> > will
> >> >>> >> >> > > >>> >>have
> >> >>> >> >> > > >>> >> to have the same
number of partitions as the IPGeo
> >> >>>state
> >> >>> >> >>topic
> >> >>> >> >> (if
> >> >>> >> >> > > >>> IPGeo
> >> >>> >> >> > > >>> >> has 8 partitions,
then the new
> >> >>>PageViewEventRepartitioned
> >> >>> >> >>topic
> >> >>> >> >> > > >>>needs 8
> >> >>> >> >> > > >>> >>as
> >> >>> >> >> > > >>> >> well). This will
cause your
> >>PageViewEventRepartitioned
> >> >>> >>topic
> >> >>> >> >>and
> >> >>> >> >> > > >>>your
> >> >>> >> >> > > >>> >> IPGeo state topic
to be aligned such that the
> >> >>>StreamTask
> >> >>> >>that
> >> >>> >> >> gets
> >> >>> >> >> > > >>>page
> >> >>> >> >> > > >>> >> views for IP
address X will also have the IPGeo
> >> >>> >>information
> >> >>> >> >>for
> >> >>> >> >> IP
> >> >>> >> >> > > >>> >>address
> >> >>> >> >> > > >>> >> X.
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> Which strategy
you pick is really up to you. :)
> >>(4) is
> >> >>> the
> >> >>> >> >>most
> >> >>> >> >> > > >>> >> complicated,
but also the most flexible, and most
> >> >>> >> >>operationally
> >> >>> >> >> > > >>>sound.
> >> >>> >> >> > > >>> >>(1)
> >> >>> >> >> > > >>> >> is the easiest
if it fits your needs.
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> Cheers,
> >> >>> >> >> > > >>> >> Chris
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> On 8/21/14 10:15
AM, "Shekar Tippur"
> >> >>><ctippur@gmail.com>
> >> >>> >> >>wrote:
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >> >Hello,
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >I am new
to Samza. I have just installed Hello
> >>Samza
> >> >>>and
> >> >>> >> >>got it
> >> >>> >> >> > > >>> >>working.
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >Here is the
use case for which I am trying to use
> >> >>>Samza:
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >1. Cache
the contextual information which contains
> >> >>>more
> >> >>> >> >> > information
> >> >>> >> >> > > >>> >>about
> >> >>> >> >> > > >>> >> >the hostname
or IP address using Samza/Yarn/Kafka
> >> >>> >> >> > > >>> >> >2. Collect
alert and metric events which contain
> >> >>>either
> >> >>> >> >> hostname
> >> >>> >> >> > > >>>or IP
> >> >>> >> >> > > >>> >> >address
> >> >>> >> >> > > >>> >> >3. Append
contextual information to the alert and
> >> >>>metric
> >> >>> >>and
> >> >>> >> >> > > >>>insert to
> >> >>> >> >> > > >>> >>a
> >> >>> >> >> > > >>> >> >Kafka queue
from which other subscribers read off
> >>of.
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >Can you please
shed some light on
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >1. Is this
feasible?
> >> >>> >> >> > > >>> >> >2. Am I on
the right thought process
> >> >>> >> >> > > >>> >> >3. How do
I start
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >I now have
1 & 2 of them working disparately. I
> >>need
> >> >>>to
> >> >>> >> >> integrate
> >> >>> >> >> > > >>> them.
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >Appreciate
any input.
> >> >>> >> >> > > >>> >> >
> >> >>> >> >> > > >>> >> >- Shekar
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>> >>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>>
> >> >>> >> >> > > >>
> >> >>> >> >> > >
> >> >>> >> >> > >
> >> >>> >> >> >
> >> >>> >> >>
> >> >>> >>
> >> >>> >>
> >> >>>
> >> >>>
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message