samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Fang <yanfang...@gmail.com>
Subject Re: Samza as a Caching layer
Date Thu, 04 Sep 2014 18:08:06 GMT
aha, I see. Hi Shekar,

I think I know the problem. It was the confusion of the document. You may
want to run

1. run deploy/samza/bin/run-job.sh
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory
--config-path=file://$PWD/deploy/samza/config*/wikipedia-parser.properties*

*(NOT** wikipedia-feed.properties**)*

2. modify the *WikipediaParserStreamTask.java (NOT *
*WikipediaFeedStreamTask.java)*

Because what *WikipediaFeedStreamTask.java* does is to *get wiki Feed from
external and send to the kafka*. When you run
bin/produce-wikipedia-raw-data.sh , actually you already finish this step.
So your next step will be to parse the feed, that is, the (1) above
(following steps are here
<http://samza.incubator.apache.org/startup/hello-samza/latest/#generate-wikipedia-statistics>
).

Sorry for the confusion. In
http://samza.incubator.apache.org/learn/tutorials/latest/run-hello-samza-without-internet.html#use-local-data-to-run-hello-samza
, it says "instead of running"... :(  We will fix this confusion.

Hope this helps.

Cheers,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108


On Wed, Sep 3, 2014 at 3:20 PM, Shekar Tippur <ctippur@gmail.com> wrote:

> Yan,
>
> I do see these jar files
>
> [ctippur@pppdc9prd2y2 incubator-samza-hello-samza]$ ls -ltr
> deploy/samza/lib/slf4j-*
>
> -rw------- 1 ctippur domain users 25689 Aug 18 10:45
> deploy/samza/lib/slf4j-api-1.6.2.jar
>
> -rw------- 1 ctippur domain users  9752 Aug 18 10:48
> deploy/samza/lib/slf4j-log4j12-1.6.2.jar
>
>
> - Shekar
>
>
> On Wed, Sep 3, 2014 at 3:13 PM, Yan Fang <yanfang724@gmail.com> wrote:
>
> > Hi Shekar,
> >
> > Since you are running local mode, you must have the libs in the directory
> > such as *deploy/samza/lib*. Can you see the slf4j-log4j12-1.6.2.jar
> there?
> > (not the tar.gz). I ask this is because sometimes people forget to tar
> -xvf
> > after they add dependencies and recompile, while the local mode reads the
> > *deploy/samz*a, instead of the tar.gz file. Thank you.
> >
> > Cheers,
> >
> > Fang, Yan
> > yanfang724@gmail.com
> > +1 (206) 849-4108
> >
> >
> > On Wed, Sep 3, 2014 at 2:59 PM, Shekar Tippur <ctippur@gmail.com> wrote:
> >
> > > I do see this
> > >
> > > lib/slf4j-log4j12-1.6.2.jar
> > >
> > > In the tgz file. Also, this message seem to me more of a noise.
> > >
> > > I am not sure if we are getting sidetracked on this.
> > >
> > > In the file WikipediaFeedStreamTask.java, I changed log.info to
> > > System.out.println - thinking that this should throw the message to the
> > > console.
> > >
> > > Is this the right place to trap the incoming message?
> > >
> > >
> > > $ cat
> > >
> > >
> >
> samza-wikipedia/src/main/java/samza/examples/wikipedia/task/WikipediaFeedStreamTask.java
> > >
> > > /*
> > >
> > >  * Licensed to the Apache Software Foundation (ASF) under one
> > >
> > >  * or more contributor license agreements.  See the NOTICE file
> > >
> > >  * distributed with this work for additional information
> > >
> > >  * regarding copyright ownership.  The ASF licenses this file
> > >
> > >  * to you under the Apache License, Version 2.0 (the
> > >
> > >  * "License"); you may not use this file except in compliance
> > >
> > >  * with the License.  You may obtain a copy of the License at
> > >
> > >  *
> > >
> > >  *   http://www.apache.org/licenses/LICENSE-2.0
> > >
> > >  *
> > >
> > >  * Unless required by applicable law or agreed to in writing,
> > >
> > >  * software distributed under the License is distributed on an
> > >
> > >  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> > >
> > >  * KIND, either express or implied.  See the License for the
> > >
> > >  * specific language governing permissions and limitations
> > >
> > >  * under the License.
> > >
> > >  */
> > >
> > >
> > > package samza.examples.wikipedia.task;
> > >
> > >
> > > import java.util.Map;
> > >
> > > import org.apache.samza.system.IncomingMessageEnvelope;
> > >
> > > import org.apache.samza.system.OutgoingMessageEnvelope;
> > >
> > > import org.apache.samza.system.SystemStream;
> > >
> > > import org.apache.samza.task.MessageCollector;
> > >
> > > import org.apache.samza.task.StreamTask;
> > >
> > > import org.apache.samza.task.TaskCoordinator;
> > >
> > > import
> samza.examples.wikipedia.system.WikipediaFeed.WikipediaFeedEvent;
> > >
> > >
> > > /**
> > >
> > >  * This task is very simple. All it does is take messages that it
> > receives,
> > > and
> > >
> > >  * sends them to a Kafka topic called wikipedia-raw.
> > >
> > >  */
> > >
> > > public class WikipediaFeedStreamTask implements StreamTask {
> > >
> > >   private static final SystemStream OUTPUT_STREAM = new
> > > SystemStream("kafka", "wikipedia-raw");
> > >
> > >
> > >   @Override
> > >
> > >   public void process(IncomingMessageEnvelope envelope,
> MessageCollector
> > > collector, TaskCoordinator coordinator) {
> > >
> > >     Map<String, Object> outgoingMap =
> > > WikipediaFeedEvent.toMap((WikipediaFeedEvent) envelope.getMessage());
> > >
> > >     System.out.println(envelope.getMessage().toString());
> > >
> > >     collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM,
> > > outgoingMap));
> > >
> > >   }
> > >
> > > }
> > >
> > >
> > >
> > >
> > > On Wed, Sep 3, 2014 at 2:48 PM, Chris Riccomini <
> > > criccomini@linkedin.com.invalid> wrote:
> > >
> > > > Hey Shekar,
> > > >
> > > > Sorry that you're having such a tough time with this. I'll keep
> trying
> > to
> > > > help as best I can.
> > > >
> > > > Do you see slf4j-log4j12 in your job's .tgz file?
> > > >
> > > > Cheers,
> > > > Chris
> > > >
> > > > On 9/3/14 2:46 PM, "Shekar Tippur" <ctippur@gmail.com> wrote:
> > > >
> > > > >Chris,
> > > > >
> > > > >I have added the below dependencies and I still get the same
> message.
> > > > >
> > > > >
> > > > >      <dependency>
> > > > >
> > > > >        <groupId>org.slf4j</groupId>
> > > > >
> > > > >        <artifactId>slf4j-api</artifactId>
> > > > >
> > > > >        <version>1.7.7</version>
> > > > >
> > > > >      </dependency>
> > > > >
> > > > >      <dependency>
> > > > >
> > > > >          <groupId>org.slf4j</groupId>
> > > > >
> > > > >          <artifactId>slf4j-log4j12</artifactId>
> > > > >
> > > > >          <version>1.5.6</version>
> > > > >
> > > > >      </dependency>
> > > > >
> > > > >
> > > > >On Wed, Sep 3, 2014 at 9:01 AM, Chris Riccomini <
> > > > >criccomini@linkedin.com.invalid> wrote:
> > > > >
> > > > >> Hey Shekar,
> > > > >>
> > > > >> The SLF4J stuff is saying that you don't have an slf4j binding on
> > your
> > > > >> classpath. Try adding slf4j-log4j as a runtime dependency on your
> > > > >>project.
> > > > >>
> > > > >> Cheers,
> > > > >> Chris
> > > > >>
> > > > >> On 9/2/14 3:24 PM, "Shekar Tippur" <ctippur@gmail.com> wrote:
> > > > >>
> > > > >> >Chris ,
> > > > >> >
> > > > >> >In the current state, I just want Samza to connect to 127.0.0.1.
> > > > >> >
> > > > >> >I have set YARN_HOME to
> > > > >> >
> > > > >> >$ echo $YARN_HOME
> > > > >> >
> > > > >> >/home/ctippur/hello-samza/deploy/yarn
> > > > >> >
> > > > >> >I still dont see anything on hadoop console.
> > > > >> >
> > > > >> > Also, I see this during startup
> > > > >> >
> > > > >> >
> > > > >> >SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > > > >> >
> > > > >> >SLF4J: Defaulting to no-operation (NOP) logger implementation
> > > > >> >
> > > > >> >SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder
> for
> > > > >>further
> > > > >> >details.
> > > > >> >
> > > > >> >SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > > > >> >
> > > > >> >SLF4J: Defaulting to no-operation (NOP) logger implementation
> > > > >> >
> > > > >> >SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder
> for
> > > > >>further
> > > > >> >details.
> > > > >> >
> > > > >> >- Shekar
> > > > >> >
> > > > >> >
> > > > >> >On Tue, Sep 2, 2014 at 2:20 PM, Chris Riccomini <
> > > > >> >criccomini@linkedin.com.invalid> wrote:
> > > > >> >
> > > > >> >> Hey Shekar,
> > > > >> >>
> > > > >> >> Ah ha. In that case, do you expect your SamzaContainer to try
> to
> > > > >>connect
> > > > >> >> to the RM at 127.0.0.1, or do you expect it to try to connect
> to
> > > some
> > > > >> >> remote RM? If you expect it to try and connect to a remote RM,
> > it's
> > > > >>not
> > > > >> >> doing that. Perhaps because YARN_HOME isn't set.
> > > > >> >>
> > > > >> >> If you go to your RM's web interface, how many active nodes do
> > you
> > > > >>see
> > > > >> >> listed?
> > > > >> >>
> > > > >> >> Cheers,
> > > > >> >> Chris
> > > > >> >>
> > > > >> >> On 9/2/14 2:17 PM, "Shekar Tippur" <ctippur@gmail.com> wrote:
> > > > >> >>
> > > > >> >> >Chris ..
> > > > >> >> >
> > > > >> >> >I am using a rhel server to host all the components (Yarn,
> > kafka,
> > > > >> >>samza).
> > > > >> >> >I
> > > > >> >> >dont have ACLs open to wikipedia.
> > > > >> >> >I am following ..
> > > > >> >> >
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> http://samza.incubator.apache.org/learn/tutorials/latest/run-hello-samza-
> > > > >> >>w
> > > > >> >> >ithout-internet.html
> > > > >> >> >
> > > > >> >> >- Shekar
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >On Tue, Sep 2, 2014 at 2:13 PM, Chris Riccomini <
> > > > >> >> >criccomini@linkedin.com.invalid> wrote:
> > > > >> >> >
> > > > >> >> >> Hey Shekar,
> > > > >> >> >>
> > > > >> >> >> Can you try changing that to:
> > > > >> >> >>
> > > > >> >> >>   http://127.0.0.1:8088/cluster
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >> And see if you can connect?
> > > > >> >> >>
> > > > >> >> >> Cheers,
> > > > >> >> >> Chris
> > > > >> >> >>
> > > > >> >> >> On 9/2/14 1:21 PM, "Shekar Tippur" <ctippur@gmail.com>
> wrote:
> > > > >> >> >>
> > > > >> >> >> >Other observation is ..
> > > > >> >> >> >
> > > > >> >> >> >http://10.132.62.185:8088/cluster shows that no
> applications
> > > are
> > > > >> >> >>running.
> > > > >> >> >> >
> > > > >> >> >> >- Shekar
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> >On Tue, Sep 2, 2014 at 1:15 PM, Shekar Tippur <
> > > ctippur@gmail.com
> > > > >
> > > > >> >> >>wrote:
> > > > >> >> >> >
> > > > >> >> >> >> Yarn seem to be running ..
> > > > >> >> >> >>
> > > > >> >> >> >> yarn      5462  0.0  2.0 1641296 161404 ?      Sl   Jun20
> > > > >>95:26
> > > > >> >> >> >> /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager
> > > -Xmx1000m
> > > > >> >> >> >> -Dhadoop.log.dir=/var/log/hadoop-yarn
> > > > >> >> >> >>-Dyarn.log.dir=/var/log/hadoop-yarn
> > > > >> >> >> >>
> > -Dhadoop.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log
> > > > >> >> >> >>
> -Dyarn.log.file=yarn-yarn-resourcemanager-pppdc9prd2y2.log
> > > > >> >> >> >> -Dyarn.home.dir=/usr/lib/hadoop-yarn
> > > > >> >> >> >>-Dhadoop.home.dir=/usr/lib/hadoop-yarn
> > > > >> >> >> >> -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA
> > > > >> >> >> >> -Djava.library.path=/usr/lib/hadoop/lib/native -classpath
> > > > >> >> >> >>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> >>>>>>>>/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/
> > > > >>>>>>>>li
> > > > >> >>>>>>b/
> > > > >> >> >>>>*:
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> >>>>>>>>/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/l
> > > > >>>>>>>>ib
> > > > >> >>>>>>/*
> > > > >> >> >>>>:/
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> >>>>>>>>usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop
> > > > >>>>>>>>-y
> > > > >> >>>>>>ar
> > > > >> >> >>>>n/
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> >>>>>>>>.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*
> > > > >>>>>>>>:/
> > > > >> >>>>>>us
> > > > >> >> >>>>r/
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> >>>>>>>>lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/rm
> > > > >>>>>>>>-c
> > > > >> >>>>>>on
> > > > >> >> >>>>fi
> > > > >> >> >> >>g/log4j.properties
> > > > >> >> >> >>
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
> > > > >> >> >> >>
> > > > >> >> >> >> I can tail kafka topic as well ..
> > > > >> >> >> >>
> > > > >> >> >> >> deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper
> > > > >> >> >>localhost:2181
> > > > >> >> >> >>--topic wikipedia-raw
> > > > >> >> >> >>
> > > > >> >> >> >>
> > > > >> >> >> >>
> > > > >> >> >> >>
> > > > >> >> >> >> On Tue, Sep 2, 2014 at 1:04 PM, Chris Riccomini <
> > > > >> >> >> >> criccomini@linkedin.com.invalid> wrote:
> > > > >> >> >> >>
> > > > >> >> >> >>> Hey Shekar,
> > > > >> >> >> >>>
> > > > >> >> >> >>> It looks like your job is hanging trying to connect to
> the
> > > RM
> > > > >>on
> > > > >> >> >>your
> > > > >> >> >> >>> localhost. I thought that you said that your job was
> > running
> > > > >>in
> > > > >> >> >>local
> > > > >> >> >> >>> mode. If so, it should be using the LocalJobFactory. If
> > not,
> > > > >>and
> > > > >> >>you
> > > > >> >> >> >>> intend to run on YARN, is your YARN RM up and running on
> > > > >> >>localhost?
> > > > >> >> >> >>>
> > > > >> >> >> >>> Cheers,
> > > > >> >> >> >>> Chris
> > > > >> >> >> >>>
> > > > >> >> >> >>> On 9/2/14 12:22 PM, "Shekar Tippur" <ctippur@gmail.com>
> > > > wrote:
> > > > >> >> >> >>>
> > > > >> >> >> >>> >Chris ..
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >$ cat ./deploy/samza/undefined-samza-container-name.log
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >2014-09-02 11:17:58 JobRunner [INFO] job factory:
> > > > >> >> >> >>> >org.apache.samza.job.yarn.YarnJobFactory
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >2014-09-02 11:17:59 ClientHelper [INFO] trying to
> connect
> > > to
> > > > >>RM
> > > > >> >> >> >>> >127.0.0.1:8032
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >2014-09-02 11:17:59 NativeCodeLoader [WARN] Unable to
> > load
> > > > >> >> >> >>>native-hadoop
> > > > >> >> >> >>> >library for your platform... using builtin-java classes
> > > where
> > > > >> >> >> >>>applicable
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >2014-09-02 11:17:59 RMProxy [INFO] Connecting to
> > > > >>ResourceManager
> > > > >> >> >>at /
> > > > >> >> >> >>> >127.0.0.1:8032
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >and Log4j ..
> > > > >> >> >> >>> >
> > > > >> >> >> >>> ><!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
> > > > >> >> >> >>> >
> > > > >> >> >> >>> ><log4j:configuration
> > > > >> >> >>xmlns:log4j="http://jakarta.apache.org/log4j/">
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >  <appender name="RollingAppender"
> > > > >> >> >> >>> >class="org.apache.log4j.DailyRollingFileAppender">
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >     <param name="File"
> > > > >> >> >> >>> >value="${samza.log.dir}/${samza.container.name}.log"
> > > > >> >> >> >>> >/>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >     <param name="DatePattern" value="'.'yyyy-MM-dd" />
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >     <layout class="org.apache.log4j.PatternLayout">
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >      <param name="ConversionPattern"
> > value="%d{yyyy-MM-dd
> > > > >> >> >>HH:mm:ss}
> > > > >> >> >> >>> %c{1}
> > > > >> >> >> >>> >[%p] %m%n" />
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >     </layout>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >  </appender>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >  <root>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >    <priority value="info" />
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >    <appender-ref ref="RollingAppender"/>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >  </root>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> ></log4j:configuration>
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >On Tue, Sep 2, 2014 at 12:02 PM, Chris Riccomini <
> > > > >> >> >> >>> >criccomini@linkedin.com.invalid> wrote:
> > > > >> >> >> >>> >
> > > > >> >> >> >>> >> Hey Shekar,
> > > > >> >> >> >>> >>
> > > > >> >> >> >>> >> Can you attach your log files? I'm wondering if it's
> a
> > > > >> >> >> >>>mis-configured
> > > > >> >> >> >>> >> log4j.xml (or missing slf4j-log4j jar), which is
> > leading
> > > to
> > > > >> >> >>nearly
> > > > >> >> >> >>> empty
> > > > >> >> >> >>> >> log files. Also, I'm wondering if the job starts
> fully.
> > > > >> >>Anything
> > > > >> >> >>you
> > > > >> >> >> >>> can
> > > > >> >> >> >>> >> attach would be helpful.
> > > > >> >> >> >>> >>
> > > > >> >> >> >>> >> Cheers,
> > > > >> >> >> >>> >> Chris
> > > > >> >> >> >>> >>
> > > > >> >> >> >>> >> On 9/2/14 11:43 AM, "Shekar Tippur" <
> ctippur@gmail.com
> > >
> > > > >> wrote:
> > > > >> >> >> >>> >>
> > > > >> >> >> >>> >> >I am running in local mode.
> > > > >> >> >> >>> >> >
> > > > >> >> >> >>> >> >S
> > > > >> >> >> >>> >> >On Sep 2, 2014 11:42 AM, "Yan Fang" <
> > > yanfang724@gmail.com
> > > > >
> > > > >> >> >>wrote:
> > > > >> >> >> >>> >> >
> > > > >> >> >> >>> >> >> Hi Shekar.
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >> Are you running job in local mode or yarn mode? If
> > > yarn
> > > > >> >>mode,
> > > > >> >> >>the
> > > > >> >> >> >>> log
> > > > >> >> >> >>> >> >>is in
> > > > >> >> >> >>> >> >> the yarn's container log.
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >> Thanks,
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >> Fang, Yan
> > > > >> >> >> >>> >> >> yanfang724@gmail.com
> > > > >> >> >> >>> >> >> +1 (206) 849-4108
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >> On Tue, Sep 2, 2014 at 11:36 AM, Shekar Tippur
> > > > >> >> >> >>><ctippur@gmail.com>
> > > > >> >> >> >>> >> >>wrote:
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >> > Chris,
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> > Got some time to play around a bit more.
> > > > >> >> >> >>> >> >> > I tried to edit
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>>
> > > > >> >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> >>>>>>>>>>>>>samza-wikipedia/src/main/java/samza/examples/wikipedia/task/Wi
> > > > >>>>>>>>>>>>>ki
> > > > >> >>>>>>>>>>>pe
> > > > >> >> >>>>>>>>>di
> > > > >> >> >> >>>>>>>aFe
> > > > >> >> >> >>> >>>>ed
> > > > >> >> >> >>> >> >>StreamTask.java
> > > > >> >> >> >>> >> >> > to add a logger info statement to tap the
> incoming
> > > > >> >>message.
> > > > >> >> >>I
> > > > >> >> >> >>>dont
> > > > >> >> >> >>> >>see
> > > > >> >> >> >>> >> >> the
> > > > >> >> >> >>> >> >> > messages being printed to the log file.
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> > Is this the right place to start?
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> > public class WikipediaFeedStreamTask implements
> > > > >> >>StreamTask {
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >   private static final SystemStream
> OUTPUT_STREAM
> > =
> > > > >>new
> > > > >> >> >> >>> >> >> > SystemStream("kafka",
> > > > >> >> >> >>> >> >> > "wikipedia-raw");
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >   private static final Logger log =
> > > > >> >>LoggerFactory.getLogger
> > > > >> >> >> >>> >> >> > (WikipediaFeedStreamTask.class);
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >   @Override
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >   public void process(IncomingMessageEnvelope
> > > > >>envelope,
> > > > >> >> >> >>> >> >>MessageCollector
> > > > >> >> >> >>> >> >> > collector, TaskCoordinator coordinator) {
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >     Map<String, Object> outgoingMap =
> > > > >> >> >> >>> >> >> > WikipediaFeedEvent.toMap((WikipediaFeedEvent)
> > > > >> >> >> >>> >>envelope.getMessage());
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >     log.info(envelope.getMessage().toString());
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >     collector.send(new
> > > > >> >> >>OutgoingMessageEnvelope(OUTPUT_STREAM,
> > > > >> >> >> >>> >> >> > outgoingMap));
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >   }
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> > }
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> > On Mon, Aug 25, 2014 at 9:01 AM, Chris
> Riccomini <
> > > > >> >> >> >>> >> >> > criccomini@linkedin.com.invalid> wrote:
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >> > > Hey Shekar,
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > > Your thought process is on the right track.
> It's
> > > > >> >>probably
> > > > >> >> >> >>>best
> > > > >> >> >> >>> to
> > > > >> >> >> >>> >> >>start
> > > > >> >> >> >>> >> >> > > with hello-samza, and modify it to get what
> you
> > > > >>want.
> > > > >> >>To
> > > > >> >> >> >>>start
> > > > >> >> >> >>> >>with,
> > > > >> >> >> >>> >> >> > > you'll want to:
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > > 1. Write a simple StreamTask that just does
> > > > >>something
> > > > >> >> >>silly
> > > > >> >> >> >>>like
> > > > >> >> >> >>> >> >>just
> > > > >> >> >> >>> >> >> > > print messages that it receives.
> > > > >> >> >> >>> >> >> > > 2. Write a configuration for the job that
> > consumes
> > > > >>from
> > > > >> >> >>just
> > > > >> >> >> >>>the
> > > > >> >> >> >>> >> >>stream
> > > > >> >> >> >>> >> >> > > (alerts from different sources).
> > > > >> >> >> >>> >> >> > > 3. Run this to make sure you've got it
> working.
> > > > >> >> >> >>> >> >> > > 4. Now add your table join. This can be
> either a
> > > > >> >> >>change-data
> > > > >> >> >> >>> >>capture
> > > > >> >> >> >>> >> >> > (CDC)
> > > > >> >> >> >>> >> >> > > stream, or via a remote DB call.
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > > That should get you to a point where you've
> got
> > > your
> > > > >> >>job
> > > > >> >> >>up
> > > > >> >> >> >>>and
> > > > >> >> >> >>> >> >> running.
> > > > >> >> >> >>> >> >> > > From there, you could create your own Maven
> > > project,
> > > > >> >>and
> > > > >> >> >> >>>migrate
> > > > >> >> >> >>> >> >>your
> > > > >> >> >> >>> >> >> > code
> > > > >> >> >> >>> >> >> > > over accordingly.
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > > Cheers,
> > > > >> >> >> >>> >> >> > > Chris
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > > On 8/24/14 1:42 AM, "Shekar Tippur"
> > > > >><ctippur@gmail.com
> > > > >> >
> > > > >> >> >> >>>wrote:
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > > >Chris,
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >I have gone thro the documentation and
> decided
> > > that
> > > > >> >>the
> > > > >> >> >> >>>option
> > > > >> >> >> >>> >> >>that is
> > > > >> >> >> >>> >> >> > > >most
> > > > >> >> >> >>> >> >> > > >suitable for me is stream-table.
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >I see the following things:
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >1. Point samza to a table (database)
> > > > >> >> >> >>> >> >> > > >2. Point Samza to a stream - Alert stream
> from
> > > > >> >>different
> > > > >> >> >> >>> sources
> > > > >> >> >> >>> >> >> > > >3. Join key like a hostname
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >I have Hello Samza working. To extend that to
> > do
> > > > >>what
> > > > >> >>my
> > > > >> >> >> >>>needs
> > > > >> >> >> >>> >> >>are, I
> > > > >> >> >> >>> >> >> am
> > > > >> >> >> >>> >> >> > > >not sure where to start (Needs more code
> change
> > > OR
> > > > >> >> >> >>> configuration
> > > > >> >> >> >>> >> >> changes
> > > > >> >> >> >>> >> >> > > >OR
> > > > >> >> >> >>> >> >> > > >both)?
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >I have gone thro
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>>
> > > > >> >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > http://samza.incubator.apache.org/learn/documentation/latest/api/overvie
> > > > >> >> >> >>>w
> > > > >> >> >> >>> >> >> > > .
> > > > >> >> >> >>> >> >> > > >html
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >Is my thought process on the right track? Can
> > you
> > > > >> >>please
> > > > >> >> >> >>>point
> > > > >> >> >> >>> >>me
> > > > >> >> >> >>> >> >>to
> > > > >> >> >> >>> >> >> the
> > > > >> >> >> >>> >> >> > > >right direction?
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >- Shekar
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >On Thu, Aug 21, 2014 at 1:08 PM, Shekar
> Tippur
> > > > >> >> >> >>> >><ctippur@gmail.com>
> > > > >> >> >> >>> >> >> > wrote:
> > > > >> >> >> >>> >> >> > > >
> > > > >> >> >> >>> >> >> > > >> Chris,
> > > > >> >> >> >>> >> >> > > >>
> > > > >> >> >> >>> >> >> > > >> This is perfectly good answer. I will start
> > > > >>poking
> > > > >> >>more
> > > > >> >> >> >>>into
> > > > >> >> >> >>> >> >>option
> > > > >> >> >> >>> >> >> > #4.
> > > > >> >> >> >>> >> >> > > >>
> > > > >> >> >> >>> >> >> > > >> - Shekar
> > > > >> >> >> >>> >> >> > > >>
> > > > >> >> >> >>> >> >> > > >>
> > > > >> >> >> >>> >> >> > > >> On Thu, Aug 21, 2014 at 1:05 PM, Chris
> > > Riccomini
> > > > >><
> > > > >> >> >> >>> >> >> > > >> criccomini@linkedin.com.invalid> wrote:
> > > > >> >> >> >>> >> >> > > >>
> > > > >> >> >> >>> >> >> > > >>> Hey Shekar,
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>> Your two options are really (3) or (4),
> > then.
> > > > >>You
> > > > >> >>can
> > > > >> >> >> >>>either
> > > > >> >> >> >>> >>run
> > > > >> >> >> >>> >> >> some
> > > > >> >> >> >>> >> >> > > >>> external DB that holds the data set, and
> you
> > > can
> > > > >> >> >>query it
> > > > >> >> >> >>> >>from a
> > > > >> >> >> >>> >> >> > > >>> StreamTask, or you can use Samza's state
> > store
> > > > >> >> >>feature to
> > > > >> >> >> >>> >>push
> > > > >> >> >> >>> >> >>data
> > > > >> >> >> >>> >> >> > > >>>into a
> > > > >> >> >> >>> >> >> > > >>> stream that you can then store in a
> > > partitioned
> > > > >> >> >>key-value
> > > > >> >> >> >>> >>store
> > > > >> >> >> >>> >> >> along
> > > > >> >> >> >>> >> >> > > >>>with
> > > > >> >> >> >>> >> >> > > >>> your StreamTasks. There is some
> > documentation
> > > > >>here
> > > > >> >> >>about
> > > > >> >> >> >>>the
> > > > >> >> >> >>> >> >>state
> > > > >> >> >> >>> >> >> > > >>>store
> > > > >> >> >> >>> >> >> > > >>> approach:
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>>
> > > > >> >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st
> > > > >> >> >> >>> >> >> > > >>>ate
> > > > >> >> >> >>> >> >> > > >>> -management.html
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>><
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>>
> > > > >> >> >> >>>>>
> > > > >> >> >>
> > > > >> >>
> > > >
> http://samza.incubator.apache.org/learn/documentation/0.7.0/container/
> > > > >> >> >> >>>>>s
> > > > >> >> >> >>> >> >> > > >>>tate-management.html>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>> (4) is going to require more up front
> effort
> > > > >>from
> > > > >> >>you,
> > > > >> >> >> >>>since
> > > > >> >> >> >>> >> >>you'll
> > > > >> >> >> >>> >> >> > > >>>have
> > > > >> >> >> >>> >> >> > > >>> to understand how Kafka's partitioning
> model
> > > > >>works,
> > > > >> >> >>and
> > > > >> >> >> >>> setup
> > > > >> >> >> >>> >> >>some
> > > > >> >> >> >>> >> >> > > >>> pipeline to push the updates for your
> state.
> > > In
> > > > >>the
> > > > >> >> >>long
> > > > >> >> >> >>> >>run, I
> > > > >> >> >> >>> >> >> > believe
> > > > >> >> >> >>> >> >> > > >>> it's the better approach, though. Local
> > > lookups
> > > > >>on
> > > > >> >>a
> > > > >> >> >> >>> >>key-value
> > > > >> >> >> >>> >> >> store
> > > > >> >> >> >>> >> >> > > >>> should be faster than doing remote RPC
> calls
> > > to
> > > > >>a
> > > > >> >>DB
> > > > >> >> >>for
> > > > >> >> >> >>> >>every
> > > > >> >> >> >>> >> >> > message.
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>> I'm sorry I can't give you a more
> definitive
> > > > >> >>answer.
> > > > >> >> >>It's
> > > > >> >> >> >>> >>really
> > > > >> >> >> >>> >> >> > about
> > > > >> >> >> >>> >> >> > > >>> trade-offs.
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>> Cheers,
> > > > >> >> >> >>> >> >> > > >>> Chris
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>> On 8/21/14 12:22 PM, "Shekar Tippur"
> > > > >> >> >><ctippur@gmail.com>
> > > > >> >> >> >>> >>wrote:
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>> >Chris,
> > > > >> >> >> >>> >> >> > > >>> >
> > > > >> >> >> >>> >> >> > > >>> >A big thanks for a swift response. The
> data
> > > > >>set is
> > > > >> >> >>huge
> > > > >> >> >> >>>and
> > > > >> >> >> >>> >>the
> > > > >> >> >> >>> >> >> > > >>>frequency
> > > > >> >> >> >>> >> >> > > >>> >is in burst.
> > > > >> >> >> >>> >> >> > > >>> >What do you suggest?
> > > > >> >> >> >>> >> >> > > >>> >
> > > > >> >> >> >>> >> >> > > >>> >- Shekar
> > > > >> >> >> >>> >> >> > > >>> >
> > > > >> >> >> >>> >> >> > > >>> >
> > > > >> >> >> >>> >> >> > > >>> >On Thu, Aug 21, 2014 at 11:05 AM, Chris
> > > > >>Riccomini
> > > > >> >><
> > > > >> >> >> >>> >> >> > > >>> >criccomini@linkedin.com.invalid> wrote:
> > > > >> >> >> >>> >> >> > > >>> >
> > > > >> >> >> >>> >> >> > > >>> >> Hey Shekar,
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> This is feasible, and you are on the
> > right
> > > > >> >>thought
> > > > >> >> >> >>> >>process.
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> For the sake of discussion, I'm going
> to
> > > > >>pretend
> > > > >> >> >>that
> > > > >> >> >> >>>you
> > > > >> >> >> >>> >> >>have a
> > > > >> >> >> >>> >> >> > > >>>Kafka
> > > > >> >> >> >>> >> >> > > >>> >> topic called "PageViewEvent", which has
> > > just
> > > > >> >>the IP
> > > > >> >> >> >>> >>address
> > > > >> >> >> >>> >> >>that
> > > > >> >> >> >>> >> >> > was
> > > > >> >> >> >>> >> >> > > >>> >>used
> > > > >> >> >> >>> >> >> > > >>> >> to view a page. These messages will be
> > > logged
> > > > >> >>every
> > > > >> >> >> >>>time
> > > > >> >> >> >>> a
> > > > >> >> >> >>> >> >>page
> > > > >> >> >> >>> >> >> > view
> > > > >> >> >> >>> >> >> > > >>> >> happens. I'm also going to pretend that
> > you
> > > > >>have
> > > > >> >> >>some
> > > > >> >> >> >>> >>state
> > > > >> >> >> >>> >> >> called
> > > > >> >> >> >>> >> >> > > >>> >>"IPGeo"
> > > > >> >> >> >>> >> >> > > >>> >> (e.g. The maxmind data set). In this
> > > example,
> > > > >> >>we'll
> > > > >> >> >> >>>want
> > > > >> >> >> >>> >>to
> > > > >> >> >> >>> >> >>join
> > > > >> >> >> >>> >> >> > the
> > > > >> >> >> >>> >> >> > > >>> >> long/lat geo information from IPGeo to
> > the
> > > > >> >> >> >>>PageViewEvent,
> > > > >> >> >> >>> >>and
> > > > >> >> >> >>> >> >> send
> > > > >> >> >> >>> >> >> > > >>>it
> > > > >> >> >> >>> >> >> > > >>> >>to a
> > > > >> >> >> >>> >> >> > > >>> >> new topic: PageViewEventsWithGeo.
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> You have several options on how to
> > > implement
> > > > >> >>this
> > > > >> >> >> >>> example.
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> 1. If your joining data set (IPGeo) is
> > > > >> >>relatively
> > > > >> >> >> >>>small
> > > > >> >> >> >>> >>and
> > > > >> >> >> >>> >> >> > changes
> > > > >> >> >> >>> >> >> > > >>> >> infrequently, you can just pack it up
> in
> > > your
> > > > >> >>jar
> > > > >> >> >>or
> > > > >> >> >> >>>.tgz
> > > > >> >> >> >>> >> >>file,
> > > > >> >> >> >>> >> >> > and
> > > > >> >> >> >>> >> >> > > >>> open
> > > > >> >> >> >>> >> >> > > >>> >> it open in every StreamTask.
> > > > >> >> >> >>> >> >> > > >>> >> 2. If your data set is small, but
> changes
> > > > >> >>somewhat
> > > > >> >> >> >>> >> >>frequently,
> > > > >> >> >> >>> >> >> you
> > > > >> >> >> >>> >> >> > > >>>can
> > > > >> >> >> >>> >> >> > > >>> >> throw the data set on some HTTP/HDFS/S3
> > > > >>server
> > > > >> >> >> >>>somewhere,
> > > > >> >> >> >>> >>and
> > > > >> >> >> >>> >> >> have
> > > > >> >> >> >>> >> >> > > >>>your
> > > > >> >> >> >>> >> >> > > >>> >> StreamTask refresh it periodically by
> > > > >> >> >>re-downloading
> > > > >> >> >> >>>it.
> > > > >> >> >> >>> >> >> > > >>> >> 3. You can do remote RPC calls for the
> > > IPGeo
> > > > >> >>data
> > > > >> >> >>on
> > > > >> >> >> >>> every
> > > > >> >> >> >>> >> >>page
> > > > >> >> >> >>> >> >> > view
> > > > >> >> >> >>> >> >> > > >>> >>event
> > > > >> >> >> >>> >> >> > > >>> >> by query some remote service or DB
> (e.g.
> > > > >> >> >>Cassandra).
> > > > >> >> >> >>> >> >> > > >>> >> 4. You can use Samza's state feature to
> > set
> > > > >>your
> > > > >> >> >>IPGeo
> > > > >> >> >> >>> >>data
> > > > >> >> >> >>> >> >>as a
> > > > >> >> >> >>> >> >> > > >>>series
> > > > >> >> >> >>> >> >> > > >>> >>of
> > > > >> >> >> >>> >> >> > > >>> >> messages to a log-compacted Kafka topic
> > > > >> >> >> >>> >> >> > > >>> >> (
> > > > >> >> >> >>> >> >>
> > > > >> >> >>
> > https://cwiki.apache.org/confluence/display/KAFKA/Log+Compaction
> > > > >> >> >> >>> >> >> > ),
> > > > >> >> >> >>> >> >> > > >>> and
> > > > >> >> >> >>> >> >> > > >>> >> configure your Samza job to read this
> > topic
> > > > >>as a
> > > > >> >> >> >>> bootstrap
> > > > >> >> >> >>> >> >> stream
> > > > >> >> >> >>> >> >> > > >>> >> (
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>>
> > > > >> >> >> >>>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > http://samza.incubator.apache.org/learn/documentation/0.7.0/container/st
> > > > >> >> >> >>> >> >> > > >>>r
> > > > >> >> >> >>> >> >> > > >>> >>e
> > > > >> >> >> >>> >> >> > > >>> >> ams.html).
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> For (4), you'd have to partition the
> > IPGeo
> > > > >>state
> > > > >> >> >>topic
> > > > >> >> >> >>> >> >>according
> > > > >> >> >> >>> >> >> > to
> > > > >> >> >> >>> >> >> > > >>>the
> > > > >> >> >> >>> >> >> > > >>> >> same key as PageViewEvent. If
> > PageViewEvent
> > > > >>were
> > > > >> >> >> >>> >>partitioned
> > > > >> >> >> >>> >> >>by,
> > > > >> >> >> >>> >> >> > > >>>say,
> > > > >> >> >> >>> >> >> > > >>> >> member ID, but you want your IPGeo
> state
> > > > >>topic
> > > > >> >>to
> > > > >> >> >>be
> > > > >> >> >> >>> >> >>partitioned
> > > > >> >> >> >>> >> >> > by
> > > > >> >> >> >>> >> >> > > >>>IP
> > > > >> >> >> >>> >> >> > > >>> >> address, then you'd have to have an
> > > upstream
> > > > >>job
> > > > >> >> >>that
> > > > >> >> >> >>> >> >> > re-partitioned
> > > > >> >> >> >>> >> >> > > >>> >> PageViewEvent into some new topic by IP
> > > > >>address.
> > > > >> >> >>This
> > > > >> >> >> >>>new
> > > > >> >> >> >>> >> >>topic
> > > > >> >> >> >>> >> >> > will
> > > > >> >> >> >>> >> >> > > >>> >>have
> > > > >> >> >> >>> >> >> > > >>> >> to have the same number of partitions
> as
> > > the
> > > > >> >>IPGeo
> > > > >> >> >> >>>state
> > > > >> >> >> >>> >> >>topic
> > > > >> >> >> >>> >> >> (if
> > > > >> >> >> >>> >> >> > > >>> IPGeo
> > > > >> >> >> >>> >> >> > > >>> >> has 8 partitions, then the new
> > > > >> >> >> >>>PageViewEventRepartitioned
> > > > >> >> >> >>> >> >>topic
> > > > >> >> >> >>> >> >> > > >>>needs 8
> > > > >> >> >> >>> >> >> > > >>> >>as
> > > > >> >> >> >>> >> >> > > >>> >> well). This will cause your
> > > > >> >> >>PageViewEventRepartitioned
> > > > >> >> >> >>> >>topic
> > > > >> >> >> >>> >> >>and
> > > > >> >> >> >>> >> >> > > >>>your
> > > > >> >> >> >>> >> >> > > >>> >> IPGeo state topic to be aligned such
> that
> > > the
> > > > >> >> >> >>>StreamTask
> > > > >> >> >> >>> >>that
> > > > >> >> >> >>> >> >> gets
> > > > >> >> >> >>> >> >> > > >>>page
> > > > >> >> >> >>> >> >> > > >>> >> views for IP address X will also have
> the
> > > > >>IPGeo
> > > > >> >> >> >>> >>information
> > > > >> >> >> >>> >> >>for
> > > > >> >> >> >>> >> >> IP
> > > > >> >> >> >>> >> >> > > >>> >>address
> > > > >> >> >> >>> >> >> > > >>> >> X.
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> Which strategy you pick is really up to
> > > you.
> > > > >>:)
> > > > >> >> >>(4) is
> > > > >> >> >> >>> the
> > > > >> >> >> >>> >> >>most
> > > > >> >> >> >>> >> >> > > >>> >> complicated, but also the most
> flexible,
> > > and
> > > > >> >>most
> > > > >> >> >> >>> >> >>operationally
> > > > >> >> >> >>> >> >> > > >>>sound.
> > > > >> >> >> >>> >> >> > > >>> >>(1)
> > > > >> >> >> >>> >> >> > > >>> >> is the easiest if it fits your needs.
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> Cheers,
> > > > >> >> >> >>> >> >> > > >>> >> Chris
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> On 8/21/14 10:15 AM, "Shekar Tippur"
> > > > >> >> >> >>><ctippur@gmail.com>
> > > > >> >> >> >>> >> >>wrote:
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >> >Hello,
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >I am new to Samza. I have just
> installed
> > > > >>Hello
> > > > >> >> >>Samza
> > > > >> >> >> >>>and
> > > > >> >> >> >>> >> >>got it
> > > > >> >> >> >>> >> >> > > >>> >>working.
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >Here is the use case for which I am
> > trying
> > > > >>to
> > > > >> >>use
> > > > >> >> >> >>>Samza:
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >1. Cache the contextual information
> > which
> > > > >> >>contains
> > > > >> >> >> >>>more
> > > > >> >> >> >>> >> >> > information
> > > > >> >> >> >>> >> >> > > >>> >>about
> > > > >> >> >> >>> >> >> > > >>> >> >the hostname or IP address using
> > > > >> >>Samza/Yarn/Kafka
> > > > >> >> >> >>> >> >> > > >>> >> >2. Collect alert and metric events
> which
> > > > >> >>contain
> > > > >> >> >> >>>either
> > > > >> >> >> >>> >> >> hostname
> > > > >> >> >> >>> >> >> > > >>>or IP
> > > > >> >> >> >>> >> >> > > >>> >> >address
> > > > >> >> >> >>> >> >> > > >>> >> >3. Append contextual information to
> the
> > > > >>alert
> > > > >> >>and
> > > > >> >> >> >>>metric
> > > > >> >> >> >>> >>and
> > > > >> >> >> >>> >> >> > > >>>insert to
> > > > >> >> >> >>> >> >> > > >>> >>a
> > > > >> >> >> >>> >> >> > > >>> >> >Kafka queue from which other
> subscribers
> > > > >>read
> > > > >> >>off
> > > > >> >> >>of.
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >Can you please shed some light on
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >1. Is this feasible?
> > > > >> >> >> >>> >> >> > > >>> >> >2. Am I on the right thought process
> > > > >> >> >> >>> >> >> > > >>> >> >3. How do I start
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >I now have 1 & 2 of them working
> > > > >>disparately. I
> > > > >> >> >>need
> > > > >> >> >> >>>to
> > > > >> >> >> >>> >> >> integrate
> > > > >> >> >> >>> >> >> > > >>> them.
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >Appreciate any input.
> > > > >> >> >> >>> >> >> > > >>> >> >
> > > > >> >> >> >>> >> >> > > >>> >> >- Shekar
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>> >>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>>
> > > > >> >> >> >>> >> >> > > >>
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> > >
> > > > >> >> >> >>> >> >> >
> > > > >> >> >> >>> >> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>> >>
> > > > >> >> >> >>>
> > > > >> >> >> >>>
> > > > >> >> >> >>
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >>
> > > > >> >>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message