spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Konwinski <andykonwin...@gmail.com>
Subject Fwd: Spark Streaming: Flume stream not found
Date Thu, 22 May 2014 16:46:04 GMT
I'm forwarding this email along which contains a question from a Spark user
Adrien (CC'd) who can't successfully get any emails through to the Apache
mailing lists.

Please reply-all when responding to include Adrien. See below for his
question.
---------- Forwarded message ----------
From: "Adrien Legrand" <legrand.ax@gmail.com>
Date: May 22, 2014 1:06 AM
Subject: Re: Post validation
To: "Andy Konwinski" <andykonwinski@gmail.com>
Cc:

Thanks, it would be nice ! Here is the question:

Title: Spark Streaming: Flume stream not found

Message:
Hi everyone,

I am currently trying to process a flume (avro) stream with spark streaming
on a yarn cluster. Everything is fine when I try to launch my code locally.
 To do so, I use the following args :

master = "local"
host, port = the machine I'm sending the stream to with flume (I triple
checked the concordance between flume / spark).

                val ssc = new StreamingContext(master, "FlumeEventCount",
batchInterval,
                                System.getenv("SPARK_HOME"),
StreamingContext.jarOfClass(this.getClass))

                val stream = FlumeUtils.createStream(ssc, host, port.toInt)

But when I try to launch the same job parallelized (replacing "local" by
"yarn-standalone"), the jar is launched (I can see some print I used to
debug the code) but it shows the expected output (from the data processing)
only 1 time out of 5 or 10.
Here is the complete command line:

$SPARK_HOME/bin/spark-class org.apache.spark.deploy.yarn.Client --jar
/home/www/loganalysis-1.0-SNAPSHOT-jar-with-dependencies.jar --class
com.loganalysis.Computation --args yarn-standalone --args
receiver.priv.fr--args 9999 --num-workers 6 --master-memory 4g
--worker-memory 2g
--worker-cores 1

For no apparent reasons, sometimes the processing is done. The first guess
was that, since I use a big jars with all dependencies in it, other
machines don't have those dependencies and thus can't do the processing.
That's why I tried to add my executed jar with -addJars but the result was
the same.


2014-05-22 7:18 GMT+02:00 Andy Konwinski <andykonwinski@gmail.com>:

> One other idea occurred to me. You can try subscribing to the nabble
> mailing lists and send your emails to those. They will relay emails to the
> apache lists. Not sure if it'll help or not.
>
> I'm really sorry to hear you're having problems with the lists.
>
> If you forward me your questions I can relay them to the mailing list for
> you.
>
>
>
> On Wed, May 21, 2014 at 1:56 AM, Adrien Legrand <legrand.ax@gmail.com>wrote:
>
>> Hello Andy,
>> Thank you for your answer.
>> I used the mailing list's form to submit both of the subjects. Is it
>> still possible that the problem you've talked about may be the cause ?
>> I am also directly checking my mails in gmail, I don't use any mailing
>> client like outlook.
>> The only thing I'm thinking about the problem here is my company's proxy,
>> but it seems really unlikely...
>>
>> Thank you,
>>
>> Adrien
>>
>>
>>
>> 2014-05-20 18:36 GMT+02:00 Andy Konwinski <andykonwinski@gmail.com>:
>>
>> Hi Adrien,
>>>
>>> I'm not positive but I suspect something about how you at up your email
>>> client (or forwarding or something) may be triggering spam filters.
>>>
>>> In my gmail a note pops up in the email from you that says: "this
>>> message may not have been sent by legrand.ax@gmail.com".
>>>
>>> Other than that I'm not sure what else might be going wrong. You can try
>>> pinging the apache infra people for more help too.
>>> On May 20, 2014 2:42 AM, <legrand.ax@gmail.com> wrote:
>>>
>>>> Hello Andy,
>>>> I posted 2 different topics (the important one is "Spark streaming:
>>>> flume stream not found") on the spark user mailing list, but none of them
>>>> was accepted.
>>>> I registered myself before posting each subject and I think I didn't
>>>> made any mistakes.
>>>> After posting the last (and important) subject, I received the
>>>> following mail:
>>>>
>>>> Hi. This is the qmail-send program at apache.org <http://apache.org/>
.
>>>> I'm afraid I wasn't able to deliver your message to the following
>>>> addresses.
>>>> This is a permanent error; I've given up. Sorry it didn't work out.
>>>>
>>>> Did I do something wrong ?
>>>>
>>>> Thank you,
>>>>
>>>> Adrien
>>>>
>>>> _____________________________________
>>>> Sent from http://apache-spark-user-list.1001560.n3.nabble.com
>>>>
>>>>
>>
>

Mime
View raw message