samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Mangi <r...@chartbeat.com>
Subject Problems upgrading Job
Date Thu, 12 Nov 2015 16:48:17 GMT
Hi,

I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against the latest). Everything
works fine running locally (although I had to make some changes to the local grid’s kafka
since the checkpointing seems to require replication_factor > 1) but when I deploy it against
my production yarn cluster I get these errors.

[yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state changed (SyncConnected)
[yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate ZkClient event thread.
[yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: 0x250233cdf57f2fa closed
[yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread shut down
[yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] Coordinator stream __samza_coordinator_metrics-reporter_1
already exists.
[yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config in coordinator stream.
[yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer [INFO] Starting coordinator
stream producer.
[yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] Creating a new producer
for system mykafka.
[yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] ProducerConfig values:
[yarnmaster01] out: 	value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
[yarnmaster01] out: 	key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
[yarnmaster01] out: 	block.on.buffer.full = true
[yarnmaster01] out: 	retry.backoff.ms = 100
[yarnmaster01] out: 	buffer.memory = 33554432
[yarnmaster01] out: 	batch.size = 16384
[yarnmaster01] out: 	metrics.sample.window.ms = 30000
[yarnmaster01] out: 	metadata.max.age.ms = 300000
[yarnmaster01] out: 	receive.buffer.bytes = 32768
[yarnmaster01] out: 	timeout.ms = 30000
[yarnmaster01] out: 	max.in.flight.requests.per.connection = 1
[yarnmaster01] out: 	bootstrap.servers = [devstream01.chartbeat.net:9092]
[yarnmaster01] out: 	metric.reporters = []
[yarnmaster01] out: 	client.id = samza_producer-metrics_reporter-1-1447342853273-4
[yarnmaster01] out: 	compression.type = none
[yarnmaster01] out: 	retries = 2147483647
[yarnmaster01] out: 	max.request.size = 1048576
[yarnmaster01] out: 	send.buffer.bytes = 131072
[yarnmaster01] out: 	acks = 1
[yarnmaster01] out: 	reconnect.backoff.ms = 10
[yarnmaster01] out: 	linger.ms = 0
[yarnmaster01] out: 	metrics.num.samples = 2
[yarnmaster01] out: 	metadata.fetch.timeout.ms = 60000
[yarnmaster01] out:
[yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The configuration batch.num.messages
= null was supplied but isn't a known config.
[yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The configuration producer.type
= null was supplied but isn't a known config.
[yarnmaster01] out: Exception in thread "main" org.apache.samza.SamzaException: org.apache.kafka.common.errors.TimeoutException:
Failed to update metadata after 60000 ms.
[yarnmaster01] out: 	at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115)
[yarnmaster01] out: 	at org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132)
[yarnmaster01] out: 	at org.apache.samza.job.JobRunner.run(JobRunner.scala:85)
[yarnmaster01] out: 	at org.apache.samza.job.JobRunner$.main(JobRunner.scala:43)
[yarnmaster01] out: 	at org.apache.samza.job.JobRunner.main(JobRunner.scala)
[yarnmaster01] out: Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to
update metadata after 60000 ms.
[yarnmaster01] out:


Warning: run() received nonzero return code 1 while executing './bin/run-job.sh -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory
--config-path=file://$PWD/conf/metrics_reporter.properties'!


This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 but I’m not using
a StreamAppender in log4j.

Any ideas? My first thought is that I might have to delete the existing checkpoint topics
but that would mean we can’t migrate completely until the 10.0 release unless we want to
run snapshot code in production.

Thanks!

Rick



Mime
View raw message