pulsar-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Pulsar Slack" <apache.pulsar.sl...@gmail.com>
Subject Slack digest for #general - 2019-04-26
Date Fri, 26 Apr 2019 09:11:04 GMT
2019-04-25 09:11:20 UTC - Romain Castagnet: Hi, did you try "namespaces create" without cluster
and next "namespaces set-clusters TENANT/NS -c cluster1,cluster2" ?
----
2019-04-25 09:14:15 UTC - stefan: hi, yes this is what i did
----
2019-04-25 09:16:04 UTC - stefan: i tried both ways
----
2019-04-25 09:23:24 UTC - Romain Castagnet: hum strange
----
2019-04-25 09:32:28 UTC - Matti-Pekka Laaksonen: Today I noticed one of our client applications
had died, seemingly due to a lost connection. The last log message is:
{"timestamp":"2019-04-25T06:33:19.900Z","level":"WARN","thread":"pulsar-client-io-1-1","logger":"org.apache.pulsar.client.impl.ClientCnx","message":"[10.223.2.164/10.223.2.164:6650]
Got exception NativeIoException : syscall:read(..) failed: Connection reset by peer","context":"default"}
----
2019-04-25 09:32:38 UTC - Matti-Pekka Laaksonen: This leads me to <https://github.com/apache/pulsar/blob/branch-2.2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L219>
----
2019-04-25 09:37:15 UTC - Matti-Pekka Laaksonen: I don't quite understand this case. Normally
when the Pulsar connection is lost we catch the exception, close down the application gracefully,
and the orchestration service restarts the container after a delay. In this case, however,
there is no error or a caught exception, simply a WARN level log message. I'm not familiar
with the execution path of the ClientCnx, should the connection die after the state is set
to State.Failed?
----
2019-04-25 10:15:13 UTC - Yuvaraj Loganathan: Because the client will retry and establish
the connection :thinking_face:
----
2019-04-25 10:35:49 UTC - songxinlei: @songxinlei has joined the channel
----
2019-04-25 11:39:58 UTC - Matti-Pekka Laaksonen: Hmm, might be that the Pulsar client  was
able to reconnect, but the non-Pulsar parts of the client failed. I'll look into it
----
2019-04-25 12:28:59 UTC - Chris Bartholomew: I wanted to let everyone know that I've built
a service based on Pulsar. You can see it here: <https://kafkaesque.io> I am really
hoping it helps people get started with Pulsar, testing their client code, etc. A basic account
is free and includes an integrated dashboard for admin and monitoring of topics, namespaces,
clusters, geo-replication. Would love it if everyone could try it out and give me some feedback.
Thanks.
+1 : Sijie Guo, Ezequiel Lovelle, Guy Feldman, Karthik Ramasamy, DT, Ruud Kamphuis
----
2019-04-25 16:00:14 UTC - Grant Wu: What’s the current status of the Docker/k8s runtime
for PFs?
----
2019-04-25 16:07:45 UTC - Sijie Guo: @Grant Wu k8s runtime is supported since 2.3.0. The documentation
is still missing though :disappointed:
----
2019-04-25 16:08:29 UTC - Grant Wu: :disappointed:
----
2019-04-25 17:43:08 UTC - Devin G. Bost: We increased parallelism for a high-traffic Pulsar
function (from 3 to 5), but the data shows that the new function instances aren't getting
any traffic. How would we figure out why these instances aren't getting any of the load?
----
2019-04-25 17:44:13 UTC - Matteo Merli: Can you share the topics stats for the topic these
are consuming from?

`pulsar-admin topics stats $TOPIC`
----
2019-04-25 17:54:10 UTC - Thor Sigurjonsson: Devin is getting the topic stats ready...
----
2019-04-25 17:55:59 UTC - Thor Sigurjonsson: I guess to add a little color to the conversation,
we noticed that the metrics in grafana showed higher latency on 0.999 quantile and wanted
to see if we could bring that down, when we deployed parallelism 5 (from 3) we noticed 2 new
functions share hosts with 2 "older ones" and there are not metrics being shown either in
grafana for those and 0 metrics from the pulsar-admin functions stats call.
----
2019-04-25 17:56:34 UTC - Devin G. Bost: I noticed that the instance with instance_id: "2"
is missing from the list of subscriptions.
----
2019-04-25 17:56:49 UTC - Thor Sigurjonsson: those .999 quantile ones are around 100-125ms.
----
2019-04-25 17:57:23 UTC - Matteo Merli: It seems all 5 consumers (1 per function instance)
are consuming at ~32 msg/s
----
2019-04-25 17:59:03 UTC - Matteo Merli: In the stats JSON, you have the `msgRateOut` for each
consumer and the overall for the subscription
----
2019-04-25 18:02:36 UTC - Thor Sigurjonsson: when I do pulsar-admin functions status on that
function I get this for instance 2:
```{
    "instanceId" : 2,
    "status" : {
      "running" : true,
      "error" : "",
      "numRestarts" : 0,
      "numReceived" : 0,
      "numSuccessfullyProcessed" : 0,
      "numUserExceptions" : 0,
      "latestUserExceptions" : [ ],
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "averageLatency" : 0.0,
      "lastInvocationTime" : 0,
      "workerId" : "REDACTED-8080"
    }```
----
2019-04-25 18:04:36 UTC - Devin G. Bost: I also only count 4 consumers.
----
2019-04-25 18:05:32 UTC - Thor Sigurjonsson: Also instance 3 and instance 2 are on the same
host, and instance 2 shows no metrics from prometheus-grafana and instance 4 has ~100ms .999
quantile latency (and no data showing for instance 2). It's roughly twice what other functions
report... Made me guess maybe they were being rolled up for the host or something...
----
2019-04-25 18:07:45 UTC - Devin G. Bost: In the JSON output from `pulsar-admin topics stats
$TOPIC`, I only see consumers with these instance_id values: 0, 3, 1, 4. (2 is missing.)
----
2019-04-25 18:12:51 UTC - David Kjerrumgaard: Are there any errors in the log for instance
2?
----
2019-04-25 18:16:07 UTC - Thor Sigurjonsson: Full contents of log from instance 2 at about
the time of the parallelism update.
----
2019-04-25 18:20:33 UTC - Ruud Kamphuis: Interesting! Question: why does the name include
kafka?
----
2019-04-25 18:21:18 UTC - Jerry Peng: @Thor Sigurjonsson there are not more logs for instance-2?
 If so it seems to be getting stuck.
----
2019-04-25 18:21:37 UTC - Jerry Peng: @Thor Sigurjonsson do you guys have function state enabled?
----
2019-04-25 18:22:38 UTC - Thor Sigurjonsson: Hmm, we do have a log topic set...
----
2019-04-25 18:23:01 UTC - Thor Sigurjonsson: working to see about function state being enabled..
----
2019-04-25 18:25:06 UTC - Thor Sigurjonsson: would that be `stateStorageServiceUrl` in functions_worker.yml?
(It's commented out).
----
2019-04-25 18:25:19 UTC - Jerry Peng: yes and gotcha
----
2019-04-25 18:25:47 UTC - Jerry Peng: you guys are running functions via Thread Runtime?
----
2019-04-25 18:26:09 UTC - Thor Sigurjonsson: Yes
----
2019-04-25 18:26:59 UTC - Thor Sigurjonsson: (kerberos jvm params plumbing the kafka connector
made us go there for now)
----
2019-04-25 18:29:25 UTC - Jerry Peng: Give me a second to investigate
+1 : Thor Sigurjonsson, Devin G. Bost
----
2019-04-25 18:33:31 UTC - Thor Sigurjonsson: That's like OpenOffice calling their thing Microsofty.
:slightly_smiling_face:
----
2019-04-25 18:38:23 UTC - Thor Sigurjonsson: Sorry about being flippant. :slightly_smiling_face:
I get that there is a good search marketing angle to get streaming customers.. I'll give it
a spin this week and see if I can give useful feedback.
----
2019-04-25 18:48:02 UTC - Ruud Kamphuis: Yeah to me the name is only gonna confuse people.
People are looking for pulsar will see the name and think : nope, this is not what I want.
People that want hosted kafka find this and think nope, this is not what I want. Just my 2
cents 
----
2019-04-25 18:48:26 UTC - Ruud Kamphuis: Great to have a hosted pulsar offering tho! :raised_hands:
----
2019-04-25 18:53:50 UTC - Chethan UK: @Chethan UK has joined the channel
----
2019-04-25 19:04:50 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson I have reproduced
the issue.  There was a PR that went in earlier this year that might have be causing race
conditions when using the same pulsar client to create consumers as is the case for running
functions via ThreadRuntime.  I am looking for a fix the issue.

  In the meantime, do you guys want to  try running with process runtime? We have added the
ability to add runtime flags so that your kerberos configs can get passed in.  The functionality
is not in a official release so you guys can either 1) try to build your own pulsar release
from master or 2) you can try a streamlio pulsar release that contains the functionality since
we create releases more often that apache does.
+1 : Thor Sigurjonsson
----
2019-04-25 19:05:21 UTC - Devin G. Bost: &gt; I have reproduced the issue.  There was
a PR that went in earlier this year that might have be causing race conditions when using
the same pulsar client to create consumers as is the case for running functions via ThreadRuntime.
 I am looking for a fix the issue.

Very impressive.
----
2019-04-25 19:06:51 UTC - Jerry Peng: Thanks! Can’t take all the credit.  @Matteo Merli
also helped
+1 : Devin G. Bost
----
2019-04-25 19:13:11 UTC - Devin G. Bost: Is there a temporary workaround?

We did have some concerns about the memory utilization that might be associated with the process
runtime (with parallelization). Would we increase our memory requirements if we parallelized
with the process runtime instead of parallelizing with the threading runtime?
----
2019-04-25 19:15:46 UTC - Thor Sigurjonsson: I think there are a few things that go into the
decision for us: 1) bug fixes we need, 2) what runtime to "settle on" (and in which cluster
maybe) 3) functions support for publishing properties and then the timing and how we roll
out the prod env right now being used. We can roll faster in lower environment, but it would
be good to pick a release soon that gets the most bang for the buck.
----
2019-04-25 19:16:22 UTC - Thor Sigurjonsson: This parallelism issue is not critical just yet,
but it is part of 1) above I think.
----
2019-04-25 19:16:57 UTC - Thor Sigurjonsson: I guess we should consider the streamlio build
also going forward.
----
2019-04-25 19:17:51 UTC - Thor Sigurjonsson: I'm guessing much of what we'd need would be
in 2.3.2 (I may be wrong).
----
2019-04-25 19:20:08 UTC - Thor Sigurjonsson: Would we be getting those from here <https://hub.docker.com/r/streamlio/pulsar/tags>
? if we were rolling with docker?
----
2019-04-25 19:27:29 UTC - Jerry Peng: @Thor Sigurjonsson yes but its currently does not have
the latest image.  We are actually in the process of doing a another release.  A new image
should be up in the next half an hour
+1 : Thor Sigurjonsson
----
2019-04-25 19:28:22 UTC - Jerry Peng: @Devin G. Bost I am not sure of a temporary workaround
at this moment, but this issue doesn’t happen everytime.  It is a race condition.  I am
only able to reproduce it once out of the many times I have tried.
----
2019-04-25 19:34:46 UTC - Chethan UK: Has anyone used MongoDB Source connector?
----
2019-04-25 19:45:57 UTC - David Kjerrumgaard: Not yet, are you having issues?
----
2019-04-25 19:47:22 UTC - Chethan UK: <https://pulsar.apache.org/docs/en/io-cdc/>

is there a good tutorial on MongoDB *Source*?
----
2019-04-25 19:47:57 UTC - Ali Ahmed: <https://github.com/bbonnin/pulsar-io-mongo>
----
2019-04-25 19:48:11 UTC - Chethan UK: Its sink, I want source
----
2019-04-25 19:48:49 UTC - Ali Ahmed: sorry the source is just debezium you can try debezium
docs
----
2019-04-25 19:54:51 UTC - Chethan UK: Where is the helm chart <https://pulsar.apache.org/docs/en/deploy-kubernetes/#deploying-pulsar-components-helm>
?
----
2019-04-25 19:59:16 UTC - David Kjerrumgaard: @Chethan UK It is bundled with the code. If
you close the pulsar repo, then go to /apache/pulsar/deployment/kubernetes/helm/pulsar
----
2019-04-25 19:59:52 UTC - Devin G. Bost: Gotcha.
----
2019-04-25 20:32:34 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson a new docker release
is not available:
<https://hub.docker.com/r/streamlio/pulsa>
----
2019-04-25 20:32:46 UTC - Jerry Peng: sorry: <https://hub.docker.com/r/streamlio/pulsar/tags>
----
2019-04-25 20:43:03 UTC - Devin G. Bost: Thanks!
----
2019-04-25 22:28:54 UTC - Steven Le Roux: Hi, I've deployed a local instance of pulsar, but
with separated components (zk, bk)
----
2019-04-25 22:29:44 UTC - Steven Le Roux: Bk seems ok so far (bk shell listbookies, is listing
bookies), they're registred into zk properly under /ledgers/available
----
2019-04-25 22:30:16 UTC - Steven Le Roux: but when starting pulsar, it connects to zk, then
:
22:03:43.753 [main] ERROR org.apache.bookkeeper.client.BookieWatcherImpl - Failed to get bookie
list :
----
2019-04-25 22:30:55 UTC - Steven Le Roux: I can't find where to configure the ledger zk path,
but anyway, it defaults to /ledgers which should be fine :
----
2019-04-25 22:30:56 UTC - Steven Le Roux: 22:03:43.583 [main] INFO  org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase
- Initialize zookeeper metadata driver with external zookeeper client : ledgersRootPath =
/ledgers.
----
2019-04-25 22:31:13 UTC - Steven Le Roux: any idea what I'm missing ?
----
2019-04-25 22:31:53 UTC - Matteo Merli: What’s the `zookeeperServers` settings in `broker.conf`?
----
2019-04-25 22:33:24 UTC - Steven Le Roux: zookeeperServers=10.0.0.2:2181/pulsar-local
----
2019-04-25 22:33:54 UTC - Steven Le Roux: I've reduced to one for testing but there are three
of them
----
2019-04-25 22:33:55 UTC - Matteo Merli: I see, you’re using a chroot for ZK
----
2019-04-25 22:34:05 UTC - Steven Le Roux: yes
----
2019-04-25 22:34:14 UTC - Matteo Merli: is BK also using the same chroot?
----
2019-04-25 22:34:36 UTC - Steven Le Roux: also, I'm testing to chroot zk so that I can collocalize
local zk and global zk for testing purpose
----
2019-04-25 22:35:25 UTC - Steven Le Roux: ok from what you're saying, pulsar is expecting
to read /ledgers at /pulsar-local/ledgers then ?
----
2019-04-25 22:35:31 UTC - Matteo Merli: You can co-locate them without needing the chroot
----
2019-04-25 22:35:47 UTC - Matteo Merli: the “global” zk is only using `/admin/` prefix
----
2019-04-25 22:36:02 UTC - Steven Le Roux: ok perfect
----
2019-04-25 22:36:23 UTC - Matteo Merli: &gt; ok from what you’re saying, pulsar is expecting
to read /ledgers at /pulsar-local/ledgers then ?

Yes, both Pulsar and BK should share the same chroot
----
2019-04-25 22:36:36 UTC - Steven Le Roux: ok that's why
----
2019-04-25 22:36:39 UTC - Steven Le Roux: thx, testing ;:)
----
2019-04-25 22:36:42 UTC - Matteo Merli: :slightly_smiling_face:
----
2019-04-25 22:43:08 UTC - Steven Le Roux: Far better :wink: thx @Matteo Merli!
+1 : Matteo Merli
----
2019-04-26 00:10:48 UTC - Grant Wu: @Sijie Guo did you figure anything out about <https://github.com/apache/bookkeeper/issues/1970>
?
----
2019-04-26 01:05:37 UTC - Jerry Peng: @Grant Wu I talked with a few users that saw this problem,
they all had errors in their bookies when this error was occurring.  There wasn’t enough
non-faulty bookies in the cluster and that is what is causing this exception.
----
2019-04-26 01:06:58 UTC - Grant Wu: Interesting
----
2019-04-26 01:09:37 UTC - durga: Ok. Thanks @Matteo Merli
----
2019-04-26 02:26:48 UTC - Sijie Guo: @Grant Wu I was looking into that issue before but I
didn’t get to the root cause yet. it is still on my backlog. even as what @Jerry Peng there
wasn’t enough non-faulty bookies in the cluster, that bookkeeper should handle that. the
ArrayIndexOutoOfBoundsException doesn’t sound right to me.

but anyway I will look into it whenever I have time.
----
Mime
View raw message