pulsar-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Pulsar Slack" <apache.pulsar.sl...@gmail.com>
Subject Slack digest for #general - 2018-12-12
Date Wed, 12 Dec 2018 09:11:04 GMT
2018-12-11 10:14:41 UTC - Maarten Tielemans: Morning all, currently hitting a DNS resolve issue
when I attempt to connect to either of my two brokers. Keep in mind, zookeeper and the bookie
are running on the same nodes.

I did perform the write tests on the bookie's successfully.

10:11:12.321 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Adding 1 publishers
on topic <persistent://public/default/persistent-60>
10:11:12.465 [pulsar-client-io-2-2] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id:
0x61c9eaa6, L:/ - R:localhost/]] Connected to server
10:11:12.877 [pulsar-client-io-2-2] INFO  org.apache.pulsar.client.impl.ConnectionPool - [[id:
0xd9fb817a, L:/ - R:localhost/]] Connected to server
10:11:13.367 [pulsar-client-io-2-2] WARN  org.apache.pulsar.client.impl.ConnectionPool - Failed
to open connection to ip-10-0-3-183.kanto.indigo:6650 : io.netty.resolver.dns.DnsNameResolverContext$SearchDomainUnknownHostException:
Search domain query failed. Original hostname: 'ip-10-0-3-183.kanto.indigo' failed to resolve
'ip-10-0-3-183.kanto.indigo.kanto.indigo' after 6 queries
2018-12-11 12:51:40 UTC - Maarten Tielemans: Going through my config trying to resolve the
above. When setting the cluster metadata,

`--web-service-url	The web service URL for the cluster, plus a port. This URL should be a
standard DNS name.`

What is the recommended way to set this up? I tried with a TCP and a HTTP load balancer in
front of the two pulsar nodes
2018-12-11 14:26:21 UTC - Sijie Guo: yes +1
2018-12-11 15:32:07 UTC - Julien Plissonneau Duquène: Thanks for the information. Not sure
I will be able to attend that one, but I'll try for sure.
2018-12-11 15:40:07 UTC - Mike Card: Has anyone else had a problem with message truncation
under very heavy load? We have been running a test with Pulsar 2.2.0 in which we have a web
app running on 3 EC2 instances (1 in each of us-east-1a, 1b, and 1c) which has a REST API
that can be called which will result in a message being written into a partitioned Pulsar
topic. We have a load test that results in calls to that REST API at a rate of about 15 KHz
in aggregate across all 3 availability zones. We are seeing messages truncated at 64 bytes
into the partitioned topic using both synchronous and asynchronous producer send calls, under
light load this does not happen. Just wondered if any of you had seen anything similar and
identified a cause or work-around.
2018-12-11 15:48:37 UTC - David Kjerrumgaard: @Maarten Tielemans The quickest way to resolve
your issue is to edit the /etc/hosts file on the host that is trying to resolve the host name
"ip-10-0-3-183.kanto.indigo" to include that host/IP mapping.  If you are using a load balancer
in front of the broker, make sure that your forwarding rules use the IP address and not the
hostname, otherwise you will have to manually add cname records into the DNS server you are
2018-12-11 15:51:21 UTC - David Kjerrumgaard: @Mike Card How are you determining that the
message size is 64 bytes? Is that the size of the messages returned from the consumer?
2018-12-11 15:54:56 UTC - Mike Card: Yes. My downstream consumer tasks log a buffer underflow
exception for a 64 byte message (should be larger) and if I restart the web app it will die
again when it restarts and begins reading messages out of the “input” topic. I do not
observe this behavior under light load. 

The code in question runs fine with Kafka, putting Pulsar in place of Kafka in this application
seemed easy but I’m wondering if there is something I have mis-configured or what to cause
2018-12-11 16:14:25 UTC - Grant Wu: Are message IDs incrementing on a per topic basis?
2018-12-11 16:16:20 UTC - Mike Card: wouldn't message IDs increment on a per-message basis?
2018-12-11 16:17:11 UTC - Mike Card: (oh if it matters producers are using round-robin to
determine partition to write messages to next)
2018-12-11 16:18:33 UTC - Grant Wu: I meant more like
2018-12-11 16:18:58 UTC - Grant Wu: if the first message in a topic has message ID n, does
the second message have message ID n+1
2018-12-11 16:19:15 UTC - Grant Wu: Sorry, this isn’t related to your previous issue :sweat_smile:
2018-12-11 16:19:24 UTC - Grant Wu: For what it’s worth, I doubt this is the case - I just
wanted to check
2018-12-11 16:20:37 UTC - David Kjerrumgaard: @Mike Card It would be useful if you could share
the code/configuration you are using for testing, as Pulsar handles 100s of millions of messages
in Production at Yahoo, so it does perform well under heavy load scenarios.
2018-12-11 16:22:52 UTC - David Kjerrumgaard: @Mike Card  So the data flow is  load generator
--- REST ----&gt; web app ----&gt; pulsar client ---&gt; Partitioned topic?
2018-12-11 16:23:09 UTC - David Kjerrumgaard: Is it possible that the truncation occurs in
the web application?
2018-12-11 16:23:09 UTC - Mike Card: yes
2018-12-11 16:24:07 UTC - David Kjerrumgaard: Does the web app use a Java client or a web
socket to communicate to the Pulsar Broker?
2018-12-11 16:26:41 UTC - Mike Card: It uses the Pulsar Java client
2018-12-11 16:36:49 UTC - David Kjerrumgaard: Just as a sanity check, can you add a line to
the web app that checks the size of the message before it publishes it?  That will help us
isolate the issue.
2018-12-11 16:37:10 UTC - Mike Card: Yes I can
+1 : David Kjerrumgaard
2018-12-11 16:37:40 UTC - David Kjerrumgaard: Let me know what you find
2018-12-11 16:38:12 UTC - Mike Card: OK
2018-12-11 17:49:04 UTC - Shalin: Can you set python version for the pulsar functions to run
in or point to the python file to use?
2018-12-11 18:58:08 UTC - Ryan Samo: Hey guys, is there a way to easily reset a Pulsar cluster?
Like wipe the metadata back to when the cluster was first created?
2018-12-11 19:54:27 UTC - David Kjerrumgaard: @Ryan Samo You may be able to re-initialize
the cluster metadata, but be cautious.  <http://pulsar.apache.org/docs/latest/admin-api/clusters/#Initializeclustermetadata-v8mupg>
2018-12-11 19:55:34 UTC - David Kjerrumgaard: It basically wipes out the entries in ZK, but
leaves the data on the bookies
2018-12-11 19:58:44 UTC - Ryan Samo: Ok thanks, one more question. Where does Pulsar keep
track of the cert to role mapping? Like if I have a cert named client1 and grant consume to
client1, is that in a bookie or zookeeper?
2018-12-11 19:59:52 UTC - David Kjerrumgaard: @Ryan Samo Look at this on how to clear out
the data on the bookies as well (if you are interested)  <https://bookkeeper.apache.org/docs/latest/admin/bookies/#formatting>
2018-12-11 20:00:19 UTC - Sanjeev Kulkarni: @Shalin that is not yet possible. Python functions
are run invoking the python command thats present on the system. Both python2 and python3
are supported.
2018-12-11 20:02:09 UTC - Shalin: Gotcha. Thanks . :thumbsup:
2018-12-11 20:08:49 UTC - Ryan Samo: Thanks @David Kjerrumgaard !
2018-12-11 20:21:14 UTC - David Kjerrumgaard: @Ryan Samo Are you asking where the role mapping
is physically stored?
2018-12-11 20:22:21 UTC - Tobias Gustafsson: Is there any way to get the last message id of
a topic? I want to be able to, later, use this to position a Reader instance to get messages
from that ID and forward. I know that I can instantiate the Reader with `pulsar.LatestMessage`
but that does not help since I want to know the actual message ID.
2018-12-11 20:23:26 UTC - Ryan Samo: @David Kjerrumgaard yeah I was just wondering how the
roles are stored in case we do a reset
2018-12-11 20:25:03 UTC - David Kjerrumgaard: @Ryan Samo I am not sure, but I believe they
are part of the cluster metadata kept on ZK. So you will have to re-execute the pulsar-admin
commands to create those mappings...AFAIK.
2018-12-11 20:25:25 UTC - Ryan Samo: Cool thanks 
2018-12-11 20:26:00 UTC - David Kjerrumgaard: for the proxy user ONLY, they are kept in the
broker.conf, <http://pulsar.apache.org/docs/en/security-authorization.html#proxy-roles>
 But I don't think that is what you are looking for
2018-12-11 20:28:36 UTC - Ryan Samo: Nah I was looking to see how Pulsar stores references
to the roles, on bookie or zookeeper, etc
2018-12-11 20:31:31 UTC - Tobias Gustafsson: One more question: Is the REST API response content
documented anywhere? <https://pulsar.apache.org/docs/latest/reference/RestApi/> only
seem to contain the URL and response codes. Perhaps I'm missing something?
2018-12-11 20:31:35 UTC - David Kjerrumgaard: @Tobias Why not use that reader and then call
the getNext() method and then call getMessageId() on the returned message?  It is definitely
a hack, but it SHOULD work
2018-12-11 20:33:43 UTC - David Kjerrumgaard: @Tobias We have that for the admin REST API,
if that helps.... <https://pulsar.apache.org/en/admin-rest-api/>
2018-12-11 20:33:50 UTC - Tobias Gustafsson: @David Kjerrumgaard What should I use as initial
message ID for the reader then? If I use `LatestMessage` getNext() may not return
2018-12-11 20:36:10 UTC - Tobias Gustafsson: anything as long as there are no more messages
produced on that topic
2018-12-11 20:36:12 UTC - David Kjerrumgaard: readNext(int timeout, TimeUnit unit) would at
least prevent a hanging process
2018-12-11 20:36:23 UTC - David Kjerrumgaard: but I see what you are saying
2018-12-11 20:38:35 UTC - Tobias Gustafsson: As a backup option, is there any way to get the
message id of a message I've produced? That way I could at least keep track of the last message
ID successfully commited even though I would have to do it outside of Pulsar.
2018-12-11 20:40:13 UTC - Tobias Gustafsson: @David Kjerrumgaard Thanks for the docs link,
that was what I was searching for!
2018-12-11 20:40:17 UTC - David Kjerrumgaard: @Tobias Yes, every call to send(T message) returns
a MessageId
2018-12-11 20:40:24 UTC - David Kjerrumgaard: <http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Producer.html#send-T->
2018-12-11 20:42:03 UTC - David Kjerrumgaard: If you are sending messages asynchronously,
the completable future returns the messageId  <http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/Producer.html#sendAsync-org.apache.pulsar.client.api.Message->
2018-12-11 20:42:15 UTC - Tobias Gustafsson: I'm using the Go client, it seems to be missing
that functionality from what I can tell.
2018-12-11 20:42:40 UTC - David Kjerrumgaard: And you will ONLY get a messageId IF the message
has been committed to disk
2018-12-11 20:42:47 UTC - David Kjerrumgaard: Ah
2018-12-11 20:43:59 UTC - David Kjerrumgaard: @Tobias Can you please file a feature request
for these methods to be added?
2018-12-11 20:43:59 UTC - David Kjerrumgaard: <https://github.com/apache/pulsar/issues>
2018-12-11 20:44:31 UTC - David Kjerrumgaard: That way you can track the progress, and will
be notified when they are ready
2018-12-11 20:45:25 UTC - Tobias Gustafsson: OK, thanks!
2018-12-11 21:00:41 UTC - Cristian: Is there a roadmap somewhere?
2018-12-11 22:14:30 UTC - Dave Southwell: Can anyone give me some guidance on how to use Basic
Auth with Pulsar?  I saw in some slack history and in github that there is support for it,
but there isn't any documentation that I've found.
2018-12-11 23:14:27 UTC - Mike Card: Hey @David Kjerrumgaard I ran this test and the producer
says the original object size is 64 bytes but the consumer won't deserialize it, still gets
a buffer underflow. This is going to take more investigation, I am beginning to believe something
about the Pulsar byte array serializer is different from the Kafka ByteBuffer serializer
2018-12-11 23:15:56 UTC - Mike Card: Fundamentally it seems to me they should be the same,
but something in the serde process is making the consumer unhappy under heavy load conditions.
I am starting to think this is related to using multiple tasks somehow, that is going to be
my next test
2018-12-11 23:17:03 UTC - David Kjerrumgaard: which SerDe are you using?
2018-12-11 23:20:20 UTC - Mike Card: We are using our own serde to turn an object into a byte
array , the call to publish looks like this:

<http://_log.info|_log.info>("DefaultDatabus.onUpdateIntent: publishing message " +
ref.toString() + ", size in bytes == " + UpdateRefSerializer.toByteBuffer(ref).array().length);
-&gt; {});
2018-12-11 23:21:38 UTC - Mike Card: 
2018-12-11 23:21:51 UTC - Mike Card: This is the custom serializer
2018-12-11 23:26:51 UTC - Mike Card: It works fine with an identical app setup using Kafka
in lieu of Pulsar, I think something changes with the serialization though somehow.
2018-12-11 23:27:26 UTC - Mike Card: I was looking at the statics in that serde but I don't
*think* they are the culprit since they don't cause problems with Kafka.
2018-12-11 23:27:51 UTC - Mike Card: (I did not write that serde so)
2018-12-11 23:36:05 UTC - David Kjerrumgaard: I will get with the engineering team to see
if they have any ideas.
2018-12-12 00:22:39 UTC - Ali Ahmed: @Dave Southwell are you looking for basic auth with pulsar
client or http access ?
2018-12-12 00:23:54 UTC - Dave Southwell: Hi @Ali Ahmed I'm looking for basic auth over http.
2018-12-12 00:26:47 UTC - Matteo Merli: @Dave Southwell I’d say that at this point the best
option is to look at the unit test contained in that PR for how to use the basic auth
2018-12-12 00:30:33 UTC - Matteo Merli: Essentially, you’d have to pass an `.htpasswd` file
(create with regular HTTP server tools) in the broker.

On the client, when you pass the authentication, you pass `org.apache.pulsar.client.impl.auth.AuthenticationBasic`
and the auth param string will be something like : `userId:MY_USER,password:MY_PASSWORD`
2018-12-12 00:31:51 UTC - Dave Southwell: Ok.  I'd gleaned that .htpasswd was what was used
to pass in the username and hashed password for valid users.  I assume the usernames should
match then to roles in Pulsar as well.
2018-12-12 00:32:23 UTC - Matteo Merli: Yes, that will be the same authorization part that
is common to all the authentication plugins
+1 : Dave Southwell
2018-12-12 00:33:29 UTC - Matteo Merli: In any case, for next release 2.3 we’ve added a
better way to perform simple authentication based on JWT tokens: <http://pulsar.apache.org/docs/en/next/security-token-client/>
2018-12-12 00:34:28 UTC - Dave Southwell: Looks good!
2018-12-12 04:58:20 UTC - VendyLuo: @VendyLuo has joined the channel
View raw message