pulsar-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Pulsar Slack" <apache.pulsar.sl...@gmail.com>
Subject Slack digest for #general - 2018-12-11
Date Tue, 11 Dec 2018 09:11:03 GMT
2018-12-10 09:43:59 UTC - Maarten Tielemans: morning all. looking at "scaling" my single node
Pulsar setup to a multi node setup, however most documentation I find immediately speaks about
6 nodes/VM's. would it also be possible to currently deploy Pulsar on 2 i3.xlarge nodes? (and
have zookeeper, bookkeeper and pulsar run on each)
----
2018-12-10 10:03:56 UTC - Sijie Guo: @Maarten Tielemans: I just discussed with @richardliu
above - you can start with deploy Pulsar to one (or small number of nodes) and expand later.
I made some changes to the deployment documentation. <https://github.com/apache/pulsar/pull/3152/files>
----
2018-12-10 10:24:30 UTC - Maarten Tielemans: Thanks @Sijie Guo
----
2018-12-10 10:41:59 UTC - Christophe Bornet: OK. So the fallback can only be done manually
?
----
2018-12-10 11:01:37 UTC - David Tinker: What happens if `Consumer.acknowledgeAsync()` (Java
client) fails? Does it get retried or something? Or should I handle that myself somehow?
----
2018-12-10 11:04:11 UTC - David Tinker: It would be nice if there was more Apache Pulsar on
Stackoverflow. I could go post my question there if you like? Would probably be good for the
project if that was the preferred way to ask questions, then post link in this channel.
+1 : jia zhai, Sijie Guo
----
2018-12-10 11:06:56 UTC - jia zhai: @David Tinker You may need handle that yourself. There
is currently no ack status kept.
----
2018-12-10 11:08:21 UTC - David Tinker: Tx. Should I retry a few times and toss my consumer
and re-connect if that doesn't work?
----
2018-12-10 11:12:39 UTC - jia zhai: usually, if not acked successfully, after acktimeout,
it will get redelivered to consumer
----
2018-12-10 11:14:07 UTC - Sijie Guo: I think Stackoverflow is also preferred. that would also
generally good for sharing the knowledges. people from the community is also monitoring Stackoverflow
as well
----
2018-12-10 11:14:34 UTC - David Tinker: Ok. So it is probably sufficient to just consider
async acked messages to be acked immediately as they will be re-delivered later in any case.
I am counting "messages in flight" for flow control purposes.
----
2018-12-10 11:16:36 UTC - Ivan Kelly: I'd use the debezium connector rather than rolling my
own solution for bringing data from mysql to pulsar. I think it's available in master now,
so will be in a release in the next month or so
----
2018-12-10 11:17:00 UTC - David Tinker: <https://stackoverflow.com/questions/53704514/how-should-apache-pulsar-consumer-acknowledgeasync-failure-be-handled>
----
2018-12-10 12:20:23 UTC - Maarten Tielemans: If you were to use multiple bookkeepers, how
many bookkeepers would need to ack a produced message before a consumer would receive it?
----
2018-12-10 12:24:56 UTC - Sijie Guo: @Maarten Tielemans :

it is configurable.

you can configure it at `conf/broker.conf`:

```
# Number of bookies to use when creating a ledger
managedLedgerDefaultEnsembleSize=&lt;replicas&gt;

# Number of copies to store for each message
managedLedgerDefaultWriteQuorum=&lt;replicas&gt;

# Number of guaranteed copies (acks to wait before write is complete)
managedLedgerDefaultAckQuorum=&lt;replicas&gt;
```

you can configure the replication settings per namespace via `bin/pulsar-admin namespaces
set-persistence`
----
2018-12-10 13:24:05 UTC - Ezequiel Lovelle: @Ezequiel Lovelle has joined the channel
----
2018-12-10 13:33:12 UTC - Christophe Bornet: The `unload` command does indeed work. @Matteo
Merli Can you give more info on what this command does internally ? Is it safe to execute
periodically, eg. in a cron to ensure an automatic fallback after some time ?
----
2018-12-10 14:31:13 UTC - Samuel Sun: <https://builds.apache.org/job/pulsar_precommit_java8/5237/console>
----
2018-12-10 14:31:45 UTC - Samuel Sun: can I rerun this jenkins job ? could fail due to other
reasons, not pr itself.
----
2018-12-10 14:32:06 UTC - Samuel Sun: <https://github.com/apache/pulsar/pull/3151>
----
2018-12-10 14:41:49 UTC - Matteo Merli: You can comment `run java8 tests` to have Jenkins
to start again
----
2018-12-10 14:42:14 UTC - Matteo Merli: On the PR itself 
----
2018-12-10 14:42:53 UTC - Samuel Sun: sure
----
2018-12-10 14:43:25 UTC - Samuel Sun: nice
----
2018-12-10 14:44:41 UTC - Maarten Tielemans: Following the deploy on bare metal guide (<https://pulsar.apache.org/docs/en/deploy-bare-metal/>),
with the change that I try to run zookeeper, bookkeeper and pulsar on the same node

I started two nodes/instances of zookeeper and initialised the cluster metadata
However, when I try to start bookkeeper I receive the following error
----
2018-12-10 14:44:50 UTC - Maarten Tielemans: 
----
2018-12-10 14:48:12 UTC - Matteo Merli: Unload will trigger the current brokers to do graceful
close of the topics and then release the ownership. Topic will be automatically reassigned
to a new broken based on load and current constraints. 

The only downside of it is the latency blip perceived by clients during the failover 
----
2018-12-10 14:53:21 UTC - Christophe Bornet: So what is your recommendation ? Should we monitor
for failover and when primary brokers come back to live, ask for an unload ? Or maybe do an
automatic unload each time we detect a broker does an unactive to active transition ?
----
2018-12-10 14:53:59 UTC - Christophe Bornet: Shouldn't Pulsar do it by itself ideally ?
----
2018-12-10 15:04:08 UTC - Grégory Guichard: Hi,  there is a limit of concurrent connection
on a pulsar broker ? My broker doesn't accept new connection after 10 000
----
2018-12-10 15:23:44 UTC - Rohit Rajan: @Rohit Rajan has joined the channel
----
2018-12-10 16:18:07 UTC - Mike Card: @Matteo Merli Have you guys ever run a test like this
on pulsar, i.e. two parallel producers calling the synchronous send() API as fast as possible,
both publishing to the same partitioned topic (in my test there were 48 partitions) which
is being consumed downstream by 2 tasks each running a shared subscription to consume the
topic, each using synchronous receives?
----
2018-12-10 17:08:01 UTC - Matteo Merli: @Grégory Guichard There is no artificial limit. Have
you checked the OS file-descriptors limit for the process?
----
2018-12-10 17:14:52 UTC - Matteo Merli: What is your use case for isolation exactly? The case
for primary/secondary was a bit complicated to begin with. In general, at Yahoo we have few
namespaces isolated to a subset of brokers (set as the “primary”), with fallback to the
general pool (secondary set to “.*“) in case all the brokers from primary were unavailable
----
2018-12-10 17:16:50 UTC - Christophe Bornet: I'm testing rack aware placement and seeing unexpected
behavior. I publish on a topic with E=2,Qw=2,Qa=2 (standard config). I have 2 bookies on rack
"eu" and 2 on rack "us". When I start publishing one bookie of each rack gets used. When I
stopped a bookie from  "eu", I expected it would have been replaced by another bookie of the
same rack but a bookie from "us" got used instead. Am I missing something ?
----
2018-12-10 17:17:57 UTC - Matteo Merli: Can you check in the broker logs if the rack info
has been picked up correctly?
----
2018-12-10 17:18:34 UTC - Matteo Merli: It should print something there when it gets the bookies
racks
----
2018-12-10 17:23:28 UTC - Christophe Bornet: ```
bin/pulsar-admin --admin-url <http://pulsar1-eu:8080> bookies racks-placement
{
  "default" : {
    "bk1-eu" : {
      "rack" : "eu"
    },
    "bk2-eu" : {
      "rack" : "eu"
    },
    "bk1-us" : {
      "rack" : "us"
    },
    "bk2-us" : {
      "rack" : "us"
    },
    "pulsar1-eu" : {
      "rack" : "eu"
    },
    "pulsar2-eu" : {
      "rack" : "eu"
    }
  }
}
```
----
2018-12-10 17:23:37 UTC - Christophe Bornet: Is that the info ?
----
2018-12-10 17:24:14 UTC - Matteo Merli: In broker logs, it should print info regarding the
racks for each bookie
----
2018-12-10 17:25:16 UTC - Christophe Bornet: OK. That's DEBUG info?
----
2018-12-10 17:25:20 UTC - Matteo Merli: In any case, I think the bookie address should include
the port as well: `bk1-eu:3181`

And make sure `bk1-eu` is the same address advertised by bookies
----
2018-12-10 17:25:56 UTC - Christophe Bornet: oh ! probably !
----
2018-12-10 17:26:28 UTC - Christophe Bornet: I'm using docker images to run the cluster
----
2018-12-10 17:26:35 UTC - Christophe Bornet: with docker compose
----
2018-12-10 17:26:59 UTC - Matteo Merli: I see, in any case try adding the port as well
----
2018-12-10 17:27:13 UTC - Christophe Bornet: how do I activate the logs fro broker rack ?
----
2018-12-10 17:28:12 UTC - Matteo Merli: It would be automatically printed as info logs
----
2018-12-10 17:29:07 UTC - Matteo Merli: when a broker discover bookies there will be one line
about that. if if refers to “/default”  rack … it means it has not picked up the configured
rack info
----
2018-12-10 17:29:16 UTC - Matteo Merli: Take a look at this unit test:  <https://github.com/apache/pulsar/blob/master/pulsar-zookeeper-utils/src/test/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMappingTest.java#L72>
----
2018-12-10 17:32:49 UTC - Christophe Bornet: Indeed port is missing
----
2018-12-10 17:33:18 UTC - Christophe Bornet: I have very few logs in docker logs INFO
----
2018-12-10 17:33:57 UTC - Christophe Bornet: ```
[conf/broker.conf] Applying config clusterName = test
[conf/broker.conf] Applying config configurationStoreServers = zk:2181
[conf/broker.conf] Applying config zookeeperServers = zk
2018-12-09 22:59:53,519 CRIT Supervisor running as root (no user in config file)
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/bookie.conf" during
parsing
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/broker.conf" during
parsing
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/functions_worker.conf"
during parsing
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/global-zk.conf"
during parsing
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/local-zk.conf" during
parsing
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/presto_worker.conf"
during parsing
2018-12-09 22:59:53,519 INFO Included extra file "/etc/supervisord/conf.d/proxy.conf" during
parsing
2018-12-09 22:59:53,526 INFO RPC interface 'supervisor' initialized
2018-12-09 22:59:53,527 CRIT Server 'unix_http_server' running without any HTTP authentication
checking
2018-12-09 22:59:53,527 INFO supervisord started with pid 1
2018-12-09 22:59:54,529 INFO spawned: 'broker' with pid 17
2018-12-09 22:59:56,520 INFO success: broker entered RUNNING state, process has stayed up
for &gt; than 1 seconds (startsecs)

```
----
2018-12-10 17:34:21 UTC - Christophe Bornet: That's all I got from a broker spawned yesterday...
----
2018-12-10 17:34:22 UTC - Maarten Tielemans: @Sijie Guo In case you run both zookeeper and
bookie on same node, the prometheus ports conflict. You may want to highlight that in the
documentation
----
2018-12-10 17:40:30 UTC - Matteo Merli: It should have printed a lot more info logs. Are you
starting with supervisord ?
----
2018-12-10 17:40:45 UTC - Matteo Merli: That would collect logs under /var/logs/supervisor/…
----
2018-12-10 17:42:49 UTC - Christophe Bornet: I'm starting this way
```
  pulsar1-eu:
    hostname: pulsar1-eu
    image: apachepulsar/pulsar-test-latest-version:latest
    command: bin/run-broker.sh
    environment:
      clusterName: test
      zookeeperServers: zk
      configurationStoreServers: zk:2181
    networks:
      pulsar:
        ipv4_address: 172.22.0.16
```
----
2018-12-10 17:43:36 UTC - Christophe Bornet: Yes it seems to be started by supervisord
----
2018-12-10 17:43:44 UTC - Matteo Merli: Oh I see. This is the “test” image that we use
for integration tests. I would recommend to use the official image `apachepulsar/pulsar`
----
2018-12-10 17:45:04 UTC - Matteo Merli: with command:
```
bin/apply-config-from-env.py conf/broker.conf &amp;&amp;
bin/apply-config-from-env.py conf/pulsar_env.sh &amp;&amp;
bin/pulsar broker
```
----
2018-12-10 17:48:25 UTC - Ryan Samo: Hey guys,
I am trying to get the websocket proxy to work with authentication and authorization but I
keep running into trouble. I have a root ca and certs in place that work today via the java
client to a Pulsar proxy to the brokers, all working with no problem. I granted consume and
produce to the client cert cname as well as the proxy cert cname. If I start a WebSocket proxy
and attempt to use the same proxy cert I get the following error:

failed to get Partitioned metadata : Valid Proxy Client role should be provided for getPartitionMetadataRequest

Can you please guide me to what might be wrong since the grants and certs are the same but
only the websockets proxy has an issue?

Thanks!
----
2018-12-10 17:48:37 UTC - Christophe Bornet: OK will do. Shouldn't I add `bin/watch-znode.py
-z $zookeeperServers -p /initialized-$clusterName -w` before ?
----
2018-12-10 17:51:35 UTC - Matteo Merli: That I think was to wait for that z-node to be created
when creating a new cluster
----
2018-12-10 17:51:55 UTC - Matteo Merli: Not sure it’s strictly required in general deployment
----
2018-12-10 17:52:12 UTC - Ryan Samo: Oh and it shows in the logs that the client cert is being
seen by the brokers which is good, it does say the originalPrincipal is null.
----
2018-12-10 17:57:20 UTC - Christophe Bornet: Yes. My docker-compose is for tests only
----
2018-12-10 17:58:21 UTC - Christophe Bornet: ```
17:38:22.527 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping
- Reloading the bookie rack affinity mapping cache.
17:38:22.554 [zk-cache-callback-executor-OrderedExecutor-3-0] WARN  org.apache.pulsar.zookeeper.ZooKeeperDataCache
- Reloading ZooKeeperDataCache failed at path: /bookies
java.lang.RuntimeException: java.net.UnknownHostException: bk1-eu
	at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$1(ZkBookieRackAffinityMapping.java:128)
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	at java.util.LinkedHashMap.forEach(LinkedHashMap.java:684) ~[?:1.8.0_181]
	at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$0(ZkBookieRackAffinityMapping.java:123)
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	at java.util.TreeMap.forEach(TreeMap.java:1005) ~[?:1.8.0_181]
	at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.deserialize(ZkBookieRackAffinityMapping.java:122)
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.deserialize(ZkBookieRackAffinityMapping.java:1)
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	at org.apache.pulsar.zookeeper.ZooKeeperCache.lambda$9(ZooKeeperCache.java:325) ~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:994)
~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:572) ~[org.apache.pulsar-pulsar-broker-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:508) ~[org.apache.pulsar-pulsar-broker-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
Caused by: java.net.UnknownHostException: bk1-eu
	at org.apache.bookkeeper.net.BookieSocketAddress.&lt;init&gt;(BookieSocketAddress.java:55)
~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2]
	at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping$2.lambda$1(ZkBookieRackAffinityMapping.java:125)
~[org.apache.pulsar-pulsar-zookeeper-utils-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT]
	... 9 more
```
So the wrong entries make the ZkBookieRackAffinityMapping fail completely. Maybe they could
be just ignored ?
----
2018-12-10 17:58:38 UTC - Christophe Bornet: I'll remove them for now
----
2018-12-10 18:05:43 UTC - Christophe Bornet: What is the use of `hostname` in racks-info ?
Do I have to put it or is it guessed from the bookie name if not present ?
----
2018-12-10 18:06:27 UTC - Matteo Merli: The hostname is there for info purposes. It can be
helpful if bookies are advertising just IP addresses
----
2018-12-10 18:07:13 UTC - Matteo Merli: Regarding the DNS error. The problem is that BookieSocketAddress
is creating a InetSocketAddress in the constructor and failing the DNS
----
2018-12-10 18:10:46 UTC - Christophe Bornet: I've removed the wrong entries and put the ones
with the ports. Now I see the policy correctly loaded.
But in the logs I still don't see which bookies are effectively selected
----
2018-12-10 18:13:56 UTC - Christophe Bornet: ```
18:00:50.892 [zk-cache-callback-executor-OrderedExecutor-3-0] INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping
- Bookie rack info updated to {default={bk2-eu:3181=BookieInfo(rack=eu, hostname=null), bk1-eu:3181=BookieInfo(rack=eu,
hostname=null), bk2-us:3181=BookieInfo(rack=us, hostname=null), bk1-us:3181=BookieInfo(rack=us,
hostname=null)}}. Notifying rackaware policy.
18:06:02.040 [pulsar-web-358] INFO  org.eclipse.jetty.server.RequestLog - 172.22.0.1 - - [10/Dec/2018:18:06:02
+0000] "GET /admin/v2/persistent/public/default/test9003/partitions HTTP/1.1" 200 16 "-" "Pulsar-Java-v2.3.0-SNAPSHOT"
1
18:06:02.147 [pulsar-1-5] INFO  org.eclipse.jetty.server.RequestLog - 172.22.0.1 - - [10/Dec/2018:18:06:02
+0000] "GET /lookup/v2/topic/persistent/public/default/test9003 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.3.0-SNAPSHOT"
20
18:06:02.185 [pulsar-1-4] INFO  org.apache.pulsar.broker.namespace.OwnershipCache - Trying
to acquire ownership of public/default/0x40000000_0x50000000
18:06:02.203 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  org.apache.pulsar.broker.namespace.OwnershipCache
- Successfully acquired ownership of /namespace/public/default/0x40000000_0x50000000
18:06:02.204 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  org.eclipse.jetty.server.RequestLog
- 172.22.0.1 - - [10/Dec/2018:18:06:02 +0000] "GET /lookup/v2/topic/persistent/public/default/test9003?authoritative=true
HTTP/1.1" 200 166 "-" "Pulsar-Java-v2.3.0-SNAPSHOT" 27
18:06:02.206 [pulsar-1-15] INFO  org.apache.pulsar.broker.PulsarService - Loading all topics
on bundle: public/default/0x40000000_0x50000000
18:06:02.209 [pulsar-ordered-OrderedExecutor-5-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl
- Opening managed ledger public/default/persistent/test9003
18:06:02.227 [pulsar-ordered-OrderedExecutor-0-0-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp
- Ensemble: [172.22.0.14:3181, 172.22.0.15:3181] for ledger: 1057
18:06:02.229 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl
- [public/default/persistent/test9003] Created ledger 1057
18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.persistent.DispatchRateLimiter
- [<persistent://public/default/test9003>] [null] setting message-dispatch-rate DispatchRate{dispatchThrottlingRateInMsg=0,
dispatchThrottlingRateInByte=0, ratePeriodInSecond=1}
18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.persistent.DispatchRateLimiter
- [<persistent://public/default/test9003>] [null] configured message-dispatch rate at
broker DispatchRate{dispatchThrottlingRateInMsg=0, dispatchThrottlingRateInByte=0, ratePeriodInSecond=1}
18:06:02.232 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.service.BrokerService
- Created topic <persistent://public/default/test9003> - dedup is disabled
18:06:02.240 [bookkeeper-ml-workers-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.PulsarService
- Loaded 1 topics on public/default/0x40000000_0x50000000 -- time taken: 0.032 seconds
18:06:02.400 [pulsar-io-21-6] INFO  org.apache.pulsar.broker.service.ServerCnx - New connection
from /172.22.0.1:39696
18:06:02.422 [pulsar-io-21-6] INFO  org.apache.pulsar.broker.service.ServerCnx - [/172.22.0.1:39696][<persistent://public/default/test9003>]
Creating producer. producerId=0
18:06:02.424 [ForkJoinPool.commonPool-worker-2] INFO  org.apache.pulsar.broker.service.ServerCnx
- [/172.22.0.1:39696] Created new producer: Producer{topic=PersistentTopic{topic=<persistent://public/default/test9003>},
client=/172.22.0.1:39696, producerName=test-4-174, producerId=0}
```
----
2018-12-10 18:19:32 UTC - Christophe Bornet: My use case is for a multi-region cluster where
I want to avoid cross-dc bandwidth in "normal" mode with fallback to producing/consuming on
the other dc in case of failure of a dc
----
2018-12-10 18:20:47 UTC - Christophe Bornet: Note that if a dc fails, production/consumption
on that dc is probably also OOO but at least no data is lost thanks to sync replication of
bookies on the other dc
----
2018-12-10 18:21:44 UTC - Christophe Bornet: There could also be use cases where we would
shutdown brokers on one region for maintenance purpose and have the traffic go to the other
region for some time
----
2018-12-10 18:22:11 UTC - Christophe Bornet: And at the end of the maintenance we would like
to have the trafic go back to the local brokers
----
2018-12-10 18:28:34 UTC - Matteo Merli: Got it. In all these cases I think it would be preferable
to have “manual” triggered failover, if this is meant for these special conditions (eg:
failback into one DC or planned maintenance)
----
2018-12-10 18:31:03 UTC - David Kjerrumgaard: @Ryan Samo This is the code block that is throwing
the exception you are seeing. If the originalPrincipal is indeed null, then the call to invalidOriginalPrincipal
will return true, and cause the exception to be thrown.
----
2018-12-10 18:33:47 UTC - David Kjerrumgaard: @Ryan Samo FYI.....This is the logic for the
invalidOriginalPrincipal method
----
2018-12-10 18:35:44 UTC - Ryan Samo: Yeah I saw the code block and also the invalid block
but I guess I’m confused as to what the “originalPrincipal” really was and why it would
ever be null? It’s like the client cert makes it to the brokers but not the proxy cert?
----
2018-12-10 18:37:41 UTC - David Kjerrumgaard: @Ryan Samo Can you step through the above code
block in a debugger? That would help us identify the issue
----
2018-12-10 18:38:48 UTC - David Kjerrumgaard: Do you see a log message similar to the following?
 <http://log.info|log.info>("[{}] Client successfully authenticated with {} role {}
and originalPrincipal {}", remoteAddress, authMethod, authRole, originalPrincipal);
----
2018-12-10 18:38:56 UTC - Ryan Samo: Sure, let me give it a shot and maybe it’ll stand out
to me. 
----
2018-12-10 18:39:34 UTC - Ryan Samo: Yup I sure did, it said it was ok, the client cert gave
that message
----
2018-12-10 18:40:06 UTC - David Kjerrumgaard: but the value for originalPrincipal was `null`
, correct?
----
2018-12-10 18:40:50 UTC - Ryan Samo: Let me take another look
----
2018-12-10 18:44:58 UTC - Christophe Bornet: Yes, probably.
----
2018-12-10 18:45:10 UTC - Ryan Samo: “Client successfully authenticated with tls role websocket
and originalPrincipal null”
----
2018-12-10 18:45:25 UTC - Ryan Samo: That’s in the broker
----
2018-12-10 18:47:22 UTC - Ryan Samo: The websocket proxy shows “Authenticated WebSocket
client devmclient1 on topic <persistent://testtenant/ns1/testtopic> “
----
2018-12-10 18:48:07 UTC - Ryan Samo: My certs are “websocket” and “devmclient1”
----
2018-12-10 18:50:10 UTC - David Kjerrumgaard: and when you connect with the java client, which
cert do you use?  Maybe it is worth trying connecting with “websocket” cert via the java
client just to see if that works?
----
2018-12-10 18:51:13 UTC - David Kjerrumgaard: just to rule out the cert as the issue, and
isolate it to the websocket proxy
----
2018-12-10 18:52:07 UTC - Ryan Samo: Gotcha, ok let me try that
----
2018-12-10 19:09:54 UTC - Ryan Samo: Ok, so on my java client path I use a Pulsar proxy with
the cert named proxy and the client cert is named devmclient1. If I use those together it
works fine. If I swap the client devmclient1 cert to the WebSocket cert I get 

Client successfully authenticated with tls role proxy and originalPrincipal websocket
14:05:20.536 [pulsar-io-21-15] WARN  org.apache.pulsar.broker.service.ServerCnx - [/] Valid
Proxy Client role should be provided for lookup  with role proxy and proxyClientAuthRole websocket
on topic
----
2018-12-10 19:10:30 UTC - Ryan Samo: Also I I try to use the proxy cert on the client I get
the same error
----
2018-12-10 19:10:36 UTC - Ryan Samo: If 
----
2018-12-10 19:20:14 UTC - David Kjerrumgaard: @Ryan Samo So there appears to be an issue with
the webSocket cert.
----
2018-12-10 19:20:55 UTC - Ryan Samo: Ok, let me generate a new cert and try it once more
----
2018-12-10 19:21:00 UTC - Ryan Samo: Thanks!
----
2018-12-10 19:21:08 UTC - David Kjerrumgaard: no problem.....good luck!!
----
2018-12-10 20:42:47 UTC - Ben Devore: @Ben Devore has joined the channel
----
2018-12-10 20:45:17 UTC - Thor Sigurjonsson: @Thor Sigurjonsson has joined the channel
----
2018-12-10 21:06:59 UTC - Christophe Bornet: I'm still not seeing bookie selection info in
logs. Any hint ?
----
2018-12-10 21:08:52 UTC - Emma Pollum: I'm running into issues creating a function in Pulsar.
When I try to create it, I get `Function worker service is not done initializing. Please try
again in a little while.`
----
2018-12-10 21:09:05 UTC - Emma Pollum: My pulsar cluster has been running for a few days though....
----
2018-12-10 21:27:50 UTC - Emma Pollum: Is there a seperate pulsar-functions service that needs
to be launched?
----
2018-12-10 21:38:55 UTC - Mike Card: @Matteo Merli Oh and I had the message routing mode on
the producers set to round robin and message batching set to true as well
----
2018-12-10 21:56:40 UTC - David Kjerrumgaard: @Emma Pollum No, there isn't a separate service
that needs launched.  Can you scan you Pulsar Broker log files for any errors / entries related
to the function worker service?
----
2018-12-10 22:00:50 UTC - Emma Pollum: I think I found the issue, it looks lke you need to
set up the bookkeeper conf file to enable function worker
+1 : David Kjerrumgaard
----
2018-12-10 22:01:04 UTC - Emma Pollum: <https://pulsar.apache.org/docs/fr/deploy-bare-metal/#enabling-pulsar-functions-optional>
----
2018-12-10 22:09:31 UTC - Matteo Merli: &gt; 18:00:50.892 [zk-cache-callback-executor-OrderedExecutor-3-0]
INFO  org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping - Bookie rack info updated to
{default={bk2-eu:3181=BookieInfo(rack=eu, hostname=null), bk1-eu:3181=BookieInfo(rack=eu,
hostname=null), bk2-us:3181=BookieInfo(rack=us, hostname=null), bk1-us:3181=BookieInfo(rack=us,
hostname=null)}}. Notifying rackaware policy.

That’s a start. I think there should be some other message at some point, though I don’t
remember the exact format.

Does the rack-aware policy work now after you kill one of the bookies? Before, without the
:3181 for sure it wasn’t being picked up
----
2018-12-10 22:40:07 UTC - Christophe Bornet: It still doesn't work because `ZkBookieRackAffinityMapping.getRack()`
gets called with an address in the form `bk1-eu` without the port and `racksWithHost` has
keys with the port
----
2018-12-10 22:46:11 UTC - Emma Pollum: What is the best way to get a list of all the bookies
in the cluster?
----
2018-12-10 22:56:06 UTC - Christophe Bornet: It seems that `getZkBookieRackMappingCache` 
tried to update `racksWithHost` with the correct keys but the reference used by `getRack`
is still the old one.
----
2018-12-10 23:01:23 UTC - David Kjerrumgaard: Are you running in a K8s environment?
----
2018-12-10 23:13:51 UTC - Matteo Merli: `bookkeeper shell listbookies -readwrite`
----
2018-12-10 23:20:00 UTC - Christophe Bornet: There should be a workaround for the `racksWithHost`
but it doesn't seem to work for me : <https://github.com/apache/pulsar/blob/v2.2.0/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMapping.java#L118>
----
2018-12-10 23:50:21 UTC - Christophe Bornet: @Matteo Merli it works if I return `racksWithHost`
instead of `racks` at <https://github.com/apache/pulsar/blob/v2.2.0/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZkBookieRackAffinityMapping.java#L134>
. I think it's a bug. Do I do a PR ?
----
2018-12-11 00:41:02 UTC - Mike Card: @Matteo Merli repeated my test using the asynchronous
send API, I would guess the synchronous send API is doing exactly what I am doing here:

retryEventProducer.sendAsync(newRetryRefBuffer.array()).thenAccept(msgId -&gt; {});
----
2018-12-11 00:42:15 UTC - Mike Card: I still get the same 64-byte message truncation I was
seeing before, if send() is just calling the asyncSend() API then perhaps there is a problem
queuing messages in the send queue under very high (say 15 KHz) write rates
----
2018-12-11 00:44:21 UTC - Mike Card: @Matteo Merli when I switched to the asynchronous send
API I set block if queue full to true on all the producers.
----
2018-12-11 01:46:16 UTC - Harry Rickards: @Harry Rickards has joined the channel
----
2018-12-11 06:58:15 UTC - Cristian: @Cristian has joined the channel
----
2018-12-11 07:03:03 UTC - Cristian: Hello people!

I'm trying to understand Pulsar's schema registry and seeing how it compares with the one
that confluent developed for Kafka. I don't see in the docs whether Pulsar's supports configuring
evolution compatibility modes for Avro schemas (this is what I mean <https://docs.confluent.io/current/avro.html#avro-backward-compatibility>)
----
2018-12-11 07:13:17 UTC - Sijie Guo: @Cristian I think the evolution compatibility modes will
be supported in the upcoming 2.3.0 release. it is not yet supported in 2.2.0.
----
2018-12-11 07:44:47 UTC - Ivan Kelly: @Sijie Guo we need to document that. there's very little
documentation on actually using schema
+1 : jia zhai
----
2018-12-11 09:10:16 UTC - 陈琳: @陈琳 has joined the channel
----
Mime
View raw message