pulsar-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Pulsar Slack" <apache.pulsar.sl...@gmail.com>
Subject Slack digest for #general - 2019-01-08
Date Tue, 08 Jan 2019 09:11:02 GMT
2019-01-07 09:11:04 UTC - Sijie Guo: Or check your bookie nodes to see if the bookies are running
or not 
----
2019-01-07 09:20:55 UTC - bossbaby: <https://gist.github.com/tuan6956/cf05fc21fa733b6ef92ce86923b56dde>
----
2019-01-07 09:21:12 UTC - bossbaby: please check help me
----
2019-01-07 09:28:49 UTC - Ali Ahmed: you only have one bookie it seems
----
2019-01-07 09:30:00 UTC - bossbaby: i found error and edit it,
Run successfull
+1 : bossbaby
----
2019-01-07 09:30:06 UTC - Ali Ahmed: ok
+1 : bossbaby
----
2019-01-07 09:31:39 UTC - bossbaby: "If you deploy Pulsar in a one-node cluster, you should
update the replication settings in conf/broker.conf to 1" has been described in document.
But default is 2, so fix it and run againt
----
2019-01-07 09:31:44 UTC - bossbaby: Thanks you all bro
----
2019-01-07 09:32:19 UTC - Ali Ahmed: if you need a one node cluster just use standalone mode
----
2019-01-07 09:32:28 UTC - Ali Ahmed: it will setup everything correctly
+1 : bossbaby
----
2019-01-07 09:55:00 UTC - Yuvaraj Loganathan: Right now we are thinking of one topic per customer
under an namespace. like topic name as `customer-data-&lt;customer-id&gt;` . The consumer
will consume using pattern subscription `customer-data-*`  Let say there are two topics matches
the subscription `customer-data-1` and `cusotmer-data-2` For every message i call an external
service. The external service may throttle lets say for `customer-data-1`. So When the external
service is throttles i would like to stop consuming the  message from `customer-data-1` for
some time and continue on `customer-data-2` topic data which is not throttles. In pulsar client
 if I don’t acknowledge for an topic`customer-data-1` topic and continuously acknowledge
for `customer-data-2` topic will I get data for customer-data-2 topic without getting blocked
?
----
2019-01-07 09:57:49 UTC - bossbaby: i don't know why do need 3 zookkeeper single in 3 vms
in tutorial "To run Pulsar on bare metal". i think 1 zk + 1 bk + 1 broker in 1 vms to enough
----
2019-01-07 11:35:36 UTC - Yuvaraj Loganathan: @Sijie Guo <https://github.com/apache/pulsar/issues/3317>
Any help would be highly appreciated. Our dev pipeline blocked because of this. Not able to
compile also.
----
2019-01-07 12:28:52 UTC - Yifan: @Yuvaraj Loganathan which version of python are you using.
3.6 doesn’t work
----
2019-01-07 12:35:10 UTC - Yuvaraj Loganathan: @Yifan Yes it is 3.6 :face_palm: Let me check
with 3.7
----
2019-01-07 12:38:26 UTC - Yuvaraj Loganathan: It works awesome! Thanks @Yifan
+1 : Yifan, Sijie Guo
----
2019-01-07 12:39:46 UTC - Yuvaraj Loganathan: Closed the issue.
----
2019-01-07 15:53:15 UTC - Grant Wu: @Sijie Guo B u m p.
----
2019-01-07 15:53:38 UTC - Grant Wu: Wait, why doesn’t the client work with Python 3.6?
----
2019-01-07 15:57:04 UTC - Grant Wu: Because Zookeeper is designed to run in a cluster/multi-server
setup to provide a voting quorum.  See <https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup>
----
2019-01-07 15:57:21 UTC - Grant Wu: If you want to run a standalone setup for development
purposes, `pulsar standalone` probably suffices for your need?
+1 : Matteo Merli
----
2019-01-07 16:09:35 UTC - Matteo Merli: For MacOS we only publish the wheel files for 2.7
and 3.7.  These are the versions of python coming with either macOS or homebrew 
----
2019-01-07 16:10:33 UTC - Matteo Merli: It would be nice to have an environment with all the
combinations, to use when doing releases 
----
2019-01-07 16:12:22 UTC - Grant Wu: Oh, so this doesn’t apply to Linux, okay
----
2019-01-07 16:26:31 UTC - Matteo Merli: Yes, in Linux we build in Docker containers and have
all combinations 
----
2019-01-07 16:30:37 UTC - Grant Wu: or @Matteo Merli do you think you could look into this?
:confused:
----
2019-01-07 16:37:14 UTC - Matteo Merli: Passing buck to @Jerry Peng ;)
----
2019-01-07 16:44:53 UTC - Romain Castagnet: Hi. When I activate SSL connexion on brokers,
I have this warning before an SSL handshake error : "org.apache.pulsar.broker.service.ServerCnx
- [/XX.XX.XX.XX:41818] Got exception TooLongFrameException : Adjusted frame length exceeds
5242880: 369295620 - discarded". Yesterday morning this error disappear and it seems to fall
in work. Since I tried to activate authentication, this error appear again. I don't understand
why. Did you have a similar problem ?
----
2019-01-07 16:53:08 UTC - Chris Miller: Is there any reason why ConsumerImpl.hasMessageAvailable()
is not part of the Consumer interface?
----
2019-01-07 16:56:20 UTC - Matteo Merli: Not technical reason, it’s more of semantics. Consumer
is the api to use a managed subscription, where the server knows and controls which messages
you’re consuming. In general application don’t have the need to know when they’re caught
up with the publishers
----
2019-01-07 16:57:50 UTC - Matteo Merli: On the contrary, the Reader is completely unmanaged.
A common use case is to create a reader to do a scan on the topic, starting from a given point
and up to “now”
----
2019-01-07 16:58:08 UTC - Chris Miller: I see, thanks. Maybe I'm looking at things the wrong
way then. I'd like to have a consumer that can read some history up until the most recent
message. Sounds like I need a Reader instead
----
2019-01-07 16:59:43 UTC - Chris Miller: I asked some related questions on Friday about this,
wondering when you might use Consumer.seek() vs Reader, and why Reader wasn't a super-interface
of Consumer
----
2019-01-07 17:00:59 UTC - Chris Miller: I don't suppose there's a "best practices" doc somewhere
detailing these sort of common patterns?
----
2019-01-07 17:02:38 UTC - Chris Miller: One thing that's missing from both Consumer and Reader
is seeking to a timestamp. The admin API has that via resetCursor(). I guess it's not an efficient
operation and therefore no so suitable for client use?
----
2019-01-07 17:06:30 UTC - Grant Wu: I’ve actually asked about this before and I think it
was stated that it was a reasonable thing to ask for
+1 : Chris Miller
----
2019-01-07 17:14:48 UTC - Grant Wu: I think it may have been lost due to the history limit
:disappointed:
----
2019-01-07 17:16:54 UTC - Chris Miller: History limit?
----
2019-01-07 17:20:22 UTC - Grant Wu: Yes, Slack limits free plans to 10k messages
----
2019-01-07 17:25:46 UTC - Grant Wu: There are archives sent to the mailing list but I don't
know how to search that
----
2019-01-07 17:50:06 UTC - Chris Miller: Oh, haha sorry I thought you were referring to some
sort of history limit in Pulsar :slightly_smiling_face:
----
2019-01-07 18:02:22 UTC - Evan Nelson: @Evan Nelson has joined the channel
----
2019-01-07 18:58:30 UTC - Jerry Peng: @Grant Wu ok let me investigate
pray : Grant Wu
----
2019-01-07 21:57:57 UTC - Jerry Peng: @Grant Wu before we can get the topic name in python
functions we need to complete this first:
<https://github.com/apache/pulsar/issues/3322>
since there is currently no way to get the topic name from a message using the C++/Python
API
----
2019-01-07 21:59:47 UTC - Grant Wu: oh no :disappointed:
----
2019-01-07 21:59:52 UTC - Grant Wu: Okay, good to know
----
2019-01-07 22:23:32 UTC - Jerry Peng: Though this should be pretty easy to add
----
2019-01-07 23:01:47 UTC - Emma Pollum: What IP does pulsar use for geo replication? Does it
utilize the service url of the cluster to replicate to, or something else?
----
2019-01-07 23:12:23 UTC - Matteo Merli: It will use the ServiceURL for the other cluster as
specified in the “clusters” metadata
----
2019-01-07 23:12:50 UTC - Matteo Merli: eg. the metadata you specify with `initialize-cluster-metadata`
command
----
2019-01-07 23:14:07 UTC - Emma Pollum: Thank you!
----
2019-01-08 02:22:59 UTC - Kevin DiVincenzo: @Kevin DiVincenzo has joined the channel
----
2019-01-08 02:27:27 UTC - Kevin DiVincenzo: Hi - I have a question regarding namespaces/topics.

For my use-case, I want to create topics like: `persistent://{tenant}/{namespace}/{topic}/someId`.
There is no problem creating these topics from the Java client, using `pulsar-perf produce
...`, etc.

The problem is that all of the admin functionality doesn't seem to work when you nest the
topics one layer deeper than just `persistent://{tenant}/{namespace}/{topic}`. `{namespace}/{topic}`
doesn't appear to be a valid namespace (expected), but if I do `pulsar-admin topics list {tenant}/{namespace}`,
I get back an empty list.
----
2019-01-08 02:28:14 UTC - Kevin DiVincenzo: For what its worth - this use case is for event-sourcing
/ integrating with Akka Persistence.
----
2019-01-08 02:30:03 UTC - bossbaby: so, pulsar will use 1 in 3 to backup it or use all.
----
2019-01-08 02:33:18 UTC - Kevin DiVincenzo: So is this not possible / not supported?
----
2019-01-08 02:37:05 UTC - Sijie Guo: @Kevin DiVincenzo I think there something related to
encoding and decoding “/”.
----
2019-01-08 02:37:30 UTC - Sijie Guo: I would recommend if possible trying to avoid using “/”
for now
----
2019-01-08 02:37:37 UTC - Sijie Guo: but this is a bug we definitely to fix
----
2019-01-08 02:37:38 UTC - bossbaby: i was deploy 3 node in 1 cluster but i have a question
that 2 in 3 node will node backup and store copy data to handle failure, it right?
----
2019-01-08 02:38:23 UTC - Kevin DiVincenzo: @Sijie Guo Ahh - so with the Admin API not being
able to encode/decode `/` properly in the namespace name?
----
2019-01-08 02:38:41 UTC - bossbaby: i dont know, i should deploy 1 cluster 3 broker or 3 cluster
to 2 cluster add in 1 cluster
----
2019-01-08 02:42:56 UTC - Sijie Guo: @Kevin DiVincenzo:

&gt; so with the Admin API not being able to encode/decode `/` properly in the namespace
name?

it should already encode and decode “/”. however “/” is used for distinguish namespace,
tenant and topic, as well as for distinguish v1 topic format and v2 topic format. so there
might something in rest server doesn’t handle encoding properly. (feel free to create a
github issue for that)

so I strongly recommend to avoid using “/” in topic name for now, until we identified
the issue and fix it properly
----
2019-01-08 02:44:56 UTC - Kevin DiVincenzo: @Sijie Guo - Before I go down the path of using
some other delimiter (e.g. `-`), is it safe to assume that there currently isn't a better
way to represent this _{eventlog_name}_ *{delimiter}* _{actual pulsar topic}_  relationship
within pulsar currently?
----
2019-01-08 02:45:45 UTC - Sijie Guo: can you use `{eventlog_name}` as a namespace?
----
2019-01-08 02:45:48 UTC - Kevin DiVincenzo: I'm planning on using the multi-topic subscription
to aggregate all of the child topics into the event log FWIW
----
2019-01-08 02:47:12 UTC - Kevin DiVincenzo: Well each entity in the aggregate root (e.g. event-log)
has its own _persistenceId_ (artifact of the Akka Persistence system). Each entity needs to
be able to traverse the topic (by sequenceId) for various purposes.
----
2019-01-08 02:47:46 UTC - Kevin DiVincenzo: So you might have 5 assets in some event-log called
"asset", each with their own unique persistenceId
----
2019-01-08 02:49:39 UTC - Kevin DiVincenzo: If some other service wanted to read the whole
log of assets (vs. a single asset), I was just using the `.topicsPattern(...)` method on the
client with `<persistent://tenant/namespace/assets/.*>` as the pattern.
----
2019-01-08 02:49:58 UTC - Kevin DiVincenzo: All of this is actually already working in my
little demo (before I build it out to a proper SDK).
----
2019-01-08 02:50:28 UTC - Kevin DiVincenzo: It was just the namespace navigation / admin topics
list stuff that had me stumped.
----
2019-01-08 02:52:20 UTC - Kevin DiVincenzo: @Sijie Guo - I guess before I go further down
the rabbit hole with this, are there any current limitations for the `MultiTopicsConsumerImpl`?
----
2019-01-08 02:52:30 UTC - Kevin DiVincenzo: E.g. problems with reading from thousands of topics?
----
2019-01-08 02:54:36 UTC - Kevin DiVincenzo: You have the `property` field on the message builder,
so I was also thinking of tagging messages with their `persistenceId` - the downside is to
see the history for just a single `persistenceId`, you'd have to traverse the topic (e.g.
with the Reader interface), filter only those messages for that `persistenceId`, then create
some sort of mapping into a logical sequenceId for only those messages.
----
2019-01-08 02:54:59 UTC - Kevin DiVincenzo: Straight forward to do I guess, but I was trying
to avoid it if possible / not necessary.
----
2019-01-08 02:55:04 UTC - Sijie Guo: ah:

1) I would suggest you use other delimiters, such as “-” or “_”. so in your use case,
your regex will be “<persistent://tenant/namespace/asserts_.*>“.
2)
&gt; problems with reading from thousands of topics?

there shouldn’t be problems reading from thousands of topics. but the number of topics will
be bounded by your resources of your client machine, such as memory.
----
2019-01-08 02:55:38 UTC - Sijie Guo: it depends on your use case
----
2019-01-08 02:56:02 UTC - Kevin DiVincenzo: ^^ - perfect thanks. I'm assuming that bounding
the client receive buffer to something ~reasonable~ small like `10` should fix #2?
----
2019-01-08 02:57:11 UTC - bossbaby: i was deploy 3 node in 1 cluster but i have a question
that 2 in 3 node will node backup and store copy data to handle failure, it right?
----
2019-01-08 02:58:36 UTC - Kevin DiVincenzo: IOW - is that receive buffer per individual topic
(assuming yes based on your response) or is it shared between all topics?
----
2019-01-08 02:58:38 UTC - bossbaby: in my node is 1 bookkeeper( 3 bookkeeper - 3 cluster)
----
2019-01-08 03:07:12 UTC - Kevin DiVincenzo: Actually never mind - everything seems to be working
fine, up to 1,000 topics. I guess if the number of topics in an event-log ever needs to exceed
that number, we'll just use multiple readers.
----
2019-01-08 03:07:17 UTC - Kevin DiVincenzo: Thanks for your help @Sijie Guo
ok_hand : Sijie Guo
----
2019-01-08 03:07:50 UTC - Sijie Guo: yes. it is per topic.
----
2019-01-08 03:08:09 UTC - Sijie Guo: I think there is a setting for the total receiver buffer
as well
----
2019-01-08 03:08:57 UTC - Kevin DiVincenzo: Actually - one more question for sanity's sake
if you don't mind.
----
2019-01-08 03:09:10 UTC - Chris Chapman: @Chris Chapman has joined the channel
----
2019-01-08 03:11:23 UTC - Kevin DiVincenzo: From testing, it seems like with the default message
retention policy and backlog policy, messages are actually *not ever* deleted from the topic.
I'm able to later on start a consumer (with `.subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)`)
and read the entire history of all messages sent to this topic (this is what I want). Is this
the actual/correct behavior?
----
2019-01-08 03:11:39 UTC - Sijie Guo: &gt;  2 in 3 node will node backup and store copy
data to handle failure, it right?

bookkeeper has replication to handle failures
----
2019-01-08 03:13:40 UTC - Sijie Guo: &gt;  it seems like with the default message retention
policy and backlog policy

I think default message retention policy is deleting the messages after all subscriptions
have consumed the messages.
if the messages are not deleted, there might be some subscriptions not acknowledging the messages.
I would recommend u using “pulsar-admin topics stats &lt;topic&gt;” to see if
there are any subscriptions in that topic not acknowledging the messages.
----
2019-01-08 03:14:21 UTC - Kevin DiVincenzo: Is there a way to tell pulsar "don't delete messages
ever" then?
----
2019-01-08 03:15:11 UTC - Kevin DiVincenzo: (was planning on using the awesome Bookkeeper
tiered storage feature)
----
2019-01-08 03:15:21 UTC - bossbaby: so 2 bookeeper in 2 Vms and it will replicaion ?
----
2019-01-08 03:15:46 UTC - Sijie Guo: yes
----
2019-01-08 03:18:06 UTC - Sijie Guo: @Kevin DiVincenzo - currently you can configure the retention
policy (by setting retention time to -1) to keep the data forever.  it is per namespace basis.
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/>
----
2019-01-08 03:19:56 UTC - Kevin DiVincenzo: i.e. `pulsar-admin namespaces set-retention my-tenant/my-ns
\
  --size -1 \
  --time -1`
+1 : Sijie Guo
----
2019-01-08 03:20:28 UTC - Sijie Guo: &gt; messages are actually *not ever* deleted from
the topic.

actually I took my previous comment back. it is probably related to how pulsar garbage collects
data. pulsar garbage collects data by segments. so there is at least one segment kept even
all your consumers are consumed the messages. that might explain why you receive all the data
after restarting from `earliest`.

anyway, in general, you can use “topics stats” and “topics stats-internal” to see
more details about the topic
----
2019-01-08 03:20:34 UTC - Sijie Guo: yes
----
2019-01-08 03:20:49 UTC - Kevin DiVincenzo: Yup - that's want I needed. Thanks again.
----
2019-01-08 03:22:40 UTC - Sijie Guo: cool
----
2019-01-08 03:27:21 UTC - bossbaby: great, i thought i will setup to replication but now,
setup every broker with every node and bookkeeper will replcate data to every bookkeeper have
in cluster
----
Mime
View raw message