tomee-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ihsan Ecemis <miece...@gmail.com>
Subject Re: CXF and WebSocket messages are merged together
Date Thu, 16 Jun 2016 07:05:48 GMT

> On Jun 16, 2016, at 2:37 AM, Romain Manni-Bucau <rmannibucau@gmail.com> wrote:
> 
> 2016-06-16 4:09 GMT+02:00 Ihsan Ecemis <miecemis@gmail.com <mailto:miecemis@gmail.com>>:
> 
>> 
>> Thanks for your answer Romain.  I would appreciate any links to inspire me
>> how to do that, especially for Twilio API calls.  I don’t know how to use
>> sessionIds to track connections.
>> 
>> 
> Idea was to use websocket ot http session, I don't know if you activated it
> (atmosphere can need session support for long polling).

We have this in our web.xml:

  <listener>
    <listener-class>org.atmosphere.cpr.SessionSupport</listener-class>
  </listener>

  <context-param>
    <param-name>org.atmosphere.cpr.sessionSupport</param-name>
    <param-value>true</param-value>
  </context-param>

as documented here: https://github.com/Atmosphere/atmosphere/wiki/Enabling-HttpSession-Support
<https://github.com/Atmosphere/atmosphere/wiki/Enabling-HttpSession-Support>

And we can get sessionId with AtmosphereResource.session(). 

But apparently since Atmosphere doesn’t notify us about the disconnect we can still write
to the underlying resource, which is not cleared when another producer starts using it.  

BTW, we are using @ManagedService (https://github.com/Atmosphere/atmosphere/wiki/Getting-Started-with-the-ManagedService-Annotation#managedservice
<https://github.com/Atmosphere/atmosphere/wiki/Getting-Started-with-the-ManagedService-Annotation#managedservice>)
which comes with HeartbeatInterceptor, so I don’t understand how the Server doesn’t detect
the disconnect for 10 minutes...


>> 
>> Here are 2 new pieces of information:
>> 
>> (1) WebSocket connections were shaky for some users and Atmosphere
>> javascript client had a failover to long-polling.
>> 
>> So yesterday, we disabled long-polling failover and since then we didn't
>> have any errors from Twilio.  Probably it is too early to claim victory but
>> I think it is an interesting observation.
>> 
>> If we don’t see any other errors, I will think that the client was
>> switching from websocket to long-polling, and then getting disconnected.
>> The server (Atmosphere?) didn’t realize this disconnect and kept on writing
>> to the same connection. And at some point that same connection was handed
>> over to the Twilio API call…  (which should not happen!  must be prevented
>> by Tomcat?  nginx?  CXF?  Atmosphere?)
>> 
>> I don’t know how the Socket Connection Pool works at low-level so this is
>> only my speculation, both from my observation there is something really
>> wrong here.
>> 
>> 
>> (2) I looked at Twilio error messages more carefully,  and each error I
>> looked so far contained WebSockets messages addressed to the same user
>> (This is very manual process, I check our logs for each task_id posted to
>> Twilio in JSON).
>> 
>> In one case there were 5 WebSockets messages addressed to the same user,
>> prepended to Twilio XML.   These 5 messages spanned 576 seconds, i.e.
>> almost 10 minutes!  (Each WebSockets messages has a  timestamp).   So we
>> kept on writing to the same connection for 10 minutes after the WebSocket
>> was disconnected   ;-(
>> 
>> 
> Can it be a bad mobile connection?

The user who had lots of disconnects the other day was working in a dorm room over WiFi. 
Not mobile, but not great connection either. 

Maybe the user was continuously connecting and disconnecting,  and Atmosphere was trying to
keep the underlying resource alive? 




>> 
>> 
>>> On Jun 15, 2016, at 7:26 PM, Romain Manni-Bucau <rmannibucau@gmail.com>
>> wrote:
>>> 
>>> Is debugging using the sessionId to track connections an option?
>>> Le 16 juin 2016 01:12, "Ihsan Ecemis" <miecemis@gmail.com> a écrit :
>>> 
>>>> 
>>>> TL;DR:  We are having this weird, non-reproducable problem in a complex
>>>> system with lots of moving parts. I posted this question to Atmosphere
>>>> mailing list but didn’t get useful answers.  I would appreciate if
>> anyone
>>>> can tell me whether there is any chance what we observe can be a bug
>> with
>>>> Tomcat/TomEE.  Any other help/tips would be greatly appreciated. Thanks
>>>> 
>>>> 
>>>> Problem:
>>>> 
>>>> We experience some weird problems with WebSockets under Atmosphere,
>>>> running under TomEE 7.0.0 — the whole stack is listed below.
>>>> 
>>>> 
>>>> One of our JAX-RS API endpoints (something like /twilio/xxxx) is called
>> by
>>>> Twilio.  Twilio expects  the following XML response:
>>>> 
>>>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response/>
>>>> 
>>>> That system was working fine for months.  We were receiving Text
>> Messages
>>>> with no errors from Twilio.
>>>> 
>>>> 
>>>> Then, we added WebSockets at a different endpoint (something like
>>>> /ws/yyyy).  Using our AngularJS app, our users started communicating
>> with
>>>> our backend in JSON.  A sample message our backend wrote to a WebSocket
>>>> client:
>>>> 
>>>> 
>>>> 
>> {"thread_id":"Server-2702","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1126,"task_id":10134},"time":1465409434004}
>>>> 
>>>> 
>>>> 
>>>> After a while, Twilio started sending us "Schema validation warning”
>>>> emails.  When I went to Twilio Dashboard,  I saw that they received the
>>>> following response to their API call:
>>>> 
>>>> 
>> {"thread_id":”Server-1732","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id”:734,"task_id”:9345},"time":
>>>> 1465409122781}<?xml version="1.0" encoding="UTF-8"
>>>> standalone="yes"?><Response/>
>>>> 
>>>> 
>>>> So Twilio received a message that was meant to be sent to a WebSocket
>>>> user, prepended to the usual Twilio response!  This looks crazy, and not
>>>> unacceptable of course.
>>>> 
>>>> This happens a few times, sometimes a dozen times a day.  We put
>> together
>>>> a test system but could not reproduce the problem. (Our test system was
>> not
>>>> exactly the same system as our production in many respects
>> unfortunately)
>>>> 
>>>> Sometimes Twilio receives more than one WebSocket message, again an
>> actual
>>>> example (redacted just a little for brevity):
>>>> 
>>>> 
>> {"thread_id":"Server-2722","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1966,"task_id":11112},"time":1465409571004}{"thread_id":"Server-2723","session_id":null,"command":"new_caleasy_item","data":{"text_message_id":1453,"text_message_created_at":null,"meal_type_id":6,"body":"Edamame","report_date":"2016-06-03","caleasy_media_list":[],"task_id":11114,"expiration_millis":180000},"time":1465409582010}<?xml
>>>> version="1.0" encoding="UTF-8" standalone="yes"?><Response/>
>>>> 
>>>> 
>>>> In this example, our Backend wrote 2 WebSocket messages, probably from 2
>>>> separate threads, intended for 2 separate WebSocket users, and then it
>>>> wrote the response to Twilio.  It looks like we are writing to a
>> connection
>>>> nobody is reading from, and appending more messages to it, and then the
>>>> whole thing is returned to Twilio when they make the API call…
>>>> 
>>>> 
>>>> I don’t understand how this could happening.  In nginx?  (I really doubt
>>>> nginx can have such a huge bug)   In Tomcat?   A problem between nginx
>> and
>>>> Tomcat that only affects WebSockets when regular API calls are also
>> used?
>>>> I have no idea and I don’t know where to start to investigate   ;-(
>>>> 
>>>> 
>>>> Here is our stack:
>>>> 
>>>> *) AWS Linux 4.1.10-17.31, TomEE 7.0.0, nginx as proxy server on the
>> same
>>>> AWS instance
>>>> *) SSL ends on nginx. nginx is configured for WebSockets following
>> online
>>>> instructions.
>>>> *) We serve JAX-RS API calls via Apache CXF (version 3.1.6), using
>>>> standard @Path("/....") annotations.
>>>> *) We have just one WebSocket service using Atmosphere (version 2.4.2),
>>>> defined as @ManagedService(path = "/caleasy”)
>>>> *) WebSocket service is used by our AngularJS based UI
>>>> *) No other specific setup with CXF and Atmosphere
>>>> 
>>>> In our web.xml, we configure Atmosphere as follows:
>>>> 
>>>> <servlet>
>>>>   <description>AtmosphereServlet</description>
>>>>   <servlet-name>AtmosphereServlet</servlet-name>
>>>>   <servlet-class>org.atmosphere.cpr.AtmosphereServlet</servlet-class>
>>>> </servlet>
>>>> 
>>>> <servlet-mapping>
>>>>   <servlet-name>AtmosphereServlet</servlet-name>
>>>>   <url-pattern>/ws/*</url-pattern>
>>>> </servlet-mapping>
>>>> 
>>>> <listener>
>>>>   <listener-class>org.atmosphere.cpr.SessionSupport</listener-class>
>>>> </listener>
>>>> 
>>>> 
>>>> 
>>>> PS: Our TomEE logs have the following Exceptions, but I am not sure
>>>> whether they are normal or suspicious:
>>>> 
>>>> 2016-06-09 13:52:41,832 ERROR [nio-8080-exec-9] o.a.c.JSR356Endpoint
>>>>  -
>>>> java.io.EOFException: null
>>>>       at
>>>> 
>> org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1267)
>>>>       at
>>>> 
>> org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.isReadyForRead(NioEndpoint.java:1176)
>>>>       at
>>>> 
>> org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:58)
>>>>       at
>>>> 
>> org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.upgradeDispatch(WsHttpUpgradeHandler.java:148)
>>>>       at
>>>> 
>> org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54)
>>>>       at
>>>> 
>> org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53)
>>>>       at
>>>> 
>> org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:788)
>>>>       at
>>>> 
>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1485)
>>>>       at
>>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>       at
>>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>       at
>>>> 
>> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>>>       at java.lang.Thread.run(Thread.java:745)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message