tomee-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ihsan Ecemis <miece...@gmail.com>
Subject Re: CXF and WebSocket messages are merged together
Date Thu, 16 Jun 2016 02:09:29 GMT

Thanks for your answer Romain.  I would appreciate any links to inspire me how to do that,
especially for Twilio API calls.  I don’t know how to use sessionIds to track connections.


Here are 2 new pieces of information:

(1) WebSocket connections were shaky for some users and Atmosphere javascript client had a
failover to long-polling. 

So yesterday, we disabled long-polling failover and since then we didn't have any errors from
Twilio.  Probably it is too early to claim victory but I think it is an interesting observation.

If we don’t see any other errors, I will think that the client was switching from websocket
to long-polling, and then getting disconnected.  The server (Atmosphere?) didn’t realize
this disconnect and kept on writing to the same connection. And at some point that same connection
was handed over to the Twilio API call…  (which should not happen!  must be prevented by
Tomcat?  nginx?  CXF?  Atmosphere?) 

I don’t know how the Socket Connection Pool works at low-level so this is only my speculation,
both from my observation there is something really wrong here. 


(2) I looked at Twilio error messages more carefully,  and each error I looked so far contained
WebSockets messages addressed to the same user (This is very manual process, I check our logs
for each task_id posted to Twilio in JSON).

In one case there were 5 WebSockets messages addressed to the same user, prepended to Twilio
XML.   These 5 messages spanned 576 seconds, i.e. almost 10 minutes!  (Each WebSockets messages
has a  timestamp).   So we kept on writing to the same connection for 10 minutes after the
WebSocket was disconnected   ;-(



> On Jun 15, 2016, at 7:26 PM, Romain Manni-Bucau <rmannibucau@gmail.com> wrote:
> 
> Is debugging using the sessionId to track connections an option?
> Le 16 juin 2016 01:12, "Ihsan Ecemis" <miecemis@gmail.com> a écrit :
> 
>> 
>> TL;DR:  We are having this weird, non-reproducable problem in a complex
>> system with lots of moving parts. I posted this question to Atmosphere
>> mailing list but didn’t get useful answers.  I would appreciate if anyone
>> can tell me whether there is any chance what we observe can be a bug with
>> Tomcat/TomEE.  Any other help/tips would be greatly appreciated. Thanks
>> 
>> 
>> Problem:
>> 
>> We experience some weird problems with WebSockets under Atmosphere,
>> running under TomEE 7.0.0 — the whole stack is listed below.
>> 
>> 
>> One of our JAX-RS API endpoints (something like /twilio/xxxx) is called by
>> Twilio.  Twilio expects  the following XML response:
>> 
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?><Response/>
>> 
>> That system was working fine for months.  We were receiving Text Messages
>> with no errors from Twilio.
>> 
>> 
>> Then, we added WebSockets at a different endpoint (something like
>> /ws/yyyy).  Using our AngularJS app, our users started communicating with
>> our backend in JSON.  A sample message our backend wrote to a WebSocket
>> client:
>> 
>> 
>> {"thread_id":"Server-2702","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1126,"task_id":10134},"time":1465409434004}
>> 
>> 
>> 
>> After a while, Twilio started sending us "Schema validation warning”
>> emails.  When I went to Twilio Dashboard,  I saw that they received the
>> following response to their API call:
>> 
>> {"thread_id":”Server-1732","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id”:734,"task_id”:9345},"time":
>> 1465409122781}<?xml version="1.0" encoding="UTF-8"
>> standalone="yes"?><Response/>
>> 
>> 
>> So Twilio received a message that was meant to be sent to a WebSocket
>> user, prepended to the usual Twilio response!  This looks crazy, and not
>> unacceptable of course.
>> 
>> This happens a few times, sometimes a dozen times a day.  We put together
>> a test system but could not reproduce the problem. (Our test system was not
>> exactly the same system as our production in many respects unfortunately)
>> 
>> Sometimes Twilio receives more than one WebSocket message, again an actual
>> example (redacted just a little for brevity):
>> 
>> {"thread_id":"Server-2722","session_id":null,"command":"cancel_caleasy_item","data":{"text_message_id":1966,"task_id":11112},"time":1465409571004}{"thread_id":"Server-2723","session_id":null,"command":"new_caleasy_item","data":{"text_message_id":1453,"text_message_created_at":null,"meal_type_id":6,"body":"Edamame","report_date":"2016-06-03","caleasy_media_list":[],"task_id":11114,"expiration_millis":180000},"time":1465409582010}<?xml
>> version="1.0" encoding="UTF-8" standalone="yes"?><Response/>
>> 
>> 
>> In this example, our Backend wrote 2 WebSocket messages, probably from 2
>> separate threads, intended for 2 separate WebSocket users, and then it
>> wrote the response to Twilio.  It looks like we are writing to a connection
>> nobody is reading from, and appending more messages to it, and then the
>> whole thing is returned to Twilio when they make the API call…
>> 
>> 
>> I don’t understand how this could happening.  In nginx?  (I really doubt
>> nginx can have such a huge bug)   In Tomcat?   A problem between nginx and
>> Tomcat that only affects WebSockets when regular API calls are also used?
>> I have no idea and I don’t know where to start to investigate   ;-(
>> 
>> 
>> Here is our stack:
>> 
>> *) AWS Linux 4.1.10-17.31, TomEE 7.0.0, nginx as proxy server on the same
>> AWS instance
>> *) SSL ends on nginx. nginx is configured for WebSockets following online
>> instructions.
>> *) We serve JAX-RS API calls via Apache CXF (version 3.1.6), using
>> standard @Path("/....") annotations.
>> *) We have just one WebSocket service using Atmosphere (version 2.4.2),
>> defined as @ManagedService(path = "/caleasy”)
>> *) WebSocket service is used by our AngularJS based UI
>> *) No other specific setup with CXF and Atmosphere
>> 
>> In our web.xml, we configure Atmosphere as follows:
>> 
>>  <servlet>
>>    <description>AtmosphereServlet</description>
>>    <servlet-name>AtmosphereServlet</servlet-name>
>>    <servlet-class>org.atmosphere.cpr.AtmosphereServlet</servlet-class>
>>  </servlet>
>> 
>>  <servlet-mapping>
>>    <servlet-name>AtmosphereServlet</servlet-name>
>>    <url-pattern>/ws/*</url-pattern>
>>  </servlet-mapping>
>> 
>>  <listener>
>>    <listener-class>org.atmosphere.cpr.SessionSupport</listener-class>
>>  </listener>
>> 
>> 
>> 
>> PS: Our TomEE logs have the following Exceptions, but I am not sure
>> whether they are normal or suspicious:
>> 
>> 2016-06-09 13:52:41,832 ERROR [nio-8080-exec-9] o.a.c.JSR356Endpoint
>>   -
>> java.io.EOFException: null
>>        at
>> org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.fillReadBuffer(NioEndpoint.java:1267)
>>        at
>> org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.isReadyForRead(NioEndpoint.java:1176)
>>        at
>> org.apache.tomcat.websocket.server.WsFrameServer.onDataAvailable(WsFrameServer.java:58)
>>        at
>> org.apache.tomcat.websocket.server.WsHttpUpgradeHandler.upgradeDispatch(WsHttpUpgradeHandler.java:148)
>>        at
>> org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54)
>>        at
>> org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53)
>>        at
>> org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:788)
>>        at
>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1485)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>        at
>> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>        at java.lang.Thread.run(Thread.java:745)
>> 
>> 
>> 
>> 


Mime
View raw message