-[NSURLSessionWebSocketTask cancelWithCloseCode:reason:] reliability issue

Hello,

For our iOS app, we're switching WebSocket client implementation from one using Poco project to one based on NSURLSessionWebSocketTask. We are observing one regression which prevents us from shipping this solution though:

5-7% of calls to -[NSURLSessionWebSocketTask cancelWithCloseCode:reason:] fail to deliver given close code to server and 1006 close code (AbnormalClosure) is used instead.

Our test consist of following steps:
  1. Create WebSocket using NSURLSessionWebSocketTask

  2. Send "Hello" message

  3. Close WebSocket with 1011 close code

  4. Destroy NSURLSessionWebSocketTask instance

  5. Check whether server received 1011 close code

  • We do not perform those steps line by line, but rather use delays (~100-200ms) between each step.

  • We perform such test 1000 times and observe 5-7% failure rate consistently.

  • Distribution of failures is random.

  • We did perform test on both iOS simulator and iOS devices with no observable differences.

  • We did perform test on multiple server implementations (tornado- and node.js based ones) running on local machine (same as Xcode) and remote one

  • We always receive "Hello" message on server

  • 1011 close code is arbitrarily chosen for testing, but results are same for different ones, too.



We did setup mitmproxy and for failed tests see following log:

Code Block
Error in WebSocket connection to 20.16.12.131:8080: WebSocket connection closed unexpectedly by client: TcpDisconnect(None)



We did enable CFNetwork diagnostic using CFNETWORK_DIAGNOSTICS and found one difference between success and failure cases. For successful cases we see log which looks like this:

Code Block
default 16:27:15.831193+0200 Playground-ObjC tcp_close [C7.1:2] TCP Packets:
snd 0.000s seq 1092086878:1092086879 ack 0 win 65535 len 0 [SEC]
rcv 0.004s seq 3101059099:3101059100 ack 1092086879 win 65535 len 0 [S.]
snd 0.000s seq 1092086879:1092086879 ack 3101059100 win 4117 len 0 [.]
snd 0.002s seq 1092086879:1092087279 ack 3101059100 win 4117 len 400 [P.]
rcv 0.001s seq 3101059100:3101059100 ack 1092086879 win 2058 len 0 [.]
rcv 0.000s seq 3101059100:3101059100 ack 1092087279 win 2052 len 0 [.]
rcv 0.015s seq 3101059100:3101059229 ack 1092087279 win 2052 len 129 [P.] ECT0
snd 0.000s seq 1092087279:1092087279 ack 3101059229 win 4113 len 0 [.]
snd 0.003s seq 1092087279:1092087289 ack 3101059229 win 4113 len 10 [P.]
rcv 0.002s seq 3101059229:3101059229 ack 1092087289 win 2052 len 0 [.]
snd 0.017s seq 1092087289:1092087297 ack 3101059229 win 4113 len 8 [P.]
snd 0.001s seq 1092087297:1092087298 ack 3101059229 win 4113 len 0 [F.]
rcv 0.000s seq 3101059229:3101059229 ack 1092087297 win 2052 len 0 [.]
rcv 0.001s seq 3101059229:3101059233 ack 1092087297 win 2052 len 4 [P.] ECT0
Last packet 0ms ago.


So it's something related to "tcp_close" and seems to log whole TCP packets that were exchanged during connection. For failed cases such log is not present.

One additional log we see potentially interesting (but were not able to confirm its meaning or relation to failed cases) comes from runningboardd:

Code Block
default 16:27:15.830447+0200 runningboardd Invalidating assertion 33-1364-23682 (target:[application<org.example.app>:1364]) from originator [application<org.example.app>:1364]


So that's where we are right now and we have no idea where to dig next. We wonder if anyone else have seen such problem or can point us to any direction we could try next.

Thanks much,
Damian & Maciek
Post not yet marked as solved Up vote post of kurak Down vote post of kurak
935 views

Replies

Just to clarify, 1006 close code is generated on server side, not by iOS client. Our understanding is that due to tcp connection being closed "abnormally" on client side, server uses 1006.

Came here after looking for most of the day. I have the exact same problem. In my case it reproducible 100% of the times in CI (example here) and 0% of the times on my local machine.

Sorry that doesn't help you much but I feel less alone now :).

In situations like my first question is always “Can you reproduce this with NWConnection?” Given this:

In my case it reproducible 100% of the times in CI

that seems like a useful diagnostic, because if the problem happens with URLSession and not with NWConnection then that’s strong evidence that you’re looking at a URLSession bug.

Share and Enjoy

Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"