-
-
Notifications
You must be signed in to change notification settings - Fork 155
Description
The Bug
So, I've been hunting this bug for almost five years. It is a little bit tough to reproduce because it's super random and only shows up in heavy-traffic parallel testing, I couldn't setup a minimal reproduction yet. But it boils down to this: when running system specs in parallel, SOMETIMES your Cuprite/Ferrum Browser setup will crash with the notorious DeadBrowserError, and your entire spec suite is probably going to fail as well.
After some tracking, I found out that the bug happens because there is an attempt to write into a closed websocket. That websocket, in turn, is closed because One or more reserved bits are on: reserved1 = 1, reserved2 = 0, reserved3 = 0. However, that's also not the real reason. The underlying websocket driver closes the connection because it can't parse the opcode. But the opcode is not from the websocket exchange proper, but rather from this little snippet of code in Ferrum:
ferrum/lib/ferrum/client/web_socket.rb
Lines 95 to 106 in 180f292
| def start | |
| @thread = Utils::Thread.spawn do | |
| loop do | |
| data = @sock.readpartial(512) | |
| break unless data | |
| @driver.parse(data) | |
| end | |
| rescue EOFError, Errno::ECONNRESET, Errno::EPIPE, IOError # rubocop:disable Lint/ShadowedException | |
| @messages.close | |
| end | |
| end |
It would appear that the specific packet leading to that dreaded DeadBrowser error is a timeout packet:
HTTP/1.1 408 Request Timeout\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Encoding: UTF-8\r\nAccept-Ranges: bytes\r\nConnection: keep-alive\r\n\r\n\r\nRequest has timed out
So instead of a properly formed websocket packet, we get a plaintext HTTP 408, which isn't parsed by the websocket-driver -> closes the websocket-driver -> triggers a DeadBrowserError the next time we try to send anything through that websocket.
I'm really unsure what should happen in a situation like this. Is it even a valid behavior on chrome's part? Should we handle this in our specs by catching/retrying? One of the more annoying aspects of this is that it completely taints the whole capybara setup. You can't Capybara.reset! out of this, your entire spec suite is bricked, which is kinda brutal for a spec suite that runs for 20+ minutes.
Let me know if you have any ideas!
Environment
It's a remote chrome configuration using browserless/chromium:latest docker image and ferrum 0.17.1.
Chrome /json/version response:
{
"Browser": "Chrome/136.0.7103.25",
"Protocol-Version": "1.3",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/136.0.0.0 Safari/537.36",
"V8-Version": "13.6.233.4",
"WebKit-Version": "537.36 (@97d495678dc307bfe6d6475901104e262ec7a487)",
"webSocketDebuggerUrl": "ws://chrome:3333",
"Debugger-Version": "97d495678dc307bfe6d6475901104e262ec7a487"
}