Skip to content

DeadBrowserError when remote Chrome session returns HTTP 408 Request Timed Out #538

@markiz

Description

@markiz

The Bug

So, I've been hunting this bug for almost five years. It is a little bit tough to reproduce because it's super random and only shows up in heavy-traffic parallel testing, I couldn't setup a minimal reproduction yet. But it boils down to this: when running system specs in parallel, SOMETIMES your Cuprite/Ferrum Browser setup will crash with the notorious DeadBrowserError, and your entire spec suite is probably going to fail as well.

After some tracking, I found out that the bug happens because there is an attempt to write into a closed websocket. That websocket, in turn, is closed because One or more reserved bits are on: reserved1 = 1, reserved2 = 0, reserved3 = 0. However, that's also not the real reason. The underlying websocket driver closes the connection because it can't parse the opcode. But the opcode is not from the websocket exchange proper, but rather from this little snippet of code in Ferrum:

def start
@thread = Utils::Thread.spawn do
loop do
data = @sock.readpartial(512)
break unless data
@driver.parse(data)
end
rescue EOFError, Errno::ECONNRESET, Errno::EPIPE, IOError # rubocop:disable Lint/ShadowedException
@messages.close
end
end

It would appear that the specific packet leading to that dreaded DeadBrowser error is a timeout packet:

HTTP/1.1 408 Request Timeout\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Encoding: UTF-8\r\nAccept-Ranges: bytes\r\nConnection: keep-alive\r\n\r\n\r\nRequest has timed out

So instead of a properly formed websocket packet, we get a plaintext HTTP 408, which isn't parsed by the websocket-driver -> closes the websocket-driver -> triggers a DeadBrowserError the next time we try to send anything through that websocket.

I'm really unsure what should happen in a situation like this. Is it even a valid behavior on chrome's part? Should we handle this in our specs by catching/retrying? One of the more annoying aspects of this is that it completely taints the whole capybara setup. You can't Capybara.reset! out of this, your entire spec suite is bricked, which is kinda brutal for a spec suite that runs for 20+ minutes.

Let me know if you have any ideas!

Environment

It's a remote chrome configuration using browserless/chromium:latest docker image and ferrum 0.17.1.

Chrome /json/version response:

{
  "Browser": "Chrome/136.0.7103.25",
  "Protocol-Version": "1.3",
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/136.0.0.0 Safari/537.36",
  "V8-Version": "13.6.233.4",
  "WebKit-Version": "537.36 (@97d495678dc307bfe6d6475901104e262ec7a487)",
  "webSocketDebuggerUrl": "ws://chrome:3333",
  "Debugger-Version": "97d495678dc307bfe6d6475901104e262ec7a487"
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions