Skip to content

Conversation

@addaleax
Copy link
Collaborator

This adds new features to the mongodb-runner package, both for programmatic and CLI usage, in order to let drivers adopt this as their main MongoDB cluster management tool.

CLI improvements:

  • Support --config <file> to provide a config file
  • Let users trigger verbose debug mode through the CLI
  • Let users specify config options via environment variables

Core features:

  • Custom arguments for the individual replset members, individual shards, or individual mongoses (and multiple mongoses, through that)
  • Support for enabling authentication on clusters as part of their startup

Related: mongodb-labs/drivers-evergreen-tools#701

async addAuthIfNeeded(): Promise<void> {
if (!this.users?.length) return;
// Sleep to give time for a possible replset election to settle.
await sleep(1000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something that could tell us a replset election is done earlier? Seems like a place where things could add up timewise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is taken straight from @blink1073's commits, I've asked myself the same question but he can probably answer more easily. We do wait for a primary to be self-reporting as the winner of the election before, so I'm not entirely sure that this is necessary

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the wait for the self-report, I was getting a notPrimary error about 50% of the time before I added the sleep.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another wrinkle: if a keyFile is used, the initial metadata insert fails with the error below. In my branch I deferred the buildInfo check if there was a keyFile. The cluster start still works in the end with the failure.

DEBUG:   mongodb-runner failed to get buildInfo, treating as closed server MongoServerError: Authentication failed.
    at Connection.sendCommand (/Users/steve.silvester/workspace/drivers-evergreen-tools/.evergreen/orchestration/node_modules/mongodb/lib/cmap/connection.js:306:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:103:5)
    at async Connection.command (/Users/steve.silvester/workspace/drivers-evergreen-tools/.evergreen/orchestration/node_modules/mongodb/lib/cmap/connection.js:334:26)
    at async executeScram (/Users/steve.silvester/workspace/drivers-evergreen-tools/.evergreen/orchestration/node_modules/mongodb/lib/cmap/auth/scram.js:79:22)
    at async ScramSHA256.auth (/Users/steve.silvester/workspace/drivers-evergreen-tools/.evergreen/orchestration/node_modules/mongodb/lib/cmap/auth/scram.js:39:16)
    at async performInitialHandshake (/Users/steve.silvester/workspace/drivers-evergreen-tools/.evergreen/orchestration/node_modules/mongodb/lib/cmap/connect.js:104:13)
    at async connect (/Users/steve.silvester/workspace/drivers-evergreen-tools/.evergreen/orchestration/node_modules/mongodb/lib/cmap/connect.js:24:9) {
  errorLabelSet: Set(2) { 'HandshakeError', 'ResetPool' },
...

isClosed(): boolean {
return this.servers.length === 0 && this.shards.length === 0;
// eslint-disable-next-line @typescript-eslint/no-unused-vars
for (const _ of this.children()) return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be returning false if there are children? Not sure I follow the logic here, maybe worth a comment? I might be missing something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the existing logic – if all servers have been removed from the cluster, than it's closed. But I'm happy to add a comment here

Copy link
Member

@Anemy Anemy Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be

Suggested change
for (const _ of this.children()) return true;
for (const _ of this.children()) return false;

Then? Wouldn't return true mean there are servers or shards? Thanks for adding a comment! Apologies if I'm totally missing this.

type: 'string',
describe: 'Configure OIDC authentication on the server',
})
.config()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I can't believe I missed this option. 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that it passes options through that aren't defined in the cli options, so we don't need to add all of the new options to the cli.

rsMembers: RSMemberOptions[];
};

export type ShardedOptions = {
Copy link

@blink1073 blink1073 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the shards are defined in our config files is more like a list of rsMembers, so I think we need to add:

| {
    shards?: never;
    shardArgs?: RSMemberOptions[][];
}
```

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do


export type MongoClusterOptions = Pick<
MongoServerOptions,
'logDir' | 'tmpDir' | 'args' | 'binDir' | 'docker'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pick defaultConnectionOptions here as well to handle the TLS options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't sound quite right – can you describe the use case here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How else will internal client returned by withClient be able to use TLS?

Copy link

@blink1073 blink1073 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pass these options on to mongo-orchestration to use in its connection string for internal clients:

ssl = true
tlsCertificateKeyFile = clientCertFile
tlsAllowInvalidCertificates = true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blink1073 Sooo ... I've pushed updates here, and yes, this will let you override flags for the internal client, but I'd generally like to keep usage of this feature minimal.

What I've done here is a bit more complex: By default, when mongodb-runner detects TLS usage, it creates a TLS client key and self-signed certificate on the fly, and adds those to the server's CA file. That's a bit more in line with the philosophy of "one toggle does one thing" that we try to stick to here in the package's interface, i.e. you will generally just need to specify --tlsCAFile on the command line and the rest of it will be handled for you.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay that sounds reasonable to me, thank you! To keep this PR scoped, I think we should stop here feature-wise. Some ideas for follow up:

  • Better handling of keyFile and its interaction with the internal metadata.
  • Allow args to be passed as part of the config file and appended to the command line args
  • Add support for unix sockets
  • Write to stderr for all except the connection string, so a calling process can capture it directly.

I'll test out these new changes and get back to you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for keyfiles I think we'll want to do something similar to TLS – intercept those and use it for auth, if possible.

But yeah, let's just try to get this work here over the line 🙂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is working now except for the shards config option

@addaleax addaleax requested a review from blink1073 November 24, 2025 22:59
describe: 'Configure OIDC authentication on the server',
})
.config()
.env('MONGODB_RUNNER')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since "shards" can be an object now, I think shards needs a custom coerce method. Right now if it is an object it gets converted to NaN.

Copy link

@blink1073 blink1073 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm testing this using the cli instead of programmatically, to align more closely with mongotune and to avoid adding a node project to the orchestration folder (to minimize overall change).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants