Skip to content

Duplicated heavy operation in TransportClusterHealthAction.executeHealth #88303

@AlexanderGunnarssonMW

Description

@AlexanderGunnarssonMW

Description

We are running a large scale Elasticsearch 7 cluster. When performing profiling on the master node, we observed that about 50% of the CPU time is spent in TransportClusterHealthAction.executeHealth. Half of this time is spent in the validateRequest method, and the other half in getResponse. These two methods are doing the same heavy operation twice, building a ClusterHealthResponse first to validate it and then to return it. Optimizing this would for our workload save 25% of the overall CPU time spent by the master node and presumably make cluster health requests respond twice as fast.

I have already written the small code change needed to optimize this (just a new nullable method validateRequestAndGetResponse). Would you like me to submit this as a PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions