Skip to content

Commit f7051be

Browse files
ggivoCopilotatakavcitishun
authored
[automatic failover] Implement sliding time window metrics tracker (#3521)
* abstract clock for easy testing * Improve LockFreeSlidingWindowMetrics: fix bugs and add tests Bug Fixes: - Fix: Ensure snapshot metrics remain accurate after a full window rotation - Fix: events recorded exactly at bucket boundaries were miscounted - Enforce window size % bucket size == 0 - Move LockFreeSlidingWindowMetricsUnitTests to correct package (io.lettuce.core.failover.metrics) * remove unused reset methods * extract interface for MetricsSnapshot - remove snapshotTime - not used & not correctly calcualted - remove reset metrics - unused as of now * add LockFreeSlidingWindowMetrics benchmark test * performance tests moved to metrics package * replace with port from reselience4j * update copyrights * format * clean up javadocs * clean up - fix incorrect javadoc - fix failing benchmark * [automatic failover] Hide failover metrics implementation - CircuitBreakerMetrics, MetricsSnapshot - public - metrics implementation details stay inside io.lettuce.core.failover.metrics - Update CircuitBreaker to obtain its metrics via CircuitBreakerMetricsFactory.createLockFree() * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * rename createLockFree -> createDefaultMetrics * address review comments by @atakavci - remove CircuitBreakerMetrics, CircuitBreakerMetricsImpl - rename SlidingWindowMetrics -> CircuitBreakerMetrics * format * Enforce min-window size of 2 buckets Current implementation requires at least 2 buckets window With windowSize=1, only one node is created with next=null When updateWindow() advances the window it sets HEAD to headNext, which is null for a single-node window On the next call to updateWindow(), tries to access head.next but head is now null, causing: NullPointerException: Cannot read field "next" because "head" is null * Clean-up benchmark - benchmark matrix threads (1,4) window_size ("2", "30", "180") - performs 1_000_000 ops in simulated 5min test window - benchmark record events - benchmark record & read snapshot * remove MetricsPerformanceTests.java - no reliable way to assert on performance, instead added basic benchmark test to benchmark recording/snapshot reading average times - gc benchmarks are available for local testing * reset method removed * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @atakavci Co-authored-by: atakavci <a_takavci@yahoo.com> * Update src/main/java/io/lettuce/core/failover/metrics/CircuitBreakerMetrics.java Co-authored-by: Tihomir Krasimirov Mateev <tihomir.mateev@redis.com> * add missing license header and javadoc * add missing license header and javadoc * correct author for jmh failover metrics --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: atakavci <a_takavci@yahoo.com> Co-authored-by: Tihomir Krasimirov Mateev <tihomir.mateev@redis.com>
1 parent c2890be commit f7051be

23 files changed

+1336
-943
lines changed

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1296,7 +1296,7 @@
12961296
<argument>-classpath</argument>
12971297
<classpath />
12981298
<argument>org.openjdk.jmh.Main</argument>
1299-
<argument>.*</argument>
1299+
<argument>.*failover.*</argument>
13001300
<argument>-tu</argument>
13011301
<!--
13021302
Override time unit in benchmark results. Available

src/main/java/io/lettuce/core/failover/CircuitBreaker.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import io.lettuce.core.RedisConnectionException;
1818
import io.lettuce.core.failover.api.CircuitBreakerStateListener;
1919
import io.lettuce.core.failover.metrics.CircuitBreakerMetrics;
20-
import io.lettuce.core.failover.metrics.CircuitBreakerMetricsImpl;
20+
import io.lettuce.core.failover.metrics.MetricsFactory;
2121
import io.lettuce.core.failover.metrics.MetricsSnapshot;
2222

2323
/**
@@ -45,7 +45,7 @@ public class CircuitBreaker implements Closeable {
4545
* Create a circuit breaker instance.
4646
*/
4747
public CircuitBreaker(CircuitBreakerConfig config) {
48-
this.metrics = new CircuitBreakerMetricsImpl();
48+
this.metrics = MetricsFactory.createDefaultMetrics();
4949
this.config = config;
5050
this.trackedExceptions = new HashSet<>(config.trackedExceptions);
5151
}
@@ -54,7 +54,7 @@ public CircuitBreaker(CircuitBreakerConfig config) {
5454
* Get the metrics tracked by this circuit breaker.
5555
* <p>
5656
* This is only for internal use and testing purposes.
57-
*
57+
*
5858
* @return the circuit breaker metrics
5959
*/
6060
CircuitBreakerMetrics getMetrics() {
Lines changed: 11 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
package io.lettuce.core.failover.metrics;
22

33
/**
4-
* Interface for circuit breaker metrics tracking successes and failures within a time-based sliding window. Thread-safe and
5-
* lock-free using atomic operations.
4+
* Interface for sliding window metrics. Allows tracking of success and failure counts within a configurable window.
65
*
76
* <p>
8-
* This interface defines the contract for tracking metrics over a configurable time period. Old data outside the window is
9-
* automatically expired and cleaned up.
7+
* Implementations must be:
8+
* <ul>
9+
* <li>Thread-safe: Safe for concurrent access from multiple threads</li>
10+
* <li>Efficient: Minimal memory overhead and fast operations</li>
11+
* <li>Time-based: Automatic expiration of old data outside the window</li>
12+
* </ul>
1013
* </p>
1114
*
1215
* @author Ali Takavci
@@ -15,26 +18,21 @@
1518
public interface CircuitBreakerMetrics {
1619

1720
/**
18-
* Record a successful command execution. Lock-free operation.
21+
* Record a successful command execution.
1922
*/
2023
void recordSuccess();
2124

2225
/**
23-
* Record a failed command execution. Lock-free operation.
26+
* Record a failed command execution.
2427
*/
2528
void recordFailure();
2629

2730
/**
28-
* Get a snapshot of the current metrics within the time window. Use the snapshot to access success count, failure count,
29-
* total count, and failure rate.
31+
* Get a snapshot of the current metrics within the time window. This is a point-in-time view and does not change after
32+
* being returned. Use the snapshot to access success count, failure count, total count, and failure rate.
3033
*
3134
* @return an immutable snapshot of current metrics
3235
*/
3336
MetricsSnapshot getSnapshot();
3437

35-
/**
36-
* Reset all metrics to zero.
37-
*/
38-
void reset();
39-
4038
}

src/main/java/io/lettuce/core/failover/metrics/CircuitBreakerMetricsImpl.java

Lines changed: 0 additions & 88 deletions
This file was deleted.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
/*
2+
* Copyright 2011-Present, Redis Ltd. and Contributors
3+
* All rights reserved.
4+
*
5+
* Licensed under the MIT License.
6+
*
7+
* This file contains contributions from third-party contributors
8+
* licensed under the Apache License, Version 2.0 (the "License");
9+
* you may not use this file except in compliance with the License.
10+
* You may obtain a copy of the License at
11+
*
12+
* https://www.apache.org/licenses/LICENSE-2.0
13+
*
14+
* Unless required by applicable law or agreed to in writing, software
15+
* distributed under the License is distributed on an "AS IS" BASIS,
16+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17+
* See the License for the specific language governing permissions and
18+
* limitations under the License.
19+
*
20+
* ---
21+
*
22+
* Ported from Resilience4j's LockFreeSlidingTimeWindowMetrics
23+
* Copyright 2024 Florentin Simion and Rares Vlasceanu
24+
* Licensed under the Apache License, Version 2.0
25+
* https://github.com/resilience4j/resilience4j
26+
*
27+
* Modifications:
28+
* - Ported to be compatible with Java 8: Replaced VarHandle with AtomicReference
29+
* - Stripped down unused metrics: Removed duration and slow call tracking
30+
*/
31+
32+
package io.lettuce.core.failover.metrics;
33+
34+
/**
35+
* Clock abstraction for obtaining the current time in nanoseconds.
36+
* <p>
37+
* This interface allows for testable time-dependent code by enabling injection of custom clock implementations.
38+
* </p>
39+
*
40+
* @since 7.1
41+
*/
42+
interface Clock {
43+
44+
/**
45+
* System clock implementation using {@link System#nanoTime()}.
46+
*/
47+
Clock SYSTEM = System::nanoTime;
48+
49+
/**
50+
* Get the current time in nanoseconds.
51+
*
52+
* @return the current time in nanoseconds
53+
*/
54+
long monotonicTime();
55+
56+
}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
/*
2+
* Copyright 2011-Present, Redis Ltd. and Contributors
3+
* All rights reserved.
4+
*
5+
* Licensed under the MIT License.
6+
*
7+
* This file contains contributions from third-party contributors
8+
* licensed under the Apache License, Version 2.0 (the "License");
9+
* you may not use this file except in compliance with the License.
10+
* You may obtain a copy of the License at
11+
*
12+
* https://www.apache.org/licenses/LICENSE-2.0
13+
*
14+
* Unless required by applicable law or agreed to in writing, software
15+
* distributed under the License is distributed on an "AS IS" BASIS,
16+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17+
* See the License for the specific language governing permissions and
18+
* limitations under the License.
19+
*
20+
* ---
21+
*
22+
* Ported from Resilience4j's LockFreeSlidingTimeWindowMetrics
23+
* Copyright 2024 Florentin Simion and Rares Vlasceanu
24+
* Licensed under the Apache License, Version 2.0
25+
* https://github.com/resilience4j/resilience4j
26+
*
27+
* Modifications:
28+
* - Ported to be compatible with Java 8: Replaced VarHandle with AtomicReference
29+
* - Stripped down unused metrics: Removed duration and slow call tracking
30+
*/
31+
package io.lettuce.core.failover.metrics;
32+
33+
/**
34+
* Interface for measurement implementations that accumulate calls and outcomes.
35+
*/
36+
interface CumulativeMeasurement extends MeasurementData {
37+
38+
/**
39+
* Records the outcome of a call.
40+
*
41+
* @param outcome the outcome of the call
42+
*/
43+
void record(Outcome outcome);
44+
45+
}

0 commit comments

Comments
 (0)