O(1) scheduler: Complete implementation #38

vicLin8712 · 2025-11-19T07:02:27Z

O(1) scheduler: Complete implementation

This PR provides the complete O(1) scheduler implementation and serves as the final part of the 3-part stacked PR series.
It integrates all components introduced in the earlier patches and replaces the legacy O(n) linear scheduler with the new ready-queue–based, RR-cursor-based, bitmap-assisted O(1) design.

Feature of O(1) scheduler

Priority-indexed ready queues
Each priority level maintains an independent ready queue.
Bitmap + De Bruijn–based highest-priority lookup
The scheduler locates the next runnable task in constant time using priority bitmaps and De Bruijn table lookup.
RR cursor for fair round-robin scheduling
Each priority queue maintains a cursor to provide O(1) fair scheduling among tasks of the same priority.
Full integration into the scheduler execution path
The legacy O(n) priority scanning algorithm is completely replaced by the new O(1) logic; the iteration limit IMAX=500 is removed.
Idle task fully integrated into new design
System execution starts in the idle task, which serves as the initial execution context.
Whenever the idle task yields, control deterministically transitions to the highest-priority runnable task.

Unit tests

Commit: Add unit test suite for RR-cursor scheduler

make sched_cmp
make run

Approach

A dedicated controller task is created with priority TASK_PRIO_CRIT to orchestrate the entire test process and enforce deterministic sequencing.
After each state change of the test tasks, the unit test verifies both bitmap correctness and per-priority task-count consistency, ensuring alignment with the ready-queue and priority-bitmask invariants maintained by the O(1) scheduler.
Task types
Controller task: Coordinates the test flow, triggers all state transitions, and validates ready-queue invariants after each step.
Delay task: A runnable task that transitions into TASK_BLOCKED through mo_task_delay().
Used to verify dequeue behavior and correct clearing of priority bits when a task leaves the schedulable state set.
Normal task: A simple infinite-loop runnable task that remains schedulable unless externally suspended or cancelled.
Serves as the primary subject for testing state transitions and enqueue/dequeue correctness.

Verified state points
The following state transitions are validated by checking both ready-queue task counts and bitmap updates after each operation:

Normal task state transitions
- Creation (TASK_READY) – initial enqueue and priority bit set.
- Priority change – priority migration updates to queue placement and the corresponding bitmap bit.
- Suspension (TASK_READY → TASK_SUSPEND) – dequeued from the ready queue and priority bit cleared.
- Resumption (TASK_SUSPEND → TASK_READY) – re-enqueued with correct priority placement.
- Cancellation (TASK_READY → TASK_CANCELLED) – removed from ready queues and all bitmap bits fully cleared.
Blocked task behavior (TASK_RUNNING → TASK_BLOCKED)
- The delay task is created and its priority is promoted to match the controller task’s priority (TASK_READY).
- After the controller yields, the delay task becomes the running task, invokes mo_task_delay(), and transitions to TASK_BLOCKED.
- Control returns to the controller task, and the test verifies:
  - the delay task is completely removed from the ready queue
  - its priority bit is cleared from the bitmap
  - scheduler selection falls back to the highest remaining runnable task

Results

Linmo kernel is starting...
Heap initialized, 130005992 bytes available
idle id 1: entry=80001900 stack=80004488 size=4096
task 2: entry=80000788 stack=80005508 size=4096 prio_level=4 time_slice=5
Scheduler mode: Preemptive
Starting RR-cursor based scheduler test suits...

=== Testing Bitmap and Task Count Consistency ===
task 3: entry=80000168 stack=80006634 size=4096 prio_level=4 time_slice=5
PASS: Bitmap is consistent when TASK_READY
PASS: Task count is consistent when TASK_READY
PASS: Bitmap is consistent when priority migration
PASS: Task count is consistent when priority migration
PASS: Bitmap is consistent when TASK_SUSPENDED
PASS: Task count is consistent when TASK_SUSPENDED
PASS: Bitmap is consistent when TASK_READY from TASK_SUSPENDED
PASS: Task count is consistent when TASK_READY from TASK_SUSPENDED
PASS: Bitmap is consistent when task canceled
PASS: Task count is consistent when task canceled
task 4: entry=80000178 stack=80006634 size=4096 prio_level=4 time_slice=5
PASS: Task count is consistent when task canceled
PASS: Task count is consistent when task blocked

=== Test Results ===
Tests passed: 12
Tests failed: 0
Total tests: 12
All tests PASSED!
RR-cursor based scheduler tests completed successfully.

Note

The term TASK_CANCELLED in this document is used only for explanation. It is not an actual state in the task state machine, but represents the condition where a task has been removed from all scheduling structures and no longer exists in the system.
The task states shown in parentheses (e.g., (TASK_READY)) refer to the state of the test tasks being created or manipulated, not the state of the controller task.

Benchmark

Commit: Add benchmarking files

python3 bench.py

Approach

Spawn N=500 normal tasks to populate the scheduling domain.
All tasks begin in the TASK_READY state, ensuring the ready queues and bitmap are fully populated.
Scenario configuration (active ratio)
For each benchmark scenario, suspend a portion of tasks to reach the desired active-ratio load:

2% active
4% active
20% active
50% active
100% active

Benchmark execution
To compare the legacy O(n) scheduler with the new O(1) scheduler, a compile-time flag OLD is passed to select which scheduling algorithm is active.
The original linear-search scheduler is preserved in task.c for baseline measurement.
For each benchmark scenario, the scheduler is executed 20 times to obtain stable timing data.
The average and maximum scheduling latencies are collected, and the performance improvement is computed as the ratio between the old and new scheduler times (e.g., 1.5× faster).
Metrics collected
The benchmark collects the following metrics for each scenario:
- Mean improvement
  Average speedup factor computed as (old_latency / new_latency) across 20 runs.
- Standard deviation of improvement
  Measures the variability of speedup across repeated runs.
- Minimum / maximum improvement
  Best and worst observed speedup factors among the 20 runs.
- 95% confidence interval (CI)
  Statistical confidence bounds for the mean improvement.
- Mean scheduling latency (old / new)
  Average schedule-selection time for both the legacy O(n) scheduler and the new O(1) scheduler.
- Maximum scheduling latency (old / new)
  Worst-case schedule-selection time observed for each scheduler.
  Results

Scenario 'Minimal Active':                                                                                                                                                                                                 
  mean improvement        = 2.68x faster                                                                                                                                                                                   
  std dev of improvement  = 0.34x                                                                                                                                                                                          
  min / max improvement   = 1.75x  /  3.35x                                                                                                                                                                                
  95% CI of improvement   = [2.54x, 2.83x]                                                                                                                                                                                 
  mean old sched time     = 5616.25 us                                                                                                                                                                                     
  mean new sched time     = 2119.0 us                                                                                                                                                                                      
  max  old sched time     = 47.0 us 
  max  new sched time     = 37.0 us 

Scenario 'Moderate Active':
  mean improvement        = 1.80x faster
  std dev of improvement  = 0.27x
  min / max improvement   = 1.27x  /  2.51x
  95% CI of improvement   = [1.68x, 1.92x]
  mean old sched time     = 3887.6 us 
  mean new sched time     = 2179.45 us 
  max  old sched time     = 40.0 us 
  max  new sched time     = 23.0 us 

Scenario 'Heavy Active':
  mean improvement        = 1.02x faster
  std dev of improvement  = 0.08x
  min / max improvement   = 0.84x  /  1.17x
  95% CI of improvement   = [0.98x, 1.06x]
  mean old sched time     = 2150.15 us 
  mean new sched time     = 2119.1 us 
  max  old sched time     = 73.0 us 
  max  new sched time     = 33.0 us 

Scenario 'Stress Test':
  mean improvement        = 0.93x (slower than OLD)
  std dev of improvement  = 0.11x
  min / max improvement   = 0.65x  /  1.20x
  95% CI of improvement   = [0.88x, 0.98x]
  mean old sched time     = 1874.35 us 
  mean new sched time     = 2032.55 us 
  max  old sched time     = 23.0 us 
  max  new sched time     = 20.0 us 

Scenario 'Full Load Test':
  mean improvement        = 0.89x (slower than OLD)
  std dev of improvement  = 0.11x
  min / max improvement   = 0.63x  /  1.07x
  95% CI of improvement   = [0.84x, 0.94x]
  mean old sched time     = 1798.8 us 
  mean new sched time     = 2048.55 us 
  max  old sched time     = 33.0 us 
  max  new sched time     = 52.0 us

Reference

#23 - Draft discussion
#36 - Infrustrue
#37 - Task state transitions APIs
ae35c84 - unit test test suite
11e9ee6 - benchmark

Summary by cubic

Complete O(1) scheduler with priority queues, bitmap selection, and RR cursors, replacing the legacy O(n) scan. Adds an idle task and updates task lifecycle to use ready queues; up to ~2.7x faster under light load.

New Features
- Priority-indexed ready queues with O(1) highest-priority selection via bitmap.
- Per-priority round-robin cursors for fair rotation without list churn.
- Scheduler state in kcb (ready_bitmap, ready_queues[], rr_cursors[]) and a dedicated idle task as the safe fallback.
- Intrusive ready-queue design: TCB embeds rq_node; helpers list_pushback_node() and list_remove_node() manage nodes safely.
- Unit tests validate bitmap/queue invariants; benchmarks show strong gains at low/moderate activity.
Refactors
- Tasks explicitly enqueue/dequeue on READY/RUNNING transitions (spawn, delay/block, suspend/resume, cancel, priority change).
- Blocking paths use _sched_block_dequeue() and _sched_block_enqueue() for mutex/cond/semaphore to reinsert tasks correctly.
- Priority changes migrate tasks between queues and yield if the running task changes its priority.
- Startup launches into the idle task (idle_task_init) and removes the IMAX scan limit.

^{Written for commit 77bb7ae. Summary will update automatically on new commits.}

jserv · 2025-11-19T09:22:14Z

Do not include numbers in pull-request titles.

This commit extends the core scheduler data structures to support the new O(1) scheduler design. Adds in tcb_t: - rq_node: embedded list node for ready-queue membership used during task state transitions. This avoids redundant malloc/free for per-enqueue/dequeue nodes by tying the node's lifetime to the task control block. Adds in kcb_t: - ready_bitmap: 8-bit bitmap tracking which priority levels have runnable tasks. - ready_queues[]: per-priority ready queues for O(1) task selection. - rr_cursors[]: round-robin cursor per priority level to support fair selection within the same priority. These additions are structural only and prepare the scheduler for O(1) ready-queue operations; they do not change behavior yet.

Previously, list_pushback() and list_remove() were the only list APIs available for data insertion into and removal from the list by malloc a new and free target linkage node. After the new data structure, rq_node, is added as the linkage node for ready queue operation purpose, there is no need to malloc and free each time. This commit adds the insertion and removal list operations without malloc and free on the linkage node. - list_pushback_node(): append an existing node to the end of the list in O(n) time without allocating memory. - list_remove_node(): remove a node from the list without freeing it. Both helper functions are operated in O(n) by linearly searching method and will be applied in the upcoming task dequeuing/enqueuing from/into the ready queue operations.

Previously, `sched_enqueue_task()` only marked task state as TASK_READY to represent the task has been enqueued due to the original scheduler selects the next task based on the global list and all tasks are kept in it. After new data structure, ready_queue[], is added for keeping runnable tasks, the enqueuing task API should push the embedded linkage list node, rq_node, into the corresponding ready_queue. This commit uses list_pushback_node() helper to enqueue the embeded list node of tcb into ready queue and sets up cursor and bitmap of the corresponding priority queue.

Previously, sched_dequeue_task() was a no-op stub, which was sufficient when the scheduler selected tasks from the global list. Since new data structure, ready_queue, is added for keeping all runnable tasks, a dequeue path is required to remove tasks from ready queue to ensure it always holds runnable tasks. This commit adds the dequeue path to sched_dequeue_task(), using list_remove_node() helper to remove the existing linkage node from the corresponding ready queue and update the RR cursor and priority bitmap accordingly.

Previously, task operation APIs such as sched_wakeup_task() only updated the task state, which was sufficient when scheduling relied on the global task list. With the scheduler now selecting runnable tasks from ready_queue[] per priority level, state changes alone are insufficient. To support the new scheduler and to prevent selection of tasks that have already left the runnable set, explicit enqueue and dequeue paths are required when task state transitions cross the runnable boundary: In ready-queue set: {TASK_RUNNING, TASK_READY} Not in ready-queue set: {all other states} This commit updates task operation APIs to include queue insertion and removal logic according to their semantics. In general, queue operations are performed by invoking existing helper functions mo_enqueue_task() and mo_dequeue_task(). The modified APIs include: - sched_wakeup_task(): avoid enqueueing a task that is already running by treating TASK_RUNNING as part of the runnable set complement. - mo_task_cancel(): dequeue TASK_READY tasks from ready_queue[] before cancelling, ensuring removed tasks are not scheduled again. - mo_task_delay(): runnable boundary transition only ("TASK_RUNNING → TASK_BLOCKED"), no queue insertion for non-runnable tasks. - mo_task_suspend(): supports both TASK_RUNNING and TASK_READY ("TASK_RUNNING/TASK_READY → TASK_SUSPENDED"), dequeue before suspend when necessary. - mo_task_resume(): only for suspended tasks ("TASK_SUSPENDED → TASK_READY"), enqueue into ready_queue[] on resume. - _sched_block(): runnable boundary transition only ("TASK_RUNNING → TASK_BLOCKED"), dequeue without memory free.

Currently, mo_mutex_lock() will call mutex_block_atomic() to mark the running task as TASK_BLOCKED so that it won't be selected by the old scheduler. To support the ready queue consistency that always keeps runnable tasks, the dequeuing path should be included when mutex_block_atomic() is called. This commit adds _sched_blocked_dequeue() helper and will be applied in mutex_block_atomic() in the following commit.

Previously, mutex_block_atomic() only marked the running task as TASK_BLOCKED, which was sufficient when scheduling selected tasks by scanning the global task list. Since the new scheduler is designed to select only runnable tasks from ready_queue[], mutex blocking now also requires removing the task’s rq_node from the corresponding ready queue, preventing the scheduler from selecting a blocked (non-runnable/dequeued) task again.

Currently, there is no enqueueing API that can be invoked from other files, especially in mutex and semaphore operations which include task state transition from TASK_BLOCKED to TASK_READY when a held resource is released. This change introduces the _sched_blocked_enqueue() helper, which will be used by mutex/semaphore unblocking paths to insert the task’s existing linkage node into the corresponding per-priority ready queue, keeping scheduler visibility and ready-queue consistency.

This commit replaces unblocking state transitions (TASK_BLOCKED->TASK_READY) in mutex and semaphore paths with the _sched_block_enqueue() helper to ensure scheduler visibility and preserve ready-queue invariants.

Previously, mo_task_priority() only updated the task’s time slice and priority level. With the new scheduler design, tasks are kept in per-priority ready queues, so mo_task_priority() must also handle migrating tasks between these queues. This commit adds dequeue/enqueue logic for tasks in TASK_RUNNING or TASK_READY state, as such tasks must reside in a ready queue and a priority change implies ready-queue migration. The priority fields are still updated as part of the migration path: sched_dequeue_task() relies on the current priority, while the enqueue operation needs the new priority. Therefore, the priority update is performed between the dequeue and enqueue steps. If the priority change happens while the task is running, it must yield immediately to preserve the scheduler’s strict task-ordering policy.

This commit refactors mo_task_spawn() to align with the new O(1) scheduler design. The task control block (tcb_t) embeds its list node during task creation. The enqueue operation is moved inside a critical section to guarantee consistent enqueuing process during task creation. The “first task assignment” logic is removed because first task has been assigned to system idle task as previous commit mentioned.

Previously, the scheduler performed an O(N) scan of the global task list (kcb->tasks) to locate the next TASK_READY task. This resulted in non-deterministic selection latency and unstable round-robin rotation under heavy load or frequent task state transitions. This change introduces a strict O(1) scheduler based on per-priority ready queues and round-robin (RR) cursors. Each priority level maintains its own ready queue and cursor, enabling constant-time selection of the next runnable task while preserving fairness within the same priority.

vicLin8712 mentioned this pull request Nov 19, 2025

O(1) scheduler #36

Open

28 tasks

vicLin8712 requested a review from visitorckw November 19, 2025 07:06

jserv changed the title ~~[3/3] O(1) scheduler: Complete implementation~~ O(1) scheduler: Complete implementation Nov 19, 2025

vicLin8712 force-pushed the o1-sched-lauch branch 2 times, most recently from b18ebac to 0d8c856 Compare November 19, 2025 18:19

vicLin8712 added 12 commits November 29, 2025 10:36

Use _sched_block_enqueue() helper in mutex/semaphore

e6f3bb6

This commit replaces unblocking state transitions (TASK_BLOCKED->TASK_READY) in mutex and semaphore paths with the _sched_block_enqueue() helper to ensure scheduler visibility and preserve ready-queue invariants.

vicLin8712 force-pushed the o1-sched-lauch branch 2 times, most recently from 7e130fd to 164ad8d Compare November 29, 2025 09:24

Add branch ci

77bb7ae

vicLin8712 force-pushed the o1-sched-lauch branch from 164ad8d to 77bb7ae Compare November 29, 2025 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

O(1) scheduler: Complete implementation #38

O(1) scheduler: Complete implementation #38

Uh oh!

vicLin8712 commented Nov 19, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

jserv commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

O(1) scheduler: Complete implementation #38

Are you sure you want to change the base?

O(1) scheduler: Complete implementation #38

Uh oh!

Conversation

vicLin8712 commented Nov 19, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!