Completely Fair Scheduler: Linux CFS Algorithm Deep Dive

August 27, 2025

The Linux Completely Fair Scheduler (CFS) revolutionized process scheduling in the Linux kernel when it was introduced in version 2.6.23. Unlike traditional schedulers that rely on complex priority calculations and time slice allocations, CFS implements an elegant approach based on the concept of “fairness” – ensuring each process receives its fair share of CPU time.

What is the Completely Fair Scheduler?

The Completely Fair Scheduler is the default process scheduler for normal (non-real-time) tasks in Linux. It was designed by Ingo Molnár to address the limitations of the previous O(1) scheduler, providing better interactive performance and fairness across all processes.

Key principles of CFS:

  • Every process should receive equal CPU time over the long term
  • Interactive processes should respond quickly to user input
  • Scheduling decisions should be made efficiently
  • The scheduler should scale well with increasing number of processes

Core Concepts of CFS

Virtual Runtime (vruntime)

The most fundamental concept in CFS is virtual runtime. Each process tracks how much CPU time it has consumed, adjusted by its priority. The process with the lowest vruntime is considered to have received the least amount of CPU time and should run next.

// Simplified vruntime calculation
vruntime += delta_exec * (NICE_0_LOAD / se->load.weight);

Where:

  • delta_exec: Actual time the process ran
  • NICE_0_LOAD: Load weight for nice level 0
  • se->load.weight: Process weight based on nice value

Completely Fair Scheduler: Linux CFS Algorithm Deep Dive

Red-Black Tree Data Structure

CFS uses a red-black tree (a self-balancing binary search tree) to maintain runnable processes. Processes are ordered by their vruntime values, with the leftmost node always containing the process that should run next.

Red-black tree properties:

  • O(log n) insertion and deletion
  • Self-balancing ensures consistent performance
  • Leftmost node access in O(1)
  • Maintains fairness by vruntime ordering

Completely Fair Scheduler: Linux CFS Algorithm Deep Dive

Time Slices and Latency

Unlike traditional schedulers with fixed time slices, CFS dynamically calculates time slices based on the target latency and number of running processes.

# CFS tuning parameters (in /proc/sys/kernel/)
sched_latency_ns=6000000        # Target latency (6ms)
sched_min_granularity_ns=750000 # Minimum time slice (0.75ms)
sched_wakeup_granularity_ns=1000000 # Wakeup preemption threshold

Time slice calculation:

time_slice = sched_latency * (se->load.weight / cfs_rq->load.weight);
if (time_slice < sched_min_granularity)
    time_slice = sched_min_granularity;

CFS Algorithm Workflow

Completely Fair Scheduler: Linux CFS Algorithm Deep Dive

Process Selection Algorithm

The heart of CFS lies in its simple yet effective process selection:

static struct sched_entity *pick_next_entity(struct cfs_rq *cfs_rq)
{
    struct sched_entity *se = __pick_first_entity(cfs_rq);
    if (se) {
        // Update statistics
        update_stats_wait_end(cfs_rq, se);
        // Remove from tree (will be re-inserted when preempted)
        __dequeue_entity(cfs_rq, se);
    }
    return se;
}

Practical Examples

Example 1: Basic CFS Behavior

Let’s trace how CFS handles three processes with different nice values:

# Create test processes with different priorities
nice -n 0 ./cpu_intensive_task &   # Normal priority
nice -n 5 ./cpu_intensive_task &   # Lower priority  
nice -n -5 ./cpu_intensive_task &  # Higher priority

Process weights (based on nice values):

  • Nice -5: weight = 3121 (higher weight = more CPU time)
  • Nice 0: weight = 1024 (baseline)
  • Nice 5: weight = 335 (lower weight = less CPU time)
Time Process (Nice) vruntime Before Time Slice vruntime After
0ms Process A (-5) 0 4.1ms 1.34ms
4.1ms Process B (0) 0 2.4ms 2.4ms
6.5ms Process C (5) 0 1.2ms 3.66ms
7.7ms Process A (-5) 1.34ms 4.1ms 2.68ms

Example 2: Interactive Process Handling

CFS provides excellent responsiveness for interactive processes through sleeper fairness:

// When a process wakes up from sleep
static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
    u64 vruntime = cfs_rq->min_vruntime;
    
    // Sleeper fairness: don't penalize processes that sleep
    if (se->sleep_start_fair) {
        vruntime -= sched_latency;
    }
    
    se->vruntime = max(se->vruntime, vruntime);
}

Example 3: Monitoring CFS in Action

You can observe CFS behavior using various tools:

# View scheduler statistics
cat /proc/sched_debug

# Monitor process scheduling with perf
perf sched record -a sleep 10
perf sched latency

# Check individual process statistics
cat /proc/PID/sched

Sample output from /proc/PID/sched:

se.exec_start                                :      12345.678901
se.vruntime                                  :           123.456
se.sum_exec_runtime                          :          1234.567
nr_switches                                  :               145
nr_voluntary_switches                        :               120
nr_involuntary_switches                      :                25

CFS vs Other Scheduling Algorithms

Completely Fair Scheduler: Linux CFS Algorithm Deep Dive

Algorithm Time Complexity Fairness Interactive Response Scalability
Round Robin O(1) Poor Poor Good
Priority Queue O(1) Poor Good Good
CFS O(log n) Excellent Excellent Very Good

Advanced CFS Features

Group Scheduling

CFS supports hierarchical scheduling through control groups (cgroups), allowing fair scheduling between groups of processes:

# Create CPU cgroup
mkdir /sys/fs/cgroup/cpu/high_priority
echo 2048 > /sys/fs/cgroup/cpu/high_priority/cpu.shares

# Assign process to group
echo $PID > /sys/fs/cgroup/cpu/high_priority/cgroup.procs

Load Balancing

On multi-core systems, CFS performs load balancing to distribute processes across CPU cores:

// Simplified load balancing trigger
if (this_rq->nr_running > 1 && idle_balance(this_rq)) {
    // Move processes from busy to idle cores
    pull_task(busiest_rq, this_rq, target_task);
}

Tuning CFS Performance

Key Parameters

# Target latency - lower values improve responsiveness
echo 4000000 > /proc/sys/kernel/sched_latency_ns

# Minimum granularity - prevents thrashing
echo 500000 > /proc/sys/kernel/sched_min_granularity_ns  

# Migration cost - affects load balancing
echo 250000 > /proc/sys/kernel/sched_migration_cost_ns

Workload-Specific Tuning

Interactive workloads:

  • Decrease sched_latency_ns for better responsiveness
  • Increase sched_wakeup_granularity_ns for smoother experience

Batch workloads:

  • Increase sched_latency_ns to reduce context switching overhead
  • Decrease sched_min_granularity_ns for better throughput

Common CFS Issues and Solutions

High Context Switch Rate

Problem: Too many processes competing for CPU

Solution: Increase minimum granularity or use CPU affinity

# Check context switch rate
vmstat 1

# Set CPU affinity to reduce migrations
taskset -c 0,1 ./my_process

Poor Interactive Performance

Problem: Batch processes affecting interactive response

Solution: Use nice values or control groups

# Lower priority for batch jobs
nice -n 19 ./batch_job

# Higher priority for interactive apps
nice -n -10 ./interactive_app

CFS Implementation Details

Virtual Runtime Calculation

The precise vruntime calculation accounts for different process weights:

static void update_curr(struct cfs_rq *cfs_rq)
{
    struct sched_entity *curr = cfs_rq->curr;
    u64 delta_exec = rq_clock_task(rq_of(cfs_rq)) - curr->exec_start;
    
    if (unlikely((s64)delta_exec <= 0))
        return;
        
    curr->exec_start = rq_clock_task(rq_of(cfs_rq));
    curr->sum_exec_runtime += delta_exec;
    curr->vruntime += calc_delta_fair(delta_exec, curr);
    update_min_vruntime(cfs_rq);
}

Tree Operations

Efficient red-black tree operations maintain O(log n) complexity:

// Enqueue entity (process) into the tree
static void enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
    update_curr(cfs_rq);
    place_entity(cfs_rq, se, 0);
    account_entity_enqueue(cfs_rq, se);
    __enqueue_entity(cfs_rq, se);
    se->on_rq = 1;
}

The Completely Fair Scheduler represents a significant advancement in process scheduling, providing excellent fairness and interactive performance while maintaining good scalability. Its elegant design based on virtual runtime and red-black trees has made it the foundation of Linux process scheduling for over a decade, continuously evolving to meet the demands of modern computing workloads.

Understanding CFS helps system administrators optimize Linux systems and developers write more efficient applications that work harmoniously with the kernel’s scheduling decisions.