Python Concurrency

Photo by Agathe on Unsplash
Photo by Agathe on Unsplash
In Python, Process, Thread, and Async are three different concurrency mechanisms, but their behavior and applicable scenarios are often confused. In particular, when using FastAPI, a lack of understanding of its execution model can easily lead to writing code that blocks the system. This article starts from these fundamental concepts and explains how Python behaves in concurrent and practical applications.

In Python, Process, Thread, and Async are three different concurrency mechanisms, but their behavior and applicable scenarios are often confused. In particular, when using FastAPI, a lack of understanding of its execution model can easily lead to writing code that blocks the system. This article starts from these fundamental concepts and explains how Python behaves in concurrent and practical applications.

Process

A process is an OS-level execution unit. When a program starts, the system creates a process and allocates an independent memory space and execution environment. Each process contains a Python interpreter and at least one thread (the main thread), and is scheduled by the operating system.

A key characteristic of processes is memory isolation. Different processes cannot directly share data, so they are completely independent in execution. This improves system stability, but also means that data must be exchanged through explicit mechanisms such as queues or pipes, rather than direct variable access.

In Python, new processes can be created using the multiprocessing module. When a new process is created, it runs independently from the main program and has its own PID, indicating that it is a separate execution unit.

from multiprocessing import Process
import os

def worker():
    print(f"Worker PID: {os.getpid()}")

if __name__ == "__main__":
    print(f"Main PID: {os.getpid()}")

    p = Process(target=worker)
    p.start()
    p.join()


# Output
Main PID: 1633
Worker PID: 1641

Since each process has its own Python interpreter, it also has its own GIL. This means multiple processes can run Python code simultaneously on different CPU cores without interfering with each other. This makes processes particularly suitable for CPU-bound tasks. In a multi-core environment, these processes can execute in parallel and achieve true parallelism.

from multiprocessing import Process

def compute(seq):
    for i in range(10**7):
        print(f"Process {seq} is running, iteration {i}")

if __name__ == "__main__":
    processes = []

    for seq in range(4):
        p = Process(target=compute, args=(seq,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()


# Output
Process 3 is running, iteration 308566Process 2 is running, iteration 307463
Process 2 is running, iteration 307464

Process 3 is running, iteration 308567
Process 2 is running, iteration 307465Process 1 is running, iteration 309200

Process 2 is running, iteration 307466Process 1 is running, iteration 309201


Process 1 is running, iteration 309202Process 0 is running, iteration 306115
Process 3 is running, iteration 308568

Process 0 is running, iteration 306116Process 3 is running, iteration 308569

...

When data needs to be transferred between processes, IPC (inter-process communication) mechanisms such as queues must be used. While this approach is safe, it introduces additional overhead due to data transfer and serialization.

from multiprocessing import Process, Queue

def worker(q):
    q.put("hello")

if __name__ == "__main__":
    q = Queue()

    p = Process(target=worker, args=(q,))
    p.start()

    print(q.get())
    p.join()


# Output
hello

Overall, processes are suitable for independent and computation-intensive tasks, especially when leveraging multiple CPU cores. However, due to their higher creation cost and less convenient data sharing, they are typically used for coarse-grained workloads rather than frequent, fine-grained operations.

Thread and GIL

A thread is an execution unit within a process. Threads within the same process share memory and resources, allowing direct access to shared data structures and variables. This makes data exchange straightforward, but also requires careful handling of synchronization to avoid issues such as race conditions.

In Python, each process starts with a main thread, and additional threads can be created using threading.Thread(). These threads are scheduled by the operating system and can be assigned to different CPU cores at the system level.

In practice, threads can run concurrently, especially when waiting for operations such as sleep(), where execution is yielded and other threads can run.

import threading
import time

def worker(name):
    print(f"{name} start")
    time.sleep(2)
    print(f"{name} end")

t1 = threading.Thread(target=worker, args=("A",))
t2 = threading.Thread(target=worker, args=("B",))

t1.start()
t2.start()

t1.join()
t2.join()

# Output
A start
B start
B end
A end

In CPython, however, thread behavior is affected by the GIL (Global Interpreter Lock). The GIL ensures that only one thread can execute Python bytecode at a time. This limitation is not imposed by the CPU, but by the design of the Python interpreter.

As a result, for CPU-bound tasks written in pure Python, multiple threads cannot execute simultaneously. Instead, they take turns acquiring the GIL and executing. Even on a multi-core CPU, threads cannot achieve true parallel execution under this constraint.

In the following code, even if multiple threads are created, the overall execution time usually does not decrease significantly, because the computation is still constrained by the GIL.

import threading

def compute(seq):
    for i in range(10**7):
        print(f"Thread {seq} is running, iteration {i}")

threads = []
for seq in range(2):
    t = threading.Thread(target=compute, args=(seq,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()


# Output
Thread 0 is running, iteration 590839
Thread 0 is running, iteration 590840
Thread 0 is running, iteration 590841
Thread 0 is running, iteration 590842
Thread 0 is running, iteration 590843
Thread 0 is running, iteration 590844
Thread 0 is running, iteration 590845
Thread 0 is running, iteration 590846
Thread 0 is running, iteration 590847
Thread 0 is running, iteration 590848
Thread 1 is running, iteration 618463
Thread 1 is running, iteration 618464

Thread 0 is running, iteration 590849
Thread 0 is running, iteration 590850
Thread 0 is running, iteration 590851
Thread 0 is running, iteration 590852
Thread 0 is running, iteration 590853
Thread 0 is running, iteration 590854
Thread 0 is running, iteration 590855
Thread 0 is running, iteration 590856
Thread 0 is running, iteration 590857Thread 1 is running, iteration 618465
Thread 1 is running, iteration 618466
Thread 1 is running, iteration 618467

However, the GIL does not completely prevent concurrency. During I/O operations such as file access or network requests, Python releases the GIL, allowing other threads to continue execution. This makes threads effective for I/O-bound workloads, where execution can overlap during waiting periods.

Because threads share memory, synchronization mechanisms such as locks are required when multiple threads modify shared data. Without proper synchronization, the result may be incorrect due to concurrent modifications.

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100000):
        with lock:
            counter += 1

threads = []
for _ in range(2):
    t = threading.Thread(target=increment)
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(counter)


# Output
200000

Overall, threads are suitable for I/O-bound tasks that require shared state. They are lightweight and efficient for concurrency, but cannot provide true parallelism for CPU-bound workloads. The GIL is the key factor that determines how threads behave under different types of tasks.

Async

Async is an event-driven concurrency model designed to efficiently handle a large number of tasks within a single thread. Unlike threads, which rely on the OS for scheduling, async tasks are scheduled cooperatively by the program itself. This model is commonly referred to as cooperative multitasking.

In Python, async def defines a coroutine. Calling a coroutine does not execute it immediately; instead, it returns a coroutine object. Execution only begins when the coroutine is awaited or registered with the event loop.

The following example shows that directly calling a coroutine has no effect; it must be awaited to actually execute.

import asyncio

async def run():
    print("running")

async def main():
    run()  # not execute
    await run()  # execute

asyncio.run(main())

An event loop is the core component of async and can be viewed as a task scheduler. When a coroutine reaches an await statement, it yields control back to the event loop, which then switches to another ready task. Once the awaited operation completes, the event loop resumes the original coroutine.

Async concurrency is achieved through this cooperative switching mechanism. Multiple tasks can be interleaved within a single thread, allowing efficient handling of I/O-bound workloads.

import asyncio

async def worker(name):
    print(f"{name} start")
    await asyncio.sleep(2)
    print(f"{name} end")

async def main():
    await asyncio.gather(
        worker("A"),
        worker("B")
    )

asyncio.run(main())


# Output
A start
B start
A end
B end

In addition to await, another common operation is asyncio.create_task(). This function wraps a coroutine into a task and schedules it on the event loop without waiting for its result.

In this example, worker() is scheduled for execution, but main() does not wait for it to complete. It is useful for background execution, but without proper management, tasks may be lost or errors may go unnoticed.

import asyncio

async def worker():
    print("start")
    await asyncio.sleep(2)
    print("end")

async def main():
    asyncio.create_task(worker())
    print("main done")

asyncio.run(main())

Async is particularly effective for I/O-bound scenarios such as handling large numbers of HTTP requests or database queries. However, it is highly sensitive to blocking operations. If a blocking operation is executed within a coroutine, the entire event loop is blocked, and all tasks stop progressing.

This kind of code blocks the entire event loop, eliminating the advantages of async.

async def bad():
    for _ in range(10**8):
        pass  # blocking CPU work

Because async runs in a single thread, it does not provide parallelism. All tasks still execute in the same thread and only achieve concurrency through task switching. For true parallel execution, processes or other mechanisms are required.

Overall, async is well-suited for high-concurrency, I/O-heavy systems such as web servers or API services. Its efficiency depends on keeping all operations non-blocking and maintaining control over task execution and lifecycle.

Running Async in a Thread

In some cases, it is necessary to run async code inside a separate thread. This is not because async is insufficient, but because the system may already be structured around threads, or certain logic needs to be isolated in a different execution context.

To make this work, it is important to understand that an event loop is bound to a thread. Each thread must have its own event loop in order to execute coroutines. This means a new thread cannot directly use an existing event loop and must create its own.

The simplest approach is to call asyncio.run() inside the thread. This function creates an event loop, runs the coroutine, and closes the loop automatically. This is suitable for one-time execution.

In the following example, the new thread creates its own event loop and executes async_worker() within it. The entire async lifecycle is confined to that thread, and they do not interfere with each other.

import threading
import asyncio

async def async_worker():
    print("async start")
    await asyncio.sleep(1)
    print("async end")

def thread_worker():
    asyncio.run(async_worker())

t = threading.Thread(target=thread_worker)
t.start()
t.join()

If more fine-grained control over the event loop is required, such as reusing it within the same thread, it can be created manually. In this case, a new event loop must first be created and set as the default loop for the current thread, after which the coroutine can be executed.

import threading
import asyncio

async def async_worker():
    print("async start")
    await asyncio.sleep(1)
    print("async end")

def thread_worker():
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)

    loop.run_until_complete(async_worker())
    loop.close()

t = threading.Thread(target=thread_worker)
t.start()
t.join()

This approach makes the creation and destruction of the event loop more explicit, and it also makes it easier to schedule multiple async tasks within the same thread.

A key constraint is that event loops cannot be shared across threads. Each loop must run in the thread where it was created. Violating this assumption can lead to undefined behavior.

In this model, the thread acts as an execution container, while the event loop manages async tasks within that container. This combination is useful when isolation is required while still benefiting from async concurrency.

Handling Blocking Operations

In concurrent systems, a key concern is how to handle blocking operations. A blocking operation is one that occupies a thread for a significant amount of time and does not yield control, such as file I/O, network requests, database queries, or long-running computations.

In the thread model, blocking operations are acceptable. When a thread is blocked on I/O, the OS can schedule other threads to run, allowing the system to maintain concurrency. This is why threads are suitable for I/O-bound workloads.

In this example, the file read is a blocking operation, but it only blocks that thread and does not affect other threads.

import threading

def read_file():
    with open("data.txt", "r") as f:
        return f.read()

t = threading.Thread(target=read_file)
t.start()
t.join()

In the async model, the situation is different. Since async typically runs in a single thread, all tasks are managed by a single event loop. If a blocking operation is executed within an async function, the entire event loop is blocked, preventing all other tasks from running.

This code blocks the entire event loop, so this pattern should be avoided in an async environment.

async def read_file():
    with open("data.txt", "r") as f:
        return f.read()  # blocking operation

When it is necessary to use an existing blocking function, a common approach is to offload it to a thread pool for execution. This allows the event loop to remain non-blocking while still using the original synchronous code.

In this example, blocking_read() is executed in a thread pool instead of blocking the event loop.

import asyncio

def blocking_read():
    with open("data.txt", "r") as f:
        return f.read()

async def main():
    loop = asyncio.get_running_loop()
    data = await loop.run_in_executor(None, blocking_read)
    print(data)

asyncio.run(main())

FastAPI and Sync Endpoints

In FastAPI, when an endpoint is defined using a normal def, the function is not executed directly in the event loop. Instead, FastAPI (via Starlette) delegates the sync function to a thread pool to avoid blocking the main event loop. The overall request lifecycle is still driven by Uvicorn, but the actual business logic runs in a separate thread.

This means that each synchronous endpoint call effectively consumes one worker thread from the thread pool. If the function includes blocking operations, such as file I/O or network requests, those operations only block that specific thread and do not affect the handling of other requests.

In this example, time.sleep() is a blocking operation, but since the entire function runs in the thread pool, it does not block the event loop. However, this does not mean that long-running tasks should be executed in this context.

from fastapi import FastAPI
import time

app = FastAPI()

@app.post("/job")
def create_job():
    time.sleep(5)  # blocking operation
    return {"status": "done"}

If a request requires several minutes to complete, the corresponding thread will be occupied for a long time. When multiple such requests arrive, the thread pool can quickly become exhausted, preventing new requests from being processed and reducing overall throughput. In addition, HTTP requests typically have timeout limits, so long-running tasks are likely to be interrupted before completion.

Therefore, even in synchronous endpoints, long-running tasks should not be executed directly. A more appropriate approach is for the endpoint to only create a job and enqueue it, while a separate worker process handles the actual execution. This design prevents threads from being held for extended periods and allows the API to respond quickly.

from fastapi import FastAPI
from uuid import uuid4

app = FastAPI()
queue = []

@app.post("/job")
def create_job():
    job_id = str(uuid4())
    queue.append(job_id)
    return {"job_id": job_id}

FastAPI and Async Endpoints

When an endpoint is defined using async def, the function runs directly on the event loop instead of being delegated to a thread pool. This means the function must be non-blocking; otherwise, it will affect the concurrency of the entire system.

In an async endpoint, all tasks share the same event loop. When a coroutine reaches an await, it yields control so that other tasks can continue executing. However, if a blocking operation occurs during execution, the entire event loop is blocked, causing all requests to stall.

In this example, although the code is written using async syntax, it is still effectively synchronous and blocking. When time.sleep() is executed, the event loop cannot process other requests, which severely degrades system performance.

from fastapi import FastAPI
import time

app = FastAPI()

@app.post("/job")
async def create_job():
    time.sleep(5)  # blocking operation,block event loop
    return {"status": "done"}

Even when using asyncio.create_task() to run a task in the background, it is not suitable for handling long-running or critical workloads. The issue is that the task’s lifecycle is still tied to the FastAPI process. Once the service restarts, these tasks are lost, and there is no state tracking or retry mechanism.

import asyncio
from fastapi import FastAPI

app = FastAPI()

async def long_task():
    await asyncio.sleep(10)

@app.post("/job")
async def create_job():
    asyncio.create_task(long_task())
    return {"status": "started"}

Therefore, in async endpoints, the principle for handling long-running tasks is the same as in synchronous endpoints: they should not be executed directly, but instead delegated to separate worker processes. This design ensures that the event loop remains lightweight and non-blocking.

Common APIs

This section summarizes commonly used APIs.

Process (multiprocessing)

  • multiprocessing.Process(): Create a new process.
  • Process.start(): Start execution.
  • Process.join(): Wait for completion.

Thread (threading)

  • threading.Thread(): Create a new thread.
  • Thread.start(): Start execution.
  • Thread.join(): Wait for completion.
  • threading.Lock(): Synchronize access to shared data.

Async (asyncio)

  • asyncio.run(): Create and run an event loop.
  • asyncio.create_task(): Schedule a coroutine without waiting.
  • asyncio.gather(): Wait for multiple coroutines.
  • asyncio.get_running_loop(): Get the current event loop.
  • loop.run_in_executor(): Offload blocking work to a thread pool.

Thread + Async

  • asyncio.run(): Create an event loop in a thread.
  • asyncio.new_event_loop(): Create a new event loop.
  • asyncio.set_event_loop(): Bind a loop to the current thread.

Conclusion

Process, Thread, and Async represent different levels of concurrency, each suited for different types of problems. The GIL is not simply a performance limitation, but a key factor that determines how threads behave. When using FastAPI, it is important to understand how the event loop works and avoid running long tasks within the request lifecycle. By delegating work to external workers and properly controlling concurrency, it is possible to build a stable and scalable system.

Leave a Reply

Your email address will not be published. Required fields are marked *