Python并发编程实用教程
一、并发编程基础
1. 并发与并行概念
定义对比:
- 并发:交替执行任务(单核)
- 并行:同时执行任务(多核)
并发vs并行示意图
并发: [任务A] <-> [任务B] <-> [任务A] <-> [任务B] (时间切片)
并行: [任务A] → 同时执行 ← [任务B] (多核)
表1 Python并发编程方式对比
方式 | 模块 | 适用场景 | 特点 |
多线程 | threading | I/O密集型 | 共享内存,GIL限制 |
多进程 | multiprocessing | CPU密集型 | 独立内存,开销大 |
协程 | asyncio | 高并发I/O | 单线程异步,高效 |
二、多线程编程
1. Thread类基础
语法定义:
from threading import Thread
t = Thread(target=函数, args=(参数,))
t.start()
t.join()
应用示例:
import time
from threading import Thread
def download_file(url):
print(f"开始下载 {url}")
time.sleep(2) # 模拟I/O操作
print(f"完成下载 {url}")
# 创建并启动线程
threads = []
for i in range(3):
t = Thread(target=download_file, args=(f"https://example.com/file{i}.zip",))
threads.append(t)
t.start()
# 等待所有线程完成
for t in threads:
t.join()
print("所有下载任务完成")
注意事项:
- 线程适合I/O密集型任务
- 受GIL限制,不适合CPU密集型任务
- 注意线程安全问题
三、多进程编程
1. Process类使用
语法定义:
from multiprocessing import Process
p = Process(target=函数, args=(参数,))
p.start()
p.join()
应用示例:
import math
from multiprocessing import Process
def calculate_factorial(n):
print(f"计算 {n} 的阶乘")
result = math.factorial(n)
print(f"{n}! = {result}")
if __name__ == '__main__':
numbers = [1000, 2000, 3000]
processes = []
for num in numbers:
p = Process(target=calculate_factorial, args=(num,))
processes.append(p)
p.start()
for p in processes:
p.join()
print("所有计算完成")
多进程内存模型
进程A ── 独立内存空间
进程B ── 独立内存空间
进程C ── 独立内存空间
四、异步编程(asyncio)
1. 协程基础
语法定义:
import asyncio
async def 协程函数():
await 异步操作
asyncio.run(协程函数())
应用示例:
import asyncio
async def fetch_data(url):
print(f"开始获取 {url}")
await asyncio.sleep(2) # 模拟I/O等待
print(f"完成获取 {url}")
return f"{url} 的数据"
async def main():
tasks = [
fetch_data("https://api.com/data1"),
fetch_data("https://api.com/data2"),
fetch_data("https://api.com/data3")
]
results = await asyncio.gather(*tasks)
print("所有结果:", results)
asyncio.run(main())
表2 同步vs异步I/O对比
特性 | 同步I/O | 异步I/O |
线程使用 | 阻塞线程 | 单线程处理 |
性能 | 低(串行) | 高(并发) |
复杂度 | 简单 | 需要async/await |
适用场景 | 简单逻辑 | 高并发网络请求 |
五、线程/进程池
1. ThreadPoolExecutor
语法定义:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as executor:
future = executor.submit(函数, 参数)
result = future.result()
应用示例:
from concurrent.futures import ThreadPoolExecutor
import urllib.request
def download(url):
with urllib.request.urlopen(url) as response:
return f"{url}: {len(response.read())} bytes"
urls = [
"https://www.python.org",
"https://www.google.com",
"https://www.github.com"
]
with ThreadPoolExecutor(max_workers=3) as executor:
futures = [executor.submit(download, url) for url in urls]
for future in futures:
print(future.result())
2. ProcessPoolExecutor
语法定义:
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as executor:
future = executor.submit(函数, 参数)
result = future.result()
应用示例:
from concurrent.futures import ProcessPoolExecutor
import math
def compute_factorial(n):
return math.factorial(n)
numbers = [1000, 2000, 3000, 4000]
with ProcessPoolExecutor() as executor:
results = executor.map(compute_factorial, numbers)
for num, result in zip(numbers, results):
print(f"{num}! 的计算完成")
六、共享数据与同步
1. 线程安全操作
表3 线程同步原语
工具 | 用途 | 示例 |
Lock | 互斥锁 | with lock: |
RLock | 可重入锁 | with rlock: |
Semaphore | 信号量 | semaphore.acquire() |
Queue | 线程安全队列 | queue.put/get() |
应用示例:
from threading import Thread, Lock
import time
class BankAccount:
def __init__(self):
self.balance = 100
self.lock = Lock()
def deposit(self, amount):
with self.lock:
new_balance = self.balance + amount
time.sleep(0.1) # 模拟处理延迟
self.balance = new_balance
account = BankAccount()
threads = []
for _ in range(10):
t = Thread(target=account.deposit, args=(10,))
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"最终余额: {account.balance}") # 正确结果200
七、应用案例
1. 并发Web爬虫示例
import asyncio
import aiohttp
from concurrent.futures import ThreadPoolExecutor
async def fetch_url(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
urls = [
"https://www.python.org",
"https://www.google.com",
"https://www.github.com"
]
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
for url, content in zip(urls, results):
print(f"{url}: {len(content)} bytes")
asyncio.run(main())
2. 并行数据处理示例
from multiprocessing import Pool
import pandas as pd
def process_chunk(chunk):
# 模拟耗时数据处理
return chunk.describe()
if __name__ == '__main__':
data = pd.DataFrame({'value': range(1000000)})
chunks = [data[i:i+100000] for i in range(0, len(data), 100000)]
with Pool(4) as pool:
results = pool.map(process_chunk, chunks)
final_result = pd.concat(results)
print(final_result)
八、并发编程建议
- I/O密集型:使用线程或异步
- CPU密集型:使用多进程
- 避免共享状态,使用消息传递
- 合理设置工作线程/进程数量
- 使用连接池管理资源
表4 并发问题
问题 | 解决方案 |
竞态条件 | 使用Lock/RLock |
死锁 | 避免嵌套锁,设置超时 |
资源耗尽 | 使用连接池,限制并发数 |
GIL限制 | 使用多进程或C扩展 |
总结
核心知识点:
- 多线程适合I/O密集型任务(threading)
- 多进程适合CPU密集型任务(multiprocessing)
- 协程实现高效I/O并发(asyncio)
- 线程/进程池简化资源管理(concurrent.futures)
选择指南:
I/O密集型 → 多线程/协程
CPU密集型 → 多进程
高并发网络 → 异步编程
批量计算 → 进程池
Python并发编程决策树
开始 → CPU密集型? → 是 → multiprocessing
↓否
I/O高并发? → 是 → asyncio
↓否
threading/concurrent.futures
持续更新Python编程学习日志与技巧,敬请关注!