Playwright 基础教程
下面给你一份 超全、实战向的 Python Playwright 教程,涵盖安装、基本用法、异步/同步版本、页面操作、等待机制、模拟登录、上传下载文件、爬虫技巧、反爬绕过、截图录屏、并发等内容,一篇搞定 Playwright。
Playwright 是微软出品的新一代浏览器自动化框架,支持:
- Chromium、Firefox、WebKit 三大浏览器
- 强大的自动等待机制(不用像 Selenium 那样频繁写 sleep)
- 原生支持异步(async/await)
- 强力反爬能力(更接近真实浏览器)
pip install playwright
安装浏览器内核:
playwright install
国内如需加速:
playwright install chromium
你可以选择:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://example.com")
print(page.title())
browser.close()
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
page = await browser.new_page()
await page.goto("https://example.com")
print(await page.title())
await browser.close()
asyncio.run(main())
无头模式(默认)
browser = p.chromium.launch()
可视化模式
browser = p.chromium.launch(headless=False)
启用开发者工具
browser = p.chromium.launch(headless=False, devtools=True)
访问网页
page.goto("https://www.baidu.com")
点击按钮
page.click("#submit")
输入文字
page.fill("#input-box", "hello world")
追加输入
page.type("#q", "playwright 教程")
回车
page.keyboard.press("Enter")
获取文本
text = page.text_content(".result")
获取 html
html = page.inner_html("body")
获取多元素
items = page.query_selector_all(".list-item")
for item in items:
print(item.text_content())
Playwright 默认会等待这些情况:
- 元素出现
- 元素可见
- 元素可点击
- 页面加载完成
无需写 time.sleep()。
等待元素
page.wait_for_selector("#login-btn", timeout=5000)
等待跳转
with page.expect_navigation():
page.click("#go-next")
上传
page.set_input_files("#upload", "a.jpg")
可多文件:
page.set_input_files("#upload", ["1.png", "2.png"])
下载文件
with page.expect_download() as dl:
page.click("#download-btn")
download = dl.value
download.save_as("output.zip")
全屏截图
page.screenshot(path="full.png", full_page=True)
指定元素截图
page.locator("#main").screenshot(path="main.png")
视频录制
context = browser.new_context(record_video_dir="videos/")
page = context.new_page()
page.goto("https://example.com")
context.close()
page1 = browser.new_page()
page2 = browser.new_page()
page1.goto("https://google.com")
page2.goto("https://bing.com")
value = page.evaluate("document.title")
print(value)
传参:
page.evaluate("""(msg) => {
console.log(msg)
}""", "hello")
例如模拟登录百度:
page.goto("https://www.baidu.com")
page.click("#s-top-loginbtn")
page.fill("#TANGRAM__PSP_11__userName", "your_username")
page.fill("#TANGRAM__PSP_11__password", "your_password")
page.click("#TANGRAM__PSP_11__submit")
保存登录状态 cookie(复用登录)
context = browser.new_context(storage_state="auth.json")
复用:
browser.new_context(storage_state="auth.json")
设置真实 UA
context = browser.new_context(
user_agent="Mozilla/5.0 ..."
)
禁止 webdriver 标识
(Playwright 默认隐藏)
禁止加载图片加速爬虫
context = browser.new_context(
bypass_csp=True,
permissions=[],
)
控制并发 — 使用 async 多任务
asyncio.gather(task1(), task2(), ...)
拦截并修改请求
def handle(route, request):
if "ads" in request.url:
return route.abort()
return route.continue_()
page.route("**/*", handle)
获取接口返回
with page.expect_response("**/api/search") as resp:
page.click("#search")
response = resp.value
print(response.json())
frame = page.frame(name="frameName")
frame.click("#btn")
使用异步 + 多页模式:
async def fetch(url, browser):
page = await browser.new_page()
await page.goto(url)
title = await page.title()
await page.close()
return title
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch()
tasks = []
for u in urls:
tasks.append(fetch(u, browser))
print(await asyncio.gather(*tasks))
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://www.zhihu.com/hot")
page.wait_for_selector("section.HotItem")
items = page.query_selector_all("section.HotItem")
for i in items:
title = i.query_selector(".HotItem-title").inner_text()
print(title)
browser.close()
| 功能 | Playwright | Selenium |
|---|---|---|
| 自动等待 | ✅ 强 | ❌ 弱 |
| 支持多浏览器 | ✅ | ✅ |
| 同步 + 异步 | ✅ | ❌ |
| 反爬友好 | ✅ 好 | ❌ 容易被识别 |
| API 现代化 | ✅ | ❌ 老旧 |
如果做爬虫 -> Playwright 更强。