Skip to main content
Documents
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Playwright 基础教程

下面给你一份 超全、实战向的 Python Playwright 教程,涵盖安装、基本用法、异步/同步版本、页面操作、等待机制、模拟登录、上传下载文件、爬虫技巧、反爬绕过、截图录屏、并发等内容,一篇搞定 Playwright。

Playwright 是微软出品的新一代浏览器自动化框架,支持:

  • Chromium、Firefox、WebKit 三大浏览器
  • 强大的自动等待机制(不用像 Selenium 那样频繁写 sleep)
  • 原生支持异步(async/await)
  • 强力反爬能力(更接近真实浏览器)

1. 安装 Playwright

pip install playwright

安装浏览器内核:

playwright install

国内如需加速:

playwright install chromium

2. Playwright 的两种使用方式:同步 vs 异步

你可以选择:

1. 同步方式(最常用)

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://example.com")
    print(page.title())
    browser.close()

2. 异步方式(高并发推荐)

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.new_page()
        await page.goto("https://example.com")
        print(await page.title())
        await browser.close()

asyncio.run(main())

3. 浏览器控制:打开模式

无头模式(默认)

browser = p.chromium.launch()

可视化模式

browser = p.chromium.launch(headless=False)

启用开发者工具

browser = p.chromium.launch(headless=False, devtools=True)

4. 页面常用操作

访问网页

page.goto("https://www.baidu.com")

点击按钮

page.click("#submit")

输入文字

page.fill("#input-box", "hello world")

追加输入

page.type("#q", "playwright 教程")

回车

page.keyboard.press("Enter")

获取文本

text = page.text_content(".result")

获取 html

html = page.inner_html("body")

获取多元素

items = page.query_selector_all(".list-item")
for item in items:
    print(item.text_content())

5. 自动等待机制(Playwright 巨大的优势)

Playwright 默认会等待这些情况:

  • 元素出现
  • 元素可见
  • 元素可点击
  • 页面加载完成

无需写 time.sleep()

等待元素

page.wait_for_selector("#login-btn", timeout=5000)

等待跳转

with page.expect_navigation():
    page.click("#go-next")

6. 上传和下载文件

上传

page.set_input_files("#upload", "a.jpg")

可多文件:

page.set_input_files("#upload", ["1.png", "2.png"])

下载文件

with page.expect_download() as dl:
    page.click("#download-btn")

download = dl.value
download.save_as("output.zip")

7. 截图与录屏

全屏截图

page.screenshot(path="full.png", full_page=True)

指定元素截图

page.locator("#main").screenshot(path="main.png")

视频录制

context = browser.new_context(record_video_dir="videos/")
page = context.new_page()
page.goto("https://example.com")
context.close()

8. 多 Tab(多页面)

page1 = browser.new_page()
page2 = browser.new_page()

page1.goto("https://google.com")
page2.goto("https://bing.com")

9. 执行 JS 代码

value = page.evaluate("document.title")
print(value)

传参:

page.evaluate("""(msg) => {
    console.log(msg)
}""", "hello")

10. 模拟登录示例(经典场景)

例如模拟登录百度:

page.goto("https://www.baidu.com")
page.click("#s-top-loginbtn")
page.fill("#TANGRAM__PSP_11__userName", "your_username")
page.fill("#TANGRAM__PSP_11__password", "your_password")
page.click("#TANGRAM__PSP_11__submit")

保存登录状态 cookie(复用登录)

context = browser.new_context(storage_state="auth.json")

复用:

browser.new_context(storage_state="auth.json")

11. Playwright 爬虫技巧(反爬绕过)

设置真实 UA

context = browser.new_context(
    user_agent="Mozilla/5.0 ..."
)

禁止 webdriver 标识

(Playwright 默认隐藏)

禁止加载图片加速爬虫

context = browser.new_context(
    bypass_csp=True,
    permissions=[],
)

控制并发 — 使用 async 多任务

asyncio.gather(task1(), task2(), ...)

12. 拦截网络请求(Mock / 加速 / 过滤)

拦截并修改请求

def handle(route, request):
    if "ads" in request.url:
        return route.abort()
    return route.continue_()

page.route("**/*", handle)

获取接口返回

with page.expect_response("**/api/search") as resp:
    page.click("#search")
response = resp.value
print(response.json())

13. 处理 iframe

frame = page.frame(name="frameName")
frame.click("#btn")

14. 多线程/并发爬虫架构示例

使用异步 + 多页模式:

async def fetch(url, browser):
    page = await browser.new_page()
    await page.goto(url)
    title = await page.title()
    await page.close()
    return title

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        tasks = []
        for u in urls:
            tasks.append(fetch(u, browser))
        print(await asyncio.gather(*tasks))

15. 完整案例:爬取知乎热榜标题

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://www.zhihu.com/hot")

    page.wait_for_selector("section.HotItem")

    items = page.query_selector_all("section.HotItem")
    for i in items:
        title = i.query_selector(".HotItem-title").inner_text()
        print(title)

    browser.close()

16. Playwright vs Selenium(简表)

功能 Playwright Selenium
自动等待 ✅ 强 ❌ 弱
支持多浏览器
同步 + 异步
反爬友好 ✅ 好 ❌ 容易被识别
API 现代化 ❌ 老旧

如果做爬虫 -> Playwright 更强。