一、安装
Pip
pip install playwright
Conda
下载Playwright软件包,并为Chromium、Firefox和WebKit安装浏览器二进制文件。
安装命令:
python -m playwright install
二、使用
安装后,您可以在Python脚本中导入Playwright,并启动三种浏览器(chromium、firefox和webkit)中的任意一种。
from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch() page = browser.new_page() page.goto("http://playwright.dev") print(page.title()) browser.close()
Playwright支持API的两种形式:同步和异步。如果项目使用asyncio(https://docs.python.org/3/library/asyncio.html),则应使用async API:
import asyncio from playwright.async_api import async_playwright async def main(): async with async_playwright() as p: browser = await p.chromium.launch() page = await browser.new_page() await page.goto("http://playwright.dev") print(await page.title()) await browser.close() asyncio.run(main())
三、第一个脚本
在我们的第一个脚本中,我们将使用WebKit方式跳转到whatsmyuseragent.org,然后截图。
示例:
from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.webkit.launch() page = browser.new_page() page.goto("http://whatsmyuseragent.org/") page.screenshot(path="example.png") browser.close()
默认情况下,以无头模式运行浏览器。要查看浏览器UI,请在启动浏览器时传递headless=False标志,你也可以使用slow_mo来降低执行速度。在调试工具部分了解更多信息。
firefox.launch(headless=False, slow_mo=50)
四、录制脚本
命令行工具可用于记录用户交互和生成Python代码。
python -m playwright codegen --target python -o open_baidu.py -b chromium https://www.baidu.com
五、交互模式
>>> from playwright.sync_api import sync_playwright >>> playwright = sync_playwright().start() # Use playwright.chromium, playwright.firefox or playwright.webkit # Pass headless=False to launch() to see the browser UI >>> browser = playwright.chromium.launch() >>> page = browser.new_page() >>> page.goto("http://whatsmyuseragent.org/") >>> page.screenshot(path="example.png") >>> browser.close() >>> playwright.stop()
>>> from playwright.async_api import async_playwright >>> playwright = await async_playwright().start() >>> browser = await playwright.chromium.launch() >>> page = await browser.new_page() >>> await page.goto("http://whatsmyuseragent.org/") >>> await page.screenshot(path="example.png") >>> await browser.close() >>> await playwright.stop()
六、Pyinstaller
您可以使用Playwright和Pyinstaller来创建独立的可执行文件。
# main.py from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch() page = browser.new_page() page.goto("http://whatsmyuseragent.org/") page.screenshot(path="example.png") browser.close()
如果要将浏览器与可执行文件捆绑在一起:
bash: PLAYWRIGHT_BROWSERS_PATH=0 playwright install chromium pyinstaller -F main.py PowerShell: $env:PLAYWRIGHT_BROWSERS_PATH="0" playwright install chromium pyinstaller -F main.py
注意:
将浏览器与可执行文件捆绑在一起将生成更大的二进制文件。建议只捆绑您使用的浏览器。
七、已知问题
1.time.sleep() 导致的过时的问题
你应该使用page.wait_for_timeout(5000) ,而不是time.sleep(5),最好不要等待超时,但有时这对调试很有用。在这些情况下,使用我们的等待方法,而不是time模块。这是因为我们在内部依赖于异步操作和使用time.sleep(5)他们不能得到正确的处理。
2.与Windows上asyncio的SelectorEventLoop不兼容
Playwright在子进程中运行驱动程序,因此它需要Windows上的ProactorEventLoop的asyncio,因为SelectorEventLoop不支持异步子进程。
在Windows Python 3.7上,Playwright将默认事件循环设置为ProactorEventLoop,因为它是基于Python 3.8+
3.多线程threading
Playwright的API不是 thread-safe。如果在多线程环境中使用Playwright,则应该为每个线程创建一个Playwright实例。有关更多详细信息,请参阅线程问题:https://github.com/microsoft/playwright-python/issues/623。
官方原文档:https://playwright.dev/python/docs/inspector#stepping-through-the-playwright-script