Playwright E2E testing AI skills: JavaScript London talk
Using Playwright and the playwright-explore-website skill to test real user journeys
- Published
- 22 April 2026
- Read time
- 17 min read
Was this useful?
This post is the companion article to my JavaScript London talk, Playwright E2E testing AI skills, hosted in collaboration with NewDay.
The meetup is on Wednesday 29 April 2026, from 6:00 PM to 9:00 PM BST, at NewDay’s offices on 7 Handyside Street in King’s Cross, London. The evening also includes talks from David Whitney and Elham Khani, and you can still register on the event page if you plan to come along.
If you want the slides alongside the article, you can download the slide deck (.pptx).
The short version of the talk is this: Playwright is already a strong default for browser automation. The AI part becomes useful when it helps you explore a real site, identify the journeys that matter, and turn that exploration into better tests.
The shortest useful explanation of E2E testing is still the same: unit tests tell you whether the parts work. End-to-end tests tell you whether the product works.
If a button is hidden behind a cookie banner, a redirect breaks after login, or the browser submits the wrong payload, your unit tests can still be green. Your users will still hit the bug.
That is the gap Playwright helps close.
Why this talk is framed around AI skills
I am not interested in using AI to hide how tests work.
I am interested in using it to shorten the boring parts around test creation: exploration, note-taking, finding likely locators, spotting missing assertions, and drafting candidate cases for review.
That is a much better fit for AI than asking it to spray out a huge test suite and hoping it guessed the right behaviours.
The useful pattern is:
- use AI to explore and propose
- use Playwright to automate and verify
- use humans to decide what is worth keeping
That keeps the browser tests honest.
Where E2E fits in the testing pyramid
You do not want to test everything end to end. That is the fastest route to a slow, noisy, expensive test suite.
You want a small number of high-value E2E tests that protect the user journeys that matter most.
The rule I like is simple: use E2E tests for risk, not for coverage.
If a broken flow would hurt users, revenue, or trust, it deserves E2E coverage. If it is easy to prove with a unit or integration test, it probably does not.
Why Playwright is the right foundation
There are other good browser automation tools. Playwright is the one I would start with today for most web teams because it removes a lot of usual friction.
It gives you:
- Chromium, Firefox, and WebKit support out of the box
- a clean TypeScript-first API
- automatic waiting for elements to become actionable
- browser contexts for isolated tests
- a built-in trace viewer for debugging failures in CI
That matters because AI suggestions are only useful if the underlying tool is deterministic enough to turn them into repeatable tests.
What the playwright-explore-website skill is
In my setup, I use a playwright-explore-website GitHub Copilot skill backed by
the Playwright MCP server.
It is a small instruction file that tells Copilot to explore a real site with Playwright, interact with a handful of important flows, document the relevant UI elements and expected outcomes, and then propose test cases based on what it found.
Its job is not to replace a Playwright test file. Its job is to make the step before test writing more grounded in the real browser.
I wrote up the full setup, the original awesome-copilot example I started from, and the local enhancements in The playwright-explore-website Copilot skill.
That write-up stays focused on the skill itself. This post is about how I would use it in a broader Playwright testing workflow.
That makes it a good fit when you are working with:
- an unfamiliar product area
- a staging site you need to smoke test quickly
- a bug report that is missing exact reproduction steps
- a flow where you want candidate locators and assertions before coding the test
A prompt like this is already specific enough to be useful:
Use the playwright-explore-website skill on https://staging.example.com.Explore sign-in, password reset, and checkout.For each flow, document the user steps, the likely stable locators,the expected outcome, and a draft Playwright test case.The value is not the raw prompt. The value is the output: a clearer map of the journey you are about to automate.
How I would use it for E2E testing
My preferred workflow is:
This is where the AI piece earns its keep.
Instead of starting from a blank file, you start with a tested path through the browser, a list of likely selectors, and a set of outcomes worth asserting. You still need to clean that up into a proper test, but the exploratory work is faster.
A good flow looks like this:
- Pick one critical journey.
- Use the skill to explore it and note what the user actually sees.
- Turn the best candidate path into a small Playwright test.
- Replace weak selectors with semantic locators or
data-testid. - Run it in CI with traces and fix the first flaky edge before adding more.
That is enough to prove the approach without bloating the suite.
What E2E tests are good at
Playwright is excellent at checking the flows where the browser is part of the problem.
Good targets for E2E tests:
- sign in, sign out, and session refresh flows
- checkout, booking, or other business-critical user journeys
- form submission paths that depend on real navigation or API responses
- cross-browser regressions
- UI issues that only show up once the page is fully assembled
Poor targets for E2E tests:
- pure business logic
- small validation rules
- isolated component states
- anything a fast unit test can already prove clearly
That distinction still matters even when AI is involved. The point is not to replace the rest of the suite. The point is to protect the seams.
Codegen and exploration are different tools
One of the easiest ways to get moving with Playwright is still to record a flow:
pnpm create playwright@latestpnpm exec playwright codegen https://your-app.exampleCodegen is useful for capturing raw actions quickly.
The playwright-explore-website skill does a different job. It helps you
understand the flow, identify meaningful assertions, and sketch candidate tests
before you commit to code.
That distinction matters. Codegen gives you interaction history. Exploration gives you testing intent.
You will usually want both:
- use the skill when you need to map the journey
- use codegen when you need a quick action scaffold
- rewrite the result so the test reads like a real scenario
The --ui flag
For day-to-day test development, the interactive UI mode is often more useful than either of those:
pnpm exec playwright test --uiIt opens a live runner where you can step through each test action, inspect the DOM at any point in the run, use the built-in locator picker to find stable selectors, and re-run individual tests without restarting the suite.
The trace viewer is the right tool when you need to diagnose a CI failure after the
fact. --ui is the right tool when you are actively writing or debugging a test
locally.
Keeping tests reliable
Flaky tests are worse than missing tests.
Once a team stops trusting the suite, the suite stops being useful.
Let Playwright wait for you
Playwright automatically waits for elements to be attached, visible, stable, and ready for interaction before acting on them.
That is one of the main reasons its tests feel less fragile than older browser automation stacks.
Use selectors that survive refactoring
Prefer the most human-facing locator you can.
Good order of preference:
getByRolegetByLabelgetByTextdata-testidwhen you need a stable testing contract
What you want to avoid is binding tests to styling details like .btn-primary
or deep CSS paths that change every time the UI gets cleaned up.
Treat waitForTimeout as a smell
If you ever reach for this:
await page.waitForTimeout(2000);assume the test still is not right.
It may pass on your machine and fail in CI. It may also slow the suite down while still being unreliable.
Better choices are explicit signals:
await page.waitForURL("**/confirmation");await page.waitForResponse(/api\/orders/);await expect(page.getByText("Order confirmed")).toBeVisible();Make the await match the business signal
This is the distinction that trips people up.
Playwright already auto-waits for an element to become actionable before it clicks, fills, or types. Explicit awaits are for the thing that happens after the action.
That means await page.getByRole("button", { name: "Place order" }).click()
can prove the button was clickable. It does not, on its own, prove the order
was created, the redirect finished, or the confirmation UI appeared.
The right explicit wait depends on the signal that tells you the step is really done:
- wait for a URL change when the flow navigates
- wait for a response when the backend side effect matters
- wait for a loading state to disappear when the page stays put
- wait for the final visible UI state when that is what the user would notice
When the click starts the transition, tie the action and the wait together:
await Promise.all([ page.waitForURL("**/confirmation"), page.getByRole("button", { name: "Place order" }).click(),]);
await expect(page.getByText("Order confirmed")).toBeVisible();If the page does not navigate, but the server-side effect is the important part, wait for that response first and then assert the UI:
await Promise.all([ page.waitForResponse( (response) => response.url().includes("/api/orders") && response.ok(), ), page.getByRole("button", { name: "Place order" }).click(),]);
await expect(page.getByText("Order confirmed")).toBeVisible();That style is more honest about what the test depends on. Instead of hoping a pause is long enough, you name the signal that proves the journey completed.
Parallelism and test isolation
Playwright runs tests in parallel by default across multiple workers.
That is one of the reasons E2E suites can be fast. It is also the most common cause of flaky-in-CI-but-green-locally failures.
Each test should set up its own state and not rely on anything another test created or left behind. Playwright isolates tests at the browser context level by default — each test gets fresh cookies, storage, and session state — but shared external state (databases, APIs, cached files) is still your responsibility.
If a test reliably passes locally but flakes in CI, shared mutable state is the first thing to check. Parallel workers interleave test execution in ways that sequential local runs never expose.
Page objects are a scaling tool
I do not start with page objects on day one.
If a suite has one or two tests, a couple of small helper functions are often enough. Page objects start paying off when multiple tests share the same screen, the same setup, or the same selectors.
The job of a page object is narrow:
- keep selectors in one place
- expose repeated user actions in product language
- reduce copy-paste when the UI changes
A good page object hides selector plumbing. It should not hide the whole test. The scenario, the assertions, and the reason the flow matters should usually stay visible in the test file.
This is the sort of thing I mean:
import { Page } from "@playwright/test";
export class LoginPage { constructor(private readonly page: Page) {}
emailField() { return this.page.getByLabel("Email"); }
passwordField() { return this.page.getByLabel("Password"); }
async signIn(email: string, password: string) { await this.emailField().fill(email); await this.passwordField().fill(password); await this.page.getByTestId("login-submit").click(); }}The test that uses it can stay focused on the actual journey:
await loginPage.signIn("alice@example.com", "correct horse battery staple");await expect(page).toHaveURL(/dashboard/);The trade-off is worth it when the same login flow appears in a few tests. It is not worth it when every page object becomes a giant wrapper around every DOM node on the screen.
My rule of thumb is simple: if two or three tests repeat the same selectors and actions, extract a small page object. Keep it focused on repeated flows, not on building a mini framework.
Fixtures handle setup and teardown
Page objects handle selector and action reuse. Fixtures handle the setup and teardown that wraps tests.
If you find yourself writing the same beforeEach block across multiple test
files — creating a page object, navigating to a starting URL, seeding some state
— that logic belongs in a fixture.
Playwright’s fixture system lets you declare what a test depends on and compose resources cleanly:
import { test as base } from "@playwright/test";import { LoginPage } from "./pages/LoginPage";import { DashboardPage } from "./pages/DashboardPage";
type Fixtures = { loginPage: LoginPage; dashboardPage: DashboardPage;};
export const test = base.extend<Fixtures>({ loginPage: async ({ page }, use) => { await use(new LoginPage(page)); }, dashboardPage: async ({ page }, use) => { await use(new DashboardPage(page)); },});Tests declare what they need and get it automatically:
import { test } from "./fixtures";import { expect } from "@playwright/test";
test("redirects to dashboard after login", async ({ loginPage, page }) => { await loginPage.signIn("alice@example.com", "correct horse battery staple"); await expect(page).toHaveURL(/dashboard/);});Teardown runs automatically after each test, even on failure. That is harder to
guarantee with a plain beforeEach / afterEach pair.
Reusing authentication state
Tests that need a logged-in user are one of the most common uses for fixtures. Repeating the full login flow in every test is slow and fragile.
Playwright’s storageState lets you save authenticated browser state — cookies,
localStorage, session storage — and restore it at the start of a test:
import { test as setup } from "@playwright/test";
setup("authenticate", async ({ page }) => { await page.goto("/login"); await page.getByLabel("Email").fill("alice@example.com"); await page.getByLabel("Password").fill("correct horse battery staple"); await page.getByRole("button", { name: "Sign in" }).click(); await page.waitForURL("/dashboard"); await page.context().storageState({ path: ".auth/alice.json" });});export default defineConfig({ projects: [ { name: "setup", testMatch: /auth\.setup\.ts/ }, { name: "authenticated", use: { storageState: ".auth/alice.json" }, dependencies: ["setup"], }, ],});Tests in the authenticated project start already logged in. The login flow runs
once per suite, not once per test.
Add .auth/ to .gitignore — the saved state contains session tokens that must
not be committed to source control.
Other useful jobs for Playwright
The interesting part of playwright-explore-website is that it is not limited
to authoring E2E tests. But stepping back further: neither is Playwright itself.
The same browser automation API that drives your test suite is also a general tool for anything that needs a real browser.
Exploratory QA and smoke testing
Use the skill for exploratory QA on a staging or preview deployment, reproducing vague browser bugs from support tickets, or checking for console errors and visible breakage after a deploy.
A prompt like “explore this staging URL and document anything broken, slow, or visually unexpected” is often enough to surface real issues before they reach production.
Screenshots and visual documentation
Playwright can take full-page screenshots of any URL:
await page.goto("https://your-app.example/dashboard");await page.screenshot({ path: "dashboard.png", fullPage: true });That is useful for generating visual documentation, capturing before/after diffs for design reviews, or producing screenshots for release notes without manual screen-grabbing.
Video recording
You can record video of every test run by adding recordVideo to the browser
context:
export default defineConfig({ use: { video: "retain-on-failure", },});With retain-on-failure, videos are only saved when a test fails — keeping
storage manageable while giving you a full replay of what happened. Set to "on"
if you want recordings for every run.
PDF generation
For pages that need to produce printable output, page.pdf() renders the page
using print CSS and saves it to disk:
await page.goto("https://your-app.example/invoice/123");await page.pdf({ path: "invoice-123.pdf", format: "A4" });That is a straightforward way to automate report generation or verify that print layouts render correctly — without a dedicated PDF library.
Downloading assets
Playwright can intercept and save file downloads:
const [download] = await Promise.all([ page.waitForEvent("download"), page.getByRole("button", { name: "Export CSV" }).click(),]);await download.saveAs("export.csv");That makes it easy to automate data exports, verify download flows in tests, or pull generated assets from a behind-authentication endpoint.
What it is not
I would not use Playwright as a substitute for proper accessibility reviews, performance profiling, or security testing. It is a browser automation tool. It is genuinely good at anything that needs a real browser — but knowing where it stops is as useful as knowing where it starts.
Running Playwright in CI and debugging failures
Running the suite in CI is the obvious part.
The more interesting part is what happens after a failure.
Playwright’s trace viewer is one of the best reasons to use it. When a test fails, you can capture a trace and inspect:
- every action
- a timeline of the test
- screenshots at each step
- console output
- network requests
The workflow is straightforward:
pnpm exec playwright test --trace=on-first-retrypnpm exec playwright show-trace trace.zipThat turns CI failures from guesswork into evidence.
When not to write an E2E test
If your E2E suite becomes the default answer to every testing question, it will become slow, brittle, and expensive to maintain.
Good reasons not to write an E2E test:
- the behaviour is already covered clearly in a unit test
- the test would take a long time to set up for very little risk reduction
- the UI state is local and easy to verify at component level
- the failure would not matter much in production
If a test takes a long time to run but almost never catches a meaningful bug, it is probably not earning its place in the suite. Remove it, or replace it with a cheaper test that gives clearer feedback.
A pragmatic place to start
If you are introducing Playwright, or the AI-assisted workflow around it, this is a sensible first week plan.
- Pick one critical user journey.
- Explore it with
playwright-explore-website. - Write one clean Playwright test from that exploration.
- Use semantic locators first, then
data-testidwhere needed. - Run that test in CI with traces enabled.
- Fix flakiness before adding more coverage.
That is enough to learn the tool, prove the value, and build trust in the approach.
Once that first path is stable, add the next one.
Not everything needs an end-to-end test. The paths that matter do.
If you want the official docs after this overview, start here:
Working on something similar?
Need help raising the bar?
I help teams improve engineering practice through hands-on delivery, pragmatic reviews, and mentoring. If you want a second pair of eyes or practical support, let's talk.
- Engineering practice review
- Hands-on delivery
- Team mentoring
If this has been useful, you can back the writing with a one-off tip through a secure Stripe checkout.
Free · Practical · One email per post
Get practical engineering notes
One short email when a new article goes live. Useful if you are breaking into tech, growing as an engineer, or improving engineering practice on your team.
Comments
Loading comments…
Leave a comment