Playwright E2E testing AI skills: JavaScript London talk

This post is the companion article to my JavaScript London talk, Playwright E2E testing AI skills, hosted in collaboration with NewDay.

The meetup is on Wednesday 29 April 2026, from 6:00 PM to 9:00 PM BST, at NewDay’s offices on 7 Handyside Street in King’s Cross, London. The evening also includes talks from David Whitney and Elham Khani, and you can still register on the event page if you plan to come along.

If you want the slides alongside the article, you can download the slide deck (.pptx).

The short version of the talk is this: Playwright is already a strong default for browser automation. The AI part becomes useful when it helps you explore a real site, identify the journeys that matter, and turn that exploration into better tests.

The shortest useful explanation of E2E testing is still the same: unit tests tell you whether the parts work. End-to-end tests tell you whether the product works.

If a button is hidden behind a cookie banner, a redirect breaks after login, or the browser submits the wrong payload, your unit tests can still be green. Your users will still hit the bug.

That is the gap Playwright helps close.

Why this talk is framed around AI skills

I am not interested in using AI to hide how tests work.

I am interested in using it to shorten the boring parts around test creation: exploration, note-taking, finding likely locators, spotting missing assertions, and drafting candidate cases for review.

That is a much better fit for AI than asking it to spray out a huge test suite and hoping it guessed the right behaviours.

The useful pattern is:

use AI to explore and propose
use Playwright to automate and verify
use humans to decide what is worth keeping

That keeps the browser tests honest.

Where E2E fits in the testing pyramid

You do not want to test everything end to end. That is the fastest route to a slow, noisy, expensive test suite.

You want a small number of high-value E2E tests that protect the user journeys that matter most.

The rule I like is simple: use E2E tests for risk, not for coverage.

If a broken flow would hurt users, revenue, or trust, it deserves E2E coverage. If it is easy to prove with a unit or integration test, it probably does not.

Why Playwright is the right foundation

There are other good browser automation tools. Playwright is the one I would start with today for most web teams because it removes a lot of usual friction.

It gives you:

Chromium, Firefox, and WebKit support out of the box
a clean TypeScript-first API
automatic waiting for elements to become actionable
browser contexts for isolated tests
a built-in trace viewer for debugging failures in CI

That matters because AI suggestions are only useful if the underlying tool is deterministic enough to turn them into repeatable tests.

What the playwright-explore-website skill is

In my setup, I use a playwright-explore-website GitHub Copilot skill backed by the Playwright MCP server.

It is a small instruction file that tells Copilot to explore a real site with Playwright, interact with a handful of important flows, document the relevant UI elements and expected outcomes, and then propose test cases based on what it found.

Its job is not to replace a Playwright test file. Its job is to make the step before test writing more grounded in the real browser.

I wrote up the full setup, the original awesome-copilot example I started from, and the local enhancements in The playwright-explore-website Copilot skill.

That write-up stays focused on the skill itself. This post is about how I would use it in a broader Playwright testing workflow.

That makes it a good fit when you are working with:

an unfamiliar product area
a staging site you need to smoke test quickly
a bug report that is missing exact reproduction steps
a flow where you want candidate locators and assertions before coding the test

A prompt like this is already specific enough to be useful:

1
Use the playwright-explore-website skill on https://staging.example.com.
2
Explore sign-in, password reset, and checkout.
3
For each flow, document the user steps, the likely stable locators,
4
the expected outcome, and a draft Playwright test case.

The value is not the raw prompt. The value is the output: a clearer map of the journey you are about to automate.

How I would use it for E2E testing

My preferred workflow is:

This is where the AI piece earns its keep.

Instead of starting from a blank file, you start with a tested path through the browser, a list of likely selectors, and a set of outcomes worth asserting. You still need to clean that up into a proper test, but the exploratory work is faster.

A good flow looks like this:

Pick one critical journey.
Use the skill to explore it and note what the user actually sees.
Turn the best candidate path into a small Playwright test.
Replace weak selectors with semantic locators or data-testid.
Run it in CI with traces and fix the first flaky edge before adding more.

That is enough to prove the approach without bloating the suite.

What E2E tests are good at

Playwright is excellent at checking the flows where the browser is part of the problem.

Good targets for E2E tests:

sign in, sign out, and session refresh flows
checkout, booking, or other business-critical user journeys
form submission paths that depend on real navigation or API responses
cross-browser regressions
UI issues that only show up once the page is fully assembled

Poor targets for E2E tests:

pure business logic
small validation rules
isolated component states
anything a fast unit test can already prove clearly

That distinction still matters even when AI is involved. The point is not to replace the rest of the suite. The point is to protect the seams.

Codegen and exploration are different tools

One of the easiest ways to get moving with Playwright is still to record a flow:

1
pnpm create playwright@latest
2
pnpm exec playwright codegen https://your-app.example

Codegen is useful for capturing raw actions quickly.

The playwright-explore-website skill does a different job. It helps you understand the flow, identify meaningful assertions, and sketch candidate tests before you commit to code.

That distinction matters. Codegen gives you interaction history. Exploration gives you testing intent.

You will usually want both:

use the skill when you need to map the journey
use codegen when you need a quick action scaffold
rewrite the result so the test reads like a real scenario

The `--ui` flag

For day-to-day test development, the interactive UI mode is often more useful than either of those:

1
pnpm exec playwright test --ui

It opens a live runner where you can step through each test action, inspect the DOM at any point in the run, use the built-in locator picker to find stable selectors, and re-run individual tests without restarting the suite.

The trace viewer is the right tool when you need to diagnose a CI failure after the fact. --ui is the right tool when you are actively writing or debugging a test locally.

Keeping tests reliable

Flaky tests are worse than missing tests.

Once a team stops trusting the suite, the suite stops being useful.

Let Playwright wait for you

Playwright automatically waits for elements to be attached, visible, stable, and ready for interaction before acting on them.

That is one of the main reasons its tests feel less fragile than older browser automation stacks.

Use selectors that survive refactoring

Prefer the most human-facing locator you can.

Good order of preference:

getByRole
getByLabel
getByText
data-testid when you need a stable testing contract

What you want to avoid is binding tests to styling details like .btn-primary or deep CSS paths that change every time the UI gets cleaned up.

Treat `waitForTimeout` as a smell

If you ever reach for this:

1
await page.waitForTimeout(2000);

assume the test still is not right.

It may pass on your machine and fail in CI. It may also slow the suite down while still being unreliable.

Better choices are explicit signals:

1
await page.waitForURL("**/confirmation");
2
await page.waitForResponse(/api\/orders/);
3
await expect(page.getByText("Order confirmed")).toBeVisible();

Make the await match the business signal

This is the distinction that trips people up.

Playwright already auto-waits for an element to become actionable before it clicks, fills, or types. Explicit awaits are for the thing that happens after the action.

That means await page.getByRole("button", { name: "Place order" }).click() can prove the button was clickable. It does not, on its own, prove the order was created, the redirect finished, or the confirmation UI appeared.

The right explicit wait depends on the signal that tells you the step is really done:

wait for a URL change when the flow navigates
wait for a response when the backend side effect matters
wait for a loading state to disappear when the page stays put
wait for the final visible UI state when that is what the user would notice

When the click starts the transition, tie the action and the wait together:

1
await Promise.all([
2
  page.waitForURL("**/confirmation"),
3
  page.getByRole("button", { name: "Place order" }).click(),
4
]);
5

6
await expect(page.getByText("Order confirmed")).toBeVisible();

If the page does not navigate, but the server-side effect is the important part, wait for that response first and then assert the UI:

1
await Promise.all([
2
  page.waitForResponse(
3
    (response) =>
4
      response.url().includes("/api/orders") && response.ok(),
5
  ),
6
  page.getByRole("button", { name: "Place order" }).click(),
7
]);
8

9
await expect(page.getByText("Order confirmed")).toBeVisible();

That style is more honest about what the test depends on. Instead of hoping a pause is long enough, you name the signal that proves the journey completed.

Parallelism and test isolation

Playwright runs tests in parallel by default across multiple workers.

That is one of the reasons E2E suites can be fast. It is also the most common cause of flaky-in-CI-but-green-locally failures.

Each test should set up its own state and not rely on anything another test created or left behind. Playwright isolates tests at the browser context level by default — each test gets fresh cookies, storage, and session state — but shared external state (databases, APIs, cached files) is still your responsibility.

If a test reliably passes locally but flakes in CI, shared mutable state is the first thing to check. Parallel workers interleave test execution in ways that sequential local runs never expose.

Page objects are a scaling tool

I do not start with page objects on day one.

If a suite has one or two tests, a couple of small helper functions are often enough. Page objects start paying off when multiple tests share the same screen, the same setup, or the same selectors.

The job of a page object is narrow:

keep selectors in one place
expose repeated user actions in product language
reduce copy-paste when the UI changes

A good page object hides selector plumbing. It should not hide the whole test. The scenario, the assertions, and the reason the flow matters should usually stay visible in the test file.

This is the sort of thing I mean:

1
import { Page } from "@playwright/test";
2

3
export class LoginPage {
4
  constructor(private readonly page: Page) {}
5

6
  emailField() {
7
    return this.page.getByLabel("Email");
8
  }
9

10
  passwordField() {
11
    return this.page.getByLabel("Password");
12
  }
13

14
  async signIn(email: string, password: string) {
15
    await this.emailField().fill(email);
16
    await this.passwordField().fill(password);
17
    await this.page.getByTestId("login-submit").click();
18
  }
19
}

The test that uses it can stay focused on the actual journey:

1
await loginPage.signIn("alice@example.com", "correct horse battery staple");
2
await expect(page).toHaveURL(/dashboard/);

The trade-off is worth it when the same login flow appears in a few tests. It is not worth it when every page object becomes a giant wrapper around every DOM node on the screen.

My rule of thumb is simple: if two or three tests repeat the same selectors and actions, extract a small page object. Keep it focused on repeated flows, not on building a mini framework.

Fixtures handle setup and teardown

Page objects handle selector and action reuse. Fixtures handle the setup and teardown that wraps tests.

If you find yourself writing the same beforeEach block across multiple test files — creating a page object, navigating to a starting URL, seeding some state — that logic belongs in a fixture.

Playwright’s fixture system lets you declare what a test depends on and compose resources cleanly:

1
import { test as base } from "@playwright/test";
2
import { LoginPage } from "./pages/LoginPage";
3
import { DashboardPage } from "./pages/DashboardPage";
4

5
type Fixtures = {
6
  loginPage: LoginPage;
7
  dashboardPage: DashboardPage;
8
};
9

10
export const test = base.extend<Fixtures>({
11
  loginPage: async ({ page }, use) => {
12
    await use(new LoginPage(page));
13
  },
14
  dashboardPage: async ({ page }, use) => {
15
    await use(new DashboardPage(page));
16
  },
17
});

Tests declare what they need and get it automatically:

1
import { test } from "./fixtures";
2
import { expect } from "@playwright/test";
3

4
test("redirects to dashboard after login", async ({ loginPage, page }) => {
5
  await loginPage.signIn("alice@example.com", "correct horse battery staple");
6
  await expect(page).toHaveURL(/dashboard/);
7
});

Teardown runs automatically after each test, even on failure. That is harder to guarantee with a plain beforeEach / afterEach pair.

Reusing authentication state

Tests that need a logged-in user are one of the most common uses for fixtures. Repeating the full login flow in every test is slow and fragile.

Playwright’s storageState lets you save authenticated browser state — cookies, localStorage, session storage — and restore it at the start of a test:

1
import { test as setup } from "@playwright/test";
2

3
setup("authenticate", async ({ page }) => {
4
  await page.goto("/login");
5
  await page.getByLabel("Email").fill("alice@example.com");
6
  await page.getByLabel("Password").fill("correct horse battery staple");
7
  await page.getByRole("button", { name: "Sign in" }).click();
8
  await page.waitForURL("/dashboard");
9
  await page.context().storageState({ path: ".auth/alice.json" });
10
});

1
export default defineConfig({
2
  projects: [
3
    { name: "setup", testMatch: /auth\.setup\.ts/ },
4
    {
5
      name: "authenticated",
6
      use: { storageState: ".auth/alice.json" },
7
      dependencies: ["setup"],
8
    },
9
  ],
10
});

Tests in the authenticated project start already logged in. The login flow runs once per suite, not once per test.

Add .auth/ to .gitignore — the saved state contains session tokens that must not be committed to source control.

Other useful jobs for Playwright

The interesting part of playwright-explore-website is that it is not limited to authoring E2E tests. But stepping back further: neither is Playwright itself.

The same browser automation API that drives your test suite is also a general tool for anything that needs a real browser.

Exploratory QA and smoke testing

Use the skill for exploratory QA on a staging or preview deployment, reproducing vague browser bugs from support tickets, or checking for console errors and visible breakage after a deploy.

A prompt like “explore this staging URL and document anything broken, slow, or visually unexpected” is often enough to surface real issues before they reach production.

Screenshots and visual documentation

Playwright can take full-page screenshots of any URL:

1
await page.goto("https://your-app.example/dashboard");
2
await page.screenshot({ path: "dashboard.png", fullPage: true });

That is useful for generating visual documentation, capturing before/after diffs for design reviews, or producing screenshots for release notes without manual screen-grabbing.

Video recording

You can record video of every test run by adding recordVideo to the browser context:

1
export default defineConfig({
2
  use: {
3
    video: "retain-on-failure",
4
  },
5
});

With retain-on-failure, videos are only saved when a test fails — keeping storage manageable while giving you a full replay of what happened. Set to "on" if you want recordings for every run.

PDF generation

For pages that need to produce printable output, page.pdf() renders the page using print CSS and saves it to disk:

1
await page.goto("https://your-app.example/invoice/123");
2
await page.pdf({ path: "invoice-123.pdf", format: "A4" });

That is a straightforward way to automate report generation or verify that print layouts render correctly — without a dedicated PDF library.

Downloading assets

Playwright can intercept and save file downloads:

1
const [download] = await Promise.all([
2
  page.waitForEvent("download"),
3
  page.getByRole("button", { name: "Export CSV" }).click(),
4
]);
5
await download.saveAs("export.csv");

That makes it easy to automate data exports, verify download flows in tests, or pull generated assets from a behind-authentication endpoint.

What it is not

I would not use Playwright as a substitute for proper accessibility reviews, performance profiling, or security testing. It is a browser automation tool. It is genuinely good at anything that needs a real browser — but knowing where it stops is as useful as knowing where it starts.

Running Playwright in CI and debugging failures

Running the suite in CI is the obvious part.

The more interesting part is what happens after a failure.

Playwright’s trace viewer is one of the best reasons to use it. When a test fails, you can capture a trace and inspect:

every action
a timeline of the test
screenshots at each step
console output
network requests

The workflow is straightforward:

1
pnpm exec playwright test --trace=on-first-retry
2
pnpm exec playwright show-trace trace.zip

That turns CI failures from guesswork into evidence.

When not to write an E2E test

If your E2E suite becomes the default answer to every testing question, it will become slow, brittle, and expensive to maintain.

Good reasons not to write an E2E test:

the behaviour is already covered clearly in a unit test
the test would take a long time to set up for very little risk reduction
the UI state is local and easy to verify at component level
the failure would not matter much in production

If a test takes a long time to run but almost never catches a meaningful bug, it is probably not earning its place in the suite. Remove it, or replace it with a cheaper test that gives clearer feedback.

A pragmatic place to start

If you are introducing Playwright, or the AI-assisted workflow around it, this is a sensible first week plan.

Pick one critical user journey.
Explore it with playwright-explore-website.
Write one clean Playwright test from that exploration.
Use semantic locators first, then data-testid where needed.
Run that test in CI with traces enabled.
Fix flakiness before adding more coverage.

That is enough to learn the tool, prove the value, and build trust in the approach.

Once that first path is stable, add the next one.

Not everything needs an end-to-end test. The paths that matter do.

If you want the official docs after this overview, start here:

Playwright E2E testing AI skills: JavaScript London talk

Why this talk is framed around AI skills

Where E2E fits in the testing pyramid

Why Playwright is the right foundation

What the playwright-explore-website skill is

How I would use it for E2E testing

What E2E tests are good at

Codegen and exploration are different tools

The `--ui` flag

Keeping tests reliable

Let Playwright wait for you

Use selectors that survive refactoring

Treat `waitForTimeout` as a smell

Make the await match the business signal

Parallelism and test isolation

Page objects are a scaling tool

Fixtures handle setup and teardown

Reusing authentication state

Other useful jobs for Playwright

Exploratory QA and smoke testing

Screenshots and visual documentation

Video recording

PDF generation

Downloading assets

What it is not

Running Playwright in CI and debugging failures

When not to write an E2E test

A pragmatic place to start

Need help raising the bar?

Comments

Leave a comment

Get practical engineering notes

Why this talk is framed around AI skills

Where E2E fits in the testing pyramid

Why Playwright is the right foundation

What the playwright-explore-website skill is

How I would use it for E2E testing

What E2E tests are good at

Codegen and exploration are different tools

The --ui flag

Keeping tests reliable

Let Playwright wait for you

Use selectors that survive refactoring

Treat waitForTimeout as a smell

Make the await match the business signal

Parallelism and test isolation

Page objects are a scaling tool

Fixtures handle setup and teardown

Reusing authentication state

Other useful jobs for Playwright

Exploratory QA and smoke testing

Screenshots and visual documentation

Video recording

PDF generation

Downloading assets

What it is not

Running Playwright in CI and debugging failures

When not to write an E2E test

A pragmatic place to start

Need help raising the bar?

Comments

Leave a comment

Get practical engineering notes

The `--ui` flag

Treat `waitForTimeout` as a smell