Playwright E2E testing AI skills: JavaScript London talk
Using Playwright and the playwright-explore-website skill to test real user journeys
- Published
- 22 April 2026
- Read time
- 12 min read
Was this useful?
This post is the companion article to my JavaScript London talk, Playwright E2E testing AI skills, hosted in collaboration with NewDay.
The meetup is on Wednesday 29 April 2026, from 6:00 PM to 9:00 PM BST, at NewDay’s offices on 7 Handyside Street in King’s Cross, London. The evening also includes talks from David Whitney and Elham Khani, and you can still register on the event page if you plan to come along.
If you want the slides alongside the article, you can download the slide deck (.pptx).
The short version of the talk is this: Playwright is already a strong default for browser automation. The AI part becomes useful when it helps you explore a real site, identify the journeys that matter, and turn that exploration into better tests.
The shortest useful explanation of E2E testing is still the same: unit tests tell you whether the parts work. End-to-end tests tell you whether the product works.
If a button is hidden behind a cookie banner, a redirect breaks after login, or the browser submits the wrong payload, your unit tests can still be green. Your users will still hit the bug.
That is the gap Playwright helps close.
Why this talk is framed around AI skills
I am not interested in using AI to hide how tests work.
I am interested in using it to shorten the boring parts around test creation: exploration, note-taking, finding likely locators, spotting missing assertions, and drafting candidate cases for review.
That is a much better fit for AI than asking it to spray out a huge test suite and hoping it guessed the right behaviours.
The useful pattern is:
- use AI to explore and propose
- use Playwright to automate and verify
- use humans to decide what is worth keeping
That keeps the browser tests honest.
Where E2E fits in the testing pyramid
You do not want to test everything end to end. That is the fastest route to a slow, noisy, expensive test suite.
You want a small number of high-value E2E tests that protect the user journeys that matter most.
The rule I like is simple: use E2E tests for risk, not for coverage.
If a broken flow would hurt users, revenue, or trust, it deserves E2E coverage. If it is easy to prove with a unit or integration test, it probably does not.
Why Playwright is still the right foundation
There are other good browser automation tools. Playwright is the one I would start with today for most web teams because it removes a lot of usual friction.
It gives you:
- Chromium, Firefox, and WebKit support out of the box
- a clean TypeScript-first API
- automatic waiting for elements to become actionable
- browser contexts for isolated tests
- a built-in trace viewer for debugging failures in CI
That matters because AI suggestions are only useful if the underlying tool is deterministic enough to turn them into repeatable tests.
What the playwright-explore-website skill is
In my setup, I use a playwright-explore-website GitHub Copilot skill backed by
the Playwright MCP server.
It is a small instruction file that tells Copilot to explore a real site with Playwright, interact with a handful of important flows, document the relevant UI elements and expected outcomes, and then propose test cases based on what it found.
Its job is not to replace a Playwright test file. Its job is to make the step before test writing more grounded in the real browser.
I wrote up the full setup, the original awesome-copilot example I started from, and the local enhancements in The playwright-explore-website Copilot skill.
That write-up stays focused on the skill itself. This post is about how I would use it in a broader Playwright testing workflow.
That makes it a good fit when you are working with:
- an unfamiliar product area
- a staging site you need to smoke test quickly
- a bug report that is missing exact reproduction steps
- a flow where you want candidate locators and assertions before coding the test
A prompt like this is already specific enough to be useful:
Use the playwright-explore-website skill on https://staging.example.com.Explore sign-in, password reset, and checkout.For each flow, document the user steps, the likely stable locators,the expected outcome, and a draft Playwright test case.The value is not the raw prompt. The value is the output: a clearer map of the journey you are about to automate.
How I would use it for E2E testing
My preferred workflow is:
This is where the AI piece earns its keep.
Instead of starting from a blank file, you start with a tested path through the browser, a list of likely selectors, and a set of outcomes worth asserting. You still need to clean that up into a proper test, but the exploratory work is faster.
A good flow looks like this:
- Pick one critical journey.
- Use the skill to explore it and note what the user actually sees.
- Turn the best candidate path into a small Playwright test.
- Replace weak selectors with semantic locators or
data-testid. - Run it in CI with traces and fix the first flaky edge before adding more.
That is enough to prove the approach without bloating the suite.
What E2E tests are good at
Playwright is excellent at checking the flows where the browser is part of the problem.
Good targets for E2E tests:
- sign in, sign out, and session refresh flows
- checkout, booking, or other business-critical user journeys
- form submission paths that depend on real navigation or API responses
- cross-browser regressions
- UI issues that only show up once the page is fully assembled
Poor targets for E2E tests:
- pure business logic
- small validation rules
- isolated component states
- anything a fast unit test can already prove clearly
That distinction still matters even when AI is involved. The point is not to replace the rest of the suite. The point is to protect the seams.
Codegen and exploration are different tools
One of the easiest ways to get moving with Playwright is still to record a flow:
pnpm create playwright@latestpnpm playwright codegen https://your-app.exampleCodegen is useful for capturing raw actions quickly.
The playwright-explore-website skill does a different job. It helps you
understand the flow, identify meaningful assertions, and sketch candidate tests
before you commit to code.
That distinction matters. Codegen gives you interaction history. Exploration gives you testing intent.
You will usually want both:
- use the skill when you need to map the journey
- use codegen when you need a quick action scaffold
- rewrite the result so the test reads like a real scenario
Keeping tests reliable
Flaky tests are worse than missing tests.
Once a team stops trusting the suite, the suite stops being useful.
Let Playwright wait for you
Playwright automatically waits for elements to be attached, visible, stable, and ready for interaction before acting on them.
That is one of the main reasons its tests feel less fragile than older browser automation stacks.
Use selectors that survive refactoring
Prefer the most human-facing locator you can.
Good order of preference:
getByRolegetByLabelgetByTextdata-testidwhen you need a stable testing contract
What you want to avoid is binding tests to styling details like .btn-primary
or deep CSS paths that change every time the UI gets cleaned up.
Treat waitForTimeout as a smell
If you ever reach for this:
await page.waitForTimeout(2000);assume the test still is not right.
It may pass on your machine and fail in CI. It may also slow the suite down while still being unreliable.
Better choices are explicit signals:
await page.waitForURL("**/confirmation");await page.waitForResponse(/api\/orders/);await expect(page.getByText("Order confirmed")).toBeVisible();Make the await match the business signal
This is the distinction that trips people up.
Playwright already auto-waits for an element to become actionable before it clicks, fills, or types. Explicit awaits are for the thing that happens after the action.
That means await page.getByRole("button", { name: "Place order" }).click()
can prove the button was clickable. It does not, on its own, prove the order
was created, the redirect finished, or the confirmation UI appeared.
The right explicit wait depends on the signal that tells you the step is really done:
- wait for a URL change when the flow navigates
- wait for a response when the backend side effect matters
- wait for a loading state to disappear when the page stays put
- wait for the final visible UI state when that is what the user would notice
When the click starts the transition, tie the action and the wait together:
await Promise.all([ page.waitForURL("**/confirmation"), page.getByRole("button", { name: "Place order" }).click(),]);
await expect(page.getByText("Order confirmed")).toBeVisible();If the page does not navigate, but the server-side effect is the important part, wait for that response first and then assert the UI:
await Promise.all([ page.waitForResponse( (response) => response.url().includes("/api/orders") && response.ok(), ), page.getByRole("button", { name: "Place order" }).click(),]);
await expect(page.getByText("Order confirmed")).toBeVisible();That style is more honest about what the test depends on. Instead of hoping a pause is long enough, you name the signal that proves the journey completed.
Page objects are a scaling tool
I do not start with page objects on day one.
If a suite has one or two tests, a couple of small helper functions are often enough. Page objects start paying off when multiple tests share the same screen, the same setup, or the same selectors.
The job of a page object is narrow:
- keep selectors in one place
- expose repeated user actions in product language
- reduce copy-paste when the UI changes
A good page object hides selector plumbing. It should not hide the whole test. The scenario, the assertions, and the reason the flow matters should usually stay visible in the test file.
This is the sort of thing I mean:
import { Page } from "@playwright/test";
export class LoginPage { constructor(private readonly page: Page) {}
emailField() { return this.page.getByLabel("Email"); }
passwordField() { return this.page.getByLabel("Password"); }
async signIn(email: string, password: string) { await this.emailField().fill(email); await this.passwordField().fill(password); await this.page.getByTestId("login-submit").click(); }}The test that uses it can stay focused on the actual journey:
await loginPage.signIn("alice@example.com", "correct horse battery staple");await expect(page).toHaveURL(/dashboard/);The trade-off is worth it when the same login flow appears in a few tests. It is not worth it when every page object becomes a giant wrapper around every DOM node on the screen.
My rule of thumb is simple: if two or three tests repeat the same selectors and actions, extract a small page object. Keep it focused on repeated flows, not on building a mini framework.
Other useful jobs for the same skill
The interesting part of playwright-explore-website is that it is not limited
to authoring E2E tests.
It is also useful for:
- exploratory QA on a staging or preview deployment
- reproducing vague browser bugs from support tickets
- documenting core flows and their expected outcomes
- checking console errors and visible breakage after a deploy
- validating navigation, forms, and content on a marketing site before release
- identifying which user journeys are worth automating next
I would not use it as a substitute for proper accessibility reviews, performance testing, or security testing. It is a browser exploration tool, not a complete quality strategy.
Running Playwright in CI and debugging failures
Running the suite in CI is the obvious part.
The more interesting part is what happens after a failure.
Playwright’s trace viewer is one of the best reasons to use it. When a test fails, you can capture a trace and inspect:
- every action
- a timeline of the test
- screenshots at each step
- console output
- network requests
The workflow is straightforward:
pnpm playwright test --trace=on-first-retrypnpm playwright show-trace trace.zipThat turns CI failures from guesswork into evidence.
When not to write an E2E test
If your E2E suite becomes the default answer to every testing question, it will become slow, brittle, and expensive to maintain.
Good reasons not to write an E2E test:
- the behaviour is already covered clearly in a unit test
- the test would take a long time to set up for very little risk reduction
- the UI state is local and easy to verify at component level
- the failure would not matter much in production
If a test takes a long time to run but almost never catches a meaningful bug, it is probably not earning its place in the suite. Remove it, or replace it with a cheaper test that gives clearer feedback.
A pragmatic place to start
If you are introducing Playwright, or the AI-assisted workflow around it, this is a sensible first week plan.
- Pick one critical user journey.
- Explore it with
playwright-explore-website. - Write one clean Playwright test from that exploration.
- Use semantic locators first, then
data-testidwhere needed. - Run that test in CI with traces enabled.
- Fix flakiness before adding more coverage.
That is enough to learn the tool, prove the value, and build trust in the approach.
Once that first path is stable, add the next one.
Not everything needs an end-to-end test. The paths that matter do.
If you want the official docs after this overview, start here:
Working on something similar?
Need help raising the bar?
I help teams improve engineering practice through hands-on delivery, pragmatic reviews, and mentoring. If you want a second pair of eyes or practical support, let's talk.
- Engineering practice review
- Hands-on delivery
- Team mentoring
If this has been useful, you can back the writing with a one-off tip through a secure Stripe checkout.
Free · Practical · One email per post
Get practical engineering notes
One short email when a new article goes live. Useful if you are breaking into tech, growing as an engineer, or improving engineering practice on your team.
Comments
Loading comments…
Leave a comment