Return String

Puppeteer: Clicking, Typing and Waiting

Updated 2020-07-21

Learn how to wait for pages and elements to load as we automate searching with DuckDuckGo with this beginner friendly tutorial. By the end of this tutorial you will know how to interact with dynamic websites and web applications.

Project Setup

In part 1 of this series we learned how to setup a project using Puppeteer. Make sure you have created a new project and installed puppeteer with npm install puppeteer in a terminal from your project folder.

Your package.json file should look something like this:

{
    "name": "puppeteer-click-wait",
    "version": "1.0.0",
    "dependencies": {
        "puppeteer": "^5.0.0"
    }
}

Create a file named scraper.js in your project folder and add the code required to launch a browser and navigate to a website.

// Require the puppeteer library
const puppeteer = require('puppeteer');

// To use await we wrap our code in an async function
async function scrape() {
    // Create the browser without headless mode
    const browser = await puppeteer.launch({ headless: false });

    // Create a new page (tab) in the browser
    const page = await browser.newPage();

    // Navigate to a website
    await page.goto('https://returnstring.com');

    // CODE USING THE BROWSER HERE

    // Close the browser
    await browser.close();
}

// Run our function
scrape().catch(console.error);

How this code works was described in part 1 of this series.

Waiting with Puppeteer

A large part of ensuring our code controlling the browser executes as expected is making sure pages and elements are loaded before we try to interact with them. If we don't wait for the DOM or element we are working with to load: our script may work when the page loads fast and, seemingly randomly, fail when a response is slow.

Waiting on Page Change

When we use page.goto it is the same as using the address bar in the browser and entering to that url. Puppeteer will wait for the browser's load event to happen when using this function. However, in a world built on javascript, you will likely want to specify some wait options.

The options for page.goto allows us to define an object with timeout and waitUntil. The timeout is how long Puppeteer will wait, in milliseconds, for the page to load before it throws and error and stops running. For waitUntil we can choose from: load, domcontentloaded, networkidle0, networkidle2.

load and domcontentloaded waits for the events with the same name to happen in the browser. networkidle0 and networkidle2 waits for Puppeteer to detect either no or only 2 network requests in the last 500 milliseconds. These options can be used alone as a string or together in an array of strings.

const pageLoadOptions = {
    timeout: 10000,
    waitUntil: ['domcontentloaded', 'networkidle0']
};
await page.goto('https://returnstring.com', pageLoadOptions);

This configuration would wait a maximum of 10 seconds before throwing and error, and it would be considered successful when the browser's DomContentLoaded event is fired in the browser, and there has been no more than 2 network requests the last half of a second.

Waiting for Elements

It can be very useful to wait for specific elements you know you want to interact with. This is an especially useful skill when working with JavaScript heavy web applications where the page changes dynamically. To wait for an element to appear on the page we can use page.waitForSelector to wait for any CSS selector.

await page.waitForSelector('article.content');

This code would wait for an <article> tag with a class of content to show up on the webpage before continuing. You could wait for anything you can describe as a CSS selector.

Automating DuckDuckGo

To demonstrate these concepts in action we will create a small scraper that navigates to DuckDuckGo and wait for the search box to load. DuckDuckGo is a privacy oriented alternative to Google.

Waiting for DuckDuckGo

We will use page.goto with an option to wait for the DOM to load and then wait for an input box to appear on the page.

// Navigate to a website and wait for DOM
await page.goto('https://duckduckgo.com', {
    waitUntil: 'domcontentloaded'
});

// Wait for an input element to appear
await page.waitForSelector('input');

This should navigate to DuckDuckGo, then wait until the DOM is loaded as well as wait for an <input> element to appear. This is perhaps overkill for our specific example, but the pattern will serve you well as you automate more complex websites and applications.

Typing in the Search Field

To type in the input field we have waited for, we can use Puppeteer's page method page.type which takes a CSS selector to find the element you want to type in and a string you wish to type in the field.

Lastly we can take a screenshot with page.screenshot or use waitFor to make sure you can see the results of your code.

// Type in first input element
await page.type('input', 'puppeteer is awesome');

// Takes a screenshot so you can see the result
await page.screenshot('screenshot.jpg');

This will attempt to type puppeteer is awesome in the first input box it finds on the page and then take a screenshot of the page named screenshot.jpg in your project folder. Taking screenshots was described in part 2 of this series.

The Code So Far

// Require the puppeteer library
const puppeteer = require('puppeteer');

// To use await we wrap our code in an async function
async function scrape() {
    // Create the browser without headless mode
    const browser = await puppeteer.launch({ headless: false });

    // Create a new page (tab) in the browser
    const page = await browser.newPage();

    // Navigate to a website and wait for DOM
    await page.goto('https://duckduckgo.com', {
        waitUntil: 'domcontentloaded'
    });

    // Wait for an input element to appear
    await page.waitForSelector('input');

    // Type in first input element
    await page.type('input', 'puppeteer is awesome');

    // Take a screenshot
    await page.screenshot({ path: 'screenshot.jpg' });

    // Close the browser
    await browser.close();
}

// Run our function
scrape().catch(console.error);

Clicking and Waiting

We have filled out the input field on DuckDuckGo. Now we need to click the submit button and wait for the search result page to load.

To click an element, we can use a CSS selector with page.click. We can select the first submit button on the page by using the selector input[type="submit"]

await page.click('input[type="submit"]');

To wait for the page to change after we click the submit button, we will have to use page.waitForNavigation. This allows for similar options to page.goto that we explored earlier in this article if you need more control.

To ensure page.waitForNavigation works as intended it needs to be invoked first. However if we wait for it it will tell us the page is already loaded before page.click is run. To avoid this issue we can wrap both of the calls we want to wait for in an array and pass it to Promise.all. This will allow us for wait for multiple actions in one go.

// Allows us to await multiple actions
await Promise.all([
    // Waits for navigation and no active network connections
    page.waitForNavigation({ waitUntil: 'networkidle0' }),
    // Clicks on first submit button
    page.click('input[type="submit"]')
]);

This instructs Puppeteer to check for when the page changes and click the submit button. In this case Puppeteer will run both actions, then wait both to be done, but it will not care which action finishes first.

The Code

This is a full code for our DuckDuckGo example.

// Require the puppeteer library
const puppeteer = require('puppeteer');

// To use await we wrap our code in an async function
async function scrape() {
    // Create the browser without headless mode
    const browser = await puppeteer.launch({ headless: false });

    // Create a new page (tab) in the browser
    const page = await browser.newPage();

    // Navigate to a website and wait for DOM
    await page.goto('https://duckduckgo.com', {
        waitUntil: 'domcontentloaded'
    });

    // Wait for an input element to appear
    await page.waitForSelector('input');

    // Type in first input element
    await page.type('input', 'puppeteer is awesome');

    // Allows us to await multiple actions
    await Promise.all([
        // Waits for navigation and no active network connections
        page.waitForNavigation({ waitUntil: 'networkidle0' }),

        // Clicks on first submit button
        page.click('input[type="submit"]')
    ]);

    // Take a screenshot
    await page.screenshot({ path: 'screenshot.jpg' });

    // Close the browser
    await browser.close();
}

// Run our function
scrape().catch(console.error);

Next Steps

You have successfully created a scraper that automates searching while waiting for navigation between pages, both by page.goto and by interacting with a submit button like a real user would.

Continue on to the next part of the series to learn how to find elements on the page and access their values.