Return String

Puppeteer: Better Project Setup

Learn to fortify your Puppeteer code with error handling to provide more stability and easier debugging with this beginner friendly tutorial. With this final part of the series we will look at improving our setup.

In part 1 of this series we learned to get up and running with Puppeteer. For the last part of this series, it seems only fitting we now go over some improvements to our initial script. This will help us catch and display errors as well as ensure the browser always closes to avoid memory leaks.

Error Handling

Why Errors Happen

There are a lot of reasons errors can happen. Generally it involves a mistake in the code we have written. However, when we automate a browser we often rely on connections over a network.

Network connections are unreliable. Sometimes network requests fail, sometimes a packet gets lost, sometimes everything is just slow, and none of it is under your control. To write resilient tests and scrapers, we need to keep considering how our code can fail: This is why we learned a resilient way of clicking and waiting in part 3 of this series.

Try/Catch with Await

With JavaScript's try and catch blocks we can attempt to run some code and have a chance to inspect and handle any error it throws. If the code inside a try block throws an error it will send the error to the catch block. This comes in handy to log errors as well as recover from the error if possible.

try {
    // Some code that may fail
    await someAsyncFunction();
} catch (error) {
    // Your chance to handle the error
    console.error(error);
}

Note that this only works for async functions if you use await within the try block. If we do not wait for the async function, Node (or the browser) will continue to execute the next line of code without waiting for the function to finish running. That means we may have left the try block before the function throws an error.

Catching Promises

We can use .catch on any function that returns a Promise, which includes all async functions.

Normally Node would print a lot of error messages to the terminal whenever something goes wrong, this is what we have avoided so far by using .catch(console.error). We can use .catch to make sure we handle every error an async function will throw, even if we do not await the function inside a try block.

Having the .catch() when we call scrape() will allow us to process the error with a callback. A callback is a function that will be executed later, in this case with the error as an argument.

function loggingFunction(error) {
    // Our chance to handle or log the error
    console.debug('Oh no! An error happened!');
    console.error(error);
}

scrape().catch(loggingFunction);

We could use an arrow function instead of loggingFunction from the example, or simply pass in the console.error function as we have done so far.

Any of these solutions would catch any error thrown in scrape and log it neatly to the console. We will likely want to use further try/catch blocks in our code if we want to recover from an error. However, having a .catch on our async function call provides a great safety net that we will catch and log any error we might not otherwise handle inside the function.

Always Close The Browser

One big pitfall of browser automation is to leave the browser open after an error happens. This will lead to a bunch of wasted memory! To avoid this we can use the finally block.

Finally Block

The finally block is attached to a try block, like a catch block. The code inside the finally block will always run as the last thing, wether the code inside the try block throws an error or completes successfully.

try {
    // Some code that may fail
    await someAsyncFunction();
} catch (error) {
    // Your chance to handle the error
    console.error(error);
} finally {
    // This code will always run
    cleanUpThings();
}

Closing the Browser

Using this pattern we can wrap everything after we create the browser in a try block.

We can, optionally, add a catch block to handle or log our errors.

Most importantly we should add a finally block to make sure, no matter what happens, our browser always closes.

async function scrape() {
    // Create the browser
    const browser = await puppeteer.launch({ headless: false });

    // Wrap scraping/testing code in try
    try {
        // Create a new page (tab) in the browser
        const page = await browser.newPage();

        // CODE USING PUPPETEER HERE

        // Catch and log errors
    } catch (error) {
        // Your chance to handle errors
        console.error(error);
    } finally {
        // Always close the browser
        await browser.close();
    }
}

Note that if we launched the browser inside the try block, we would not have access to it in the finally block.

If our browser cannot launch the error will be caught by scrape().catch(), not the catch (error) { ... } block we just added.

The Code So Far

// Require the puppeteer library
const puppeteer = require('puppeteer');

async function scrape() {
    // Create the browser
    const browser = await puppeteer.launch({ headless: false });

    // Wrap scraping/testing code in try
    try {
        // Create a new page (tab) in the browser
        const page = await browser.newPage();

        // CODE USING PUPPETEER HERE

        // Catch and log errors
    } catch (error) {
        // Your chance to handle errors
        console.error(error);
    } finally {
        // Always close the browser
        await browser.close();
    }
}

// Run our function
scrape().catch(console.error);

Conclusion

In this article we have fortified our initial project template for Puppeteer to handle errors. If you have followed along with the entire series you have the basic skills needed to automate most of what browsers can do!

I urge you to go try out your skills on small projects that will serve yourself or your business.