Return String

Puppeteer: Getting Elements and Values

Updated 2020-07-21

Learn how to find elements on websites and read their values using Puppeteer and Node with this beginner friendly tutorial. By the end of this tutorial you will have learned how to access values you see on websites.

Project Setup

We learned how to set up a basic project with Puppeteer in part 1 of this series. Make sure you have created a new project and installed puppeteer by running npm install puppeteer from terminal in our project folder.

The package.json file should look something like this:

{
    "name": "puppeteer-elements-values",
    "version": "1.0.0",
    "dependencies": {
        "puppeteer": "^5.0.0"
    }
}

Create a file named scraper.js in your project folder and add the code required to launch a browser and navigate to a website.

// Require the puppeteer library
const puppeteer = require('puppeteer');

// To use await we wrap our code in an async function
async function scrape() {
    // Create the browser without headless mode
    const browser = await puppeteer.launch({ headless: false });

    // Create a new page (tab) in the browser
    const page = await browser.newPage();

    // Navigate to a website
    await page.goto(
        'https://returnstring.com/articles/puppeteer-elements-and-values'
    );

    // CODE USING THE BROWSER HERE

    // Close the browser
    await browser.close();
}

// Run our function
scrape().catch(console.error);

See part 1 of this series for a description of how this code works.

Notice we are navigating to this article's page with page.goto for this tutorial.

Getting Element Handles

Element handles are references to elements on the page, but handled by Puppeteer. This comes with some extra methods and different ways of accessing values as well as some protections from our element disappearing before we are done with it.

Single Element

To get a single element we can use the $ method on the page object as well as most element handles. The $ method takes a CSS selector as an argument. It is roughly equal to JavaScript's native querySelector as it tries to find the first element on the page matching the selector or returns null if nothing was found.

const h1Handle = await page.$('h1');

This looks for the first <h1> tag on the page and stores it in our constant h1Handle for later use.

Multiple Elements

To get multiple elements we can use the $$ method attached to all the same objects $ is. Using $$ is roughly equal to JavaScript's native querySelectorAll. This method will return either an array of element handles or an empty array if nothing was found.

const liHandles = await page.$$('li');

This will find all the <li> tags on a page and store it in our constant liHandles as an array. We can loop over the array of element handles with any of the normal javascript array methods like map or forEach.

Getting Values from Handles

Element handles are not elements in the normal JavaScript or HTML sense, they are part of Puppeteer. Normally you would get the value of innerHTML simply by calling element.innerHTML. With Puppeteer's element handles we will have to use the getProperty method to access values of properties on our element.

await SOME_HANDLE.getProperty('innerHTML');

Using getProperty returns what Puppeteer calls a JS handle. To retrieve our value we need to call jsonValue() on our JS handle as Puppeteer sends code to and from the browser using JSON.stringify and JSON.parse.

Example: H1 HTML

To demonstrate using the $ operator, let us try to read the innerHTML of the first <h1> element we find on the page and log it out to our console with console.log.

// Find the first h1 on the page
const elementHandle = await page.$('h1');

// Get the element's innerHTML as JS handle
const jsHandle = await elementHandle.getProperty('innerHTML');

// Deserialize our value from the JS handle
const plainValue = await jsHandle.jsonValue();

// Log out our h1's innerHTML
console.log(plainValue);

Adding this code to our scraper.js file should log the title of this article to your terminal when you run node scraper.js from your project folder to execute your code.

Full Code Example

// Require the puppeteer library
const puppeteer = require('puppeteer');

// To use await we wrap our code in an async function
async function scrape() {
    // Create the browser without headless mode
    const browser = await puppeteer.launch({ headless: false });

    // Create a new page (tab) in the browser
    const page = await browser.newPage();

    // Navigate to a website
    await page.goto(
        'https://returnstring.com/articles/puppeteer-elements-and-values'
    );

    // Find the first h1 on the page
    const elementHandle = await page.$('h1');

    // Get the element's innerHTML as JS handle
    const jsHandle = await elementHandle.getProperty('innerHTML');

    // Deserialize our value from the JS handle
    const plainValue = await jsHandle.jsonValue();

    // Log out our h1's innerHTML
    console.log(plainValue);

    // Close the browser
    await browser.close();
}

// Run our function
scrape().catch(console.error);

Next Steps

You have successfully created a scraper that automates searching while waiting for navigation between pages, both by page.goto and by interacting with a submit button like a real user would.

Continue on to the next part of the series to learn how to run any JavaScript you want on the page through Puppeteer.