• Import from "@langchain/community/document_loaders/web/puppeteer" instead. This entrypoint will be removed in 0.3.0.

Class that extends the BaseDocumentLoader class and implements the DocumentLoader interface. It represents a document loader for scraping web pages using Puppeteer.

const loader = new PuppeteerWebBaseLoader("https:exampleurl.com", {
launchOptions: {
headless: true,
},
gotoOptions: {
waitUntil: "domcontentloaded",
},
});
const screenshot = await loader.screenshot();

Hierarchy (view full)

Implements

Constructors

Properties

options: undefined | PuppeteerWebBaseLoaderOptions
webPath: string

Methods

  • Method that calls the scrape method and returns the scraped HTML content as a Document object.

    Returns Promise<Document[]>

    Promise that resolves to an array of Document objects.

  • Method that calls the _scrape method to perform the scraping of the web page specified by the webPath property.

    Returns Promise<string>

    Promise that resolves to the scraped HTML content of the web page.

  • Screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.

    Returns Promise<Document>

    A document object containing the screenshot of the page encoded in base64.

  • Static method that imports the necessary Puppeteer modules. It returns a Promise that resolves to an object containing the imported modules.

    Returns Promise<{
        launch: ((options?: PuppeteerLaunchOptions) => Promise<Browser>);
    }>

    Promise that resolves to an object containing the imported Puppeteer modules.

""