Dynamic Sitemaps in Next.js

In this post, we will see how to create dynamic sitemaps in Next.js.

What Is a Sitemap?

A sitemap is an XML file where you provide information about the pages on your website. A sitemap tells Search engines such as Google, which pages are important on your website and need to be indexed.

Add a Sitemap to a Next.js Website

To add a dynamic sitemap to a Next.js website we will use next-sitemap package.

Installation

yarn add next-sitemap
// or
npm install next-sitemap

Create config file

next-sitemap requires a basic config file next-sitemap.config.js under your project root

// next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */

const siteUrl = 'https://example.com';

const config = {
  siteUrl
};

Build the sitemap

Add next-sitemap as your postbuild script

{
  "build": "next build",
  "postbuild": "next-sitemap"
}

To build the sitemap, run npm run build or yarn build. By doing this, you will build the website and automatically create a sitemap.xml file in the public folder.

The sitemap.xml contains the list of URLs generated.

To tell Google where to find the sitemap.xml file and which URLs the crawler can access on your website, you have to add a robots.txt file.

Create a robots.txt file

To automatically create a robots.txt file, you need to add the generateRobotsTxt option in the config file

// next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */

const siteUrl = 'https://example.com';

const config = {
  siteUrl,
  generateRobotsTxt: true // generates robots.txt 
};

Prevent Google from Indexing certain web pages

In some cases, we want to prevent Google from indexing certain pages from our website.

The first step is to exclude a certain page URL from the sitemap list.

// next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */

const siteUrl = 'https://example.com';

const config = {
  siteUrl,
  generateRobotsTxt: true,
  exclude: ['/protected-page', '/secret-page'] // exlude here 
};

Even excluding certain pages from the sitemap list, Google can still find and index them. To be sure that you completely exclude these pages from being indexed, you have to add some policies to the robots.txt file. To do that you have to add robotsTxtOptions in the config file

// next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */

const siteUrl = 'https://example.com';

const config = {
  siteUrl,
  generateRobotsTxt: true,
  exclude: ['/protected-page', '/secret-page'], // exlude here
  robotsTxtOptions: {
    policies: [
      { userAgent: '*', disallow: '/protected-page' }, // not indexed
      { userAgent: '*', disallow: '/secret-page' }, // not indexed
      { userAgent: '*', allow: '/' }, // index the rest of the pages
    ],
  },
};

The above configuration will generate a robots.txt file like this.

# *
User-agent: *
Disallow: /protected-page

# *
User-agent: *
Disallow: /secret-page

# *
User-agent: *
Allow: /

# Host
Host: https://www.example.com

# Sitemaps
Sitemap: https://www.example.com/sitemap.xml

Generating dynamic/server-side sitemaps

To generate a dynamic/server-side sitemap, create pages/server-sitemap-index.xml/index.tsx page and add the following content.

The robots.txt will tell Google to index all the URLs generated in the sitemap.xml file.

// pages/server-sitemap.xml/index.tsx

import { GetServerSideProps } from 'next';
import { getServerSideSitemap, ISitemapField } from 'next-sitemap';

export const getServerSideProps: GetServerSideProps = async (ctx) => {
  // Method to source urls from cms
  // const response = await fetch('https//example.com/api')
  // const fields =  await response.json()

  const fields: ISitemapField[] = [
    {
      loc: 'https://example.com', // Absolute url
      lastmod: new Date().toISOString(),
    },
    {
      loc: 'https://example.com/dynamic-path-2', // Absolute url
      lastmod: new Date().toISOString(),
    },
  ];

  return getServerSideSitemap(ctx, fields);
};

// Default export to prevent next.js errors
export default function Sitemap() {}

Now, next.js is serving the dynamic sitemap from http://localhost:3000/server-sitemap.xml.

List the dynamic sitemap page in robotsTxtOptions.additionalSitemaps and exclude this path from the static sitemap list.

// next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */

const siteUrl = 'https://example.com';

const config = {
  siteUrl,
  generateRobotsTxt: true,
  exclude: ['/server-sitemap.xml'],  // exclude here
  robotsTxtOptions: {
    additionalSitemaps: [
      `${siteUrl}/server-sitemap.xml`, // add here
    ],
  },
};

export default config;

In this way, next-sitemap will manage the sitemaps for all your static pages and your dynamic sitemap will be listed on robots.txt.

Exclude generated files from the git commit

The last thing to do is to exclude generated files from git commit. New sitemap.xml and robots.txt files will be generated on the server on each build.

# .gitignore

/public/sitemap.xml
/public/robots.txt