Generating thumbnails for PDFs using GhostScript and Graphics Magic Lambda layers in Node.js

News
01 Apr 2023 / by Ashith VL & Srilekha in NodeJS

Let's dive into why pdf thumbnails are important!

The process of generating thumbnails for PDFs is essential to a lot of businesses and organizations. Whether used for marketing or simply to provide a visual representation of the content, thumbnails are an important part of any document. But generating thumbnails for PDFs can be a time consuming and resource-intensive process. Luckily, there are now a number of services available that can generate thumbnails for PDFs using Node.js on AWS Lambda. These services allow you to easily generate thumbnails and quickly deploy them to your applications.

There are various NPM packages that are available for generating pdf thumbnail. But when they are used the generated thumbnails of PDFs are blurred, to overcome this and get a quality thumbnail, the Ghostscript, ImagicMagick and GraphicMagic lambda layers are used.

How AWS lambda and S3 service are used for thumbnail generation?

Thumbnails are a great way to quickly view the contents of a PDF file without having to open it completely. Generating thumbnails for PDF files using Node.js on AWS Lambda can be a great way to optimize the process of viewing PDFs. In this blog, we’ll discuss the process of setting up a Node.js application on AWS Lambda to generate thumbnails for PDFs.

First things first, you need to set up an AWS Lambda function that will be used to generate the thumbnails. You will need to create an IAM role with the necessary permissions to access the S3 bucket that contains the PDFs. Once you have created the IAM role, you can create the Lambda function and set the IAM role as the execution role. Next, you need to install the necessary packages and dependencies. For this, you should use the Node Package Manager (NPM) and install the dependencies that you might need.

Once the packages and dependencies are installed, you need to write the code for the Lambda function. The code should take an S3 object, read the PDF, and generate a thumbnail. You can also modify the code to generate multiple thumbnails of different sizes, as well as resize images, if necessary. Finally, you need to set up an Amazon S3 trigger to invoke the Lambda function when a new PDF is uploaded to the S3 bucket. This will allow the Lambda function to generate the thumbnail for the new PDF. Generating thumbnails for PDFs using Node.js on AWS Lambda can be a great way to optimize the process of viewing PDFs. It is a simple process to set up, and allows you to quickly generate thumbnails for multiple PDFs without having to manually do it yourself.

Architecture Diagram

magick

Why Ghostscript, GraphicMagick and ImageMagick lambda layers?

Ghostscript, GraphicsMagick, and ImageMagick are all powerful tools used to manipulate images and PDFs. These tools can be used to generate PDF thumbnails on AWS Lambda by resizing and converting PDFs into images. Ghostscript is used to convert PDFs into bitmap formats, GraphicsMagick or ImageMagick can then be used to further modify the bitmap and generate quality thumbnails. These tools are especially useful when you need to process a large number of PDFs quickly and efficiently.

Adding Ghostscript, GraphicMagick and ImageMagick lambda layers in two ways:

  1. Using binary files

  1. Create a Lambda Layer for Ghostscript
  1. Download the Ghostscript binaries from the official website
  2. Create a zip archive of the Ghostscript binary files
  3. Create a new Lambda Layer using the AWS Lambda console
  4. Upload the zip archive as the source code and configure the Layer as needed

  1. Create a Lambda Layer for GraphicsMagick
  1. Download the GraphicsMagick binaries from the official website
  2. Create a zip archive of the GraphicsMagick binary files
  3. Create a new Lambda Layer using the AWS Lambda console
  4. Upload the zip archive as the source code and configure the Layer as needed.

  1. Create a Lambda Layer for ImageMagick
  1. Download the ImageMagick binaries from the official website
  2. Create a zip archive of the ImageMagick binary files
  3. Create a new Lambda Layer using the AWS Lambda console

  1. Using ARN
  1. Login to AWS Console
  2. Go to the Lambda service page
  3. Click on the "Layers" section in the left sidebar
  4. Click on the "Create Layer" button
  5. Select the "Specific an ARN" option
  6. Enter the ARN for the Ghostscript, GraphicsMagick and ImageMagick layers
  7. Click "Add"
addlayer
addlayer

How PDF thumbnails are generated using the specified lambda layers in node.js and uploaded in S3?

  • Create a Lambda function with Node.js code to generate thumbnails from PDF files.
  • Use Ghostscript to convert the PDF file into an image file, such as a JPEG or PNG.
  • Use GraphicsMagick or ImageMagick to resize the image file to the desired size.
  • Upload the image file to an Amazon S3 bucket using the AWS SDK for Node.js.

Code

const gm = require('gm').subClass({ ImageMagick: true });
const fs = require('fs');
const spawn = require('child_process').spawn;

exports.handler = (event, context, callback) => {
  const fileName = event.Records[0].s3.object.key;
  const bucketName = event.Records[0].s3.bucket.name;
  const srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
  const params = {
    Bucket: bucketName,
    Key: srcKey
  };

// download file from S3
const fileStream = s3.getObject(params).createReadStream();

// convert pdf to png
const proc = spawn('gs', [
  '-dNOPAUSE',
  '-sDEVICE=pngalpha',
  '-dFirstPage=1',
  '-dLastPage=1',
  '-sOutputFile=-',
  '-r150',
  '-q',
  '-dbatch',
  '-dSAFER',
  '-dTextAlphaBits=4',
  '-dGraphicsAlphaBits=4',
  '-dMaxStripSize=8192',
  '-'
]);

// pipe the file stream to ghostscript process
fileStream.pipe(proc.stdin);

// create a new writeable stream
const writeStream = fs.createWriteStream(`/tmp/${fileName}.png`);

// pipe the ghostscript output to writeable stream
proc.stdout.pipe(writeStream);

// when ghostscript is done, pipe it to graphicmagic
proc.on('close', () => {
  gm(`/tmp/${fileName}.png`)
    .resize(200, 200)
    .write(`/tmp/${fileName}.thumb.png`, (err) => {
      if (err) {
        callback(err);
      }

      // upload the thumb to S3
      const uploadParams = {
        Bucket: bucketName,
        Key: `thumb/${fileName}.thumb.png`,
        Body: fs.createReadStream(`/tmp/${fileName}.thumb.png`),
        ContentType: 'image/png'
      };

      s3.upload(uploadParams, (err, data) => {
        if (err) {
          callback(err);
        } else {
          callback(null, data);
        }
      });
    });
});
};

              

The above code can perform as an API to which we can input the bucket and object key details of pdf and generated thumbnails for them. If one wishes to generate thumbnails for all the PDF files that are uploaded on a particular S3 bucket automatically instead of invoking APIs then that can be done using AWS triggers.

To configure a Lambda function as a trigger for all object creation events for PDF files in an Amazon S3 bucket, you need to perform the following steps:

  • Sign in to the AWS Management Console and open the Lambda console.
  • Create a new Lambda function and select a runtime.
  • Configure the function as needed.
  • In the Designer tab, click on “S3” from the list of event sources.
  • Select the bucket that you want to trigger the function from and click “Next.”
  • In the next section, set the Prefix and Suffix to match the objects in the bucket that you want the function to trigger on, for pdf, we can set it as .pdf.
  • Click “Add” and then “Save” to save the changes.
  • Test the function to make sure it is working as expected.
addlayer
addlayer
addlayer

Advantages of our implementation of PDF thumbnail generation over other pdf thumbnail generation libraries that are available in Node.js:

  1. Ghostscript, ImageMagick, and Graphic Magick lambda layers are much more reliable than other available pdf thumbnail generation libraries in node.js. They can be used to generate thumbnails in a more consistent and accurate way, and allow for more control over the process.
  2. They are all can be used to generate thumbnails from a variety of formats, such as PDF, TIFF, JPG, and PNG. This allows for much more flexibility in the types of thumbnails that can be generated.
  3. They are all optimized for serverless workloads, allowing for faster and more efficient thumbnail generation.
  4. They also come with a wide range of options and settings, giving users the ability to customize their experience and tailor the generated thumbnails to their specific needs.

Conclusion

The combination of Ghostscript, GraphicsMagick, ImageMagick lambda Layers, Node.js and S3 provides a powerful and efficient way to generate and store PDF thumbnails. This system is highly scalable and can accommodate high volumes of PDFs. With the help of this system, businesses can now easily manage their PDFs and quickly access them when needed.