Node.js child process: How to launch external programs

Your computer gives you access to many complex applications that can do various tasks, like text processing, editing images, compressing files, etc. You can launch these programs within Node.js thanks to its child_process module, which can launch a GUI or a command-line program in a separate child process. Once the program finishes running, it returns the output to the Node.js process that launched it.

In this tutorial, we will launch external programs in Node.js using the child_process module.

Jump ahead:

Prerequisites
Why use a child process?
Setting up the directory
Launching an external program and capturing output
Offloading CPU-bound tasks to a child process
Streaming large output from an external program
Chaining external applications
Running shell commands using exec()
Shell injection attacks and how to prevent them

Prerequisites

To follow and understand this tutorial, you will need:

Node.js v ≥ 16 installed
A good understanding of Node.js streams
A basic understanding of the event loop

The tutorial was tested on a Unix-like system. If you are using Windows, some commands won’t work and you have to look for alternatives.

Why use a child process?

When you write a program and execute it with the node command in your terminal, the program becomes a process. A process is an abstraction of a running program that the operating system manages.

A Node.js process has its own memory and a single main thread that is used to execute JavaScript code. Since the code is executed in a single thread, if a task is CPU-bound and time intensive, it can block the event loop. This happens because the task runs continuously in the JavaScript main thread and prevents other code from executing.

To get around this, you can use a child process. A child process is a process created by another process (the parent). Child processes have their advantages:

Run external programs on your system
Offload CPU-bound blocking tasks to a separate process to avoid blocking the main thread

We will look at how to run external programs, as well as offloading blocking tasks to a child process, but first, let’s set up the directory for this tutorial.

Setting up the directory

In this section, we will create the directory where the programs we will write in this tutorial will reside.

To create the directory, open your terminal and enter the following command:

mkdir cp_programs

Move into the directory:

cd cp_programs

Once inside the directory, create the package.json file:

npm init -y

Now that we have the directory, we will launch an external program within Node.js next.

Launching an external program and capturing output

In this section, we will run an external program in Node.js and capture the output so that it can be used in Node.js. To do this, we will use the execFile() method of the child_process module, which runs any program and returns the output.

Before running the external program, let’s look at the program we want to run using Node.js. In your terminal, run the following command:

ls -l
// output
total 4
-rw-rw-r-- 1 stanley stanley 225 Dec 20 06:41 package.json

The ls program lists all files and sub-directories in the current directory; this is the default program on most Unix-like systems. Instead of rewriting the program’s functionality in Node.js, we can just invoke the program externally in Node.js and capture its output.

Now that we’ve identified the program we want to run, create and open the listDir.js file in the text editor and enter the following:

const util = require("node:util");
const execFile = util.promisify(require("node:child_process").execFile);

async function lsDir() {
  const { error, stdout, stderr } = await execFile("ls", ["-l"]);
  console.log(`External Program's output:\n ${stdout}`);
}
lsDir();

In the first line, we import the util package, which provides helpful utility functions. In the second line, we use the util.promisify() method to make the execFile() method use the promise API.

Next, we define the lsDir() function, which runs an external program and logs its output. In the function, we invoke the execFile() method to run the ls command-line program in a separate child process. The method takes two arguments: the program name and an array of the program’s command-line arguments. ls is the program name and -l is an option that modifies the ls command to list detailed files.

After calling the execFile() method, we destructure the object returned by the method into the following variables:

error: This will be set when Node.js has trouble executing your program
[stdout](https://blog.logrocket.com/using-stdout-stdin-stderr-node-js/): This will contain the output returned from the external program
stderr: This will be set if the external program has an error that has nothing to do with Node.js

From there, we log the output in the console and call the lsDir() function.

Before we run the program, let’s make our program log any errors it encounters in the console:

const util = require("node:util");
const execFile = util.promisify(require("node:child_process").execFile);

async function lsDir() {
  const { error, stdout, stderr } = await execFile("ls", ["-l"]);
  // add the following code
  if (error) {
    console.error(error);
    return;
  }
  if (stderr) {
    console.error(stderr);
    return;
  }
  console.log(`External Program's output:\n ${stdout}`);
}
lsDir();

In the preceding code, we check if the error and stderr variables have errors and log the results in the console.

Once you are finished adding the code, save the file. In the terminal, run the program using the node command:

node listDir.js

Upon running the program, the output will look as follows:

External Program's output:
 total 8
-rw-rw-r-- 1 stanley stanley 384 Dec 20 06:51 listDir.js
-rw-rw-r-- 1 stanley stanley 225 Dec 20 06:41 package.json

The program shows the detailed list of files in the directory that the ls program returned when it run.

Now that we can run an external program and capture its output, we will offload blocking tasks into a child process to make them non-blocking in the next section.

Offloading CPU-bound tasks to a child process

In this section, we will create a program that has a blocking CPU-bound task and offload it to a child process to prevent the CPU-intensive task from blocking the main thread. A CPU-bound or CPU-intensive task involves a piece of code that takes hold of the CPU until completion, such as mathematic calculations, image and video processing, encryption, etc.

To offload a CPU-bound task, we will move the CPU-intensive code into a separate program, then use the fork() method to invoke the program in a child process. The fork() method allows the parent and child processes to communicate through messages. So once the child process finishes executing, it will send a message back to the parent containing the data.

To have an idea of how a CPU-bound task can block the main thread, we will first create a program that has a blocking CPU-bound task and make it non-blocking later.

In your text editor, create blockingTask.js and add this code:

function cpuIntensive() {
  console.log("blocking task starts");
  let total = 0;
  for (let i = 0; i < 30_000_000_000; i++) {
    total += i;
  }
  console.log("blocking task finishes");
  return total;
}

console.log(`Calculated value: ${cpuIntensive()}`);

Here, we create the cpuIntensive() function that runs a CPU-bound task. The function contains a loop that iterates 30 billion times and increments the total variable during each iteration. After that, it returns the total variable. This task will take a while to finish.

To see how this task can be blocking, let’s add some non-blocking code to the blockingTask.js file:

function cpuIntensive() {
  console.log("blocking task starts");
  let total = 0;
  for (let i = 0; i < 30_000_000_000; i++) {
    total += i;
  }
  console.log("blocking task finishes");
  return total;
}

console.log(`Calculated value: ${cpuIntensive()}`);

// add the following non-blocking code
js_keywords = ["let", "const", "for"];
console.log("The following are JavaScript Reserved keywords: ");
for (keyword of js_keywords) {
  console.log(keyword);
}

In the last five lines, we add a small loop that iterates three times. This task won’t take long to finish in comparison to the loop that iterates 30 billion times.

Save your file, then run the program:

node blockingTask.js

When you run the program, we will get the following output first:

// output
blocking task starts

After that, we have to wait for a long time to get the rest of the output:

// output
blocking task starts
blocking task finishes
Calculated value: 449999999970159100000
The following are JavaScript Reserved keywords: 
let
const
for

As you can see, the CPU-bound task blocks the main thread and prevents non-blocking tasks from executing. It would be much better to have the non-blocking tasks running at the same time as the blocking tasks for a good user experience.

To do this, we will offload the CPU-intensive loop into another file, and use the fork() method to create a child process, freeing the main thread.

Create and open the cpuBound.js file, then add the following code:

function cpuIntensive() {
  console.log("blocking task starts");
  let total = 0;
  for (let i = 0; i < 30_000_000_000; i++) {
    total += i;
  }
  console.log("blocking task finishes");
  return total;
}

// send a message to the parent process.
process.send(cpuIntensive());

The cpuIntensive() function is the same function we defined in the blockingTask.js file. What’s new here is the process.send() method. The method sends a message containing the value that the cpuIntensive() function returns.

In the blockingTask.js file, remove the cpuIntensive() function and add the following code:

// add the following code
const { fork } = require("node:child_process");

const childProcess = fork(__dirname + "/cpuBound.js");

childProcess.on("message", (message) => {
  console.log(`Calculated value: ${message}`);
});

// code that is non-blocking
js_keywords = ["let", "const", "for"];
console.log("The following are JavaScript Reserved keywords: ");
for (keyword of js_keywords) {
  console.log(keyword);
}

In the first line, we import the fork() method from the child_process module. We then invoke the fork() method with the path to the Node.js program that should run in a child process. After that, we attach the on() method to listen to the messages sent from the child process. Once the message is received, we log it into the console.

Let’s run the blockingTask.js file again:

node blockingTask.js

The output will now match the following:

// output
The following are JavaScript Reserved keywords: 
let
const
for
blocking task starts

You will now see that the non-blocking loop logged the reserved words in the js_keywords array into the console. Earlier in this section, this loop didn’t run until the CPU-bound task was finished.

After a while, we see the output from the CPU-bound task:

// output
...
blocking task finishes
Calculated value: 449999999970159100000

Even though the CPU-intensive function was running, it did not affect the main thread. All the non-blocking code in the parent process was able to execute.

Now that we can offload CPU-bound tasks to another thread to avoid blocking, we will read large files next.

Streaming large output from an external program

So far, we’ve run an external program and captured its output in Node.js using the execFile() method. But if the external program reads a large file, it can lead to memory errors. This happens because the execFile() method stores the output in the buffer, then passes it to your program’s parent process.

To avoid using too much memory, we will need to use the spawn() method, which breaks the external programs’ output into smaller chunks and sends them to the Node.js program. This reduces the amount of memory we use because the program will read the smaller chunks of data as they come in, without keeping all the data in a buffer.

In this section, we’ll write a program that uses the spawn() method to read a large file. We will use the words file in the /usr/share/dict directory, which is available in most Unix-like systems. If you don’t have the file, you can use any large text file of your choice or you can download the sample one here.

In the terminal, copy the dictionary file into the project’s directory:

cp /usr/share/dict/words .

Let’s add an extension to the file:

mv words words.txt

Now read the file using the cat command:

cat words.txt

The command will log an output that looks like the following (I’ve omitted some output for brevity):

// output
...
zucchini
zucchini's
zucchinis
zwieback
zwieback's
zygote
zygote's
zygotes

Let’s now run the command using the child_process module in Node.js. Create and open readLargeFileStreams.js and enter the code below:

const { spawn } = require("node:child_process");
const cat = spawn("cat", ["words.txt"]);

cat.on("error", (error) => {
  console.error(`error: ${error.message}`);
});
cat.stdout.pipe(process.stdout);
cat.stderr.pipe(process.stderr);
cat.on("close", (code) => {
  console.log(`child process exited with code ${code}`);
});

In the first line, we import the spawn() method. In the second line, we call the spawn() method to run the cat program in a child process to read the words.txt file. Since spawn() uses the stream API, we attach an event to listen to the Node.js errors and log the error in the console. After that, we use the stdout.pipe() method to pipe the output from the cat program to the process.stdout, where the chunks of data received will be logged.

Next, we use the stderr.pipe() method to send error messages from the cat program to process.stderr, where they will be logged in the console. Finally, we listen to the close event to log a message in the console.

Save and run the file:

node readLargeFileStreams.js

The output will be shown in the console:

...
zoos
zorch
zucchini
zucchini's
zucchinis
zwieback
zwieback's
zygote
zygote's
zygotes
child process exited with code 0

The whole output wasn’t buffered up; instead, the program received the output in chunks and logged them in the console.

You can now read large files without using too much memory using the spawn() method. In the next section, we will chain external applications.

Chaining external applications

Most programs are designed to do one thing very well. For example, the cat program reads files, and the grep program searches for text. You can chain these programs together to achieve a particular task.

Using the words.txt file, you can read the file using cat, then chain grep to search for words that contain “zip”:

cat words.txt | grep zip

When the cat command reads the words.txt file, its output is passed to the grep command as the input. grep then filters the input to show only words that contain the word “zip”.

You can recreate this behavior in Node.js using the pipe() and the spawn() method.

First, create and open the chainingPrograms.js file, then add the following code:

const { spawn } = require("node:child_process");
const cat = spawn("cat", ["words.txt"]);
const grep = spawn("grep", ["zip"]);

cat.stdout.pipe(grep.stdin);
grep.stdout.pipe(process.stdout);

cat.on("error", (error) => {
  console.error(`error: ${error.message}`);
});
grep.on("error", (error) => {
  console.error(`error: ${error.message}`);
});

In the first three lines, we import spawn(), and then use it to run the cat and the grep commands. The cat command reads the words.txt file, and the grep command searches for words that contain the word “zip”. To pass the cat command output to grep, you use the stdout.pipe() method, which accepts the instance of the program that should receive cat‘s output as the input, which is grep here.

Next, you call stdout.pipe() and pass it process.stdout to log the output in the console. The last six lines check whether the cat or grep instance has an error and logs the error message in the console.

Once you are finished, save the file, and then run the chainingPrograms.js file using Node:

node chainingPrograms.js

Your output will look similar to the following:

// output
marzipan
marzipan's
unzip
unzipped
...
zippiest
zipping
zippy
zip's
zips

You will notice the output only shows words that contain the word “zip”. This confirms that the chaining of the programs works.

Running shell commands using `exec()`

One method we haven’t looked at so far is the exec() method. This method creates a shell and runs any command you pass it; you can even chain the commands and pass them as arguments, something that you can’t do with execFile() because it doesn’t create a shell.

Take the following example:

cat words.txt|nl|grep zip

The cat command reads the words.txt file, which is then passed to the nl command that adds line numbers to the whole file. After that, we use grep to search and return words that contain “zip”.

With what we’ve covered so far, you can chain this command using the spawn() and the pipe() methods in Node.js as demonstrated in the previous section.

With the exec() method, you can pass the chained command and it will be executed in a shell.

To do that, create and open the filterDictionary.js file and enter the following code:

const util = require("node:util");
const exec = util.promisify(require("node:child_process").exec);

async function filterDictionary() {
  const { error, stdout, stderr } = await exec("cat words.txt|nl|grep zip");
  if (error) {
    console.error(error);
    return;
  }
  if (stderr) {
    console.error(stderr);
    return;
  }
  console.log(`External Program's output:\n ${stdout}`);
}

filterDictionary();

First, we import the exec() method into the program. We then define the filterDictionary() function to run an external program in a child process. In the function, we invoke the exec() method with the chained command as the argument. After that, we check and log any errors encountered.

Run the program as follows:

node filterDictionary.js

The output will look as follows:

External Program's output:
  64930 marzipan
 64931  marzipan's
 99883  unzip
 99884  unzipped
...
104280  zipping
104281  zippy
104282  zip's
104283  zips

As you can see, the output shows the line numbers, as well as the words containing “zip”, which proves that the exec() method runs the chained commands successfully without any issue.

Shell injection attacks and how to prevent them

You have now learned how to use the exec() method, which gives you access to the shell and allows you to run any command. While accessing the shell is helpful, it can sometimes be dangerous. This is due to shell injection attacks, where an attacker can append harmful commands to the exec() method input, which can destroy the host’s computer.

To understand how this attack can happen, we will create a program that illustrates this.

First, download the prompt-sync package using npm:

npm install prompt-sync

We will use the package to get input from the user.

Next, create and open the listDirExec.js file and add the following contents:

const util = require("node:util");
const exec = util.promisify(require("node:child_process").exec);
const prompt = require("prompt-sync")({ sigint: true });
const dirname = prompt("Enter the directory you want to list");

In the first two lines, we import and promisify the exec() method. Then, we import the prompt-sync package and use the prompt() method to get input from the user, which is the directory where the user wants the program to list the contents.

In the listDirExec.js file, add the following code to list directory contents:

const util = require("node:util");
const exec = util.promisify(require("node:child_process").exec);
const prompt = require("prompt-sync")({ sigint: true });
const dirname = prompt("Enter the directory you want to list");

// add the following
async function listDir() {
  const { error, stdout, stderr } = await exec(`ls -l ${dirname}`);
  if (error) {
    console.error(error);
    return;
  }
  if (stderr) {
    console.error(stderr);
    return;
  }
  console.log(`External Program's output:\n ${stdout}`);
}
listDir();

In the preceding code, we define the listDir() function to list the directory contents. In the function, we invoke the exec() method, which runs the ls -l command together with the input the user has passed. If the user enters /home/stanley/cp_programs, the command run will be ls -l /home/stanley/cp_programs.

Since we are getting input from the user, someone with malicious intent can append another command to do damage. This can be done by adding a semicolon as follows:

ls -l; free -h

When you run the command in the terminal, it will list the directory contents, and then check the memory usage as follows:

total 1008
-rw-rw-r-- 1 stanley stanley    388 Dec 20 07:04 blockingTask.js
-rw-rw-r-- 1 stanley stanley    347 Dec 20 07:31 chainingPrograms.js
...
-rw-rw-r-- 1 stanley stanley    408 Dec 20 07:14 readLargeFile.js
-rw-rw-r-- 1 stanley stanley    323 Dec 20 07:16 readLargeFileStreams.js
-rw-r--r-- 1 stanley stanley 985084 Dec 20 07:12 words.txt
               total        used        free      shared  buff/cache   available
Mem:           7.6Gi       4.1Gi       124Mi       554Mi       3.4Gi       2.7Gi
Swap:          1.9Gi       137Mi       1.7Gi

Now that we can append commands, run the program as follow:

node listDirExec.js

We will be prompted to enter a directory name. Our application expects the user to enter their chosen directory:

/home/stanley/cp_programs

When the program runs, the output will show the directory contents:

Enter the directory you want to list/home/stanley/cp_programs
External Program's output:
 total 1008
-rw-rw-r-- 1 stanley stanley    388 Dec 20 07:04 blockingTask.js
-rw-rw-r-- 1 stanley stanley    347 Dec 20 07:31 chainingPrograms.js
-rw-rw-r-- 1 stanley stanley    278 Dec 20 07:03 cpuBound.js
-rw-rw-r-- 1 stanley stanley    410 Dec 20 07:36 filterDictionary.js
...
-rw-rw-r-- 1 stanley stanley    323 Dec 20 07:16 readLargeFileStreams.js
-rw-r--r-- 1 stanley stanley 985084 Dec 20 07:12 words.txt

An attacker may have different plans and append another command. Let’s try that by running the program once more:

node listDirExec.js

When prompted, enter the following:

/home/stanley/cp_programs;free -h;df -h

After running the command, the output will look as follows:

Enter the directory you want to list/home/stanley/cp_programs;free -h;df -h
External Program's output:
 total 1008
-rw-rw-r-- 1 stanley stanley    388 Dec 20 07:04 blockingTask.js
...
-rw-rw-r-- 1 stanley stanley    408 Dec 20 07:14 readLargeFile.js
-rw-rw-r-- 1 stanley stanley    323 Dec 20 07:16 readLargeFileStreams.js
-rw-r--r-- 1 stanley stanley 985084 Dec 20 07:12 words.txt
               total        used        free      shared  buff/cache   available
Mem:           7.6Gi       4.1Gi       201Mi       533Mi       3.3Gi       2.7Gi
Swap:          1.9Gi       139Mi       1.7Gi
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           784M  1.9M  782M   1% /run
...
tmpfs           3.9G  147M  3.7G   4% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
/dev/sda4       196M   30M  167M  15% /boot/efi

The output shows the directory contents, the system memory usage, and the file system disk usage.

While the free -h or df -h command we added aren’t harmful, it is not what our program expects as input. The program expects only the directory path but we have been able to manipulate the program to do a different task than intended. An attacker can use this loophole to spy on the system information and even destroy the computer systems.

To protect yourself from these attacks, you need to sanitize the user input. It is also recommended to use the execFile() in place of the exec() method.

Conclusion

In this tutorial, we used the child process module to launch external programs from Node.js. We began by using the execFile() method to run an external program and capture its output. Then we used the fork() method to create a child process to offload blocking CPU-bound tasks. After that, we read large files in Node.js without using too much memory using the spawn() method. Following that, we chained multiple external programs in Node.js. We then used the exec() method to execute commands in a shell. Finally, we will learn about shell injection attacks.

You should now be comfortable using the Node.js child_process module in your projects. To learn more about the module, visit the documentation. To take your learning further, you can learn about the execa library, which is a wrapper around the child_process module.

The post Node.js child process: How to launch external programs appeared first on LogRocket Blog.

from LogRocket Blog https://ift.tt/XNEFQCG
Gain $200 in a week
via Read more

Author Profile

Breaking News

Featured

Node.js child process: How to launch external programs

Prerequisites

Why use a child process?

Setting up the directory

Launching an external program and capturing output

Offloading CPU-bound tasks to a child process

Streaming large output from an external program

Chaining external applications

Running shell commands using `exec()`

Shell injection attacks and how to prevent them

Conclusion

Post a Comment

Report Abuse

About Me

Search This Blog

Popular Posts

Labels

Featured Posts

Featured Posts

Categories

Popular Posts

Footer Copyright

Contact form

Author Profile

Node.js child process: How to launch external programs

Prerequisites

Why use a child process?

Setting up the directory

Launching an external program and capturing output

Offloading CPU-bound tasks to a child process

Streaming large output from an external program

Chaining external applications

Running shell commands using exec()

Shell injection attacks and how to prevent them

Conclusion

You May Like

Post a Comment

Report Abuse

About Me

Search This Blog

Footer Copyright

#buttons=(Accept !) #days=(20)

Contact form

Running shell commands using `exec()`