Your computer gives you access to many complex applications that can do various tasks, like text processing, editing images, compressing files, etc. You can launch these programs within Node.js thanks to its child_process
module, which can launch a GUI or a command-line program in a separate child process. Once the program finishes running, it returns the output to the Node.js process that launched it.
In this tutorial, we will launch external programs in Node.js using the child_process
module.
Jump ahead:
- Prerequisites
- Why use a child process?
- Setting up the directory
- Launching an external program and capturing output
- Offloading CPU-bound tasks to a child process
- Streaming large output from an external program
- Chaining external applications
- Running shell commands using
exec()
- Shell injection attacks and how to prevent them
Prerequisites
To follow and understand this tutorial, you will need:
- Node.js v ≥ 16 installed
- A good understanding of Node.js streams
- A basic understanding of the event loop
The tutorial was tested on a Unix-like system. If you are using Windows, some commands won’t work and you have to look for alternatives.
Why use a child process?
When you write a program and execute it with the node
command in your terminal, the program becomes a process. A process is an abstraction of a running program that the operating system manages.
A Node.js process has its own memory and a single main thread that is used to execute JavaScript code. Since the code is executed in a single thread, if a task is CPU-bound and time intensive, it can block the event loop. This happens because the task runs continuously in the JavaScript main thread and prevents other code from executing.
To get around this, you can use a child process. A child process is a process created by another process (the parent). Child processes have their advantages:
- Run external programs on your system
- Offload CPU-bound blocking tasks to a separate process to avoid blocking the main thread
We will look at how to run external programs, as well as offloading blocking tasks to a child process, but first, let’s set up the directory for this tutorial.
Setting up the directory
In this section, we will create the directory where the programs we will write in this tutorial will reside.
To create the directory, open your terminal and enter the following command:
mkdir cp_programs
Move into the directory:
cd cp_programs
Once inside the directory, create the package.json
file:
npm init -y
Now that we have the directory, we will launch an external program within Node.js next.
Launching an external program and capturing output
In this section, we will run an external program in Node.js and capture the output so that it can be used in Node.js. To do this, we will use the execFile()
method of the child_process
module, which runs any program and returns the output.
Before running the external program, let’s look at the program we want to run using Node.js. In your terminal, run the following command:
ls -l // output total 4 -rw-rw-r-- 1 stanley stanley 225 Dec 20 06:41 package.json
The ls
program lists all files and sub-directories in the current directory; this is the default program on most Unix-like systems. Instead of rewriting the program’s functionality in Node.js, we can just invoke the program externally in Node.js and capture its output.
Now that we’ve identified the program we want to run, create and open the listDir.js
file in the text editor and enter the following:
const util = require("node:util"); const execFile = util.promisify(require("node:child_process").execFile); async function lsDir() { const { error, stdout, stderr } = await execFile("ls", ["-l"]); console.log(`External Program's output:\n ${stdout}`); } lsDir();
In the first line, we import the util
package, which provides helpful utility functions. In the second line, we use the util.promisify()
method to make the execFile()
method use the promise API.
Next, we define the lsDir()
function, which runs an external program and logs its output. In the function, we invoke the execFile()
method to run the ls
command-line program in a separate child process. The method takes two arguments: the program name and an array of the program’s command-line arguments. ls
is the program name and -l
is an option that modifies the ls
command to list detailed files.
After calling the execFile()
method, we destructure the object returned by the method into the following variables:
error
: This will be set when Node.js has trouble executing your program[stdout](https://blog.logrocket.com/using-stdout-stdin-stderr-node-js/)
: This will contain the output returned from the external programstderr
: This will be set if the external program has an error that has nothing to do with Node.js
From there, we log the output in the console and call the lsDir()
function.
Before we run the program, let’s make our program log any errors it encounters in the console:
const util = require("node:util"); const execFile = util.promisify(require("node:child_process").execFile); async function lsDir() { const { error, stdout, stderr } = await execFile("ls", ["-l"]); // add the following code if (error) { console.error(error); return; } if (stderr) { console.error(stderr); return; } console.log(`External Program's output:\n ${stdout}`); } lsDir();
In the preceding code, we check if the error
and stderr
variables have errors and log the results in the console.
Once you are finished adding the code, save the file. In the terminal, run the program using the node
command:
node listDir.js
Upon running the program, the output will look as follows:
External Program's output: total 8 -rw-rw-r-- 1 stanley stanley 384 Dec 20 06:51 listDir.js -rw-rw-r-- 1 stanley stanley 225 Dec 20 06:41 package.json
The program shows the detailed list of files in the directory that the ls
program returned when it run.
Now that we can run an external program and capture its output, we will offload blocking tasks into a child process to make them non-blocking in the next section.
Offloading CPU-bound tasks to a child process
In this section, we will create a program that has a blocking CPU-bound task and offload it to a child process to prevent the CPU-intensive task from blocking the main thread. A CPU-bound or CPU-intensive task involves a piece of code that takes hold of the CPU until completion, such as mathematic calculations, image and video processing, encryption, etc.
To offload a CPU-bound task, we will move the CPU-intensive code into a separate program, then use the fork()
method to invoke the program in a child process. The fork()
method allows the parent and child processes to communicate through messages. So once the child process finishes executing, it will send a message back to the parent containing the data.
To have an idea of how a CPU-bound task can block the main thread, we will first create a program that has a blocking CPU-bound task and make it non-blocking later.
In your text editor, create blockingTask.js
and add this code:
function cpuIntensive() { console.log("blocking task starts"); let total = 0; for (let i = 0; i < 30_000_000_000; i++) { total += i; } console.log("blocking task finishes"); return total; } console.log(`Calculated value: ${cpuIntensive()}`);
Here, we create the cpuIntensive()
function that runs a CPU-bound task. The function contains a loop that iterates 30 billion times and increments the total
variable during each iteration. After that, it returns the total
variable. This task will take a while to finish.
To see how this task can be blocking, let’s add some non-blocking code to the blockingTask.js
file:
function cpuIntensive() { console.log("blocking task starts"); let total = 0; for (let i = 0; i < 30_000_000_000; i++) { total += i; } console.log("blocking task finishes"); return total; } console.log(`Calculated value: ${cpuIntensive()}`); // add the following non-blocking code js_keywords = ["let", "const", "for"]; console.log("The following are JavaScript Reserved keywords: "); for (keyword of js_keywords) { console.log(keyword); }
In the last five lines, we add a small loop that iterates three times. This task won’t take long to finish in comparison to the loop that iterates 30 billion times.
Save your file, then run the program:
node blockingTask.js
When you run the program, we will get the following output first:
// output blocking task starts
After that, we have to wait for a long time to get the rest of the output:
// output blocking task starts blocking task finishes Calculated value: 449999999970159100000 The following are JavaScript Reserved keywords: let const for
As you can see, the CPU-bound task blocks the main thread and prevents non-blocking tasks from executing. It would be much better to have the non-blocking tasks running at the same time as the blocking tasks for a good user experience.
To do this, we will offload the CPU-intensive loop into another file, and use the fork()
method to create a child process, freeing the main thread.
Create and open the cpuBound.js
file, then add the following code:
function cpuIntensive() { console.log("blocking task starts"); let total = 0; for (let i = 0; i < 30_000_000_000; i++) { total += i; } console.log("blocking task finishes"); return total; } // send a message to the parent process. process.send(cpuIntensive());
The cpuIntensive()
function is the same function we defined in the blockingTask.js
file. What’s new here is the process.send()
method. The method sends a message containing the value that the cpuIntensive()
function returns.
In the blockingTask.js
file, remove the cpuIntensive()
function and add the following code:
// add the following code const { fork } = require("node:child_process"); const childProcess = fork(__dirname + "/cpuBound.js"); childProcess.on("message", (message) => { console.log(`Calculated value: ${message}`); }); // code that is non-blocking js_keywords = ["let", "const", "for"]; console.log("The following are JavaScript Reserved keywords: "); for (keyword of js_keywords) { console.log(keyword); }
In the first line, we import the fork()
method from the child_process
module. We then invoke the fork()
method with the path to the Node.js program that should run in a child process. After that, we attach the on()
method to listen to the messages sent from the child process. Once the message is received, we log it into the console.
Let’s run the blockingTask.js
file again:
node blockingTask.js
The output will now match the following:
// output The following are JavaScript Reserved keywords: let const for blocking task starts
You will now see that the non-blocking loop logged the reserved words in the js_keywords
array into the console. Earlier in this section, this loop didn’t run until the CPU-bound task was finished.
After a while, we see the output from the CPU-bound task:
// output ... blocking task finishes Calculated value: 449999999970159100000
Even though the CPU-intensive function was running, it did not affect the main thread. All the non-blocking code in the parent process was able to execute.
Now that we can offload CPU-bound tasks to another thread to avoid blocking, we will read large files next.
Streaming large output from an external program
So far, we’ve run an external program and captured its output in Node.js using the execFile()
method. But if the external program reads a large file, it can lead to memory errors. This happens because the execFile()
method stores the output in the buffer, then passes it to your program’s parent process.
To avoid using too much memory, we will need to use the spawn()
method, which breaks the external programs’ output into smaller chunks and sends them to the Node.js program. This reduces the amount of memory we use because the program will read the smaller chunks of data as they come in, without keeping all the data in a buffer.
In this section, we’ll write a program that uses the spawn()
method to read a large file. We will use the words
file in the /usr/share/dict
directory, which is available in most Unix-like systems. If you don’t have the file, you can use any large text file of your choice or you can download the sample one here.
In the terminal, copy the dictionary file into the project’s directory:
cp /usr/share/dict/words .
Let’s add an extension to the file:
mv words words.txt
Now read the file using the cat
command:
cat words.txt
The command will log an output that looks like the following (I’ve omitted some output for brevity):
// output ... zucchini zucchini's zucchinis zwieback zwieback's zygote zygote's zygotes
Let’s now run the command using the child_process
module in Node.js. Create and open readLargeFileStreams.js
and enter the code below:
const { spawn } = require("node:child_process"); const cat = spawn("cat", ["words.txt"]); cat.on("error", (error) => { console.error(`error: ${error.message}`); }); cat.stdout.pipe(process.stdout); cat.stderr.pipe(process.stderr); cat.on("close", (code) => { console.log(`child process exited with code ${code}`); });
In the first line, we import the spawn()
method. In the second line, we call the spawn()
method to run the cat
program in a child process to read the words.txt
file. Since spawn()
uses the stream API, we attach an event to listen to the Node.js errors and log the error in the console. After that, we use the stdout.pipe()
method to pipe the output from the cat
program to the process.stdout
, where the chunks of data received will be logged.
Next, we use the stderr.pipe()
method to send error messages from the cat
program to process.stderr
, where they will be logged in the console. Finally, we listen to the close
event to log a message in the console.
Save and run the file:
node readLargeFileStreams.js
The output will be shown in the console:
... zoos zorch zucchini zucchini's zucchinis zwieback zwieback's zygote zygote's zygotes child process exited with code 0
The whole output wasn’t buffered up; instead, the program received the output in chunks and logged them in the console.
You can now read large files without using too much memory using the spawn()
method. In the next section, we will chain external applications.
Chaining external applications
Most programs are designed to do one thing very well. For example, the cat
program reads files, and the grep
program searches for text. You can chain these programs together to achieve a particular task.
Using the words.txt
file, you can read the file using cat
, then chain grep
to search for words that contain “zip”:
cat words.txt | grep zip
When the cat
command reads the words.txt
file, its output is passed to the grep
command as the input. grep
then filters the input to show only words that contain the word “zip”.
You can recreate this behavior in Node.js using the pipe()
and the spawn()
method.
First, create and open the chainingPrograms.js
file, then add the following code:
const { spawn } = require("node:child_process"); const cat = spawn("cat", ["words.txt"]); const grep = spawn("grep", ["zip"]); cat.stdout.pipe(grep.stdin); grep.stdout.pipe(process.stdout); cat.on("error", (error) => { console.error(`error: ${error.message}`); }); grep.on("error", (error) => { console.error(`error: ${error.message}`); });
In the first three lines, we import spawn()
, and then use it to run the cat
and the grep
commands. The cat
command reads the words.txt
file, and the grep
command searches for words that contain the word “zip”. To pass the cat
command output to grep
, you use the stdout.pipe()
method, which accepts the instance of the program that should receive cat
‘s output as the input, which is grep
here.
Next, you call stdout.pipe()
and pass it process.stdout
to log the output in the console. The last six lines check whether the cat
or grep
instance has an error and logs the error message in the console.
Once you are finished, save the file, and then run the chainingPrograms.js
file using Node:
node chainingPrograms.js
Your output will look similar to the following:
// output marzipan marzipan's unzip unzipped ... zippiest zipping zippy zip's zips
You will notice the output only shows words that contain the word “zip”. This confirms that the chaining of the programs works.
Running shell commands using exec()
One method we haven’t looked at so far is the exec()
method. This method creates a shell and runs any command you pass it; you can even chain the commands and pass them as arguments, something that you can’t do with execFile()
because it doesn’t create a shell.
Take the following example:
cat words.txt|nl|grep zip
The cat
command reads the words.txt
file, which is then passed to the nl
command that adds line numbers to the whole file. After that, we use grep
to search and return words that contain “zip”.
With what we’ve covered so far, you can chain this command using the spawn()
and the pipe()
methods in Node.js as demonstrated in the previous section.
With the exec()
method, you can pass the chained command and it will be executed in a shell.
To do that, create and open the filterDictionary.js
file and enter the following code:
const util = require("node:util"); const exec = util.promisify(require("node:child_process").exec); async function filterDictionary() { const { error, stdout, stderr } = await exec("cat words.txt|nl|grep zip"); if (error) { console.error(error); return; } if (stderr) { console.error(stderr); return; } console.log(`External Program's output:\n ${stdout}`); } filterDictionary();
First, we import the exec()
method into the program. We then define the filterDictionary()
function to run an external program in a child process. In the function, we invoke the exec()
method with the chained command as the argument. After that, we check and log any errors encountered.
Run the program as follows:
node filterDictionary.js
The output will look as follows:
External Program's output: 64930 marzipan 64931 marzipan's 99883 unzip 99884 unzipped ... 104280 zipping 104281 zippy 104282 zip's 104283 zips
As you can see, the output shows the line numbers, as well as the words containing “zip”, which proves that the exec()
method runs the chained commands successfully without any issue.
Shell injection attacks and how to prevent them
You have now learned how to use the exec()
method, which gives you access to the shell and allows you to run any command. While accessing the shell is helpful, it can sometimes be dangerous. This is due to shell injection attacks, where an attacker can append harmful commands to the exec()
method input, which can destroy the host’s computer.
To understand how this attack can happen, we will create a program that illustrates this.
First, download the prompt-sync
package using npm:
npm install prompt-sync
We will use the package to get input from the user.
Next, create and open the listDirExec.js
file and add the following contents:
const util = require("node:util"); const exec = util.promisify(require("node:child_process").exec); const prompt = require("prompt-sync")({ sigint: true }); const dirname = prompt("Enter the directory you want to list");
In the first two lines, we import and promisify the exec()
method. Then, we import the prompt-sync
package and use the prompt()
method to get input from the user, which is the directory where the user wants the program to list the contents.
In the listDirExec.js
file, add the following code to list directory contents:
const util = require("node:util"); const exec = util.promisify(require("node:child_process").exec); const prompt = require("prompt-sync")({ sigint: true }); const dirname = prompt("Enter the directory you want to list"); // add the following async function listDir() { const { error, stdout, stderr } = await exec(`ls -l ${dirname}`); if (error) { console.error(error); return; } if (stderr) { console.error(stderr); return; } console.log(`External Program's output:\n ${stdout}`); } listDir();
In the preceding code, we define the listDir()
function to list the directory contents. In the function, we invoke the exec()
method, which runs the ls -l
command together with the input the user has passed. If the user enters /home/stanley/cp_programs
, the command run will be ls -l /home/stanley/cp_programs
.
Since we are getting input from the user, someone with malicious intent can append another command to do damage. This can be done by adding a semicolon as follows:
ls -l; free -h
When you run the command in the terminal, it will list the directory contents, and then check the memory usage as follows:
total 1008 -rw-rw-r-- 1 stanley stanley 388 Dec 20 07:04 blockingTask.js -rw-rw-r-- 1 stanley stanley 347 Dec 20 07:31 chainingPrograms.js ... -rw-rw-r-- 1 stanley stanley 408 Dec 20 07:14 readLargeFile.js -rw-rw-r-- 1 stanley stanley 323 Dec 20 07:16 readLargeFileStreams.js -rw-r--r-- 1 stanley stanley 985084 Dec 20 07:12 words.txt total used free shared buff/cache available Mem: 7.6Gi 4.1Gi 124Mi 554Mi 3.4Gi 2.7Gi Swap: 1.9Gi 137Mi 1.7Gi
Now that we can append commands, run the program as follow:
node listDirExec.js
We will be prompted to enter a directory name. Our application expects the user to enter their chosen directory:
/home/stanley/cp_programs
When the program runs, the output will show the directory contents:
Enter the directory you want to list/home/stanley/cp_programs External Program's output: total 1008 -rw-rw-r-- 1 stanley stanley 388 Dec 20 07:04 blockingTask.js -rw-rw-r-- 1 stanley stanley 347 Dec 20 07:31 chainingPrograms.js -rw-rw-r-- 1 stanley stanley 278 Dec 20 07:03 cpuBound.js -rw-rw-r-- 1 stanley stanley 410 Dec 20 07:36 filterDictionary.js ... -rw-rw-r-- 1 stanley stanley 323 Dec 20 07:16 readLargeFileStreams.js -rw-r--r-- 1 stanley stanley 985084 Dec 20 07:12 words.txt
An attacker may have different plans and append another command. Let’s try that by running the program once more:
node listDirExec.js
When prompted, enter the following:
/home/stanley/cp_programs;free -h;df -h
After running the command, the output will look as follows:
Enter the directory you want to list/home/stanley/cp_programs;free -h;df -h External Program's output: total 1008 -rw-rw-r-- 1 stanley stanley 388 Dec 20 07:04 blockingTask.js ... -rw-rw-r-- 1 stanley stanley 408 Dec 20 07:14 readLargeFile.js -rw-rw-r-- 1 stanley stanley 323 Dec 20 07:16 readLargeFileStreams.js -rw-r--r-- 1 stanley stanley 985084 Dec 20 07:12 words.txt total used free shared buff/cache available Mem: 7.6Gi 4.1Gi 201Mi 533Mi 3.3Gi 2.7Gi Swap: 1.9Gi 139Mi 1.7Gi Filesystem Size Used Avail Use% Mounted on tmpfs 784M 1.9M 782M 1% /run ... tmpfs 3.9G 147M 3.7G 4% /dev/shm tmpfs 5.0M 4.0K 5.0M 1% /run/lock /dev/sda4 196M 30M 167M 15% /boot/efi
The output shows the directory contents, the system memory usage, and the file system disk usage.
While the free -h
or df -h
command we added aren’t harmful, it is not what our program expects as input. The program expects only the directory path but we have been able to manipulate the program to do a different task than intended. An attacker can use this loophole to spy on the system information and even destroy the computer systems.
To protect yourself from these attacks, you need to sanitize the user input. It is also recommended to use the execFile()
in place of the exec()
method.
Conclusion
In this tutorial, we used the child process module to launch external programs from Node.js. We began by using the execFile()
method to run an external program and capture its output. Then we used the fork()
method to create a child process to offload blocking CPU-bound tasks. After that, we read large files in Node.js without using too much memory using the spawn()
method. Following that, we chained multiple external programs in Node.js. We then used the exec()
method to execute commands in a shell. Finally, we will learn about shell injection attacks.
You should now be comfortable using the Node.js child_process
module in your projects. To learn more about the module, visit the documentation. To take your learning further, you can learn about the execa
library, which is a wrapper around the child_process
module.
The post Node.js child process: How to launch external programs appeared first on LogRocket Blog.
from LogRocket Blog https://ift.tt/XNEFQCG
Gain $200 in a week
via Read more