Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Complete your work in the hw5 folder. Remember to pull, add, commit, and push. You need to work on
three files only: execmd.c, workon.c, and runpipeline.c. Do NOT add other files.
Exercise 1. (100 points) runpipeline
A pipeline is a sequence of external programs chained together to perform a task. The standard output of
stage i of the pipeline is fed to the standard input of stage i + 1. In shells like bash, stages are separated
by "|". For example, the following pipeline that contains 7 stages counts the number of occurrences of each
word in a text file.1 The output of the last stage is redirected into file counts.txt. whitman.txt is in the
solutions repo under sol-hw5. You can also replace it with other text files, for example, your C source code.
cat whitman.txt | tr -s [:space:] '\n' | tr -d [:punct:] | tr A-Z a-z | sort | uniq -c | \
sort -nr > counts.txt
The seven stages do the following. The command in each stage prints its result to stdout and the output
of the last command is redirected to counts.txt.
1. Send the contents of the file whitman.txt to stdout
2. Replace every sequence of consecutive spaces in stdin with a single line-feed
3. Delete all punctuation characters from stdin and send remaining characters to stdout
4. Replace uppercase letters in stdin with lowercase letters
5. Sort the lines from stdin alphabetically
6. Collapse adjacent matching lines to a single copy preceded by the number of copies
7. Sort the lines from stdin in reverse numerical order
In this problem, you will compete the functions in runpipeline.c so the program can start a pipeline
with the programs specified at the command line. To avoid interference with the shell, pipeline stages are
separated with "--", instead of "|". To run the above bash pipeline with runpipeline, you would run the
following command in bash and the resulting counts.txt should be the same. Join two lines when you try
the command.
./runpipeline cat whitman.txt -- tr -s [:space:] '\n' -- tr -d [:punct:] -- tr A-Z a-z -- \
sort -- uniq -c -- sort -nr > counts.txt
In runpipeline.c, the commands for all stages are already stored in an array of Program structures,
which are defined as follows.
1 typedef struct program_tag {
2 char ** args ; // array of pointers to arguments
3 int num_args ; // number of arguments
4 int pid ; // process ID of this program
5 int fd_in ; // pipe fd for stdin
6 int fd_out ; // pipe fd for stdout
7 } Program ;
1The “\” at the end of first line allows the pipeline commands to continue on the next line; you can also just join the two
lines when you try the pipeline in bash.
1
args[0] is the command and args is the array of arguments to be passed to an execv* function. num args
is the number of arguments in args. pid is the process ID of the child process for this command. If fd in
is non-negative, the file descriptor will be used for stdin for the command. If fd out is non-negative, it will
be used for stdout for the command.
Note that runpipeline does not redirect the input or output for the pipeline itself. If needed, the
redirection can be set on runpipeline by the shell. Then the first command in the pipeline can have
redirected stdin and the last one can have redirected stdout.
Your program will start commands in start program(), one command at a time. You can create pipes
in the function for the command to be started, if any pipe needed has not been created yet, or you can create
all pipes for all stages in function prepare pipes().
Your code should close unused pipe FDs for/in each process, should not leave zombies behind, and should
not have memory leaks.
You can use lsof command to check the open files for processes. For example,
# list open files by PIDs
$lsof -p 3753,3754,3755,3756,3757
$lsof -p 3753
# list open files for processes whose name starts with runp, cat, wc, or tr
lsof -c runp -c cat -c wc -c tr
# use -u option to specify a user name. -a indicates combine conditions with 'and'
lsof -c runp -u netid -a
The rows that you should pay attention to are the ones that have a number in the FD column. Most of
the commands have only three open files 0, 1, and 2. Some commands, for example, tee, may have additional
open files.
Dealing with many pipeline stages may look scary at the beginning. However, if you start with two stages,
and go on to three stages, four stages, and more, you will find out that seven stages are about the same as
three stages. You can test pipelines with various numbers of stages, as shown in the examples below. Note
that you can use tee to examine the data stream at the middle stages. To check if your code is producing
correct result or behaves correctly, run the same pipeline in bash (and use "|", instead of "--", to connect
stages).