[Chapter 4] 4.3 Testing and Saving Output

4.3 Testing and Saving Output

In our previous discussion of the pattern space, you saw that sed:

Makes a copy of the input line.
Modifies that copy in the pattern space.
Outputs the copy to standard output.

What this means is that sed has a built-in safeguard so that you don't make changes to the original file. Thus, the following command line:

$ sed -f sedscr testfile

does not make the change in testfile. It sends all lines to standard ouput (typically the screen) - the lines that were modified as well as the lines that are unchanged. You have to capture this output in a new file if you want to save it.

$ sed -f sedscr testfile > newfile

The redirection symbol ">" directs the output from sed to the file newfile. Don't redirect the output from the command back to the input file or you will overwrite the input file. This will happen before sed even gets a chance to process the file, effectively destroying your data.

One important reason to redirect the output to a file is to verify your results. You can examine the contents of newfile and compare it to testfile. If you want to be very methodical about checking your results (and you should be), use the diff program to point out the differences between the two files.

$ diff testfile newfile

This command will display lines that are unique to testfile preceded by a "<" and lines unique to newfile preceded by a ">". When you have verified your results, make a backup copy of the original input file and then use the mv command to overwrite the original with the new version. Be sure that the editing script is working properly before abandoning the original version.

Because these steps are repeated so frequently, you will find it helpful to put them into a shell script. While we can't go into much depth about the workings of shell scripts, these scripts are fairly simple to understand and use. Writing a shell script involves using a text editor to enter one or more command lines in a file, saving the file and then using the chmod command to make the file executable. The name of the file is the name of the command, and it can be entered at the system prompt. If you are unfamiliar with shell scripts, follow the shell scripts presented in this book as recipes in which you make your own substitutions.

The following two shell scripts are useful for testing sed scripts and then making the changes permanently in a file. They are particularly useful when the same script needs to be run on multiple files.

4.3.1 testsed

The shell script testsed automates the process of saving the output of sed in a temporary file. It expects to find the script file, sedscr, in the current directory and applies these instructions to the input file named on the command line. The output is placed in a temporary file.

for x
do
	sed -f sedscr $x > tmp.$x
done

The name of a file must be specified on the command line. As a result, this shell script saves the output in a temporary file with the prefix "tmp.". You can examine the temporary file to determine if your edits were made correctly. If you approve of the results, you can use mv to overwrite the original file with the temporary file.

You might also incorporate the diff command into the shell script. (Add diff $x tmp.$x after the sed command.)

If you find that your script did not produce the results you expected, remember that the easiest "fix" is usually to perfect the editing script and run it again on the original input file. Don't write a new script to "undo" or improve upon changes made in the temporary file.

4.3.2 runsed

The shell script runsed was developed to make changes to an input file permanently. In other words, it is used in cases when you would want the input file and the output file to be the same. Like testsed, it creates a temporary file, but then it takes the next step: copying the file over the original.

#! /bin/sh

for x
do
   echo "editing $x: \c"
   if test "$x" = sedscr; then
      echo "not editing sedscript!" 
   elif test -s $x; then 
      sed -f sedscr $x > /tmp/$x$$
      if test -s /tmp/$x$$
      then 
         if cmp -s $x /tmp/$x$$
         then
            echo "file not changed: \c"
         else
            mv $x $x.bak  # save original, just in case
            cp /tmp/$x$$ $x
         fi
         echo "done"
      else 
         echo "Sed produced an empty file\c"
         echo " - check your sedscript."
      fi
      rm -f /tmp/$x$$
   else
      echo "original file is empty."
   fi
done
echo "all done"

To use runsed, create a sed script named sedscr in the directory where you want to make the edits. Supply the name or names of the files to edit on the command line. Shell metacharacters can be used to specify a set of files.

$ runsed ch0?

runsed simply invokes sed -f sedscr on the named files, one at a time, and redirects the output to a temporary file. runsed then tests this temporary file to make sure that output was produced before copying it over the original.

The muscle of this shell script (line 9) is essentially the same as testsed. The additional lines are intended to test for unsuccessful runs - for instance, when no output is produced. It compares the two files to see if changes were actually made or to see if an empty output file was produced before overwriting the original.

However, runsed does not protect you from imperfect editing scripts. You should use testsed first to verify your changes before actually making them permanent with runsed.


4.2 A Global Perspective on Addressing		4.4 Four Types of sed Scripts