Stream Editor (sed)


sed has rocked my world.

sed (stream editor) isn’t an interactive text editor. Instead, it is used to filter text, i.e., it takes text input, performs some operation (or set of operations) on it, and outputs the modified text. sed is typically used for extracting part of a file using pattern matching or substituting multiple occurrences of a string within a file.

Contents

Basic syntax

sed ' [RANGE] COMMANDS ' [INPUTFILE]

If no INPUTFILE is specified, sed filters the contents of standard input.

Important commands:

  • s substitute.
  • q command, exit without processing any more commands or input.
  • d delete command, delete the pattern space, and start the next cycle.
  • a append command.
  • i insert command.
  • e execute command, run the resulting pattern space against the shell (GNU specifc).
  • -n command line switch, is auto-print is not disabled, print the pattern space, then replace the pattern space with the next line of input.
  • -i command line switch. sed will never destructively overwrite a files contents, unless the -i option is used. It also supports auto-creating a backup file like so -i.bak.

Simple unit testing:

echo "getFoo_Bar" | sed 's@^\(.\{7\}\)\(.\)\(.*\)$@\L\1\L\2\3@'

I found the offical GNU documentation to be the most useful resource.

How sed works

sed maintains two data buffers: the active pattern space, and the auxiliary hold space. Both are initially empty.

sed operates by performing the following cycle on each line of input: first, sed reads one line from the input stream, removes any trailing newline, and places it in the pattern space. Then commands are executed; each command can have an address associated to it: addresses are a kind of condition code, and a command is only executed if the condition is verified before the command is to be executed.

When the end of the script is reached, unless the -n option is in use, the contents of pattern space are printed out to the output stream, adding back the trailing newline if it was removed. Then the next cycle starts for the next input line.

Unless special commands (like D) are used, the pattern space is deleted between two cycles. The hold space, on the other hand, keeps its data between cycles (see commands h, H, x, g, G to move data between both buffers).

Substitution

Sample file ntp.conf:

driftfile  /var/lib/ntp/ntp.draft
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server 3.fedora.pool.ntp.org
server ntp.fedora.org

First cool tip, nl is the boss for quickly line numbering a file:

nl ntp.conf

 1  driftfile  /var/lib/ntp/ntp.draft
 2  statistics loopstats peerstats clockstats
 3  filegen loopstats file loopstats type day enable
 4  server 0.fedora.pool.ntp.org
 5  server 1.fedora.pool.ntp.org
 6  server 2.fedora.pool.ntp.org
 7  server 3.fedora.pool.ntp.org
 8  server ntp.fedora.org

So I want to indent all lines beginning with server:

sed ' 4,8 s/^/    /g' ntp.conf

Results in:

driftfile  /var/lib/ntp/ntp.draft
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
    server 0.fedora.pool.ntp.org
    server 1.fedora.pool.ntp.org
    server 2.fedora.pool.ntp.org
    server 3.fedora.pool.ntp.org
    server ntp.fedora.org

If I just want to see the affected pattern space (note the -n command line switch to restrict to sout to pattern space only, and the presence of the p command):

sed -n ' 4,8 s/^/    / p' ntp.conf

Results in:

    server 0.fedora.pool.ntp.org
    server 1.fedora.pool.ntp.org
    server 2.fedora.pool.ntp.org
    server 3.fedora.pool.ntp.org
    server ntp.fedora.org

Here’s another nice substitution example:

sed -n ' /^ben/ s@/bin/bash@/bin/sh@ p ' /etc/passwd

This beautiful little command, finds all entries starting with ben in /etc/passwd and replaces /bin/bash with /bin/sh, and the p command spits it out to sout. Notice how delimiters can be changed, in this case to @. Handy if you need to make use of forward slashes, which is the default delimiter.

Append, Insert and Delete

Delete all lines that start with server 3 (\s to represent an escaped space):

sed ' /^server\s3.fedora/ d' ntp.conf

Results in:

driftfile  /var/lib/ntp/ntp.draft
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server ntp.fedora.org

Append server ntp.kernel.org to the line after any lines that start with server 0:

sed ' /^server\s0/ a server ntp.kernel.org' ntp.conf

Results in:

driftfile  /var/lib/ntp/ntp.draft
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
server 0.fedora.pool.ntp.org
server ntp.kernel.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server 3.fedora.pool.ntp.org
server ntp.fedora.org

And insert, basically same semantics as append, except line before, not after:

sed ' /^server\s0/ i server ntp.kernel.org' ntp.conf

Results in:

driftfile  /var/lib/ntp/ntp.draft
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
server ntp.kernel.org
server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server 3.fedora.pool.ntp.org
server ntp.fedora.org

Multiple expressions

sed supports blocks:

sed ' {
  /^server 0/ i ntp.kernel.org
  /^server\s[0-9]\.fedora/ d
} ' ntp.conf

Or if you are dealing with a large script sed files for includes and reuse:

ntp.sed

/^server 0/ i ntp.kernel.org
/^server\s[0-9]\.fedora/ d

To include it use the -f switch like so:

sed -f ntp.sed /etc/ntp.conf

Once you’re ready to roll, plug in the -i switch to update the target file:

sudo sed -i.bak -f ntp.sed /etc/ntp.conf

Remote with ssh

Scripting sed to run on remote servers is a piece of cake, thanks to the ssh -t switch, which assigns TTY allowing for a sudo password to be provided. This is a neat way of spraying out updates consistently across a farm of servers. Check this out (note the include /tmp/ntp.sed must be placed on the remote file system before running):

scp ntp.sed ben@10.3.1.200:/tmp/
ssh -t ben@10.3.1.200 sudo sed -i.bak -f /tmp/ntp.sed /etc/ntp.conf

Substitution grouping

Substitution groups allow for more advanced targeting and transformation of text.

Lets break down an example.

gsed 's/\([^,]*\),\([^,]*\)/\U\1,\L\2/' heros.txt

heros.txt:

Ritchie,Dennis,410909
Thompson,Kenneth,430204
Carmack,John,700820
Torvalds,Linux,610114
Stallman,Richard,550921
Pike,Rob,560212

Using the substitution command s, the selection criteria specified is \([^,]*\),\([^,]*\) with the parenthesis escaped, or ([^,]*),([^,]*). That is, the first capture group is everything until a comma, an actual comma, then a second capture group of everything until a comma. Then the modification is applied, \U\1\L\2, \U signals that upper-case conversion should be applied to \1 (pattern space that matches the first capturing group). \L is the lower-casing conversion applied to \2 (the second capture group). See the S command documentation for more. In a nutshell, uppercase everything before the first comma only.

Result:

RITCHIE,dennis,410909
THOMPSON,kenneth,430204
CARMACK,john,700820
TORVALDS,linux,610114
STALLMAN,richard,550921
PIKE,rob,560212

Numerical grouping

The following prettify_big_numbers.sed will first convert all commas (,) to colons (:), and then second jam a comma in between the second and third capture groups, delimitering the last 3 digits.

s/,/:/g
s/\(^\|[^0-9.]\)\([0-9]\+\)\([0-9]\{3\}\)/\1\2,\3/g

Using it becomes super simple.

$ echo "0.01 0.08 0.07 2/338 584288" | gsed -f prettify_big_numbers.sed 
0.01 0.08 0.07 2/338 584,288

Executing Commands

The GNU version of sed sports the nifty e command.

This command allows one to pipe input from a shell command into pattern space. If a substitution was made, the command that is found in pattern space is executed and pattern space is replaced with its output.

files.txt

/etc/hosts
/etc/services

Some simple examples. First lets tack ls -l to the front of each of the above files listed in files.txt, execute the resulting commandwith e, replacing the pattern space with whatever output it produces.

$ gsed ' s/^/ls -l /e ' files.txt
-rw-r--r--  1 root  wheel  4858 22 Apr  2013 /etc/hosts
-rw-r--r--  1 root  wheel  677972 10 Sep  2014 /etc/services

Changing the command to something else (e.g. stat) is easy:

$ gsed ' s/^/stat /e ' files.txt
16777220 2003065 -rw-r--r-- 1 root wheel 0 4858 "Sep 13 19:50:40 2015" "Apr 22 22:30:52 2013" "Apr 22 22:30:52 2013" "Apr 22 22:30:52 2013" 4096 16 0 /etc/hosts
16777220 10405184 -rw-r--r-- 1 root wheel 0 677972 "Sep 13 19:50:15 2015" "Sep 10 06:47:34 2014" "Oct 18 15:57:39 2014" "Sep 10 06:47:34 2014" 4096 480 0x20 /etc/services

sed with Vim

vim supports very similar syntax to sed. For example, indenting lines 5 to 30:

:5,30s/^/    /

Or target lines 30 to the end of document:

:30,$ s/^/  /

To apply to all lines within a document %:

:%s/^/    /

To apply to lines that match a criteria:

:/^windows/s/^windows/linux/g

Commands

We’ve discovered only the tip of the sed iceburg.

Source the offical GNU sed Manual

Zero address commands

  • :label: Label for b and t commands.
  • #comment: The comment extends until the next newline (or the end of a -e script fragment).
  • }: The closing bracket of a { } block.

Zero or One address commands

  • =: Print the current line number.
  • a \ text: Append text, which has each embedded newline preceded by a backslash.
  • i \ text: Insert text, which has each embedded newline preceded by a backslash.
  • q [exit-code]: Immediately quit the sed script without processing any more input, except that if auto-print is not disabled the current pattern space will be printed. The exit code argument is a GNU extension.
  • Q [exit-code]: Immediately quit the sed script without processing any more input. This is a GNU extension.
  • r filename: Append text read from filename.
  • R filename: Append a line read from filename. Each invocation of the command reads a line from the file. This is a GNU extension.

Commands which accept address ranges

  • {: Begin a block of commands (end with a }).
  • b label: Branch to label; if label is omitted, branch to end of script.
  • c \ text: Replace the selected lines with text, which has each embedded newline preceded by a backslash.
  • d: Delete pattern space. Start next cycle.
  • D: If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input.
  • h H: Copy/append pattern space to hold space.
  • g G: Copy/append hold space to pattern space.
  • l: List out the current line in a ``visually unambiguous’’ form.
  • l width: List out the current line in a ``visually unambiguous’’ form, breaking it at width characters. This is a GNU extension.
  • n N: Read/append the next line of input into the pattern space.
  • p: Print the current pattern space.
  • P: Print up to the first embedded newline of the current pattern space.
  • s/regexp/replacement/: Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
  • t label: If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script.
  • T label: If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. This is a GNU extension.
  • w filename: Write the current pattern space to filename.
  • W filename: Write the first line of the current pattern space to filename. This is a GNU extension.
  • x: Exchange the contents of the hold and pattern spaces.
  • y/source/dest/: Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.