Step by step with Sed
The stream editor (Sed) is an incredibly powerful tool. Generally, Sed allows you to process a stream of lines individually.
A common use case for Sed is to find a pattern (ie string), and replace a match with string.
For simple use cases, Bash can perform pattern substitution. An example of pattern substitution is below:
test='abcdefghi'
# simple pattern substitution in Bash
echo ${test/cde/zzz}
abzzzfghi
# also can just remove a pattern
echo ${test/efghi}
abcd
# also can remove two patterns wih one command
echo ${${test/abc}/ghi}
def
# similarly offset:length is referred to as substring expansion.
echo ${test:2:3}
cde
# Note: similar syntax in vim for pattern substitution
:%s/old/new/gc
However, for more advanced pattern substitution, then Sed is the go-to tool. The full instructions for Sed are here:
https://www.gnu.org/software/sed/manual/sed.html
Some functions of Sed include:
- Append text after a line.
- Insert text before a line.
- Branch.
- Delete.
The stream can be from either the standard input, or from a file.
Due to the complexity of Sed it’s always better to start simple, then build from there.
Sed command to print and filter
We can start by creating a simple list as follows.
# Create a list.
# Effectively a multi-line stream.
{ echo '1 2 3 4 5'; echo '0 9 8 7 6'; }
1 2 3 4 5
0 9 8 7 6
Then we can just display all the lines of a stream with the Sed print command.
# pipe the whole stream into Sed,
# and allow Sed to print the stream
{ echo '1 2 3 4 5'; echo '0 9 8 7 6'; }|sed -n 'p'
1 2 3 4 5
0 9 8 7 6
However it would be more useful to be able to apply a filter to a stream of lines.
Using Sed, we have access to regular expression syntax such as:
- ^ — matches the null string at beginning of the pattern. ie beginning of the line.
- $ — matches the null string at end of the pattern. ie end of the line.
- * — matches a sequence of zero or more instances of matches for the preceding regular expression.
- . — matches any character, including newline.
- .* — matches every string (including the empty string). For example, ^main.*(.*) — matches a string starting with ‘main’, followed by an opening and closing parenthesis.
- (.*) — match a word.
- (.*) (.*) — match 2 words that are separated by a space.
There are a lot more regular expressions possible, but for simple use cases, the above can be enough.
Using regular expressions, we can also filter lines of a stream with the Sed print command.
# just select the first line
{ echo '1 2 3 4 5'; echo '0 9 8 7 6'; }|sed -n '1p'
1 2 3 4 5
# select a line that ends with 5
{ echo '1 2 3 4 5'; echo '0 9 8 7 6'; }|sed -n '/5$/p'
1 2 3 4 5
# select a line that begins with 0
{ echo '1 2 3 4 5'; echo '0 9 8 7 6'; }|sed -n '/^0/p'
0 9 8 7 6
Sed substitution command
Finding a pattern and replacing it is a common use case in Sed.
The s command (as in substitute) is probably the most important in sed and has a lot of different options.
The basic substitute command is comprised of three elements, but can be extended with an optional fourth element.
s/regexp/replacement/flag
The elements are:
- s = substitution command
- regexp = regular expression to match
- replacement = the stream that will replace the full or partial match from the regular expression.
- flag = substitution flag (optional).
The components are also separated by a delimiter character. eg /.
Delimiter characters
It’s common to see the ‘/’ character in the Sed documentation used as the delimiter.
# define a stream
echo '1 2 3 4 5'
1 2 3 4 5
# use the / character as a delimiter
echo '1 2 3 4 5' |sed 's/3/match/'
1 2 match 4 5
But you are not forced to use the ‘/’ character for a delimiter.
The / characters may be uniformly replaced by any other single character within any given s command.
Choose any character for the delimiter. Just make sure it’s specified 3 times in the command.
Consequently, any of the following Sed substitution commands work with the chosen delimiter.
# choose the _ character as a delimiter
echo '1 2 3 4 5' |sed 's_3_match_'
1 2 match 4 5
# choose the ^ character as a delimiter
echo '1 2 3 4 5' |sed 's^3^match^'
1 2 match 4 5
# choose the \ character as a delimiter
echo '1 2 3 4 5' |sed 's\3\match\'
1 2 match 4 5
# choose the | character as a delimiter
echo '1 2 3 4 5' |sed 's|3|match|'
1 2 match 4 5
Choose any delimiter you want, and ideally choose a delimiter that isn’t in the stream.
Shell environment variables
Can also use shell environment variables with the Sed substitution command.
# use '
echo '1 2 3 4 5' |sed 's/3/match/'
1 2 match 4 5
# use " with shell environment variables
a=match
echo '1 2 3 4 5' |sed "s/3/$a/"
1 2 match 4 5
# simple
echo 'hello' |sed 's/hello/goodby/'
goodby
Pattern matching
If there are no matches in a stream, then the unchanged stream is still returned.
# if there are no matches, then the unchanged stream is still returned
echo '1 2 3 4 5' |sed 's|0|match|'
1 2 3 4 5
# if there are no matches, then the unchanged stream is still returned
seq 5 |sed 's|0|match|'
1
2
3
4
5
If there is a match then only replace the first match of the line.
# Only replace the first match of the line
echo '1 2 3 3 3 3 3 3 3 3 4 5' |sed 's|3|match|'
1 2 match 3 3 3 3 3 3 3 4 5
In the above stream, the number 3 is present multiple times. However the first 3 is matched and replaced, while the subsequent matches are ignored.
Similarly, there is the same behaviour in a multi-line stream.
# create a list. Effectively a multi-line stream.
{ echo '1 2 3 3 3 3 3 3 3 3 4 5'
echo '1 2 3 3 3 3 3 3 3 3 4 5'; }
1 2 3 3 3 3 3 3 3 3 4 5
1 2 3 3 3 3 3 3 3 3 4 5
If there is a match in multiple lines, then just the first match is replaced for each line.
# Only replace the first match of that line
{ echo '1 2 3 3 3 3 3 3 3 3 4 5'
echo '1 2 3 3 3 3 3 3 3 3 4 5'; }|sed 's|3|match|'
1 2 match 3 3 3 3 3 3 3 4 5
1 2 match 3 3 3 3 3 3 3 4 5
If you need to replace every match on the line, then use the ‘g’ flag as follows:
# Replace every match on that line
{ echo '1 2 3 3 3 3 3 3 3 3 4 5'
echo '1 2 3 3 3 3 3 3 3 3 4 5'; }|sed 's|3|match|g'
1 2 match match match match match match match match 4 5
1 2 match match match match match match match match 4 5
In the above stream, the number 3 is present multiple times in each line, and so is replaced every time.
Streams and substreams
A whole line can be matched with the ‘.*’ combination regular expression, and then the match can be referenced with the ‘&’ character.
Using the ‘&’ character can be helpful to verify what the match is.
# if no match, then return the unprocessed line anyway
echo 'export x=0' | sed 's|*|matched: &|'
export x=0
# match the full line with .*
echo 'export x=0' | sed 's|.*|matched: &|'
matched: export x=0
A regular expression can also be used to divide a match into substreams.
Identify a full stream, and substreams with the ‘(‘ and ‘)’ characters.
Normally the ( and ) characters need to be used with the escape character. ie backslash.
# In the regular expression match each line with (.*)
# Also use \1 to show the first (and only) substream
echo 'export x=0'| \
sed 's|\(.*\)|matched: &|'
matched: export x=0
# In the regular expression match each substream in the line with (.*)
# Also use \1 and \2 to show the substreams
echo 'export x=0'| \
sed 's|\(export.*\)\(x=.*\)|matched: &\nsub1: \1\tsub2: \2|'
matched: export x=0
sub1: export sub2: x=0
Extended regular expressions
Extended regular expressions can be clearer to read because they usually have fewer backslashes than a regular expression.
# with regular expressions
# note: the use of escape characters
echo 'export x=0'| \
sed 's|\(export.*\)\(x=.*\)|matched: &\nsub1: \1\tsub2: \2|'
matched: export x=0
sub1: export sub2: x=0
# with extended regular expresssions
# note: less escape characters for higher readability
echo 'export x=0'| \
sed -r 's|(export.*)(x=.*)|matched: &\nsub1: \1\tsub2: \2|'
matched: export x=0
sub1: export sub2: x=0
Multiple commands
So far we’ve just been applying a single command to a stream of lines.
# replace the 5, with 'match'
seq 10 |sed 's|5|match|'
1
2
3
4
match
6
7
8
9
10
But we can also apply multiple commands to a stream of lines.
# match with multiple commands
seq 10 |sed -r 's|3|& match|
s|6|& match|
s|10|& match|
'
1
2
3 match
4
5
6 match
7
8
9
10 match
Input and output files
More likely, you’ll be processing a stream of lines residing in a file that has multiple lines.
# process an input file, and show the stream on standard out
# Note: the input file is not modified.
sed -r 's|wally|world|' test.sql
Now execute with an input file and redirect standard output to a file.
sed -r 's|wally|world|' test.sql > testoutput.sql
A good use case of Sed is to replace the Windows carriage return and line feed (ie \r\n) with the Linux line feed (ie \n). So this can be done with the following command:
sed -r 's|\r$||' testwindowstext.input.sh > testwindowstext.output.sh
In the world of Sed, it’s always better to start simple, then build up from there.
Additional reference