Parsing a List of Lines in Bash
Parsing newline-delimited data records in bash is simple, if
you have this odd redirect up your sleeve.
Working on my current shell-script project, a scheduling utility driven
by the BSD calendar, I found myself needing to parse some input files
linewise. See, I had been reading in the event data files (one for each
record), translating newlines to tildes, and
cutting the resultant data string on tildes (since cut
doesn't like cutting on newlines, it would seem) to obtain my data fields.
However, this added up to almost a half a second of runtime per record. I mean,
I didn't expect bash to be the world's fastest string parser, but
sometimes enough is quite simply enough.
Okay, let me put in the code here so people don't lose themselves in the article, and I'll explain in a moment.
#!/bin/sh
#
# This shell script echoes individual lines from the file specified
#
# usage: . <scriptname> [file to parse]
#
while read line; do
echo $line
echo
done < $1
The magic here is in that last line: done < $1
Because of the odd mechanics of shell substitution and token parsing,
for line in $(cat $1); do . . . ; done won't work. You'd end
up executing the loop whenever you hit whitespace, whether it be space, tab,
or newline. What we need is some way to ensure that each line is passed as
a distinct entity through the loop.
That's what read is here for. read is a shell
built-in (in bash, anyway . . . I can't speak for
other shells) that takes a single line of STDIN and sets it to
the variable named as its argument, like so:
usage: read varname
But in a complex script, it can be difficult to track down where the
interpreter believes STDIN, STDOUT, and
STDERR are in the code path. In this case, if you try piping the
file in, like so:
cat $1 | while read line; do . . . ; done
or
while cat $1 read line; do . . . ; done
or even using a standard shell redirect, as:
while read line < $1; do . . . ; done
you'll be in for some highly-unpredictable output. It turns out that
STDIN for read can be accessed after the loop
controlled by it, simply by redirecting the the STDIN of the
entire loop to the desired file.
No, please don't ask me why! I don't think anyone knows why anything is
the way it is in bash. There are fundamental programmatic reasons
why it is necessary to sacrifice a goat at midnight to get your script to
run properly.
Oh, by the way. Skipping all the utility invocations I had been using before cut my parser runtime by nearly two thirds . . . and I only had to hack at it for an two hours!

