Parsing newline-delimited data records in
bash is simple, if
you have this odd redirect up your sleeve.
Working on my current shell-script project, a scheduling utility driven
by the BSD
calendar, I found myself needing to parse some input files
linewise. See, I had been reading in the event data files (one for each
translating newlines to tildes, and
cutting the resultant data string on tildes (since
doesn't like cutting on newlines, it would seem) to obtain my data fields.
However, this added up to almost a half a second of runtime per record. I mean,
I didn't expect
bash to be the world's fastest string parser, but
sometimes enough is quite simply enough.
Okay, let me put in the code here so people don't lose themselves in the article, and I'll explain in a moment.
#!/bin/sh # # This shell script echoes individual lines from the file specified # # usage: . <scriptname> [file to parse] # while read line; do echo $line echo done < $1
The magic here is in that last line:
done < $1
Because of the odd mechanics of shell substitution and token parsing,
for line in $(cat $1); do . . . ; done won't work. You'd end
up executing the loop whenever you hit whitespace, whether it be space, tab,
or newline. What we need is some way to ensure that each line is passed as
a distinct entity through the loop.
read is here for.
read is a shell
bash, anyway . . . I can't speak for
other shells) that takes a single line of
STDIN and sets it to
the variable named as its argument, like so:
usage: read varname
But in a complex script, it can be difficult to track down where the
STDERR are in the code path. In this case, if you try piping the
file in, like so:
cat $1 | while read line; do . . . ; done
while cat $1 read line; do . . . ; done
or even using a standard shell redirect, as:
while read line < $1; do . . . ; done
you'll be in for some highly-unpredictable output. It turns out that
read can be accessed after the loop
controlled by it, simply by redirecting the the
STDIN of the
entire loop to the desired file.
No, please don't ask me why! I don't think anyone knows why anything is
the way it is in
bash. There are fundamental programmatic reasons
why it is necessary to sacrifice a goat at midnight to get your script to
Oh, by the way. Skipping all the utility invocations I had been using before cut my parser runtime by nearly two thirds . . . and I only had to hack at it for an two hours!