dp.cx blog

Posted on

Filed under linux, and applications

pv, or pipe viewer, has quickly become one of the most useful applications for me. Many times, I've started a long-running task, only to wonder after starting it, how much longer it was going to take. This is where pv comes in to play.

pv gives you a progress bar for your task, based on either the number of bytes it's seen, or the number of newline characters that have come through. It can show you an ETA based on the information it's seen. It can even show you the current rate of throughput, and/or the average rate of throughput.

Using pv is really straightforward. Lets say you were executing a command that loops over some data, and prints out whenever it finishes a block. If you pipe that script to pv -l, you'll get a progress bar showing you that something is happening.

 
# for i in {1..21}; do date -d "`date +%m`/$i/`date +%Y`"; sleep 1; done | pv -l > /dev/null
21 0:00:21 [ 986m/s] [                          <=>                                   ]
 

Of course, that's useful to show that data is still flowing, but not so useful if a client calls wanting to know when their data will be ready. So, you start adding some parameters, like -e, which enables ETA calculations. But to use ETA calculations, pv needs to know how much data you're expecting to go through, so you need to add a -s with your expected number of lines.

 
# for i in {1..21}; do date -d "`date +%m`/$i/`date +%Y`"; sleep 1; done | pv -les 21 > /dev/null
ETA 0:00:19
 

"But what happened to my progress bar!?" pv only uses the properties that you pass, if you pass any. If you don't pass anything, it defaults to showing you the number of entries, the elapsed time, the rate, and a bar showing data flow. If you want the progress bar back, you need a -p.

 
# for i in {1..21}; do date -d "`date +%m`/$i/`date +%Y`"; sleep 1; done | pv -leps 21 > /dev/null
[==========================================>                            ] 61% ETA 0:00:08
[=====================================================================>] 100%
 

pv is really useful for watching bytes being transferred as well. The man page gives the example of:

 
# (tar cf - . \
| pv -n -s $(du -sb . | awk '{print sh}') \
| gzip -9 > out.tgz) 2>&1 \
| dialog --gauge 'Progress' 7 70
 

Which creates a .tgz archive of the current directory, and uses pv to output a numeric progress log, which is then passed to dialog, to display a pretty ncurses based progress log. If we modify that command to

 
# (tar cf - . | pv -pterW -s $(du -sb . | awk '{print sh}') | gzip -9 > out.tgz)
 

We get the following output:

 
0:00:04 [4.49MB/s] [=====================>                     ] 53% ETA 0:00:03
 

Showing us that this command has been running for 4 seconds, transferring 4.49MB/s. We're 53% complete, and we have about 3 seconds to go.

Commands that may run for several days (using the same command above, against /) will have an output that looks like:

 
0:00:10 [  20MB/s] [>                                       ]  0% ETA 3689:57:22
 

pv has become an amazingly useful utility in my toolbelt. I'll post others as I think of them.