Home / Save a Unix manpage as plain text

Save a Unix manpage as plain text

Manpage is an abbreviation for manual page, Unix and Linux help documentation files. There are man pages for just about every Unix command line and utility command. To view a man page from a command line you simply enter man followed by the command. For example,

man man
man ls
man find

would bring up the man page for the "man", "ls" and "find" commands respectively.

Manpages are stored in nroff format, which is a type of plain text with formatting information to indicate when text should be bold or in a particular colour. You could output a man page to a text file by issuing the following (which would output the manpage for "man" to the file man.txt)

man man > man.txt

The only problem with this is that is still contains a lot of additional formatting garbage and repeated characters. The first few lines from the resulting file above looks like so:

man(1)                                                                   man(1)

N^HNA^HAM^HME^HE
      man - format and display the on-line manual pages
      manpath - determine user's search path for man pages

S^HSY^HYN^HNO^HOP^HPS^HSI^HIS^HS
      m^Hma^Han^Hn   [-^H-a^Hac^Hcd^Hdf^HfF^HFh^Hhk^HkK^HKt^Htw^HwW^HW]
      [-^H--^H-p^Hpa^Hat^Hth^Hh]   [-^H-m^Hm _^Hs_^Hy_^Hs_^Ht_^He_^Hm] [-^H$
      [-^H-M^HM _^Hp_^Ha_^Ht_^Hh_^Hl_^Hi_^Hs_^Ht] [-^H-P^HP  
      _^Hp_^Ha_^Hg_^He_^Hr] [-^H-S^HS
      _^Hs_^He_^Hc_^Ht_^Hi_^Ho_^Hn_^H__^Hl_^Hi_^Hs_^Ht$

D^HDE^HES^HSC^HCR^HRI^HIP^HPT^HTI^HIO^HON^HN
      m^Hma^Han^Hn formats and displays the on-line manual pages.   If you
      specify _^Hs_^He_^Hc_^H- _^Ht_^Hi_^Ho_^Hn,   m^Hma^Han^Hn   only looks
      in that section of the manual.   _^Hn_^Ha_^Hm_^He is normally the name
      of the manual page, which is typically the name of a   command, function
      or   file.   However,   if   _^Hn_^Ha_^Hm_^He contains a slash (/^H/) then
      m^Hma^Han^Hn interprets it as a file specification, so that you can do
      m^Hma^Han^Hn   .^H./^H/f^Hfo^Hoo^Ho.^H.5^H5 or even m^Hma^Han^Hn
      /^H/c^Hcd^Hd/^H/f^Hfo^Hoo^Ho/^H/b^Hba^Har^Hr
      .^H.1^H1.^H.g^Hgz^Hz.

The correct way to output a man page into a plain text file is by issuing the following command, which outputs the man command into a file called man.txt:

man man | col -b > man.txt

This will now correctly look like so (the same lines as in the above example are displayed):

man(1)                                                                   man(1)

NAME
      man - format and display the on-line manual pages
      manpath - determine user's search path for man pages

SYNOPSIS
      man   [-acdfFhkKtwW]   [--path]   [-m system] [-p string] [-C config_file]
      [-M pathlist] [-P pager] [-S section_list] [section] name ...

DESCRIPTION
      man formats and displays the on-line manual pages.   If you specify sec-
      tion,   man   only looks in that section of the manual.   name is normally
      the name of the manual page, which is typically the name of a   command,
      function,   or   file.   However,   if   name contains a slash (/) then man
      interprets it as a file specification, so that you can do   man   ./foo.5
      or even man /cd/foo/bar.1.gz.