catdoc

Section: User Commands (1)
Updated: Version 0.35
Index Return to Main Contents
 

NAME

catdoc - reads MS-Word file and puts its content as plain text on standard output  

SYNOPSIS

catdoc [-aswth] files...

wordview [file]  

DESCRIPTION

catdoc behaves much like cat(1) but it reads MS-Word file and produces human-readable text on standard output. Optionally it can use latex(1) escape sequenses for characters which have specail meaning for LaTeX. It also makes some effort to recognize MS-Word tables, although it never tries to write correct headers for LaTeX tabular environment.

catdoc can be invoked as filter, if you supply "-" instead of filename, but it is probably useless. It could be removed in future versions, becouse true parsing of Word file (fast saves, footnotes) requires seekable output.

wordview is Tcl/Tk script to view word files in X.  

OPTIONS

-a
- converts non-standard printable char into readable form (default). Separates table columns with TAB
-t
- converts all printable chars, which have special meaning for LaTeX(1) into appropriate control sequences. Separates table columns by &.
-w
disables word wrapping. By default catdoc output is splitted into lines not longer than 72 characters and paragraphs are separated by blank line. With this option each paragraph is one long line.
-s
exits with non-zero exit code, if MS-Word signature is not found before first printable paragraph, producing no output.
-h
- displays brief usage message and exits

All options affect only files, specified after them in command line.
   

BUGS

Can produce garbage, if file contain embedded illustrations. Doesn't handle fast-saves properly. Prints footnotes as separate paragraphs at the end of file, instead of producing correct latex commands.

 

AUTHOR

V.B.Wagner <vitus@fe.msk.su>


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
BUGS
AUTHOR

This document was created by man2html, using the manual pages.
Time: 23:39:49 GMT, February 15, 2023