Index of /stego/

 Name                   Last modified     Size  Description

 Parent Directory       15-Aug-96 13:58      -  
 stego.shar.gz          27-Dec-94 17:55    12K

STEGO(1)                 USER COMMANDS                   STEGO(1)

NAME
     stego - encode binary file as text

SYNOPSIS
     stego -d|-e
          [ -w outdict ] [ -f dictfile ] [ -l linelen ]
          [ -t textfile ] [ -u ] [ -v ] [ infile [ outfile ] ]

DESCRIPTION
     Steganography is the art and science  of  secret  communica-
     tion,  encompassing  everything  from  invisible  ink to the
     latest techniques of spread  spectrum  broadcasting.   While
     cryptography  focuses  on  preventing  any  but the intended
     recipients' begin able to read the  content  of  a  message,
     steganography  attempts  to  hide  the very existence of the
     message, usually by disguising it in another,  more  innocu-
     ous, form.

     Steganography has its place even if cryptographic tools  are
     generally  accessible  and  deemed to provide adequate secu-
     rity.  Often, one wishes not only to protect  messages  from
     interception,  but  also  to  obscure  the  fact that one is
     exchanging encrypted messages at all.  Simply using  encryp-
     tion  creates,  in  the  minds of nosy people, a presumption
     that you're doing something wrong.  Why raise suspicion?

     One of the very best ways to hide an encrypted message is to
     embed it as low-level noise in an image or sound file: tools
     exist to accomplish this.  This technique has the  disadvan-
     tage,  however,  of  requiring  large files in order to hide
     substantial  data  without   making   the   message-carrying
     ``noise''  too obvious.  Lossy compression cannot be used to
     reduce the file size,  as  it  would  destroy  the  embedded
     information.

     stego, short for Steganosaurus, provides a compromise  which
     transforms  any  binary  file  into nonsense text based on a
     dictionary either given explicitly or built on the fly  from
     a  source  document.   Its  name,  which  recalls the large,
     heavily-armoured dinosaur Stegosaurus,  was  chosen  because
     this  is  a  large, slow moving form of steganography which,
     nonetheless, is armoured and robust.  The output of stego is
     nonsense,  but  statistically resembles text in the language
     of the dictionary supplied.  Although a  human  reader  will
     instantly  recognise  it  as gibberish, statistical sampling
     employed by eavesdroppers to detect encrypted  messages  may
     consider  it  to be unremarkable, especially if a relatively
     small amount of such text appears within a larger  document.
     A human snoop would have to read the entire document just to
     discover that it contained some curious passages.

     stego makes no attempt, on its own, to prevent your  message
     from  being  read.   It  is the equivalent of a book code as
     large as the number of unique words in the dictionary; tech-
     niques  exist  to  break book codes even without obtaining a
     copy  of  the  code.   Cryptographic  security   should   be
     delegated  to  a  package  intended for that purpose such as
     pgp.  stego can then be applied  to  the  encrypted  output,
     transforming  it into seemingly innocuous text for transmis-
     sion.  Text created by stego uses  only  characters  in  the
     source  dictionary or document (plus standard ASCII punctua-
     tion symbols), so it can be  sent  through  media,  such  as
     electronic  mail,  which cannot transmit binary information.
     Unlike files encoded with uuencode or pgp's ``ASCII armour''
     facilities,  the result doesn't scream ``suspicious'' at the
     (very) first glance.

OPTIONS
     -d          Decode: the input, previously created  by  stego
                 using the same dictionary, is decoded to recover
                 the original input file.

     -e          Encode: converts the input into an  output  text
                 file  using  the specified (or default) diction-
                 ary.

     -f dictfile The specified dictfile is used as the dictionary
                 to  encode  the file.  The dictionary is assumed
                 to be a text file with one word per  line,  con-
                 taining  no  extraneous  white  space, duplicate
                 words, or punctuation within the words.  To ver-
                 ify that the given dictionary meets these condi-
                 tions, use the -v option.  The default  diction-
                 ary is the system's spelling checker dictionary,
                 if any.

     -l len      Output lines will be broken so as not to  exceed
                 len  characters,  if  possible.   len may be any
                 number up to 4096 characters per line,  allowing
                 generation  of  output compatible with word pro-
                 cessors which write each paragraph as  a  single
                 long  line.   The  default  is 65 characters per
                 line.

     -t textfile The named textfile is used to build the diction-
                 ary  used  to  encode  or decode the input file.
                 The textfile is scanned and words, consisting of
                 an ISO 8859/1 alphabetic character followed by a
                 sequence of  ISO  alphanumeric  characters,  are
                 extracted.   Duplicate  words  are automatically
                 discarded to  prevent  errors  in  encoding  and
                 decoding.

     -u          Print how-to-call information.

     -v          Verbose diagnostic messages are sent to standard
                 error   chronicling   the  momentous  events  in
                 stego's processing of the file, and extra verif-
                 ications  of  the  correctness of the dictionary
                 are made.

     -w outdict  The dictionary, usually built from a  text  file
                 specified  with  the  -t option, is written into
                 the file outdict in a form suitable  for  subse-
                 quent  use  as a dictionary supplied with the -f
                 option: all duplicate words and words containing
                 punctuation  characters  are  deleted,  and each
                 word appears  by  itself  on  a  separate  line.
                 Preprocessing a text file into dictionary format
                 allows it to be loaded much faster in subsequent
                 runs of stego.

APPLICATION NOTES
     The efficiency of encoding a file as words depends upon  the
     size  of  the  dictionary used and the average length of the
     words in the dictionary.   When  used  in  conjunction  with
     existing  compression  and  encryption  tools, the resulting
     growth in file size is usually acceptable.  For  example,  a
     random  extract of electronic mail 32768 bytes in length was
     chosen as a test sample.  Compression  with  gzip  compacted
     the  file  to  14623  bytes.   It  was  then  encrypted  for
     transmission to a single recipient with pgp, which  resulted
     in a 14797 byte file.  (Even though pgp has its own compres-
     sion, smaller files usually result from initial  compression
     with  gzip.   In  this case, pgp alone would have produced a
     file of 15194 bytes.)

     Using a 25144 word  spelling  dictionary,  stego  translated
     this  file  into  a  71727 byte text file.  Thus, the growth
     from the original text file into the encrypted  and  encoded
     output  was  a  factor of 2.2.  However, re-compressing this
     output file with gzip, a perfectly unsuspicious  act  (since
     anybody  can  uncompress  such  a file), reduces it to 37290
     bytes, a growth of only 14% compared to  the  original  text
     file.

     The  ability  to  recognise  gibberish  in  text  is  highly
     language  dependent.   Using a dictionary in a language dif-
     ferent than the mother tongue of the suspected  eavesdropper
     may  better  disguise  a message, especially if you and your
     correspondent have a credible reason to communicate in  that
     language.  For example, if you don't speak French, try using
     the electronic text of Jules Verne's  ``De  la  Terre  a  la
     Lune'' available from:

    ftp://ftp.fourmilab.ch/pub/kelvin/etexts/DeLaTerreALaLune.txt

     as  your -t dictionary file (be sure to strip the header and
     trailer from this file, leaving  only  the  French  language
     body text).

FILES
     If no infile is specified or infile is a single ``-'', stego
     reads  from  standard input; if no outfile is given, or out-
     file is a single ``-'', output is sent to  standard  output.
     The input and output are processed strictly serially; conse-
     quently stego may be used in pipelines.

BUGS
     Dictionaries (default or specified with the -f  option)  are
     not  checked for duplicate words or words containing forbid-
     den punctuation characters unless the -v option is  present.
     It's  a  good  idea  to use the -v option when testing a new
     dictionary.

     The default dictionary is the system spelling  checker  dic-
     tionary.   This  dictionary  is not standard across all sys-
     tems.   Users  exchanging  information   between   different
     machines  and/or  operating system versions should make sure
     they're using  identical  dictionary  files  to  encode  and
     decode messages.

     A comprehensive spelling checker  dictionary  is  less  than
     ideal  for  low-profile  steganography.   For  example, most
     spelling dictionaries include words such  as  ``plutonium,''
     ``assassinate,''  ``heroin,''  and ``CIA,'' which might tend
     to draw  unwanted  attention  toward  your  document.   It's
     better  to  use  a dictionary drawn from a text that doesn't
     contain such trigger words.

     The output would be less obviously gibberish if  based  upon
     template sentence structures filled in by dictionaries which
     specify parts of speech.  This  would,  of  course,  require
     different  templates  for  each  natural  language  and dic-
     tionaries containing part-of-speech information.   It  would
     also  preclude  using  simple word lists or documents as the
     dictionary.

     stego accepts 8-bit  ISO  8859/1  characters  in  dictionary
     files  and,  if  they are present, emits them in its encoded
     output.  If the medium used to transmit the output of  stego
     cannot  correctly  deliver  such data, the recipient will be
     unable to reconstruct the original message.  To  avoid  this
     problem, either encode the data before transmission or use a
     dictionary which  contains  only  characters  which  can  be
     transmitted without loss.

SEE ALSO
     gzip(1), pgp(1), iso_8859_1(7), uuencode(1)

AUTHOR
          John Walker
          kelvin@fourmilab.ch
          http://www.fourmilab.ch/

     This software is in the public domain.  Permission  to  use,
     copy,  modify, and distribute this software and its documen-
     tation for any purpose and without fee  is  hereby  granted,
     without  any  conditions  or restrictions.  This software is
     provided ``as is'' without express or implied warranty.