home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: sci.crypt
- Path: sparky!uunet!pipex!warwick!pavo.csi.cam.ac.uk!rja14
- From: rja14@cl.cam.ac.uk (Ross Anderson)
- Subject: Re: Automatic lang. determination of titles/subj. lines?
- Message-ID: <1993Jan21.113751.17113@infodev.cam.ac.uk>
- Sender: news@infodev.cam.ac.uk (USENET news)
- Nntp-Posting-Host: ely.cl.cam.ac.uk
- Organization: U of Cambridge Computer Lab, UK
- References: <1993Jan20.163448.17017@daimi.aau.dk>
- Date: Thu, 21 Jan 1993 11:37:51 GMT
- Lines: 22
-
- In article <1993Jan20.163448.17017@daimi.aau.dk>, lhp@daimi.aau.dk (
- Lasse Hiller|e Petersen) writes:
-
- > Rather than reinventing the wheel, I'd like to know whether someone knows
- > of a program for the automatic determination of the language of short
- > sentences, titles or subject lines.
-
- An effective solution to this problem was found by a chap called Trevor Coates
- who ran a translation agency in London in about 1980. The trick is to look at
- the short words (up to three letters). Each language has a unique set of these
- and the decision process is extremely fast - three or four words are enough to
- specify the language uniquely.
-
- Trevor's system was not even implemented on a computer, but on a single sheet
- of paper. That's how fast and simple it is.
-
- I don't have a copy of the sheet of paper but I expect you could reinvent it
- without too much trouble.
-
- Hope this helps
-
- Ross
-