NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / sci / crypt / 7057 < prev next >

Wrap

Text File | 1993-01-21 | 1.3 KB | 35 lines

Newsgroups: sci.crypt Path: sparky!uunet!pipex!warwick!pavo.csi.cam.ac.uk!rja14 From: rja14@cl.cam.ac.uk (Ross Anderson) Subject: Re: Automatic lang. determination of titles/subj. lines? Message-ID: <1993Jan21.113751.17113@infodev.cam.ac.uk> Sender: news@infodev.cam.ac.uk (USENET news) Nntp-Posting-Host: ely.cl.cam.ac.uk Organization: U of Cambridge Computer Lab, UK References: <1993Jan20.163448.17017@daimi.aau.dk> Date: Thu, 21 Jan 1993 11:37:51 GMT Lines: 22 In article <1993Jan20.163448.17017@daimi.aau.dk>, lhp@daimi.aau.dk ( Lasse Hiller|e Petersen) writes: > Rather than reinventing the wheel, I'd like to know whether someone knows > of a program for the automatic determination of the language of short > sentences, titles or subject lines. An effective solution to this problem was found by a chap called Trevor Coates who ran a translation agency in London in about 1980. The trick is to look at the short words (up to three letters). Each language has a unique set of these and the decision process is extremely fast - three or four words are enough to specify the language uniquely. Trevor's system was not even implemented on a computer, but on a single sheet of paper. That's how fast and simple it is. I don't have a copy of the sheet of paper but I expect you could reinvent it without too much trouble. Hope this helps Ross