home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Monster Media 1993 #2
/
Image.iso
/
gene
/
ukc_sidx.zip
/
UKC_SIDX.TXT
< prev
Wrap
Text File
|
1993-09-02
|
7KB
|
139 lines
UKC_SIDX.ARJ
~~~~~~~~~~~~
Two Special Surname/Soundex indexes to the 2% Census Sample
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This archive contains:-
UKC_SIDX.TXT The file you are now reading.
UKC_NI0.TXT ) Two name indexes, derived from the original name
UKC_SDX.TXT ) index to the 2% Census Sample.
These files have been contructed specially for use with the program
XTRACT, written by Ron MacRae and Rosemary Lockie, to help with
extraction of households with specified surnames from the UK 2% Census
Sample files, UKC_ccc.ARJ. Our program will look up the surnames(s) you
specify for your search and generate the appropriate search request by
selecting the appropriate UKC_ccc.ARJ files to search for the counties
the surname occurs in - automatically.
Both files contain a list of surnames found in the various county files,
and are derived from the original name index, UKC_NIDX.TXT. UKC_NI0.TXT
is a straight alphabetical surname listing. UKC_SDX.TXT has the soundex
code for the surname added, and is sorted in order of soundex code.
UKC_NI0.TXT began as a straight copy of UKC_NIDX. However for ease of
use within XTRACT, and to keep the overall size of the index to a
minimum, the following changes were made.
1. All counties for one surname have been combined onto the one line,
separated by commas. The county trigraphs have been replaced with
dinomes, 01 to 92 to represent the UK counties.
2. Trailing question marks on surnames have been ignored, so that
entries for BROWN and BROWN? or BROWN?? have been combined together
in the resultant index.
N.B. Question marks elsewhere in the surnames have been retained.
3. Some of the entries in the original surname index have been split,
if there appears to be more than one choice of surname. So for
instance, two entries have been made for "SINCLAIR OR MCKELLAR",
"SINCLAIR" and "MCKELLAR" (found in BUT5101.TXT) However, "DE LA
MOTTE" (DOR5106.TXT) and "VAN DEN HONERT" (WAR5117.TXT) and similar
have been retained as single names (in these two examples, if the
first name is less than 4 characters - although the overall
algorithm used for splitting is rather more complicated than that).
Together, these two changes have resulted in a 3% saving in the size of
the overall straight name index file:- 553,680 bytes, compared with
783,438 bytes in the original. UKC_SDX in its raw state adds an
additional 733,290 bytes (229,625 bytes compressed).
The format of the two files is as follows:-
UKC_NI0.TXT format surname{tab}dd,dd,dd...
UKC_SDX.TXT format sndx{sp}surname{tab}dd,dd,dd...
In UKC_SDX, a single space separates the soundex code from the surname.
A {tab} character (ASCII value 09) is used to separate the surname
(variable length) from the list of dinomes. The soundex code is always 4
characters, and either of these indexes may be imported into a database
file if desired. If so, you will need to know that the maximum length of
line is 236 characters, and the maximum length of surname contained
within the 236 characters is 19.
The way to do this would be to create a database with the following
structure:-
Soundex 5 (may be reduced to 4, after importing. 5 characters
(allows for the space on import.
Data 236 Surname, and list of county dinomes.
Surname 19 To be filled in after import.
Please note that if you wish to separate the surname out as a separate
field, you can do so with the following dBase command, or similar in your
own database language:-
replace all surname with left(data,at(chr(9),data)-1)
A table of the counties, and the digraphs chosen follows:-
01 ABD Aberdeen 47 LKS Lanarkshire
02 AGY Anglesey 48 LAN Lancashire
03 ARL Argyll 49 LEC Leicestershire
04 AYR Ayrshire 50 LIN Lincolnshire
05 BAN Banff 51 LLS Linlithgow
06 BDF Bedfordshire 52 MER Merioneth
07 BRK Berkshire 53 MDX Middlesex
08 BEW Berwick 54 MLN Midlothian
09 BRE Brecknockshire 55 MON Monmouth
10 BKM Buckingham 56 MGY Montgomery
11 BUT Bute 57 MOR Moray
12 CAI Caithness 58 NAI Nairn
13 CAM Cambridgeshire 59 NFK Norfolk
14 CGN Cardiganshire 60 NTH Northamptonshire
15 CMN Carmarthenshire 61 NBL Northumberland
16 CAE Carnarvonshire 62 NTT Nottinghamshire
17 CHS Cheshire 63 ORK Orkney
18 CLK Clackmannan 64 OXF Oxfordshire
19 CON Cornwall 65 PEE Peebles
20 CUL Cumberland 66 PEM Pembroke
21 DEN Denbighshire 67 PER Perthshire
22 DBY Derbyshire 68 RAD Radnor
23 DEV Devon 69 RFW Renfrew
24 DOR Dorset 70 ROC Ross
25 DNB Dumbartonshire 71 ROX Roxburgh
26 DFS Dumfries 72 SEL Selkirk
27 DUR Durham 73 SAL Shropshire
28 EDN Edinburgh 74 SOM Somerset
29 ELG Elgin 75 STS Staffordshire
30 ESS Essex 76 STI Stirling
31 FIF Fife 77 SFK Suffolk
32 FLN Flint 78 SRY Surrey
33 ANS Forfar (Angus) 79 SSX Sussex
34 GLA Glamorgan 80 SUT Sutherland
35 GLS Gloucestershire 81 WAR Warwickshire
36 HAD Haddingtonshire 82 WES Westmorland
37 HAM Hampshire 83 WIG Wigtown
38 HEF Hereford 84 WIL Wiltshire
39 HRT Hertfordshire 85 WOR Worcestershire
40 HUN Huntingdon 86 ERY Yorkshire East Riding
41 INV Inverness 87 NRY Yorkshire North Riding
42 IOW Isle of Wight 88 WRY Yorkshire West Riding
43 KEN Kent 89 YKS Yorkshire County
44 KCD Kincardine 90 ZET Shetland
45 KRS Kinross 91 ANT Antrim
46 KKD Kirkcudbright 92 RUT Rutland
This information has been prepared by Rosemary Lockie, 2:253/188 in
FidoNet, 2nd September 1993.