home *** CD-ROM | disk | FTP | other *** search
- Comments: Gated by NETNEWS@AUVM.AMERICAN.EDU
- Path: sparky!uunet!paladin.american.edu!auvm!UNC.BITNET!UPHILG
- Message-ID: <SAS-L%92121718133832@VTVM2.CC.VT.EDU>
- Newsgroups: bit.listserv.sas-l
- Date: Thu, 17 Dec 1992 18:15:00 EST
- Reply-To: "Philip Gallagher,(919)966-1065" <UPHILG@UNC.BITNET>
- Sender: "SAS(r) Discussion" <SAS-L@UGA.BITNET>
- From: "Philip Gallagher,(919)966-1065" <UPHILG@UNC.BITNET>
- Subject: Re: ID Numbers and confidentiality
- Comments: To: DENNIS G FISHER <AFDGF@ALASKA.BITNET>
- Lines: 67
-
- Dennis Fisher asked about concealing the identity of persons
- in a dataset from a grad student who needs to work with the
- data:
-
- Dennis,
- I can think of several structural situations where this might not
- work, but also many situations where it would. How about this:
-
- DATA DISCREET(DROP=FIRSTNAM LASTNAM LABEL='File for student')
- TELL_ALL(LABEL='File for Dennis, who knows all')
- LINKFILE(KEEP=ID FIRSTNAM LASTNAM LABEL='Emergency file')
- ;
- INFILE ...;
- INPUT month 1-2 day 3-4 year 5-6 ;
- firstnam $ 35-42 lastnam $ 43-54
- ;
- ID = _N_;
- OUTPUT DISCREET TELL_ALL LINKFILE;
- RUN;
-
- If there were many files you would have to go through the
- nuisance process of creating the LINKFILE with the first dataset
- and then merging it onto subsequent datasets by FIRSTNAM LASTNAM
- to get the ID onto them, etc., but it is a feasible thing in lots
- of situations;
-
- If it were a situation in which having obviously sequential
- numbers would reveal the names to a true snoop, you could
- sort the original dataset by some set of variables before
- assigning values to ID.
- If that were not enough, you could make the IDs look much
- more mysterious by abandoning the _N_ technique and
- substituting something a bit less obvious, like:
-
- RETAIN ID 0;
- ID = ID + INT(10*RANUNI(1234567) );
-
- This also would be monotonically increasing and would undoubtedly
- not fool the CIA or NASA or NSA or the British codebreakers, but
- the average grad student with lots of work to do might not have
- time to fuss with it. Of course, if I really suspected that the
- student would try to beat the system, I would ditch the student and
- find someone else. The best guarantee of confidentiality is
- Integrity - without that you know there will always be someone
- clever enough to break any code you devise. Isn't there?
-
- Phil Gallagher
-
-
-
-
- > I have a situation with a grad student in which I have access to
- > a dataset that includes confidential names that she cannot have
- > access to. I want to create a unique identifier so that
- > every time the same name shows up in the dataset, that person
- > gets the same identifier. We do not have date of birth or
- > any other numbers that we could use. The input statement looks something like
- > Input month 1-2 day 3-4 year 5-6 firstnam $ 35-42 lastnam $ 43-54 ;
- > We are using VMS and can use either 6.06 or 6.07.
- > If I could create a unique identifier, then I could just output
- > another raw dataset that has the unique identifier and the other
- > variables that she needs without the names, transfer that dataset
- > to her account, and everything would be fine.
- > Thanks in advance for your help.
- > Dennis Fisher
- > Center for Alcohol and Addiction Studies
- > University of Alaska Anchorage
-