How to create a unique identifier for tracking?

How to create a new Unique Identifer Number without Clashes

  • Help Creating A New Unique Identifier I have two files with multiple sales data for housing parcels that possess unique identifiers UIN (14 digit strings, but again, they aren't unique because some houses have been sold multiple times). I have dates for each of these sales (Month, Year, and Day), and I wanted to mix the dates and the parcel data to create a new actually unique identifier based on the last date sold; so I can join the two separate datasets. I've tried mixing the Parcel Numbers and the Sale Date in multiple ways (multiplying, adding, squaring etc) to create a new UIN but I keep creating new conflicting duplicates. I can see this is a mathematical result based simply on the number of data points that I am creating. How do I do this?

  • Answer:

    Do you need the new number to be a 14 digit integer? If no, just concatenate (old) UIN and date-month-time and be done with it. If yes, then a hash will have a vanishingly small number of collisions compared to anything you are likely to come up yourself.

stratastar at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

Strictly speaking a hash isn't guaranteed to be unique but will be almost all of the time. A Universal Unique IDentifier (UUID) on the other hand is. As a bonus you don't need the original data to make the UUID, so it's probably computationally more efficient to generate. So in OS X or Ubuntu from the terminal you can issue the command uuidgen which will give you a uuid:$ uuidgen9FD86CE2-B48E-4C07-99C8-31E6DD887122$ uuidgen079C0FB7-AD15-42C3-B04A-C4081543E9A5And so on. If you're unfortunate enough to be stuck in windows land there's some info http://msdn.microsoft.com/en-us/library/aa373930%28VS.85%29.aspx that may be useful, but I'm damned if I understand it.

singingfish

Look into making a http://en.wikipedia.org/wiki/SHA-1 or other http://en.wikipedia.org/wiki/Hash_function of your identifiers.

Blazecock Pileon

For example, if you're on a Mac OS X or Linux workstation, you can open up a terminal and use OpenSSL's SHA1 hash function. Here's what you get for an example ID:$ cat > test.txtParcelNumber0001_SaleDate042110^D$ cat test.txt | openssl dgst -sha1 9ee72b1d2a4076d3c64132ef49b79be35b89eb39Here's the second example ID:$ cat > test2.txtParcelNumber0002_SaleDate042110^D$ cat test2.txt | openssl dgst -sha1 73770cfc5f0f7c0ab8b03cd459d3ec22a0858b1dThat ^D character is a Control-D keypress.

Blazecock Pileon

As has been pointed out, you want a hash. I'm not sure what you're using these values for, but FYI - it's generally advisable to make unique identifiers completely independent of the data. You never know if your requirements might change and it becomes inconvenient to have your identifier tied to what it's identifying. I can't think of an example now, but if programmers could be that clairvoyant we wouldn't have any problems.

wonnage

Ack, my Ubuntu VM just got corrupted. I'll try out the hash library in R.

stratastar

zippy, the concatenate idea came to me after I posted, and I did it in excel. wonnage, I / we use the parcel number as the ID because it is matched in a spatial GIS data base. I needed a new ID to join data from two different time points of that parcel ID. Thanks guys for the help.

stratastar

Yeah, hashes can collide. Very unlikely. I was about to suggest concatenation. If a hash works, so will concatenation, it will just be longer.

delmoi

Here's a complicating question: do you need to be able to work backwards to the original parcel number from the new unique ID? If so, secure hashes are exactly the wrong solution, since the whole point of a secure hash is to obscure the original data. Then again, I suppose this might be desirable, depending on your needs. If it were up to me and I didn't have a need to obscure the underlying data, I'd just concatenate them them as parcelnumber_YYYYMMDD.

adamrice

+1 to adamrice's advice. We really need to know what you're planning to do with this before offering you any further advice. Hashes are indeed the wrong solution for creating unique identifiers. There can be overlaps, however unlikely they might be. If you legitimately have have two identical rows (which could hypothetically happen if a house changes hands twice in one day), you're essentially up a creek, as I don't think you have enough data to differentiate between legitimate duplicates, and duplicates that are occurring as a result of overlap between the two files. Hashing/concatenation won't help you here. I'm also a bit confused about the difference between these two data files. Do they contain the same fields? It sounds like you should really be using a proper database to manage this data (Access is fine; Excel is not). If you're using SQL, you can query two separate tables, and grab a list of unique items using the http://www.w3schools.com/sql/sql_union.asp keyword (this is especially helpful if you have other fields in your data file that can help establish "uniqueness"). Also, once you have your data in a database, you can create a field in your table to be used as an automatically-assigned unique identifier.

schmod

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.