How could I reduce the length of the code?

Needed: Simple Algorithm to Reduce Length of Concatenated Numeric Values ($10)

  • Here's the problem I need to solve: I need to create a unique key in an Oracle DB from the following source data: field 1: char(2) sample data: 67 field 2: char(6) sample data: 000019 field 3: char(6) sample data: 000182 Here are the constraints: There may be duplicate values in any field, but the combination of field1, field2 and field3 will always be unique. The maximum key length must be 12 characters or less, and every character in the key must be a numeric value [0123456789] All positions in all 3 fields may contain data - I can't simply truncate the first position of field2 and field3 and concatenate all 3 to arrive at a 12 character (or less)unique key. The challenge seems to be generating the key from the source fields by a numerical calculation that yields a value that is unique and cannot be produced from the a different set of source fields. Here is an example: field 1: 12 field 2: 000019 field 3: 000182 Simple multiplication (12 x 19 x 182) yields 41496 - which fits the length and numerical character constraints, but could be produced by the same values in a different order: example: 12 x 182 x 19 also = 41496 I suspect their is an algorithm or other calculation (crypto?) that will produce the key as desired. No code required in the answer, just the calculation or a link to the calculation. Thanks and good luck! -Rick

  • Answer:

    Hi, Rick: Thanks for inviting me to post an Answer. The crux of the impossibility of what you would like to do, pair up all possible combined values for three fields: field 1: two digits field 2: six digits field 3: six digits with a uniquely corresponding twelve digit string, is that there are more of the first set of possibilities (10 to the 14th power) than there are of the latter (10 to the 12th power). The "pigeonhole" principle is an often used conceptual tool (theorem) in mathematics that expresses this notion: [Pigeonhole principle -- Wikipedia] http://en.wikipedia.org/wiki/Pigeonhole_principle "The pigeonhole principle states that if n pigeons are put into m pigeonholes, and if n > m, then at least one pigeonhole must contain more than one pigeon. Another way of stating this would be that m holes can hold at most m objects with one object to a hole; adding another object will force you to reuse one of the holes." As a practical matter problems such as the one you ask do arise in computer science, and a design decision must be made how to proceed. One approach that may address your needs is "hashing" with "collision avoidance". [hashing -- Whatis.com] http://whatis.techtarget.com/definition/0,289893,sid9_gci212230,00.html The word "hash" is used somewhat loosely in various aspects of programming, often in connection with storing a set of things (but also in cryptography). Here we will discuss a "hash" function that takes as arguments the three field values and returns a twelve digit "hash". Even though no 1-1 function can be extended to all 10 to the 14th possible inputs, there may be practical value in an approach that defines a unique value for all inputs that will actually arise in an application. In this scenario you anticipate having far fewer than 10 to the 12th records to store, but you cannot tolerate the risk of assigning the same 12 digit "hash" to more than one record (a "collision" of hash values). We will assume that the hash value (twelve digit string) is to be stored in the same table as a table that holds the three original input strings, as well as possibly in other locations (as the hash value is slight more compact than the combined storage of the three original fields). This table we will call the hash table, and it provides a lookup to "reverse" the assignment performed by the hash function. The main idea is to have a fairly simple computation to give a trial hash value, then check the table to see if has been used already. It is has, apply a "shift" of one sort or another until an unused value is obtained. Note that this assumes far fewer than 10 to the 12th power values need to be managed in this fashion. The practical reason for this is that you don't want to have a huge "overhead" in shifting values around to find an open "parking space", so to speak. Keep in mind the "birthday problem" as a model of how the chances of a collision can increase dramatically even though a relatively small fraction of the possible hash values are assigned. [Ask Dr. Math: The Birthday Problem] http://mathforum.org/dr.math/faq/faq.birthdayprob.html Earlier I suggested a simple approach for "calculating" the hash value by throwing away the two leading digits of fields 2 & 3. It doesn't really matter how the leftover digits are glued together. If you are doing it in SQL, a natural option would be to use substring( ) to drop those leading digits and string concatenation to put the remnants together. The ideal "shift" function to have for avoiding collisions is one that jumps around "wildly", is easy to compute, and yet manages to visit (eventually) every potential value (looking for the open slot). It's not easy in general to satisfy all three criteria, and when the density of collisions is low (due to the paucity of records involved) and speed important, the first two criteria of greatest importance. A mathematical treatment of such problems might be to pick the largest prime below 10 to the 12th, and do all the computations modulo that prime. Supposing that the value of all zeroes can be excluded from the hash values by one means or another, a "shift" function with the nice property of visiting all the possible slots could consist of multiplication by a "primitive root" of the prime modulus, which we might choose to have six digits or so to give plenty of "jumping around" of the shifted values. If a practical algorithm of this sort would be helpful to you, I'd be happy to elaborate upon seeing a Request for Clarification from you. Let me know if you are intending to do the coding in SQL or in a more powerful language. best wishes, mathtalk-ga

rick94404-ga at Google Answers Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.