How to read large data set at hourly interval?

C++ hashing table in data structure

  • QUESTION: Hello..Zlatko,, I have some question to ask you regarding my project question. This is the question : Write a program to perform hash table using two different hashing functions and two different resolution methods to solve the collision when happened. The record to be inserted in the hashing table, must be contains of at least 4 fields. Can you please help me on how i should start. Actually, I understand the hashing function but I got stuck on how i should start to code this program. The question ask to choose any methods of hashing and i intend to choose the simplest one first, which is the direct method. Then, i should have at least 4 field of records. Is it means that i will going to make an output file like the .txt in order to save the records? Or just by make it in array? And also, I will going to have a dynamic array or just a static array? Can you please guide me steps by steps what i should make first to code this program because it really make me stuck.. Your kindness is much appreciated..Thank you so much.. ANSWER: Hello wale89. I would like some more time to think about some parts of your question but I can give you some ideas now. The problem states that the record must have at least 4 fields. I think this has nothing to do with input or output files. The data for the records could come from a file, but it's not necessary. Certainly there is no need for an output file. Each record is a struct or a class that is stored in the hash table. For example, lets have a hash table that holds instances of type MyRecord. The record could be declared like this: struct MyRecord { int field1; int field2; int field3; char field4[32]; }; In this example, I chose a record with 3 integers and a character array. The problem does not say what the fields should be, and it does not say how many fields are used in the hash function, so I'm not really sure why at least 4 fields were requested. The hash table itself is an array of buckets. You can put the hash table into a class, with methods for adding and finding instances of MyRecord. At first its good to code the class specifically for your record type, but later you can use templates to make a generic hash table. For example: class MyHashTable { public: MyHashTable(int bucketCount); bool add(const MyRecord* record); MyRecord* find(const MyRecord& key) const; private: // hashValue calculates an index into the bucket array int hashValue(const *MyRecord record) const; // A record holder used for chaining struct RecordHolder { RecordHolder() { pRecord = NULL; pNext = NULL; } MyRecord* pRecord; struct RecordHolder* pNext; }; RecordHolder *mBucketArray; int mBucketCount; }; I like to start all my member variables with an 'm' character, but that's just my habit. Here, I'm thinking about having a bucket array of type RecordHolder. The RecordHolder has a pointer to a MyRecord type for holding data, and a pointer to another RecordHolder. This type of record holder would be used when chaining is used to resolve collisions. The constructor would look something like this: MyHashTable::MyHashTable(int bucketCount) { mBucketCount = bucketCount; mBucketArray = new MyRecordHolder[bucketCount]; } If you are allowed to use any part of the standard template library, then you can make your coding easier by using a std::list for chaining instead of making your own linked list. You should ask your instructor if that is allowed. Now you need to implement the other three methods in your hash table class. Does this help you to get started ? I will think about your problem a little more, and add maybe add to this answer. If you have more questions, just ask. ---------- FOLLOW-UP ---------- QUESTION: Hi..Zlatko, Thanks so much for your kindness to help me.. I'm a begginers in Data Structure, therefore i really needs your help. Just want to know, therefore we will going to have a hash table that implement what method? So far in the class, I have learned the theory of 6 methods which is Direct, Subtraction, Modulo-Division,Digit Extraction and Midsquare Rotation. In the class my lecturer just explain it by thoery that's why i having difficulty in order to implement it. Starting from scratch.. So far after reading your code, it make understand little bit..what i should do..The idea to have 3 int fields and 1 char fields is a good idea for me..my lecturer just gived the question and ask us to make it using our own way, therefore i thought, dont have to use template library.. because i'm not really well on it..unless you have the strong to teach me.. So, you said that i will have to implement the other three methods in my hash table class. Is it like the find, insert and delete function? Or like the one you have created.. bool add(const MyRecord* record); MyRecord* find(const MyRecord& key) const; Why did you use the bool type for the add function? Can you explain it.. Therefore the program will behaves something like this..? struct MyRecord { int field1; int field2; int field3; char field4[32]; }; class MyHashTable { public: MyHashTable(int bucketCount); bool add(const MyRecord* record); MyRecord* find(const MyRecord& key) const; private: // hashValue calculates an index into the bucket array int hashValue(const *MyRecord record) const; // A record holder used for chaining struct RecordHolder { RecordHolder() { pRecord = NULL; pNext = NULL; } MyRecord* pRecord; struct RecordHolder* pNext; }; RecordHolder *mBucketArray; int mBucketCount; }; MyHashTable::MyHashTable(int bucketCount) { mBucketCount = bucketCount; //can you explain this mBucketArray = new MyRecordHolder[bucketCount]; //can you explain this } Thanks so much for your kindness.. ANSWER: Hello wale. Before I go on to your new questions, I want to continue with what I started yesterday. I want to talk about how you would organize your program in a more professional way. You see yesterday that I had set up a hashValue function inside the hash table class to calculate the hash on a MyRecord. That is not really the best place to put the hash calculation code. If you put calculation code into MyHashTable, the hash table will forever be tied to MyRecord. It will not be useful for storing any other types of records. That is a poor program design. It might make more sense to put the calculation code into MyRecord, since the calculation code will need access to the data in the record. However, that would not be a poor design too because MyRecord should not be aware that it is being stored on a hash table. After all, if MyRecord is going to have functions to help it be stored on a hash table, why not functions to help it be stored on a linked list, or a tree as well? The other issue is that you need to implement more than one way of calculating a hash. One way of doing this is this: int hashCalculation(MyRecord) { if (usingFunction_1) { calculate hash using some method } else if (usingFunction_2) { calculate hash using some other method } } This method has many disadvantages which I don't want to talk about right now. The correct way to organize the program is to take what is changing, encapsulate it into a class, and have users of the class call it through an interface. In this case, it is the hash calculation that is changing. Don't worry, I will give you a complete sample program at the end. First we define an interface to the hash function like this: class HashCalculator { public: virtual int calculateHash(const MyRecord* record) const = 0; }; Hash calculations will be done only through this interface. Do you understand the idea of virtual functions ? Then you implement different hash functions, each in its own class, like this: class HashFunction_1 : public HashCalculator { public: int calculateHash(const MyRecord* record) const { cout }; Now, you need to pass a hash calculator object to the constructor of MyHashTable, so the constructor changes to look like this: class MyHashTable { public: MyHashTable(int bucketCount, const HashCalculator& calc); Finally, when the MyHashTable needs to calculate a hash value, it calls the calculator like this int hashValue = myCalculator.caclculateHash(record); You see that the calculation code is not in the MyHashTable class so the class is not tied to the contents of the records it is storing. What I have described to you is called the strategy pattern, and it is used when you want different algorithms to be available without tying the code to any one particular algorithm. You should notice that when adding a second hash calculation method, no existing code has to be changed. Only new code has to be written. This a big advantage. You can read more about the strategy pattern at http://en.wikipedia.org/wiki/Strategy_pattern Ok, as I promised here is a sample program for you to experiment with. The things that you need to do are shown with comments like // TODO // START OF CODE using namespace std; struct MyRecord { int field1; int field2; int field3; char field4[32]; }; /*The HashCalculator is the interface to many different calculation functions Notice that the calculator is specific to MyRecord. class HashCalculator { public: virtual int calculateHash(const MyRecord* record) const = 0; }; //Here is one hash calculation function for MyRecord class HashFunction_1 : public HashCalculator { public: int calculateHash(const MyRecord* record) const { // TODO implement calculation on the record cout }; /*The HashCalculator has no data, so we can have one instance of it and share it among all hash tables storing MyRecord HashFunction_1 hf1; // Here is another hash calculation function for MyRecord class HashFunction_2 : public HashCalculator { public: virtual int calculateHash(const MyRecord* record) const { cout }; // Again, we need just 1 instance of the calculator class HashFunction_2 hf2; // Here is the hash table class MyHashTable { public: // The hash table class has a reference to a hash calculator. // The calculator is set when the the hash table is created, and // cannot be changed for the life of the hash table MyHashTable(int bucketCount, const HashCalculator& calc) : mHashCalculator(calc) { // Store the bucket count into a variable to be used later. mBucketCount = bucketCount; // Allocate memory for an array of buckets. // The buckets will be of type MyRecordHoder // MyRecordHolder is a struct that points to a MyRecord // and a "next" pointer to point to another MyRecordHolder. // The next pointer is used to make a linked list for chaining. mBucketArray = new MyRecordHolder[bucketCount]; } bool add(const MyRecord* record) { int hashValue = mHashCalculator.calculateHash(record); // TODO insert into bucket array based on hash value // TODO resolve collisions as needed return true; // TODO return false if record could not be added } MyRecord* find(const MyRecord& key) { // TODO return pointer to record or return NULL if not found } private: // Hash calculation section const HashCalculator& mHashCalculator; // Bucket array section struct MyRecordHolder { MyRecordHolder() { pRecord = NULL; pNext = NULL; } MyRecord* pRecord; struct MyRecordHolder* pNext; }; MyRecordHolder *mBucketArray; int mBucketCount; }; /* Sample main creating 2 hash tables, each with its own type of hash calculator. int main(void) { MyHashTable ht1(99, hf1); MyHashTable ht2(99, hf2); ht1.add(new MyRecord); ht2.add(new MyRecord); return 0; } // END OF CODE That was a lot of information for you, and I've only spoken about organizing your program. Please read it slowly and ask questions about what you don't understand. You still need to know about hash functions and collision resolution. Your assignment says you need 2 different collision resolution methods. The collision resolution is probably the biggest part of the hash table. It will dictate how the table will be set up. For example, I set up the table in my sample code to use chaining. To use another type of collision resolution, you will probably need another type of hash table class. I don't think we can use the strategy pattern to encapsulate the collision resolution algorithm. For now, just work on one collision method. When I started explaining things, I used pretty simple names, like MyRecord. That's OK for an explanation, but it would be better to use a more realistic example. Perhaps a student record, with a name, address, telephone number, student ID, etc. Instead of a hash table named MyHashTable, you could have a class ChainingHashTable, or a class LinearProbingHashTable. Again, I ask you to be patient. I need a little more time to refresh my memory about the different hashing functions you spoke about. It has been a long time since I was a student. ---------- FOLLOW-UP ---------- QUESTION: Hi my dear Zlatko.. Again, thanks so much for your much help that you have gaved to me.. I really appreciate it.. I will deeply immersed in the code that you gave me but it will take time for me. Actualy, now i'm in the examination week..on these coming tuesday i have Statistic paper and also coming saturday will going to have another paper. Therefore, I have to delay some time for this project in order to concentrate on it. I really hope you can understand..I will give the follow up question after immersed in the code.This project will be due on the 8th April.. Therefore, I hope i will going to have enough time to complete it.. May God bless you..for all your kindness..Thank you, Best Regards, wale89 Hello, I have some question to ask you regarding this question. This is the question : Write a program to perform hash table using two different hashing functions and two different resolution methods to solve the collision when happened. The record to be inserted in the hashing table, must be contains of at least 4 fields. Can you please help me on how i should start. Actually, I understand the hashing function but I got stuck on how i should start to code this program. The question ask to choose any methods of hashing and i intend to choose the simplest one first, which is the direct method. Then, i should have at least 4 field of records. Is it means that i will going to make an output file like the .txt in order to save the records? Or just by make it in array? And also, I will going to have a dynamic array or just a static array? Can you please guide me steps by steps what i should make first to code this program because it really make me stuck.. Your kindness is much appreciated..Thank you so much..

  • Answer:

    Hello Wale. I wrote a sample hash program in the hope that it will be useful to you. The hash table uses chaining to resolve collisions, and the program has 2 hash functions using the strategy pattern I wrote about earlier. The program compiles with the latest Microsoft compiler and with the GNU compiler. I don't know how experienced you are in C++. Perhaps my talk of interfaces, and virtual functions has made things more confusing for you. Anyway, I am showing you the code as it would be written in C++. If you get stuck on your assignment, ask me for help and I will work with whatever design you are using. Good luck on your exams. zlatko.c.help at gmail.com using namespace std; struct StudentRecord { // Some convenience constructors StudentRecord(std::string name) { mName = name; } StudentRecord(std::string name, int id, std::string telephone, std::string email) { mName = name; mId = id; mTelephone = telephone; mEmail = email; } std::string mName; int mId; std::string mTelephone; std::string mEmail; /* Two student record are equal if they share the same student name (even though that is unrealistic in the real world) */ bool operator==(const StudentRecord& other) const { return this->mName.compare(other.mName) == 0; } }; /* A convenience function to print out a record ostream& operator out } /* The HashCalculator is the interface to many different calculation functions Notice that the calculator is specific to StudentRecord. class HashCalculator { public: virtual unsigned int calculateHash(const StudentRecord& record) const = 0; }; /* Here is one hash calculation function for StudentRecord. It uses just the name field This was in the first edition of the K&R book. It is also found at http://www.cse.yorku.ca/~oz/hash.html class VeryBadHashFunction : public HashCalculator { public: unsigned int calculateHash(const StudentRecord& record) const { // cout }; /* The HashCalculator has no data, so we can have one instance of it and share it among all hash tables storing StudentRecord VeryBadHashFunction veryBadHashFunction; /* Here is another hash calculation function for StudentRecord. It also uses just the name field It is also found at http://www.cse.yorku.ca/~oz/hash.html class Djb2HashFunction : public HashCalculator { public: virtual unsigned int calculateHash(const StudentRecord& record) const { // cout }; // Again, we need just 1 instance of the calculator class Djb2HashFunction djb2HashFunction; // Here is a hash table that uses chaining to resolve collisions. class ChainingHashTable { public: /* The hash table class has a reference to a hash calculator. The calculator is set when the the hash table is created, and cannot be changed for the life of the hash table */ ChainingHashTable(int bucketCount, const HashCalculator& calc) : mHashCalculator(calc) { // create the array of buckets mBucketCount = bucketCount; mBucketArray = new RecordHolder[bucketCount]; } bool add(StudentRecord* record) { // get the hash value using the calculator that was given to the constructor unsigned int hashValue = mHashCalculator.calculateHash(*record); // make the hash value fit into the bucket array hashValue = hashValue % mBucketCount; // Check for collisions if (mBucketArray[hashValue].pRecord == NULL) { // No collisions, put the record on the record holder mBucketArray[hashValue].pRecord = record; } else { // There is a collision. // Create a new record holder RecordHolder* new_rh = new RecordHolder; // put the record on the new record holder new_rh->pRecord = record; // put the new record holder on the end of the list RecordHolder* last_rh = findLastRecordHolder(hashValue); last_rh->pNext = new_rh; } // Would return false if for some reason the add could not be done return true; } StudentRecord* find(const StudentRecord& key) { // get the hash value using the calculator that was given to the constructor unsigned int hashValue = mHashCalculator.calculateHash(key); // make the hash value fit into the bucket array hashValue = hashValue % mBucketCount; if (mBucketArray[hashValue].pRecord == NULL) { // There is no record for that hash value return NULL; } /* At this point, there is a record for the hash value. We go along the chain to find the record that matches the key. */ RecordHolder* pRh = &mBucketArray[hashValue]; while(pRh != NULL) { if (*(pRh->pRecord) == key) { // The desired record has been found, return it. return pRh->pRecord; } pRh = pRh->pNext; } // At this point, the desired record does not exist in the hash table return NULL; } private: // Hash calculation section const HashCalculator& mHashCalculator; /******************************** // Bucket array section ********************************/ struct RecordHolder { RecordHolder() { pRecord = NULL; pNext = NULL; } StudentRecord* pRecord; struct RecordHolder* pNext; }; /* findLastRecordHolder finds the last record in a chain at the given bucket array index */ RecordHolder *findLastRecordHolder(unsigned int ix) { RecordHolder *last = &mBucketArray[ix]; while(last->pNext != NULL) last = last->pNext; return last; } RecordHolder *mBucketArray; // the bucket array int mBucketCount; // the size of the bucket array }; /* Sample main creating 2 hash tables, each with its own type of hash calculator. int main(void) { // Create 2 hash tables, each with a different hash function ChainingHashTable ht1(10, djb2HashFunction); ChainingHashTable ht2(10, veryBadHashFunction); // For this test, we use only ht2 ht1.add(new StudentRecord("Wale", 1, "555-1000", "[email protected]")); ht1.add(new StudentRecord("Zlatko", 2, "555-1001", "[email protected]")); ht1.add(new StudentRecord("Stephen", 3, "555-1002", "[email protected]")); cout StudentRecord key("Stephen"); StudentRecord *found = ht1.find(key); if (found) { cout cout key.mName = "Wale"; found = ht1.find(key); if (found) { cout cout key.mName = "XXX"; found = ht1.find(key); if (found) { cout return 0; } > Then, i should have at least 4 field of records. > Is it means that i will going to make an output file like the .txt in order to save the records? > Or just by make it in array? To start with, just have a struct or class with four member variables to represent a record of four fields. Select one of them as the 'key' and write a hash function for that. The hash function would have to take into account the size of the array for the hash table. > And also, I will going to have a dynamic array or just a static array? Start with a simple static array; complete the rest of the code and make sure that it works correctly. You can easily convert it to use a dynamic array, if it is required later. The most commonly used hash table size is a prime number. For more information and code fragments, see: http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx

Miningco.com Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.