Sort
Profile photo for Robin Verlangen

I have recently developed a comparable system, thus will not be able to disclose all of the details. However, I think it all boils down to these key points:
- do everything you have to do at the right moment (example: if you need unique users per day, in a weekly report, when do you count them?, do you need geo lookups in realtime?)
- pick state of the art proven technologies that are designed to scale almost limitlessly
- make sure there's no single point of failure: not even primarily for high availability, but also for high throughput
- make sure your team is experienced in writing these kind of applications. Even if you pick the right technology stack you can still fail if your team misuses it.

EDIT: it might also be interesting to read this article of a very large RTB platform, API Rearchitecture Series - The Juicy Details

Profile photo for Dave Voorhis

The only thing you can say about big tech companies — and, for that matter, small tech companies — is that no two are alike.

Furthermore, no two projects within a given company are necessarily alike.

With that in mind, my experience of database design in companies — big, small, tech, non-tech — is that it’s utterly inconsistent, and varies from project to project.

Sometimes it’s done by DBAs.

Sometimes it’s done by Data Architects / Data Analysts or other specialists in data management.

Sometimes it’s almost entirely ad hoc, done by developers on a project as and when they need storage. (Developer

The only thing you can say about big tech companies — and, for that matter, small tech companies — is that no two are alike.

Furthermore, no two projects within a given company are necessarily alike.

With that in mind, my experience of database design in companies — big, small, tech, non-tech — is that it’s utterly inconsistent, and varies from project to project.

Sometimes it’s done by DBAs.

Sometimes it’s done by Data Architects / Data Analysts or other specialists in data management.

Sometimes it’s almost entirely ad hoc, done by developers on a project as and when they need storage. (Developer A: “I need to add a column to the flergle table to handle zplots.” Developer B: “Dude, I think that violates 3NF.” Developer C: “Who is Trey Inif? Is he the overseas delivery manager?”)

Sometimes it’s some combination of the above, either before development or via review afterward.

Sometimes it’s something else entirely. (“Get Simon in Accounting to look at it… He’s really good with spreadsheets, and databases are just spreadsheets.”)

Etc.

Smart code completion, on-the-fly analysis, quick-fixes, refactorings that work in SQL files, and more.
Profile photo for Assistant
Assistant

Designing a database for an ad server involves creating a schema that efficiently tracks key metrics such as clicks, impressions, and conversions. Below is a typical design and an outline for a cron job to aggregate data by hour.

Database Design

1. Tables Overview

You might consider the following tables:

  • Ads
  • ad_id (Primary Key)
  • ad_content (Text or URL)
  • created_at (Timestamp)
  • updated_at (Timestamp)
  • Campaigns
  • campaign_id (Primary Key)
  • campaign_name (String)
  • start_date (Date)
  • end_date (Date)
  • budget (Decimal)
  • Impressions
  • impression_id (Primary Key)
  • ad_id (Foreign Key)
  • timestamp (Timestamp)
  • user_id (Optional, Foreign Key for tracking users)
  • device (String)
  • Clicks
  • click_id (Primary Key)
  • ad_id (Foreign Key)
  • timestamp (Timestamp)
  • user_id (Optional, Foreign Key)
  • device (String)
  • Conversions
  • conversion_id (Primary Key)
  • ad_id (Foreign Key)
  • timestamp (Timestamp)
  • user_id (Optional, Foreign Key)
  • value (Decimal, for tracking conversion value)

2. Relationships

  • Each ad can belong to one or more campaigns.
  • Each impression, click, and conversion is associated with a specific ad.
  • You can optionally track users and devices for more granular data.

Aggregation Strategy

To aggregate data by hour, you can create a separate table to store aggregated metrics or run queries on the existing tables. Here’s how you can approach it:

1. Aggregated Metrics Table

You might create an hourly_metrics table:

  • hourly_metrics
  • metric_id (Primary Key)
  • ad_id (Foreign Key)
  • hour (Timestamp)
  • impressions_count (Integer)
  • clicks_count (Integer)
  • conversions_count (Integer)

2. Cron Job Implementation

You can use a cron job to run an aggregation script every hour. Below is a pseudo-code example in Python using a SQL database:

  1. import sqlite3 
  2. from datetime import datetime, timedelta 
  3.  
  4. def aggregate_metrics(): 
  5. conn = sqlite3.connect('ads_database.db') # Replace with your database connection 
  6. cursor = conn.cursor() 
  7.  
  8. # Get the current hour and the previous hour 
  9. current_hour = datetime.now().replace(minute=0, second=0, microsecond=0) 
  10. previous_hour = current_hour - timedelta(hours=1) 
  11.  
  12. # Aggregate data from the Impressions, Clicks, and Conversions tables 
  13. cursor.execute(''' 
  14. INSERT INTO hourly_metrics (ad_id, hour, impressions_count, clicks_count, conversions_count) 
  15. SELECT 
  16. ad_id, 
  17. ? AS hour, 
  18. COUNT(DISTINCT impression_id) AS impressions_count, 
  19. COUNT(DISTINCT click_id) AS clicks_count, 
  20. COUNT(DISTINCT conversion_id) AS conversions_count 
  21. FROM 
  22. Ads 
  23. LEFT JOIN Impressions ON Ads.ad_id = Impressions.ad_id AND Impressions.timestamp >= ? AND Impressions.timestamp < ? 
  24. LEFT JOIN Clicks ON Ads.ad_id = Clicks.ad_id AND Clicks.timestamp >= ? AND Clicks.timestamp < ? 
  25. LEFT JOIN Conversions ON Ads.ad_id = Conversions.ad_id AND Conversions.timestamp >= ? AND Conversions.timestamp < ? 
  26. GROUP BY ad_id 
  27. ''', (current_hour, previous_hour, current_hour, previous_hour, current_hour, previous_hour, current_hour)) 
  28.  
  29. conn.commit() 
  30. conn.close() 
  31.  
  32. # Schedule this function to run every hour 

Cron Job Setup

To set up the cron job, you would edit your crontab file by running crontab -e and adding a line like this:

0 * * * * /usr/bin/python3 /path/to/your/script.py

This runs the script at the start of every hour.

Conclusion

This design allows you to efficiently track and aggregate ad performance metrics. The cron job ensures that your data is regularly updated, enabling real-time analytics and reporting. Adjust the database schema and script based on your specific requirements and the database system you are using.

Profile photo for Benjamin Ross

The other answers all involve database CRUD transactions which for tight time-sensitive operations like serving ads creates a bottleneck that cannot be tolerated when competing for ad space for billions of customers at any given instant.

The best solution is for each ad server to already have a cache of total click counts available for every ad it serves at bid-time.

One approach is to maintain an in-memory local count on each ad server, and increment that count in real-time as the ad is served. This local count necessitates sharing count updates between hosts that serve the same ads. With thous

The other answers all involve database CRUD transactions which for tight time-sensitive operations like serving ads creates a bottleneck that cannot be tolerated when competing for ad space for billions of customers at any given instant.

The best solution is for each ad server to already have a cache of total click counts available for every ad it serves at bid-time.

One approach is to maintain an in-memory local count on each ad server, and increment that count in real-time as the ad is served. This local count necessitates sharing count updates between hosts that serve the same ads. With thousands of ad-servers, cross-communication between hosts won’t scale. An aggregator system can be put in place to receive local counts from the ad servers, aggregate a global count, and make this global count queryable.

The Back End

To receive the local counts from the ad servers, any type of synchronous data transfer from ad-server to ad-aggregator would result in many 503 errors and the like. Instead a system like Apache Kafka can receive, fanout and stream this huge amount of data to all the aggregator hosts.

The API

Each ad server queries the API every-so-often to re-sync its local count with the newly updated global count. Ad clicks happen at a very high rate across all ads, but remain low per ad. The time to re-sync can be tuned to tradeoff between highly accurate counts and putting low-load on these aggregator hosts.

Black Friday Problem

During times of high traffic a system like this will need to scale. Increasing the amount of ad servers will increase the load on the aggregator hosts and increasing the amount of aggregator hosts will increase the fan-out load on the Kafka cluster. With all these moving pieces, there are a lot of places this system can break at the most in-opportune times. Companies like Google, Amazon and Facebook spend billions collecting information about what their users like, what websites they visit, who their friends are, and what they buy (and spend even more keeping this information secure). This is all to support their targeted ads business which brings that investment back 100-fold. Smart decisions must be made to meet scale demand with highly accurate click-counts; outages in the ads business result in extremely high loss of revenue and angry CEOs.

Profile photo for David Lewis

Designing real-time distributed counters for ad clicks involves ensuring that the system can handle a high volume of data with minimal latency and high availability. Here's how you can approach this using the keyword TBP (which we'll interpret as Topics, Brokers, and Partitions, often related to distributed messaging systems like Kafka):

1. **Topics:**

- Create dedicated Kafka topics for ad click events. Each event represents a click on an ad and includes metadata such as the ad ID, user ID, timestamp, etc.

- Design the topic schema to ensure that it can handle the necessary attributes for tracki

Designing real-time distributed counters for ad clicks involves ensuring that the system can handle a high volume of data with minimal latency and high availability. Here's how you can approach this using the keyword TBP (which we'll interpret as Topics, Brokers, and Partitions, often related to distributed messaging systems like Kafka):

1. **Topics:**

- Create dedicated Kafka topics for ad click events. Each event represents a click on an ad and includes metadata such as the ad ID, user ID, timestamp, etc.

- Design the topic schema to ensure that it can handle the necessary attributes for tracking and counting clicks.

2. **Brokers:**

- Deploy multiple Kafka brokers to handle the load of click events. The brokers are responsible for receiving, storing, and transmitting the click events.

- Ensure brokers are well-distributed across different servers or data centers to provide fault tolerance and high availability.

3. **Partitions:**

- Partition the Kafka topics to distribute the load across multiple brokers. Partitions allow parallel processing of click events, enhancing throughput and reducing latency.

- Use a partitioning key, such as the ad ID, to ensure that all events related to a specific ad are sent to the same partition. This helps in maintaining the order of events for each ad and simplifies counting.

4. **Producers:**

- Implement producers that capture ad click events in real-time from various sources (e.g., websites, mobile apps) and send them to the Kafka topics.

- Optimize producers for high throughput and low latency to ensure that click events are ingested into the system without delay.

5. **Consumers:**

- Develop consumers that read from the Kafka topics and aggregate the click counts. Consumers can use stream processing frameworks like Apache Flink, Apache Spark Streaming, or Kafka Streams.

- Consumers can maintain in-memory state or use external storage systems (e.g., Redis, Cassandra) to store intermediate counts and ensure persistence.

6. **State Management:**

- Use stateful processing to maintain the counts of ad clicks in real-time. Techniques such as windowing (e.g., tumbling windows, sliding windows) can be employed to aggregate counts over specific periods.

- Implement mechanisms for state checkpointing and recovery to handle failures and ensure data consistency.

7. **Scalability and Fault Tolerance:**

- Ensure the system is horizontally scalable by adding more brokers, partitions, and consumers as needed.

- Implement redundancy and replication at the Kafka broker level to handle broker failures. Consumers should be designed to handle reprocessing from the last known offset in case of failures.

8. **Monitoring and Alerting:**

- Set up monitoring for Kafka brokers, topics, and consumers. Tools like Prometheus, Grafana, and Kafka's own metrics can be used to track performance, latency, and throughput.

- Configure alerting to detect anomalies such as sudden drops in click counts or consumer lag, enabling quick response to potential issues.

By leveraging Kafka's Topics, Brokers, and Partitions (TBP), you can design a robust and scalable real-time distributed counter system for ad clicks that ensures high availability, low latency, and fault tolerance.

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of th

Where do I start?

I’m a huge financial nerd, and have spent an embarrassing amount of time talking to people about their money habits.

Here are the biggest mistakes people are making and how to fix them:

Not having a separate high interest savings account

Having a separate account allows you to see the results of all your hard work and keep your money separate so you're less tempted to spend it.

Plus with rates above 5.00%, the interest you can earn compared to most banks really adds up.

Here is a list of the top savings accounts available today. Deposit $5 before moving on because this is one of the biggest mistakes and easiest ones to fix.

Overpaying on car insurance

You’ve heard it a million times before, but the average American family still overspends by $417/year on car insurance.

If you’ve been with the same insurer for years, chances are you are one of them.

Pull up Coverage.com, a free site that will compare prices for you, answer the questions on the page, and it will show you how much you could be saving.

That’s it. You’ll likely be saving a bunch of money. Here’s a link to give it a try.

Consistently being in debt

If you’ve got $10K+ in debt (credit cards…medical bills…anything really) you could use a debt relief program and potentially reduce by over 20%.

Here’s how to see if you qualify:

Head over to this Debt Relief comparison website here, then simply answer the questions to see if you qualify.

It’s as simple as that. You’ll likely end up paying less than you owed before and you could be debt free in as little as 2 years.

Missing out on free money to invest

It’s no secret that millionaires love investing, but for the rest of us, it can seem out of reach.

Times have changed. There are a number of investing platforms that will give you a bonus to open an account and get started. All you have to do is open the account and invest at least $25, and you could get up to $1000 in bonus.

Pretty sweet deal right? Here is a link to some of the best options.

Having bad credit

A low credit score can come back to bite you in so many ways in the future.

From that next rental application to getting approved for any type of loan or credit card, if you have a bad history with credit, the good news is you can fix it.

Head over to BankRate.com and answer a few questions to see if you qualify. It only takes a few minutes and could save you from a major upset down the line.

How to get started

Hope this helps! Here are the links to get started:

Have a separate savings account
Stop overpaying for car insurance
Finally get out of debt
Start investing with a free bonus
Fix your credit

Profile photo for Peter Ode

Examples of really good database designs? That’s a very broad question. A more specific question might be in order. For accounting and operations? For a CMS (Content Management System)? For an EMR (Electronic Medical Records) system? Multi-tenant, such as a cloud based service where a single database might host many customers, each customer having their own WebCatalog and WebStore?

Technology limitations — A database design is also limited by the database technology used: RDBMS (Releational Database Management System), ODBMS (Object Database Management System), Hierarchical Db, Network Db, NoSQ

Examples of really good database designs? That’s a very broad question. A more specific question might be in order. For accounting and operations? For a CMS (Content Management System)? For an EMR (Electronic Medical Records) system? Multi-tenant, such as a cloud based service where a single database might host many customers, each customer having their own WebCatalog and WebStore?

Technology limitations — A database design is also limited by the database technology used: RDBMS (Releational Database Management System), ODBMS (Object Database Management System), Hierarchical Db, Network Db, NoSQL in many flavors (key-value stores, document stores, search engines, graph db, and others).

RDBMS — Most developers are familiar with SQL (Structured Query Language) capable databases, most are RDBMS but SQL can also be used with several other types. SQL is a database specific language for performing queries, data inserts and updates. Most RDBMS databases also enable changing the shape of the database via SQL.

ODBMS — I’ll answer the question by referencing the king of database technologies: Object Database Management Systems (ODBMS) that are ACID compliant. ACID (atomicity, consistency, isolation, durability) is a computer science term describing a set of features for database transactions intended to guarantee data validity despite errors, power failures and other issues.

Most of the well known RDBMS systems, such as MySQL, Oracle, SQL Server, are ACID compliant. Most of the NoSQL databases are not. For critical business systems, ACID is mandatory.

Object databases are more flexible and performant — Object databases enable much more sophisticated and performant systems than what’s possible with Relational databases, especially as the database schema complexity increases. Also, developer productivity is greatly enhanced when an ODBMS is used.

For RDBMS designs, relationships are represented by joins between tables. Most common relationships are one-to-many, then many-to-many. For example, one Customer can have many SalesOrders. Typically such joins involve key field data that is maintained in three places: (1) parent table, (2) child table, and (3) in the index for the field to speed access to child records. In the example, the CustomerNumber would be in three places. Joins are expensive (in terms of computing resources), so relational designs attempt to minimize such relationships — often limiting the design in many ways. Because of such limitations, business app developers usually first try to design an optimal relational schema, then write the app code that reads/writes to this database.

Why is Object Persistence Faster? — In an ODBMS, your database schema is represented by your class hierarchy. An ODBMS adds database behavior to your classes so each instance of an object can save itself to persistent storage (your disk drive) without first having to write code that translates an object into rows and columns (required by an RDBMS). ODBMS systems integrate with your programming language with database aware Arrays, Collections, Dictionaries… The programmer writes code as if he had unlimited RAM memory. Objects are saved to disk as objects. If an object is not in memory, the ODBMS automatically de-references a pointer and directly brings that object into RAM memory. In a RDBMS multiple disk reads, first in the index, then in the table, are required to bring the database row into memory. Then the RDBMS programmer has to write code to reassemble the object from the flat row data.

Imagine if you had to disassemble your car on your driveway before storing it in the garage. And, reassemble your car in the morning before driving to work. That’s what you’re forced to do with relational databases. With an object database, all that is automatic and database performance is orders of magnitude faster.

Real-World Object Design Example — To answer your question, I’ll reference a multi-tenant eCommerce platform that my company built back in the late 1990’s — this system is still in operation today. Tenants include different types of online stores and cloud services — all using the same object database. The platform also had tenants with electronic medical record apps. The programming language / IDE is IBM Visual Age Smalltalk, a highly productive development system shown to be 3x more productive than C#, Python, PHP, Java or JavaScript (on Node).

Here are some unique characteristics of the system, made possible because of the ODBMS (rather than a RDBMS):

<> Instead of a Customer table and Vendor table, we have a LegalEntity object with direct references to a Customer object and Vendor object and ServiceProvider, Contact, Coach, Player, Physician, Patient and others. Since object oriented relationships are practically free, when compared to the RDBMS joins, we’re free to make such database schema designs (actually implemented in our class hierarchy).

One LegalEntity can be a Company, Customer, Vendor… as needed. If we need a new type, say a Customer that can rent a car, we just add a class that might be called RentalCustomer (maybe subclassed from Customer).

<> We use the same design pattern for Product. We have a generic Product object with attributes such as Code, Name, Cost, Images (a Collection of images), Price, AlternativePrices (a Collection of alternate prices for different types of customers, common in wholesale applications)… Then we have objects that can be directly referenced (with a one-to-one relationship) to provide specialized product data and behaviors such as: SerialNumberedProduct (use for items that must track serial numbers), RentalProduct, HourlyService, Subscription, and others.

Products are easy to extend, we can re-use most of our existing code, performance is stellar.

The designer/programmer still needs to be concerned with database normalization and other best practices, but ODBMS systems have far fewer limitations for your database design — compared to RDBMS design considerations.

Some ODBMS platforms — Although many ODBMS systems support multiple object-oriented languages, typically Smalltalk, Java, C++, our favorite is Smalltalk. IBM Smalltalk has been spun off to Instantiations - VAST Platform and there’s VisualWorks (VisualWorks® Overview) — both used by Fortune 1000 Enterprises. A great open-source Smalltalk is Pharo (Phar.org) with several object database libraries available. I’ve used the ACID compliant OmniBase.

One of the very best ODBMS systems is Gemstone/S (Home) which supports Smalltalk and Java.

Here’s a link to some code to make a database connection; save an object instance; find an object; indexing; and garbage collection of unused instances in the database. sebastianconcept/Aggregate

Profile photo for Greg Kemnitz

My (short) experience at big tech seems to indicate that developers themselves do most of the database design.

Sometimes it’s good, sometimes it isn’t, and often the db is small enough that it doesn’t really matter (even in seemingly big companies that you’d normally think of as having huge dataworlds; not every database at “Big Tech” is measured in exabytes).

And yes, even at “Big Tech” I often find people doing stuff like fetching down most of a table and doing what amounts to joins in application code, often because developers “forget” that database query languages can do qualifying and filte

My (short) experience at big tech seems to indicate that developers themselves do most of the database design.

Sometimes it’s good, sometimes it isn’t, and often the db is small enough that it doesn’t really matter (even in seemingly big companies that you’d normally think of as having huge dataworlds; not every database at “Big Tech” is measured in exabytes).

And yes, even at “Big Tech” I often find people doing stuff like fetching down most of a table and doing what amounts to joins in application code, often because developers “forget” that database query languages can do qualifying and filtering far better than your app can do in a “for loop”.

Most people doing relational database design know enough to avoid normalization issues. Where I see problems is in a sort of misplaced cleverness: people who get excessively “cute” and think that a gross but “correct” query is a good thing to run dozens of times on a busy production system. You often have to work with them to simplify their query or break it apart to use temporary tables, etc.

I’ve worked with some developers doing schema design, and have had good experience in getting them to learn good methodologies. I don’t typically encounter database schemas until they’ve been deployed and are already sorta broken to the point where they’re causing production issues…

Profile photo for Fiverr

The best way to find the right freelancer for digital marketing is on Fiverr. The platform has an entire category of professional freelancers who provide full web creation, Shopify marketing, Dropshipping, and any other digital marketing-related services you may need. Fiverr freelancers can also do customization, BigCommerce, and Magento 2. Any digital marketing help you need just go to Fiverr.com and find what you’re looking for.

Profile photo for Bruce A McIntyre

The first step in designing a relational database is to understand what is going to be the desired output.. What reports, queries, print outs, forms are going to be required.
Then you need to understand how the different data elements of these outputs relate to each other.
Then you need to understand the structure of the data.. Which elements are dependent on others, what pieces are unique, and what pieces are related or dependent on others.
Then you need to figure out where this data is going to come from. Is it all new? Is some already somewhere else? Do users need to enter it?
Then you can

The first step in designing a relational database is to understand what is going to be the desired output.. What reports, queries, print outs, forms are going to be required.
Then you need to understand how the different data elements of these outputs relate to each other.
Then you need to understand the structure of the data.. Which elements are dependent on others, what pieces are unique, and what pieces are related or dependent on others.
Then you need to figure out where this data is going to come from. Is it all new? Is some already somewhere else? Do users need to enter it?
Then you can look at defining "records" or tuples of data, defining which fields will be used to access the record. (Defining indexes)
Then you need to define the preferred size of each data element. How many characters for text, how many digits, logical data, raw or unstructured data.

Now you have the information needed to define the database. Create an initial database and fill it with test data so you can define the inputs and outputs (you may need lookups, data integrity checks, required data elements and optional data elements.)

Test your database to see if it meets the requirements as specified in the first steps. Then fix what doesn't work as expected.

Now if you are trying to create a different sort of database, one that is not relational, then there would be a very different path to get there.

Profile photo for Greg Kemnitz

This database isn't going to be big enough to "matter" much in needing weird performance hacks, so a straightforward ER-diagram-style normalized schema is fine.

Nowadays, any DB where tables aren't above O(10^7 recs) on a reasonably configured db server that isn't overloaded with extra apps doesn't need any weird performance hacks beyond just doing indexing correctly. If you get to 10^10 recs or more, you do, but unless every human on the planet is one of your contractors, this won't happen :)

You've got a reasonably normalized schema there - I'd probably go with Surbhi Chadha's suggestion on t

This database isn't going to be big enough to "matter" much in needing weird performance hacks, so a straightforward ER-diagram-style normalized schema is fine.

Nowadays, any DB where tables aren't above O(10^7 recs) on a reasonably configured db server that isn't overloaded with extra apps doesn't need any weird performance hacks beyond just doing indexing correctly. If you get to 10^10 recs or more, you do, but unless every human on the planet is one of your contractors, this won't happen :)

You've got a reasonably normalized schema there - I'd probably go with Surbhi Chadha's suggestion on the id changes.

Get tools and resources to manage your cap table, issue equity, and fundraise day one.
Profile photo for Gaëtan Gates Perrault

There are many DBS
To start, any "ad network" or "ad platform" is likely going to involve several different databases at several different parts of the pipeline. Think of the basic ad-serving pipeline, each of the following steps could easily represent a different type of DB:

  1. Identify ad requester (publisher) and load their data.
  2. Identify user requesting ad (cookies, uids) and load their data. (what have they seen? do they have demographics? etc.)
  3. Pass this data off to the optimization system and get a list of recommended ads to display.
  4. Load the data for the recommended ads and filter out ads a

There are many DBS
To start, any "ad network" or "ad platform" is likely going to involve several different databases at several different parts of the pipeline. Think of the basic ad-serving pipeline, each of the following steps could easily represent a different type of DB:

  1. Identify ad requester (publisher) and load their data.
  2. Identify user requesting ad (cookies, uids) and load their data. (what have they seen? do they have demographics? etc.)
  3. Pass this data off to the optimization system and get a list of recommended ads to display.
  4. Load the data for the recommended ads and filter out ads as required (seen too often, blacklisted from site, out of budget, etc.)
  5. Render the ad with a pixel and write out the cookie.
  6. Process the pixel for the impression and eventually the click.
  7. Run fraud tracking on all of this stuff.
  8. Get real-time stats for your internal team.
  9. Get roll-up stats for your publishers and advertisers and support staff.


MongoDB is great for some of these, like #8 and possibly #1, 2, 4. Fraud tracking (7), optimization (3) and roll-up stats (9) will all need some form of Map / Reduce system (
like Hadoop, Mongo is insufficient here). To display roll-up stats, you probably an SQL which makes it easy to slice data for basic reports.

The engine for #6 (pixels) is probably just a series of flat files, take a look at Google's Protocol Buffers for some ideas on how this data can be passed between servers.

So what part are you doing?
It's not 100% clear what part of this you are trying to do. Are you purely just farming ads around? Are you a network of networks? How tight are your timelines?

If I had to build an Ad Network with "hundreds of billions of reads and writes per day", I would start looking at what Google and Facebook are doing. Frankly, they may be the only people doing "ads" at that level. The internet has about billion users, so if you show 100 billions ads / day, you're showing 100 ads to every user of the internet every day.

There is almost nothing that I can give you a Quora answer that can convey the complexity of making that happen.

Profile photo for Ben Darfler

I would recommend taking a look at CRDTs (Page on Psu , Page on Hal). The quick gist is that CRDTs (Consistent Replicated Data Types) are mathematically shown to converge to the correct state when implemented on top of an eventual consistency database such as Riak, Cassandra, Voldemort, Dynamo, etc.

Specifically, they can be used to create distributed counters (among other data structures), an overview of which can be found at Playing with Riak and CRDTs - Counters. For a counter the rough idea is that instead of keeping one value for the counter (where conflicting increments cannot be handled

I would recommend taking a look at CRDTs (Page on Psu , Page on Hal). The quick gist is that CRDTs (Consistent Replicated Data Types) are mathematically shown to converge to the correct state when implemented on top of an eventual consistency database such as Riak, Cassandra, Voldemort, Dynamo, etc.

Specifically, they can be used to create distributed counters (among other data structures), an overview of which can be found at Playing with Riak and CRDTs - Counters. For a counter the rough idea is that instead of keeping one value for the counter (where conflicting increments cannot be handled) you keep one count per node in your database. The real count is the sum of all the per node counts and you can easily reconcile merge conflicts by just taking the max of each per node count.

With CRDTs as your base you can move on to other optimizations such as batching counter updates like Twitter does (Rainbird: Realtime Analytics at Twitter (Strata 2011)).

Though I personally like the mathematical underpinnings of CRDTs you could also go a completely different way and follow Facebook's example (High Scalability - High Scalability - Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day)

Profile photo for Jennie Hoch

Designing a database is in fact fairly easy, but there are a few rules to stick to. It is important to know what these rules are, but more importantly is to know why these rules exist, otherwise you will tend to make mistakes!

A good database design starts with a list of the data that you want to include in your database and what you want to be able to do with the database later on. This can all be written in your own language, without any SQL. In this stage you must try not to think in tables or columns, but just think: "What do I need to know?" Don't take this too lightly, because if you find

Designing a database is in fact fairly easy, but there are a few rules to stick to. It is important to know what these rules are, but more importantly is to know why these rules exist, otherwise you will tend to make mistakes!

A good database design starts with a list of the data that you want to include in your database and what you want to be able to do with the database later on. This can all be written in your own language, without any SQL. In this stage you must try not to think in tables or columns, but just think: "What do I need to know?" Don't take this too lightly, because if you find out later that you forgot something, usually you need to start all over. Adding things to your database is mostly a lot of work.

It helps produce database systems

  1. That meet the requirements of the users
  2. Have high performance.

The main objectives of database designing are to produce logical and physical designs models of the proposed database system.

The logical model concentrates on the data requirements and the data to be stored independent of physical considerations. It does not concern itself with how the data will be stored or where it will be stored physically.

The physical data design model involves translating the logical design of the database onto physical media using hardware resources and software systems such as database management systems (DBMS).

for more details check https://www.dbdesigner.net

Profile photo for Michael Hausenblas

Let's step back a bit. Following the polyglot persistence mantra, a single (NoSQL) database won't fit the bill, given your requirements.

I recommend looking into Nathan Marz's lambda architecture, see the Big Data book (chapter 1 for free), and the slide deck A real time architecture using Hadoop and Storm as well as An example “lambda architecture” for real-time analysis of hashtags for examples.

Once you appreciate the power and flexibility of it, the choice of the databases used should be easier.

Well, there are entire books and people study majors covering this, but let’s do a quick and dirty step by step:

  1. Understand the problem you’re trying to solve with your data: For example, let’s say you need to create a database for a pharmacy’s inventory stock, what are the questions your database must answer? items? grouping? customers? providers? users? employees? expiration dates?
  2. You should always start with the outputs your database must comply, for example, what reports are the software that consumes this database if going to need? What validations must be enforced? What business rules sho

Well, there are entire books and people study majors covering this, but let’s do a quick and dirty step by step:

  1. Understand the problem you’re trying to solve with your data: For example, let’s say you need to create a database for a pharmacy’s inventory stock, what are the questions your database must answer? items? grouping? customers? providers? users? employees? expiration dates?
  2. You should always start with the outputs your database must comply, for example, what reports are the software that consumes this database if going to need? What validations must be enforced? What business rules should be in place?
  3. Do try to learn what Database Normalization is, in order do avoid duplicate data.
  4. Think ahead of the possible errors users will make, AND THEY WILL.
  5. Do try to create keys that are relevant to each table, think ahead of the possibilities, for example, a customers table in U.S. may contain a social security number as a key, but what would happen if a few dozen customers are foreigners? then you can’t use that column as primary key.
  6. Not NULL and Null columns are important, do not let a column go null if it is important to fill.
  7. When possible use GUIDS instead of autonumber columns, the reason behind this is because if you have a several databases in different geographic locations that can become easily disconnected, then you will have a big problem if you use sharding of replication.
  8. Do create foreign keys, nothing bites harder in the arse, than unprotected child records.
  9. If you want security on your database, create triggers and stored procedures to encapsulate data CRUD operations. Then you can prevent developers from getting direct access to your base tables, and this will prevent garbage on your tables and save both you and the developers a lot of headaches down the road.

Hope this helps.

We use Aerospike at Adfonic, as one of the other posters mentioned and I would highly recommend it for this use case. We use it primarily as a key-value store, which is one particular part of the NoSQL world.

There's an article about our general work with big data here:
Adfonic processes 50,000 mobile ads per second with big data architecture

50,000 is a lot, and we're growing quickly -- in a desktop advertising environment, you very quickly can get into hundreds of thousands per second (of course, typically this will be distributed amongst various data center locations).

100% uptime is a key req

We use Aerospike at Adfonic, as one of the other posters mentioned and I would highly recommend it for this use case. We use it primarily as a key-value store, which is one particular part of the NoSQL world.

There's an article about our general work with big data here:
Adfonic processes 50,000 mobile ads per second with big data architecture

50,000 is a lot, and we're growing quickly -- in a desktop advertising environment, you very quickly can get into hundreds of thousands per second (of course, typically this will be distributed amongst various data center locations).

100% uptime is a key requirement to consider. That means hot upgradability, completely reliable failover, all the rest. In RTB, time is money.

Profile photo for Mike West

The top data person in every company on earth right now is the DBA.

Data architect is just a DBA that knows schema architecture and more importantly how to install a database for maximum CPU, Memory and IO performance.

Let me say that again in a different way. The location of the data and log files is far more important than how the schema is designed.

Data architects often think they know more than front line DBA but that’s never the case.

Schema means how the tables are laid out. A famous architecture that’s rarely followed is the third normal form. After 30 years I’ve work on about 10 databases

The top data person in every company on earth right now is the DBA.

Data architect is just a DBA that knows schema architecture and more importantly how to install a database for maximum CPU, Memory and IO performance.

Let me say that again in a different way. The location of the data and log files is far more important than how the schema is designed.

Data architects often think they know more than front line DBA but that’s never the case.

Schema means how the tables are laid out. A famous architecture that’s rarely followed is the third normal form. After 30 years I’ve work on about 10 databases that were designed correctly outside of Microsoft.

They are some of the most technically astute people in every company I’ve worked at.

The data engine is often vaulted as the most complicated piece of software on earth.

After working at several big tech companies I can say without hesitation some of the most technical people I’ve ever seen were the database administrators.

Profile photo for Jayaraman Sampathkumar

This is a classic orders table. A better format for department-items columns can be like this

ORDERS
Order_Id
Order_Date
Department_Id
Order_Status

ORDER_LINE_ITEMS
Order_Id
Item_Id
Count
Item_Unit_of_Measure

Why do you need unit of measure ?. For example lets say you are order A4 printing papers 2 bundles. If table has |A4-paper|2| does that mean 2 sheets, 2 dozen, or 2 bundles. A well designed table might have |A4-paper|2|bundle|

Profile photo for Jason Dusek
  1. --- Things that are unlikely to change even once in a person's life, de facto 
  2. --- identifying information. 
  3. CREATE TABLE person ( 
  4. nic text PRIMARY KEY, 
  5. name text, 
  6. date_of_birth timestamptz, 
  7. sex text CHECK (sex IN ('m', 'f')) 
  8. ); 
  9.  
  10. --- Additional information about a person. 
  11. CREATE TABLE personal_information ( 
  12. nic text PRIMARY KEY REFERENCES person 
  13. ON DELETE CASCADE ON UPDATE CASCADE 
  14. DEFERRABLE INITIALLY DEFERRED, 
  15. religion text, 
  16. domicile text, 
  17. male_guardian text, 
  18. married boolean, 
  19. address text, 
  20.  
  1. --- Things that are unlikely to change even once in a person's life, de facto 
  2. --- identifying information. 
  3. CREATE TABLE person ( 
  4. nic text PRIMARY KEY, 
  5. name text, 
  6. date_of_birth timestamptz, 
  7. sex text CHECK (sex IN ('m', 'f')) 
  8. ); 
  9.  
  10. --- Additional information about a person. 
  11. CREATE TABLE personal_information ( 
  12. nic text PRIMARY KEY REFERENCES person 
  13. ON DELETE CASCADE ON UPDATE CASCADE 
  14. DEFERRABLE INITIALLY DEFERRED, 
  15. religion text, 
  16. domicile text, 
  17. male_guardian text, 
  18. married boolean, 
  19. address text, 
  20. email text, 
  21. computer_lit boolean 
  22. ); 
  23.  
  24. CREATE TABLE qualification ( 
  25. nic text NOT NULL REFERENCES person 
  26. ON DELETE CASCADE ON UPDATE CASCADE 
  27. DEFERRABLE INITIALLY DEFERRED, 
  28. s_no text NOT NULL, 
  29. qualification text NOT NULL, 
  30. institution text, 
  31. grade text, 
  32. year date, 
  33. PRIMARY KEY (nic, s_no, qualification) 
  34. ); 
  35.  
  36. CREATE TABLE training ( 
  37. nic text NOT NULL REFERENCES person 
  38. ON DELETE CASCADE ON UPDATE CASCADE 
  39. DEFERRABLE INITIALLY DEFERRED, 
  40. s_no text NOT NULL, 
  41. course text NOT NULL, 
  42. institution text, 
  43. country text, 
  44. starting date, 
  45. ending date, 
  46. PRIMARY KEY (nic, s_no, course) 
  47. ); 
Profile photo for Clive Thomas

This is one of the most complex problems in current-day programming.

The majority of designers never learn how to do this well.

Theoretically, you simply lay out the data in tables and then perform normalization.

This, however, doesn't even begin to describe the problem.

Firstly, you need to know what data you are going to need to store and retrieve. At the start of the project (when you are designing the database schema at first) you typically don't know enough about the data requirements to be able to specify all the data items in the right way.

Secondly, you have to select the right level of no

This is one of the most complex problems in current-day programming.

The majority of designers never learn how to do this well.

Theoretically, you simply lay out the data in tables and then perform normalization.

This, however, doesn't even begin to describe the problem.

Firstly, you need to know what data you are going to need to store and retrieve. At the start of the project (when you are designing the database schema at first) you typically don't know enough about the data requirements to be able to specify all the data items in the right way.

Secondly, you have to select the right level of normalization. Going 3rd normal form can complicate programming, sometimes to a degree which is not convenient. Stopping at 2nd normal form can sometimes result in data redundancy and duplication, with other complications down the line.

So, in practice, what you often have to do is do a first-stab schema layout, using the best knowledge of the data requirement you have at the time, and very often only go to about 2nd normal form, because you are probably going to have to revise it anyway.

Then, as the project progresses and your understanding of the data requirement improves, you iteratively redesign the schema again and again, taking some tables to 3rd normal form as you see the requirement for this, perhaps leaving others at 2nd or somewhere inbetween because that works out to be the best compromise, and the whole time improving the match between the schema design and the real-world problem until you have a solution which is reasonably close to optimal.

One important issue is not to become emotionally attached to any part of the schema layout at any time. Even the cutest ideas can later prove to be non-optimal, and you have to be prepared to abandon them for a better solution at any time. This is one of the hardest aspects of schema design, and one of the most common failings with most designers.

Very few designers do this well. Most database schemas are quite horrifyingly bad.

Profile photo for Christopher Smith

Cassandra has native support for Spark, so not necessarily a need for Hadoop. There plenty of other systems targeting this space (Aerospike, Druid, and Couchbase come to mind immediately). Some of the BaaS solutions out there from cloud providers (like AWS Cognito) can provided a limited, but perhaps effective enough solution to implement a targeting solution. There are also packaged solutions out there like MetaMarkets, AppNexus, etc…

You can, however, decide to forgo offline learning entirely and your work while processing your data in real-time with on-line algos (using frameworks like Storm

Cassandra has native support for Spark, so not necessarily a need for Hadoop. There plenty of other systems targeting this space (Aerospike, Druid, and Couchbase come to mind immediately). Some of the BaaS solutions out there from cloud providers (like AWS Cognito) can provided a limited, but perhaps effective enough solution to implement a targeting solution. There are also packaged solutions out there like MetaMarkets, AppNexus, etc…

You can, however, decide to forgo offline learning entirely and your work while processing your data in real-time with on-line algos (using frameworks like Storm or Spark Streaming for complex routing).

You can also offload almost all the work on the client (particularly for browser or app focused systems). Basically just have a server side piece for learning/identifying useful features and let all the state & learning be stored on the client. This has the advantage of minimize infrastructure investment.

In general, we’re well passed the point where handling the load & analysis is a task of forging new technologies. There are lots of off the shelf components that can get the job done for you.

Profile photo for Andrea Tani

Thank you for the A2A

What you are looking for is a Many to many relationship between the Department table and the Item table, that can be accomplished by the Department-Item table as you correctly assumed.

I'd add a boolean field to it in order to determine wether or not a relationship (defined by a record) is valid or not, since you cannot change it you should be able to mark a record as invalid and write a new one with the same department-item key (DepatmentID and ItemNo).

Without trying to be too complex I think you designed this relation in the correct way, but keep in mind that more field

Thank you for the A2A

What you are looking for is a Many to many relationship between the Department table and the Item table, that can be accomplished by the Department-Item table as you correctly assumed.

I'd add a boolean field to it in order to determine wether or not a relationship (defined by a record) is valid or not, since you cannot change it you should be able to mark a record as invalid and write a new one with the same department-item key (DepatmentID and ItemNo).

Without trying to be too complex I think you designed this relation in the correct way, but keep in mind that more fields to the relationship table can be added (like a validfrom and a valid to date fields to define a time scope) but this will be business logic decisions

I hope this helps

Profile photo for T.S. Lim

Below are my suggestions. Table and their columns.

Departments

  • ID
  • Name


Items

  • ID
  • Name


Orders

  • ID
  • Date


OrderItems

  • ID
  • Order ID
  • Department ID
  • Item ID
  • Quantity


(Table names suggested are in accordance to Ruby on Rails convention)

Basically, your original solution is just missing the Order table.

Profile photo for Finnbogi Ragnar Ragnarsson

The most obvious thing is three tables. One for the personal information with that table linked to a qualifications table and a Trainings/Courses table.
The latter have obvious many to one relationship to the first.

That said, there could be other related tables, for information you want to be coded/quantifiable, such as religion, post code, country, schools,etc.

Voluum does everything you need. I have doubled my profits when i started using this tool.

Profile photo for Surbhi Chadha

1) Use scheme_id instead of scheme name in tbl_tender
2) remove scheme_id from tbl_work_order , you can access this field from tender_id (as scheme_id is present in tbl_tender)
3)remove tender_id from constructor_details table (again you can access tender_id from work_order_id)

Profile photo for Quora User

Hmm well to start with there are different objects and 1 to many relations. So a single table will not do.
Simplified you would have at least a Person table with some PersonID
Then you might have a Qualification table with a link to person.PersonID
Similar you would have a Training table again with a link to person.PersonID
Then you might detect a number of simplifications with input. Like there are only some 500 or so countries. It might be useful to put that in a table and make the input hence a selected list. There might be a limited list of Institutes for trainings as well. Unless you incl

Hmm well to start with there are different objects and 1 to many relations. So a single table will not do.
Simplified you would have at least a Person table with some PersonID
Then you might have a Qualification table with a link to person.PersonID
Similar you would have a Training table again with a link to person.PersonID
Then you might detect a number of simplifications with input. Like there are only some 500 or so countries. It might be useful to put that in a table and make the input hence a selected list. There might be a limited list of Institutes for trainings as well. Unless you include homeschool as certified.
I am somewhat puzzled by the inclusion of Religion. Besides there is this Date of Birth and Fathers Name. Such information should be password protected as well as Address.
The real question to ask would be: what you aim to do with the data, because that makes a lot of difference. An example. If you plan to phone the institute to check if the provided information is valid then it follows that you must be sure about the correct name and you should have additional fields in your Institute table which might include contact persons, email or contact numbers
If you plan to have some reports done than it makes a lot of difference in what items become fields.
Enjoy

Profile photo for Finnbogi Ragnar Ragnarsson

I doubt your final version will look like this.

There is possibly going to be financial data and all kind of complications there of that need to be stored in the database . Also the schemes can incur additional costs by unforseen circumstances, or the contractor may not keep his end of the bargain. Be prepared to be requested to store that data also along with fines, etc.

Resist solving those issues by adding columns to existing tables, when it really should by in additional tables.

Surbhi Chadha has pointed out three errors, but everything else looks reasonable.

Profile photo for Gaëtan Gates Perrault
Profile photo for Quora User

Get a report from the external adserver.
Or, you serve their iframe ad tag via your adserver & piggy back a conversion tracker.
There is no other way to track.

Your response is private
Was this worth your time?
This helps us sort answers on the page.
Absolutely not
Definitely yes
Profile photo for Andreth Salazar

You can use the event scheduler in Mysql.

23.4 Using the Event Scheduler

It has a pretty good support for just doing Sql transactions on a recurring interval or fix time date.

Profile photo for Alan McClanaghan

I was involved in the development of an ad serving platform where an Apache plugin/module was developed for the serving of ads and the logs were then crunched in real time for metrics (hadoop/pig). Excellent use of existing technology.

Profile photo for Vaibhav M Kite

You can use Curl Utility to listen to that port while dumping the output of the same in file or whatever the destination you want. And then simply put this whole command in cron with the frequency of execution as you want.

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025