How to design an LDAP schema?

A/B Testing: Is a star schema database design useful for tech companies that continuously experiment?

  • Do companies that are leaders in continuous experimentation and analysis of user microbehavior use star schema (a la Kimball or enterprise data warehouse)? If not, what kind of database architecture do they use? The companies I'm thinking of are ones like Airbnb, LinkedIn, Twitter, and Facebook. Each of these companies pride themselves on the ability to run dozens of experiments simultaneously and measure user's micro-behaviors down to the click. They boast of being able to link that clickstream data to user histories and other outcomes. Without rigorously enforcing a dimensional data model across the company that has conformance, tracking of history, etc., I don't understand how it would be possible to organize the data.

  • Answer:

    Yes, these companies use a dimensional model in some way for reporting purposes.  The dimensional model is a significantly easier model for business analysts and decision makers to understand and it plays very nice with existing reporting tools. Since I did consulting work at Facebook doing almost exactly what this question is asking, I'll explain how this worked based on my personal experience. The biggest incorrect assumption made in the question is that these companies choose only one technology in which to do their analysis.  IE, they only use a relational database and a dimensional model to store and analyze all their data and thus the kind of click level analysis must be done in that form.  This is simply untrue. When it comes to running experiments and analyzing click level details, this analysis is generally done in Hadoop & Hive.  Hadoop  The reasons are that scraping log data and running complex analysis on it requires very custom programming.  Data scientists will write all kinds of crazy code to analyze this stuff.  I call this kind of analysis exploratory, in that the company is attempting to know an unknown.  Once the company then knows, they will generally create a repeatable and measurable process to continue to validate.  This is where your relational data warehouse and star schema come into play.  Now that we know what specific data points we want from Hadoop/Hive, we can transform and extract it and put it into a "reportable" format on a platform much more conducive to traditional analytics (allowing users to slice-n-dice the data if you will). There's also cases where we simply want to measure the impact of making changes to an existing process.  If we are already collecting and measuring data through a dimensional model, it actually becomes very easy to perform A/B testing.  For example, if I had a subject area in my warehouse which measures page impressions of a particular page (every day measure how many hits the page gets) and we perform some a/b testing on a design change with the intent of increasing hits on a page, it would be very easy to simply measure, through the star schema and reporting tools, what effects any a/b testing had on the impressions of that page. The biggest take-away here is that these companies don't stick to a single technology, process, or model to solve their analytics problems.

Chris Schrader at Quora Visit the source

Was this solution helpful to you?

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.