How can I file a lawsuit against Public Storage?

For real-time ad personalization, is the heart of the system a fast database doing queries with block storage, or a "big data" analytics engine connected to file-based storage?

  • Apologizes in advance if asking this as an either/or is frustrating. My interest is in what sort of storage is more applicable to real-time ad personalization. Databases tend to connect to storage at a block level. And many large compute server farms exchange data with their nodes as files, via a file system. For real-time ad personalization, which is more common and important?

  • Answer:

    In almost any case you will have to use a combination of multiple datastores. Depending on your needs there is a lot of tradeoffs that can - and should - be made. There is no one-size-fits-all solution. There are blazing fast datastores that run entirely in RAM with persistency to disk. Use cases: every single record should be able to retrieve with nanosecond latency. For example: Redis. On the other hand you have batch processing systems which are good at large block files (throughput in terms of sequential GB/second can easily go beyond that of a full RAM database with random requests). Use cases: non real-time analytics, data science training, etc. For example: Hadoop/MapReduce/YARN. Then there are more hybrid systems that run both in RAM and on disk (can be SSD). This is very convenient as you can achieve low latency requests on hot data, and have a lot of warm data available on disks. Use cases: real-time data which is not critical to latency requirements. For example: Cassandra. So you will probably end up using a few different systems. Which can then feed each other: for example the real-time collected data in Redis / Cassandra can be used in a long running job in Hadoop, which then outputs the results back into one of those real-time datastores.

Robin Verlangen at Quora Visit the source

Was this solution helpful to you?

Other answers

The answer is: both and none of those. Personalizing an ad is a complex process that uses several subsystems. Each of these subsystems serves a specific purpose and it is likely that some will use traditional databases, others will use "big data engines"(Hadoop, Storm, etc.) and real-time processes will use in-memory cache (local structures, Memcached, Redis, etc.). Ad personalization has two main components:  - Discovering users' interests and behaviors  - Deciding if one of your ads is interesting for a user when a bid request comes in Let's consider these two components separately. 1) Collecting data on users It comes from different sources: pixels, data providers, first-party data from your clients, etc. You store that in traditional databases that hold everything you know about each user (the key is the user). 2) Bidding When a bid request comes in from an exchange, you have two priorities: Getting the data that you collected for that user: a simple query to any traditional row-based database is enough as long as it's correctly cached. Running your bidding model to decide if/how much you should bid for that user and impression. Your model can be trained via "big data engines" and your final model's data is stored in files. "big data" engines usually do not run in real-time when serving a request.

Anonymous

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.