Does having many databases affect MySQL performance?

Relational Databases: Would it be premature optimization to have larger tables vs lots of small tables in mysql?

  • I feel stupid for asking this, but I'm wondering how having LOTS of small tables (with complex relationships) would affect performance and memory constraints vs a large, less neatly organized table.

  • Answer:

    This is a great question (not stupid at all!) -- and (un)fortunately one with which I have first-hand real-world experience.  To-wit: In the mid-1990s, my small company had become the provider of a centerpiece application in a major automation/technology delivery project for a major East Coast city's corrections/prison department.  Ours was a comprehensive suite of corrections management applications, backed by a large, fairly complex and well-designed (through careful thought, backed by practical experience to-date) relational database, all being deployed in that customer site on some major dedicated "iron" (big, redundant and over-spec'd servers).  Not to be coy about it, the servers were then-state-of-the-art DEC AlphaServers, with "giant resources" (multiple processors, RAM, multiple IO disk channels, big disk farm with ample disk space, network bandwidth to spare, etc.), VMS (OpenVMS) operating system with full VMS Clustering support, Oracle Rdb (relational database) and... of course... our own applications and middleware systems software. Just to give you a rough sense of business scale, the initial project delivery was valued in the low tens-of-millions-of-dollars, including hardware, software, licenses, customizations, PM services, external interfaces, training, deployment, documentation and other related services, and the overall system would handle the data records of over a hundred thousand inmates, of which several thousand would be incarcerated at any point in time, and with over a thousand interactive users (line corrections officers, security personnel, staff specialists and management) interacting with live data 24x7. We (my small company) had deployed smaller-scale versions of this entire solution several times before, and in particular, our database schema (design, inter-relationship and deployment) of the several hundred tables which comprised the corrections relational database was considered to be a "practically normalized by-hand design" which we had proven for performance, reliability and maintainability over and over by that point.  It was a "mature" design, and had been "tested and tempered in the real-world."  Note that our relational database design was very far from "everything in one big table"... but it was also equally far from the "everything in its own small table... lots of tables" approach.  That design encompassed around 60 or so physical tables, and probably at least 30 or more views.  Table indexes were well defined, based on actual query metrics and experience, and primary-foreign key relationships and other constraints were meticulously designed. As mentioned, our product, including company and personnel, was the "application centerpiece" -- we were the reason that the whole delivery project existed.  Unfortunately, that did not mean that we were in technical or business control of the overall project itself (probably fortunate, as it was a highly politically-charged project, not just a technology big-deal), so although we had a large say in technical deployment decisions, ours was not the final word.  That "final say" usually went to the primary contractor, and its Project Manager (PM), members of a project delivery team from Digital Equipment Corporation, and pandering to keep this major government customer happy. Clearly, with this much money and politics riding on the project, there was a long period of software customization, delivery tasks, documentation, training and testing to get through before we ever got to the big "Go Live" day and event.  Unfortunately, this pre-delivery period afforded just enough opportunity for some technological mischief. A couple of months before the "Go Live" day, the prime contractor's team was augmented to include a couple of DEC-internal technology "big shots", two guys with well-respected though inflated reputations for "helping" and "troubleshooting" high profile deliveries around the nation and even internationally.  Of course, one of their first tasks on this project was the opportunity to "review everything" and to "make recommendations" based on their supposed expertise to "improve stuff." So, they duly took a shot at our application's relational database... and immediately pronounced it lacking.  "It's not canonically normalized," one of them determined sagely.  "It's pretty good, but it'd be much better if every logical data grouping was stored in its own data table, especially in 'these specific areas' where you guys" [meaning us hick-rubes who designed and built it] "have mixed in some data redundancies." "Yes," agreed the other expert, with a smile on his face: "But we can fix it."  [Insert long, drawn-out and bloody politico-technical battles here... Not unsurprisingly, we lost.]  Over the next few weeks, our "experts" cranked out a pile of Perl scripts which effectively rewrote our entire database schema, breaking up our roughly 60 physical tables into "canonical" relational atoms of well over 400 tables, each being assigned to and stored in its own physical database area.  This was destined to become "the Production Database." Of course, the two experts were well-pleased with themselves, and the PM and the rest of the prime contractor staff took it as a given that their experts knew best and couldn't be wrong.  Light continuity testing showed them that "everything still worked", and therefore all would be well.  We, on the other hand, waited for the boot to drop. As you can probably guess from the story so far, all was seemingly fine, to both primary contractor and to expectant customer (and over our "dead bodies")... But you'll never guess (or will you?) what happened on "Go Live" day!...  After months of prep-work, and enormous political expectations by city officials and users, "Go Live" was actually scheduled for an evening shift, which seemed prudent as we'd "ease into" our typical daily workload over the graveyard shift, rather than just jumping in on a busy morning. So, after "turning it all on" for that evening's users, things seemed to operate okay, at least for the first few hours... Actually, at first not much real work was being done at first, and most folks were simply admiring their shiny new application screens.  For us, up to that point it had already been a long day, so we retired to hotel rooms for a night's sleep. Only to be awoken at 2:30am by frantic and irrationally demanding phone calls from the PM: "Everything's frozen! The whole application's down!  The sys-admin says all VMS resources are used up, and the operating system itself is dying on both servers!  What did you guys do??!!! Why is it broken... You've got to fix this instantly!!!"  Yes, PMs really do shout and scream like this. Long-story short.  After days of egg-on-face delivery team excuses, and lots of hard work from us "rubes" (btw, the "experts" had already moved on to go "help" another project), we were able to demonstrate and prove that the "canonical normalization" of our originally-working-just-fine database schema into lots-and-lots-of-little-tables was dragging these twin-redundant, huge and over-built AlphaServer/VMS systems to their virtual knees. Because those two arrogant "experts" would not take the time to look at actual application-data query, insert and update patterns and metrics, insisting instead that the academic "canonical" design approach was a one-size-suits-all guarantee, they conveniently overlooked all possible and practical operational relational database implementation issues like: Global buffers and other system-wide resources and parameters, all of which directly affect user access performance mechanisms Area, table and row locking mechanisms Index interactions, including locking and updates Primary and foreign key constraints (and others) Performance distinctions between b-tree and hashed indices Data caching mechanisms Quiet-point locks, especially for live database backups Data access (select, insert, update, delete) metrics which are application-specific Per-process (user) resources and quotas which directly affect both database access and overall system tunability ...and much, much more In short, going live with their "canonically normalized" schema had simply swamped and consumed nearly all available multi-user and system resources nearly exponentially, including RAM/memory, buffers, CPU consumption, disk IO, paging/swapping, and more.  This "help from experts" conclusively demonstrated the stupidity of the "hey, let's normalize everything into cute and tiny little tables" approach to relational-db design... (and, just for comparison, with a real-world application of this practical size, any attempt to design it all as "just one big logical table" would wreck on the shores of needless complexity and/or impossibility). Needless to say, over the next few weeks, our combined team managed to save the day, fixing and patching system performance characteristics as much as possible to permit the overall system to meet performance expectations and requirements.  Over the course of the project (months and years to come), our small company (mostly two of us principal partners) would do the work necessary to undo the "improvements" of the two high-falutin' experts, ultimately re-migrating the database schema back to what had originally been designed as "reasonable and optimal table layouts."  It took a couple of years, but we did ultimately restore our application and database to "normal operation," all without the acknowledgement or help of the original prime contractor staff or PM. Moral of this tale?  Relational database design is never a simple-minded choice between these "normalize-everything" or "normalize-nothing" extremes -- Real-world database designs have to serve real-world application queries (etc.), and the balance between "normalize or not" has to be based on actual knowledge of how the system's queries will work and inter-related.  Don't know what these relationships are?  Go build working prototypes, measure and characterize them. It ain't easy -- and there's no cookbook approach -- but it's how usable real-world systems are built.

Lorin Ricker at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.