Overview

Every time a product owner says “We should pull in XYZ data to the mobile app” or “we need a new app to address this healthcare fail” an engineer needs to make two crucial decisions:

  1. Where and how should the new code acquire data the company currently has?
  2. Where and how should the new code record the data that will be newly created?

Sadly, the most expedient answer to both questions is “just use one of our existing databases”. The temptation to do so is high when an engineer need only add a database migration or write a query for a familiar database. The alternative might involve working with the organization’s infrastructure team to plan out changes to the operational footprint, then potentially making updates to the developer laptop setup.

Decisions expedient for today aren’t necessarily the best decisions for Rally’s long term delivery velocity. We recognized database reuse and sharing was fairly common at Rally, so we tried to stop to the practice in Spring 2017. We were concerned the company’s development speed and agility would eventually grind to a halt.

The downsides of sharing databases have been written about by many distinguished engineers. Part one of this article will talk about variations of this anti-pattern. Part two will then provide suggestions on how to avoid this practice using reliable messaging.

Examples used through the rest of this article will resemble one of Rally’s most common user scenarios: completing a mission and getting rewarded. Some missions reward users with Rally coins, other missions reward users with a credit to the user’s health savings account (HSA). From a Rally user’s point of view, completing a mission and getting rewarded is a single transaction, but under the hood it’s a highly-coupled transaction that spans several services and potentially invokes service calls to third-party services.

Variations of “Sharing Our Existing Database” anti-pattern

Use one single database, but use schemas to segment logical data partitions

RDBMS products commonly provide two different features to partition data: schemas and databases (the notable exception is MySQL which simply conflates the two concepts. A developer or DBA can create multiple schemas per database and apply schema level security policy to users. A developer or DBA can use multiple schemas in a database when customizing the table or object access per database.

An optimistic developer armed with these features might think he or she can take advantage of database schemas to avoid applications sharing databases. Given a lot of willpower and constant vigilance it’s entirely possible. However, the temptation will still be too great, and in a large engineering organization it could be difficult to constantly maintain discipline. If done correctly, then, the difference between using different schemas versus using different databases will be negligible and the latter will be simpler to maintain:

  • Separation of concerns will be violated: When the database is first designed and implemented, engineers will be more diligent and maintain the separation of concerns between schemas. But time pressures or gaps in the original data models will incentivize the engineers to just add permissions to query across schemas, and most likely not even clean up after themselves (i.e., move tables to newly appropriate schemas).
  • Tool support: Not all database tooling, libraries, and frameworks are 100% schema-aware or account for database schemas in their API design.
  • Database access patterns: The two different services have different database access patterns where at higher scale they might benefit from different database hardware configurations – Challenges has a OLTP access pattern whereas Rewards is more of an OLAP workload.

Example of multiple schemas: A hypothetical Rally database for the “challenges” and “rewards” services.

Example of multiple schemas: A hypothetical Rally database for the “challenges” and “rewards” services.

There are a few limited scenarios in which this might be a reasonable thing to do:

  1. License optimization: The organization is limited on database hardware, logical databases, or needs to optimize a licensing scenario or pricing scenario. For instance, Amazon Web Service’s managed SQL Server service (RDS) has a hard limit of 30 databases per database server. Using database schemas is a way to stretch hardware and licenses a bit further.
  2. Temporary stepping stone: Database schemas can be employed as a migration strategy where the eventual goal is for databases to no longer be shared between services.
  3. Mechanism for multi-tenancy: In some highly regulated on-premises systems, using a schema-per-user or schema-per-customer might be a way to enforce strict security policies while keeping the database more manageable than using completely separate databases per customer.

Using the “distributed transaction” feature of some databases

Enterprise-grade RDBMS products such as PostgreSQL and MS SQL Server provide functionality to commit a transaction across databases. They can do so either natively (PostgreSQL has dblink and MS SQL has cross database queries), or by the use of an external transaction coordinator. Developers and architects tend to reach for these solutions when they want to ease the transition to a service oriented architecture without rethinking the way that the organization’s software should be architected (e.g. “I’ll just refactor this component of our software into its own service and break out the tables it uses into its own database and we’re doing microservices now!”).

There are severe downsides of leaning on database distributed transactions.

  1. Uptime and availability: If one database or coordinator service is unavailable, then the entire business process will get wedged and fail, which means the uptime of the entire system is calculated from the combined probabilities of each subcomponent’s uptime.
  2. Debuggability and recovery: real world failures with distributed transactions can be difficult to debug and laborious to ensure that no data were lost.
  3. Increased infrastructure costs: Database and platform vendors limit the combinations of versions between the operating system, runtime, and database software (sometimes down to the individual patch level) to ensure these combinations are tested and supportable – which makes upgrading database software a far more complicated dance involving potential downtime.
  4. System throughput: The distributed transaction protocol requires a lot of back-and-forth communication between all the parties involved in the transaction (database A, database B, and the client requesting the transaction), and if all parties are not on the same local network, transaction throughput will be greatly diminished.

Example: Committing data across databases using the functionality provided by the database.  Notice prefixing our schemas and tables with another qualifier -- the database name.

Example: Committing data across databases using the functionality provided by the database. Notice prefixing our schemas and tables with another qualifier – the database name.

There are a few limited scenarios where this might be a strategy to employ:

  1. Database refactoring: Database distributed transactions can be used as a tool to implement database refactoring techniques. If so, databases should be located on the same local network to minimize the aforementioned downsides.
  2. Unchangeable database: When an application must transact with the legacy database that cannot be changed (e.g. too risky to do so, vendor-owned database) a developer or architect might simply have no other choice.

Spreading a transaction out across more than one synchronous API call

Example: If the second API call fails in a way where the payout isn’t recorded, then a very unhappy user will be wondering where his or her HSA deposit or Rally coins are.

Example: If the second API call fails in a way where the payout isn’t recorded, then a very unhappy user will be wondering where his or her HSA deposit or Rally coins are.

There’s nothing architecturally different between this scenario and the prior scenario. In both, the cross-database distributed transactions scenario and synchronous endpoint API call scenario, the transactions are vulnerable to the Two Generals’ Problem. They will both experience a logical data inconsistency if anything but the first part fails, and the total system uptime will still be negatively impacted by a failure in a single service or database.

The important idea to point out is that this scenario is more common when committing data to external data sources. It is common for modern web applications to integrate with third-party APIs via synchronous HTTP API calls. When calling a third-party API to modify a third-party’s records then the third-party API should be considered one part of a distributed transaction.

Rally might need to tell a third-party payment provider to post money in a user’s health savings account when a user completes a particular mission. There is no chance that a company like that will allow Rally a direct connection to their payment database for the purpose of enlisting it in a distributed database transaction! Rally’s software will have to account for periodic failures and errors with the third-party API.

What to do about distributed transactions?

Part two will dive into modeling and architecting our way out of these problems using reliable messaging.

Links in this article