Wednesday, 4 November 2015

UPDATE is evil...

The UPDATE keyword broke RDBMS's (Relational Database Management Systems).

That's a somewhat crazy assertion to make. Or is it?

The thought occurred to me a little earlier & I'm having a tough time trying to think of a good reason for this particular keyword to exist in modern software engineering. As a construct, it may be the root of all database evils.

I'm thinking of this from the perspective of moderate to huge applications/platforms. Smaller apps would be immune to evils of UPDATE. Although I'm picking on SQL Server in this post, the concept could probably apply to any row-based RDBMS which supports the UPDATE construct.

Databases are great. What they're great at is the storage and retrieval of information. What they're not great at is logic.Of course you *can* introduce logic in a database through the use of stored procedures, etc. but should you? My response is "No, you should not.

UPDATE as a construct gives rise to the possibility of introducing logic to a database. As a result, we've had to introduce more & more constructs to be able to handle the fallout from having this ability, handy though it may be.

Think about it, once we had the ability to update one of more records, we found new & inventive, although ultimately destructive ways of manipulating the data at our disposal. This lead to us grouping statements together in reusable blocks which could be called at will - the stored procedure.

Stored procedures are great for exposing only the data you want exposed, isolating the source of the data from the consumer of it. A code contract of sorts. With stored procedures it became possible to alter the underlying schema of data without breaking a dependency on the data. Combine a stored procedure with UPDATE and you have the means of introducing logic branches in the database.

There's a few reasons off the top of my head why this is bad in today's world. I'm sure there a many more, but these spring to mind immediately.

  • Logic trapped in databases introduces a limitation to scale as your data can no longer be distributed across multiple hosts/geographic locations. Yes, there may be solutions to the shared hosts/location problem so that your database(s) appear to be a single instance to your procedures but this is, essentially, a workaround to the problem that your logic is in the wrong place.

    The shared/hosts location problem is one in it's own right, with it's own solutions. Your logic should not add to this problem.
  • Having logic trapped in the database means your logic can't be shared quite as easily between applications without introducing more dependency on the database, invariably leading to brittle software. 
  • Security - introducing logic in the database means that the data it acts upon can't easily be encrypted. Logic depends on conditions, conditions depend on data. While the query engine may not care whether the data is encrypted or not, the author of the procedure almost certainly will. 
If we didn't have the power to update records, introducing logic would be significantly more challenging. Sharing data between multiple hosts/locations becomes easier (as we don't have to worry about how that affects logic). It would promote moving logic into libraries which are more easily shared (not to mention unit-tested) and encrypting/decrypting the data falls then to the application (a separate discussion). In this age of frequent security breaches, keeping data encrypted may be safer for everyone.

As if this wasn't enough, it became apparent that simply updating a record was not enough. Preserving the history of the update, or the value of the field before the update, became necessary to maintain an audit trail. Enter, the humble trigger. While useful, it opens the door to additional logic branches or crazy workflows which aren't easy to diagnose or debug.

I propose we use only the CRD (Create, Retrieve, Delete) constructs in RDBMS's. Not having UPDATE forces a re-think in how we design data dependant software, possibly with some very positive side effects. Of course, it could backfire entirely... I haven't tried it yet but I'm giving it some serious thought.

I might go into more detail on this subject in the future. It's late at the moment, I'm tired & want to publish this :) While procedures & triggers are really only the most visible but there are other areas as well. Still, I hope this has helped you question along the same lines I have.