Wednesday 2 November 2016

Drapper - An Introduction

I released Drapper about two months ago and, aside from the (rather dull) wiki that I put up for at least some documentation, haven't really written much about it.

As the author, I think it goes without saying that my opinion is completely, utterly and unashamedly biased.

It should also go without saying that I firmly believe that Drapper solves pretty much every single data access need I've ever come across. I've yet to find a case that Drapper couldn't support.

So why Drapper?

I'll start with the name. After many frustrating years of using home-rolled DALs and ORMs I came across the awesome micro ORM Dapper (if you're new to Dapper, go check it out!). Dapper was what I'd been looking for - no hoop jumping, no disgusting auto-generated SQL, no weird syntax or semantics for simple operations, no lots-of-things-I-hated-about-various-flavoured-ORM.

Dapper is simple, intuitive and targets a variety of different databases without blinking. And on top of all that goodness - it's FAST. Win!

It has, in my humblest of opinions, a drawback. It's a minor one, but one that tainted an otherwise very happy picture - mixing SQL with C# (or whichever .NET language you like most).

I don't like these two languages sharing the same space. It feels messy to me. Kludgy. I'm sure it has some benefits and there's more than one way to keep them separated from each other without going to great lengths. Static class perhaps. List of consts maybe. Resource file, even. Or perhaps something else.

I chose Something Else.

And  thus was born, from a simple desire to separate SQL from C#, a wrapper for Dapper - Drapper.

I'm sure there could have been a more original name (this one certainly doesn't come up in search results! :)), but this was the name I gave it (and I checked with Marc Gravell if it was okay to use it).

That's all it is?

Yes and no.

Yes, in that Drapper is a framework which successfully separates SQL from C# & uses Dapper under the hood.

No, in that it turns out that Drapper has some really nifty features built in that overcome some of the limitations you might find in using only Dapper. It also turns out that using Drapper helps to promote better data layer design simply by using it as is. More on those benefits later (or likely in a separate post).

Okay... So what is it then?

Drapper is control. Control over your execution. Control over your mapping. Control over your database. Control over your code, your tests, your design, your choices. Control was a central design tenet in developing Drapper. Not only should you have control over all of these things, but you should have the power to use or abandon using Drapper without impacting your code (too) adversely.

Tell me more

By and large, a great many applications will implement a CRUD repository of some sort. Some will take a more generalized (and in my view, incorrect) approach. Others will be more specialized with a repository per type to be persisted/retrieved. I prefer, and advocate for, specialization as this has more benefit for your code. The trade off is writing more code but it's a trade for more flexible, more maintainable code.

Within a specialized repository, one might have methods for each of the basic CRUD methods, possibly a method for lists of data (possibly paginated for front ends), methods which return object graphs, others which don't. Some repositories might not expose one or more CRUD operations depending on the application needs. And of course, all of this needs to be tested.

That can run into quite a bit of code! It's a daunting prospect, even to seasoned developers. But what if you can get that level of specialization with very little code? That's where Drapper comes in.

One liners

At it's heart, Drapper isn't much more than an abstraction of what Dapper does for you already. There's a single interface called IDbCommander which exposes overloads of two methods Execute and Query.

Execute is used for state changing operations (Create/Update/Delete) whereas Query is used to retrieve data.

Pretty simple, huh? It gets better - with Drapper, the majority of CRUD operations can be written in a single line of code. Need to persist a simple POCO or primitive type? One line of code.
_commander.Execute(model);

Need to retrieve them later? One line of code.
_commander.Query<T>();


Need something a little more complex? An object graph perhaps? No problem - supply a Func<> to the Query method and build up the object graph any way you see fit. Drapper supports a Func<T> with up to 16 inputs. You can define the Func<T> in a separate class (recommended) and still have your specialized secret sauce on a single line of code!

_commander.Query(Map.MyComplexType);

And the mapping function defined elsewhere -

internal class Map
    {
        Func<TypeA, TypeB, TypeA>  = MyComplexType (typeA, typeB) =>
        {
            typeA.SomeProperty = typeB;
            return typeA;
        };
    }


Yes, I'll admit that that is something of a cheat as you've written more than one line of code, Still, the intent is clear - keeping your specialized repository code specialized and simple.

But... But... How does it know what SQL to execute?

It's really quite simple & based in large part on the Single Responsibility Principle. The idea is that a CRUD method within a repository will, more often than not, correspond to a single bit of SQL to be executed. For instance, a Create method might call an insert statement, a Delete method a delete statement, and so on.

Taken a step further, we can infer that each method on a repository class corresponds to a SQL statement and knowing what that method is should determine what the corresponding SQL statement is. The repository method itself shouldn't know anything about that SQL statement. Whether it's T-SQL, PL/SQL, a simple statement or a stored procedure are not the concern of the method.

Taken another small step further, we can infer that the fully qualified name of a method could be used to uniquely identify a SQL statement to be executed (we'll talk about overloaded methods shortly). So assuming we had a method called Retrieve on a repository called ContactRepository in a namespace belonging to the fictitious HappyCustomer CRM app we could hypothetically represent that as HappyCustomer.Contacts.ContactRepository.Retrieve & expect that this method would retrieve a single contact from our database.

Drapper leverages this concept by having each overload of both Execute and Query on the IDbCommander use the [CallerMemberName] attribute as an optional argument. Having this salient bit of info gives us the name of the method being called at no cost. To get the full type name, we can (optionally) supply the type name using the typeof construct (e.g. typeof(ContactRespository)) or allow Drapper to use a bit of reflection to determine the full type name.

Command Settings and Readers

Now we have all of the info needed to uniquely identify our Retrieve method and thus, our SQL. Drapper uses a CommandSetting configuration object to represent the SQL to be executed and it should come as no surprise that we use the method name to identify the CommandSetting. Each one holds enough information to control the execution of a SQL statement - whether it's a text statement, a stored procedure, uses a particular transaction locking mechanism or has a specific timeout value, etc.

One or more CommandSetting objects will belong as collection of the type, stored in pretty much any configuration store you'd like to use. JSON and XML config stores are supported out of the box. A CommandSetting is returned from an ICommandReader.

I'll go more into the specifics of how the SQL is retrieved in another post. For the moment, it's enough to know that using the fully qualified name of a method in conjunction with an ICommandReader will return a CommandSetting. The ICommandReader is a dependency of the IDbCommander so you never have to interact with ICommandReader or a CommandSetting directly.

A word on overloads...

Overloads are supported and I'm pretty sure it's easy to see how - as the [CallerMemberName] is optional, you can supply a name to identify a specific CommandSetting. Revisiting our fictitious ContactRepository & assuming we had overloads of the Retrieve method - one to retrieve by an int id & another to retrieve by email address as a string - we could simply do something like

public Contact Retrieve(int id)
{
    // uses the "default" retrieve statement which expects an int arument
    return _commander.Query<Contact>(new { id }).SingleOrDefault();
}


public Contact Retrieve(string email)
{
    // uses the "named" retrieve statement which expects a string arument
    return _commander.Query<Contact>(

        new { email }, 
        method: "RetrieveByEmail").SingleOrDefault();
}



Allowing you to maintain a nice, clean interface to your repository without exposing the internals or expecting the calling code to know anything more about your repository than absolutely necessary.

That's a wrap! For now...

This post grew way, way longer than intended. I will write follow up posts which go into greater detail but it's really, really late at the moment and I'd like to get this post published :)

Friday 18 March 2016

Better Unit Test Structures

I have issues with the conventional structuring of unit tests in .NET. 

The prevailing convention follows the idea of having (at least) 1 test class for each class to test, with at least 1 test method per method in the class under test. Some folks get very particular about how test methods are named, claiming that there's a naming "standard". The reality is, there is no such standard. There's convention. And sadly, the most common convention is a poor one. 

In this post I'll attempt to show you a different - better - way of structuring your unit tests, along with the benefits of why you should change. I'll be using the example of a simple CRUD (Create, Retrieve, Update, Delete) repository. You can download the code for this blog entry on Github


The Example

In this example we have a simple POCO model with a couple of fields. Some of the fields are decorated with data annotations for validation. 


POCO model

The POCO object interacts with a simple CRUD repository which accepts a generic type. Some fairly straightforward stuff.  


Simple CRUD repository


Even though this is a simple example, it's a pragmatic one. All professional .NET developers have seen/written/maintained a CRUD repository in some form or other. 


Current Convention

Possibly the most popular convention when it comes to naming tests in .NET is
 
<MethodName>_<Scenario>_<Expectation>

The idea being that when you're testing your class, you prefix your test methods with the name of the method you're testing, followed by the feature/scenario you're testing, followed by what the expected result/state is. 

Sounds about right, yeah? Seems legit? The resulting test class for the repository could probably end up looking something like this. 


Handful of methods from the RepositoryTests test class


While this is a handy convention it comes with some problems (in no particular order): 


  • It assumes - even promotes - that you have one test class for the entire class under test. This can lead to gaps in test coverage, especially when it comes to large or complex classes. Not cool at all. 
  • It becomes easy to hide code smell under tons of tests in a single test class. It looks like there's good test coverage of the class but it's not that easy to see whether you've covered all possible/realistic scenarios. It's just too much to absorb at once.
  • Related to the point above, it doesn't intuitively reveal which tests you should be writing. Again, it's because there's too much to absorb at once. Having a single test class means you need mental discipline to focus on writing a test for the method you're interested in as in the back of your mind you may actually be thinking about how to write tests for the entire class. Writing tests for the entire class usually means thinking about every execution path in that class, exceptions thrown within/by the method or by dependencies in the class, etc. This is counter-intuitive to the point of unit tests - we write tests to test methods, the unit in unit tests! When you're thinking about testing the class, even if it is just in the back of your head, your focus is in the wrong place. 
  • It gets unwieldy pretty quickly - even our simple repository example is already over 200 lines of code in a single test class and we haven't really written anything more than some trivial tests! Can you imagine how huge this test class could grow in real world, production quality software?
  • Prefixing each test with the method name, while necessary when using this structuring technique, is a pain in the ass. It's repetitive, boring and feels like redundancy. Who wants to write boring, repetitive code? Nobody. 
  • And finally - the underscores - they break readability, not improve it. In .NET, namespaces, classes and methods are not recommended to be named with underscores. This means that when reading test names, it's unlike reading any other code. That's a mental break and gear shift you don't really need however minor. Especially when you're in the zone. 
It almost seems that when it comes to unit tests, the usual care and attention to detail goes out the window. Why? It's not different code. Why should we treat it differently when it comes to the names of methods or the size of classes? 

All in all, this usual convention can get messy. Messy is bad. Messy is annoying. Messy doesn't inspire confidence or make you want to write tests. Messy is anti-TDD and leads to tests being added as an afterthought. Bad. Bad. Bad. Tests should come first - without writing tests first, you may end up writing code which is impossible to test. Or re-writing code just so that it passes the test - a dangerous road to lost productivity, lost time and shortcut hacks. I know. I've been that guy. 

Writing tests should feel natural. It should be fun. You should be writing tests not only because it's the right thing to do, but because you actually want to. 


The Alternative

I picked up on this technique from a Phil Haack blog entry a few years ago and have been using it ever since. I've refined it a little since, making it even easier/cleaner/more beneficial. 

If I could distill the technique into a single sentence, the basic gist of it would be to have a test class per method

I've taken it one step further and don't use nested classes or any inheritance within my test code. I have a separate test file per method, grouped in a folder named after the class under test. 


Test class per method FTW

This comes with a number of benefits:

  • Grouping in a folder by the name of the class under test helps keep namespaces sane. 
  • It's easy to navigate to the set of tests you're most interested in.
  • It makes it easier to add new tests to the method(s) you're interest in - simply append a new test to the end of the file. No need to navigate through a huge test class or nested classes. No need to try find where your Create_xxx test methods end and your Update_xxx test methods begin.
  • When you open an individual test class it makes it easier to see which tests you still need to write - have you covered all input parameters? Have you considered exceptions thrown by the method? etc. etc.
  • It helps keeps the size of the individual test classes more manageable.
  • It helps you focus your thoughts on how to test a particular method. Your focus is on testing the method, not the class. This is a subtle, positive reinforcement of unit testing/TDD.
  • Common setup code/utility methods/helpers/etc. can (and should) be made available through separate classes within your testing namespace instead of creating inheritance structures. This promotes code reuse and keeping stuff DRY.
  • You can easily group related tests together in subfolders of your test class folder. For instance, you may want to have integration or performance tests to compliment your unit tests. Grouping them this way keeps your tests neat, clean and low maintenance. Adding new test types and where to keep them becomes intuitive. 

Intuitively support multiple test categories


  • Splitting our tests into individual files/classes also shows up where we may have gaps in our test coverage. For instance when we take another look at our Create tests, it's easy to notice that although we're testing for validation failures from the repository on the Name field of our model, we don't have any test coverage for our Id or Quantity fields. Gap spotted and simple to correct - append more tests. 
Spot gaps easily


The Naming Situation...

Having a test class per method makes the MethodName prefix of test methods instantly redundant. Score! Less repetitive code to write. That still leaves us with <Scenario>_<Expectation> portion. These are still useful so we'll kinda keep 'em but that underscore has to go

My rationale is based - in small part - on the idea that we've already moved away from conventional unit test structures. So I'm kinda thinking we needn't remain bound to the same naming either. The real reason is because I find the underscore in test names to be really, really annoying. 

A quick search on unit test naming will spit out a few links, including this post from StackOverflow on Unit test naming best practices, Roy Osherove's Naming standards of unit tests (also linked to from the SO post), and DZone's 7 Popular Unit Test Naming Conventions. A cursory glance at these posts shows that the underscore is way popular in unit test names.

My preference is to use the "Feature to be tested" convention. It compliments the folder/file structure and gives you the option of having succinct test names which still convey their intent. For instance our Create_ValidModel_WillSaveRecord method could now be named much more succinctly as WillSaveValidRecord - it's already in our Create test class. Looking at it from a higher level you could read this with a pseudo qualified name as "Repository.Create.WillSaveValidRecord" which is pretty much exactly what you would expect the happy path of your code to do. 

Also, if you use NCrunch (and you absolutely should) grouping your tests by Namespace, Fixture, Test in the Tests window shows that short, succinct names actually improve the readability of your tests without sacrificing their intent.


NCrunch Tests Window
It becomes even more valuable when you have a solution with thousands of tests where you might only want feedback on failing tests. You have instant information on which area of the code is failing, exactly which tests are failing and why they're failing. 

Although that's more of a feature of NCrunch, the way you structure your tests compliments this feedback loop very, very well. This reduces the time you spend in code/compile/debug cycles, making you more productive in less time. I'll hopefully get around to publishing a post on why NCrunch is awesome. 


Failure feedback



Conclusion

While there may be some who will read this and consider it patently wrong the point of unit testing/TDD is not to enforce strict rules, it's to support the writing of high quality software rapidly. This structuring technique works very, very well and provides many more benefits than more vanilla/traditional conventions.

Conventions, while great for establishing some commonality, should not be confused for rules or standards. The world of software engineering is constantly evolving. How we go about developing our software needs to evolve with it.