Wednesday 2 November 2016

Drapper - An Introduction

I released Drapper about two months ago and, aside from the (rather dull) wiki that I put up for at least some documentation, haven't really written much about it.

As the author, I think it goes without saying that my opinion is completely, utterly and unashamedly biased.

It should also go without saying that I firmly believe that Drapper solves pretty much every single data access need I've ever come across. I've yet to find a case that Drapper couldn't support.

So why Drapper?

I'll start with the name. After many frustrating years of using home-rolled DALs and ORMs I came across the awesome micro ORM Dapper (if you're new to Dapper, go check it out!). Dapper was what I'd been looking for - no hoop jumping, no disgusting auto-generated SQL, no weird syntax or semantics for simple operations, no lots-of-things-I-hated-about-various-flavoured-ORM.

Dapper is simple, intuitive and targets a variety of different databases without blinking. And on top of all that goodness - it's FAST. Win!

It has, in my humblest of opinions, a drawback. It's a minor one, but one that tainted an otherwise very happy picture - mixing SQL with C# (or whichever .NET language you like most).

I don't like these two languages sharing the same space. It feels messy to me. Kludgy. I'm sure it has some benefits and there's more than one way to keep them separated from each other without going to great lengths. Static class perhaps. List of consts maybe. Resource file, even. Or perhaps something else.

I chose Something Else.

And  thus was born, from a simple desire to separate SQL from C#, a wrapper for Dapper - Drapper.

I'm sure there could have been a more original name (this one certainly doesn't come up in search results! :)), but this was the name I gave it (and I checked with Marc Gravell if it was okay to use it).

That's all it is?

Yes and no.

Yes, in that Drapper is a framework which successfully separates SQL from C# & uses Dapper under the hood.

No, in that it turns out that Drapper has some really nifty features built in that overcome some of the limitations you might find in using only Dapper. It also turns out that using Drapper helps to promote better data layer design simply by using it as is. More on those benefits later (or likely in a separate post).

Okay... So what is it then?

Drapper is control. Control over your execution. Control over your mapping. Control over your database. Control over your code, your tests, your design, your choices. Control was a central design tenet in developing Drapper. Not only should you have control over all of these things, but you should have the power to use or abandon using Drapper without impacting your code (too) adversely.

Tell me more

By and large, a great many applications will implement a CRUD repository of some sort. Some will take a more generalized (and in my view, incorrect) approach. Others will be more specialized with a repository per type to be persisted/retrieved. I prefer, and advocate for, specialization as this has more benefit for your code. The trade off is writing more code but it's a trade for more flexible, more maintainable code.

Within a specialized repository, one might have methods for each of the basic CRUD methods, possibly a method for lists of data (possibly paginated for front ends), methods which return object graphs, others which don't. Some repositories might not expose one or more CRUD operations depending on the application needs. And of course, all of this needs to be tested.

That can run into quite a bit of code! It's a daunting prospect, even to seasoned developers. But what if you can get that level of specialization with very little code? That's where Drapper comes in.

One liners

At it's heart, Drapper isn't much more than an abstraction of what Dapper does for you already. There's a single interface called IDbCommander which exposes overloads of two methods Execute and Query.

Execute is used for state changing operations (Create/Update/Delete) whereas Query is used to retrieve data.

Pretty simple, huh? It gets better - with Drapper, the majority of CRUD operations can be written in a single line of code. Need to persist a simple POCO or primitive type? One line of code.
_commander.Execute(model);

Need to retrieve them later? One line of code.
_commander.Query<T>();


Need something a little more complex? An object graph perhaps? No problem - supply a Func<> to the Query method and build up the object graph any way you see fit. Drapper supports a Func<T> with up to 16 inputs. You can define the Func<T> in a separate class (recommended) and still have your specialized secret sauce on a single line of code!

_commander.Query(Map.MyComplexType);

And the mapping function defined elsewhere -

internal class Map
    {
        Func<TypeA, TypeB, TypeA>  = MyComplexType (typeA, typeB) =>
        {
            typeA.SomeProperty = typeB;
            return typeA;
        };
    }


Yes, I'll admit that that is something of a cheat as you've written more than one line of code, Still, the intent is clear - keeping your specialized repository code specialized and simple.

But... But... How does it know what SQL to execute?

It's really quite simple & based in large part on the Single Responsibility Principle. The idea is that a CRUD method within a repository will, more often than not, correspond to a single bit of SQL to be executed. For instance, a Create method might call an insert statement, a Delete method a delete statement, and so on.

Taken a step further, we can infer that each method on a repository class corresponds to a SQL statement and knowing what that method is should determine what the corresponding SQL statement is. The repository method itself shouldn't know anything about that SQL statement. Whether it's T-SQL, PL/SQL, a simple statement or a stored procedure are not the concern of the method.

Taken another small step further, we can infer that the fully qualified name of a method could be used to uniquely identify a SQL statement to be executed (we'll talk about overloaded methods shortly). So assuming we had a method called Retrieve on a repository called ContactRepository in a namespace belonging to the fictitious HappyCustomer CRM app we could hypothetically represent that as HappyCustomer.Contacts.ContactRepository.Retrieve & expect that this method would retrieve a single contact from our database.

Drapper leverages this concept by having each overload of both Execute and Query on the IDbCommander use the [CallerMemberName] attribute as an optional argument. Having this salient bit of info gives us the name of the method being called at no cost. To get the full type name, we can (optionally) supply the type name using the typeof construct (e.g. typeof(ContactRespository)) or allow Drapper to use a bit of reflection to determine the full type name.

Command Settings and Readers

Now we have all of the info needed to uniquely identify our Retrieve method and thus, our SQL. Drapper uses a CommandSetting configuration object to represent the SQL to be executed and it should come as no surprise that we use the method name to identify the CommandSetting. Each one holds enough information to control the execution of a SQL statement - whether it's a text statement, a stored procedure, uses a particular transaction locking mechanism or has a specific timeout value, etc.

One or more CommandSetting objects will belong as collection of the type, stored in pretty much any configuration store you'd like to use. JSON and XML config stores are supported out of the box. A CommandSetting is returned from an ICommandReader.

I'll go more into the specifics of how the SQL is retrieved in another post. For the moment, it's enough to know that using the fully qualified name of a method in conjunction with an ICommandReader will return a CommandSetting. The ICommandReader is a dependency of the IDbCommander so you never have to interact with ICommandReader or a CommandSetting directly.

A word on overloads...

Overloads are supported and I'm pretty sure it's easy to see how - as the [CallerMemberName] is optional, you can supply a name to identify a specific CommandSetting. Revisiting our fictitious ContactRepository & assuming we had overloads of the Retrieve method - one to retrieve by an int id & another to retrieve by email address as a string - we could simply do something like

public Contact Retrieve(int id)
{
    // uses the "default" retrieve statement which expects an int arument
    return _commander.Query<Contact>(new { id }).SingleOrDefault();
}


public Contact Retrieve(string email)
{
    // uses the "named" retrieve statement which expects a string arument
    return _commander.Query<Contact>(

        new { email }, 
        method: "RetrieveByEmail").SingleOrDefault();
}



Allowing you to maintain a nice, clean interface to your repository without exposing the internals or expecting the calling code to know anything more about your repository than absolutely necessary.

That's a wrap! For now...

This post grew way, way longer than intended. I will write follow up posts which go into greater detail but it's really, really late at the moment and I'd like to get this post published :)

Friday 18 March 2016

Better Unit Test Structures

I have issues with the conventional structuring of unit tests in .NET. 

The prevailing convention follows the idea of having (at least) 1 test class for each class to test, with at least 1 test method per method in the class under test. Some folks get very particular about how test methods are named, claiming that there's a naming "standard". The reality is, there is no such standard. There's convention. And sadly, the most common convention is a poor one. 

In this post I'll attempt to show you a different - better - way of structuring your unit tests, along with the benefits of why you should change. I'll be using the example of a simple CRUD (Create, Retrieve, Update, Delete) repository. You can download the code for this blog entry on Github


The Example

In this example we have a simple POCO model with a couple of fields. Some of the fields are decorated with data annotations for validation. 


POCO model

The POCO object interacts with a simple CRUD repository which accepts a generic type. Some fairly straightforward stuff.  


Simple CRUD repository


Even though this is a simple example, it's a pragmatic one. All professional .NET developers have seen/written/maintained a CRUD repository in some form or other. 


Current Convention

Possibly the most popular convention when it comes to naming tests in .NET is
 
<MethodName>_<Scenario>_<Expectation>

The idea being that when you're testing your class, you prefix your test methods with the name of the method you're testing, followed by the feature/scenario you're testing, followed by what the expected result/state is. 

Sounds about right, yeah? Seems legit? The resulting test class for the repository could probably end up looking something like this. 


Handful of methods from the RepositoryTests test class


While this is a handy convention it comes with some problems (in no particular order): 


  • It assumes - even promotes - that you have one test class for the entire class under test. This can lead to gaps in test coverage, especially when it comes to large or complex classes. Not cool at all. 
  • It becomes easy to hide code smell under tons of tests in a single test class. It looks like there's good test coverage of the class but it's not that easy to see whether you've covered all possible/realistic scenarios. It's just too much to absorb at once.
  • Related to the point above, it doesn't intuitively reveal which tests you should be writing. Again, it's because there's too much to absorb at once. Having a single test class means you need mental discipline to focus on writing a test for the method you're interested in as in the back of your mind you may actually be thinking about how to write tests for the entire class. Writing tests for the entire class usually means thinking about every execution path in that class, exceptions thrown within/by the method or by dependencies in the class, etc. This is counter-intuitive to the point of unit tests - we write tests to test methods, the unit in unit tests! When you're thinking about testing the class, even if it is just in the back of your head, your focus is in the wrong place. 
  • It gets unwieldy pretty quickly - even our simple repository example is already over 200 lines of code in a single test class and we haven't really written anything more than some trivial tests! Can you imagine how huge this test class could grow in real world, production quality software?
  • Prefixing each test with the method name, while necessary when using this structuring technique, is a pain in the ass. It's repetitive, boring and feels like redundancy. Who wants to write boring, repetitive code? Nobody. 
  • And finally - the underscores - they break readability, not improve it. In .NET, namespaces, classes and methods are not recommended to be named with underscores. This means that when reading test names, it's unlike reading any other code. That's a mental break and gear shift you don't really need however minor. Especially when you're in the zone. 
It almost seems that when it comes to unit tests, the usual care and attention to detail goes out the window. Why? It's not different code. Why should we treat it differently when it comes to the names of methods or the size of classes? 

All in all, this usual convention can get messy. Messy is bad. Messy is annoying. Messy doesn't inspire confidence or make you want to write tests. Messy is anti-TDD and leads to tests being added as an afterthought. Bad. Bad. Bad. Tests should come first - without writing tests first, you may end up writing code which is impossible to test. Or re-writing code just so that it passes the test - a dangerous road to lost productivity, lost time and shortcut hacks. I know. I've been that guy. 

Writing tests should feel natural. It should be fun. You should be writing tests not only because it's the right thing to do, but because you actually want to. 


The Alternative

I picked up on this technique from a Phil Haack blog entry a few years ago and have been using it ever since. I've refined it a little since, making it even easier/cleaner/more beneficial. 

If I could distill the technique into a single sentence, the basic gist of it would be to have a test class per method

I've taken it one step further and don't use nested classes or any inheritance within my test code. I have a separate test file per method, grouped in a folder named after the class under test. 


Test class per method FTW

This comes with a number of benefits:

  • Grouping in a folder by the name of the class under test helps keep namespaces sane. 
  • It's easy to navigate to the set of tests you're most interested in.
  • It makes it easier to add new tests to the method(s) you're interest in - simply append a new test to the end of the file. No need to navigate through a huge test class or nested classes. No need to try find where your Create_xxx test methods end and your Update_xxx test methods begin.
  • When you open an individual test class it makes it easier to see which tests you still need to write - have you covered all input parameters? Have you considered exceptions thrown by the method? etc. etc.
  • It helps keeps the size of the individual test classes more manageable.
  • It helps you focus your thoughts on how to test a particular method. Your focus is on testing the method, not the class. This is a subtle, positive reinforcement of unit testing/TDD.
  • Common setup code/utility methods/helpers/etc. can (and should) be made available through separate classes within your testing namespace instead of creating inheritance structures. This promotes code reuse and keeping stuff DRY.
  • You can easily group related tests together in subfolders of your test class folder. For instance, you may want to have integration or performance tests to compliment your unit tests. Grouping them this way keeps your tests neat, clean and low maintenance. Adding new test types and where to keep them becomes intuitive. 

Intuitively support multiple test categories


  • Splitting our tests into individual files/classes also shows up where we may have gaps in our test coverage. For instance when we take another look at our Create tests, it's easy to notice that although we're testing for validation failures from the repository on the Name field of our model, we don't have any test coverage for our Id or Quantity fields. Gap spotted and simple to correct - append more tests. 
Spot gaps easily


The Naming Situation...

Having a test class per method makes the MethodName prefix of test methods instantly redundant. Score! Less repetitive code to write. That still leaves us with <Scenario>_<Expectation> portion. These are still useful so we'll kinda keep 'em but that underscore has to go

My rationale is based - in small part - on the idea that we've already moved away from conventional unit test structures. So I'm kinda thinking we needn't remain bound to the same naming either. The real reason is because I find the underscore in test names to be really, really annoying. 

A quick search on unit test naming will spit out a few links, including this post from StackOverflow on Unit test naming best practices, Roy Osherove's Naming standards of unit tests (also linked to from the SO post), and DZone's 7 Popular Unit Test Naming Conventions. A cursory glance at these posts shows that the underscore is way popular in unit test names.

My preference is to use the "Feature to be tested" convention. It compliments the folder/file structure and gives you the option of having succinct test names which still convey their intent. For instance our Create_ValidModel_WillSaveRecord method could now be named much more succinctly as WillSaveValidRecord - it's already in our Create test class. Looking at it from a higher level you could read this with a pseudo qualified name as "Repository.Create.WillSaveValidRecord" which is pretty much exactly what you would expect the happy path of your code to do. 

Also, if you use NCrunch (and you absolutely should) grouping your tests by Namespace, Fixture, Test in the Tests window shows that short, succinct names actually improve the readability of your tests without sacrificing their intent.


NCrunch Tests Window
It becomes even more valuable when you have a solution with thousands of tests where you might only want feedback on failing tests. You have instant information on which area of the code is failing, exactly which tests are failing and why they're failing. 

Although that's more of a feature of NCrunch, the way you structure your tests compliments this feedback loop very, very well. This reduces the time you spend in code/compile/debug cycles, making you more productive in less time. I'll hopefully get around to publishing a post on why NCrunch is awesome. 


Failure feedback



Conclusion

While there may be some who will read this and consider it patently wrong the point of unit testing/TDD is not to enforce strict rules, it's to support the writing of high quality software rapidly. This structuring technique works very, very well and provides many more benefits than more vanilla/traditional conventions.

Conventions, while great for establishing some commonality, should not be confused for rules or standards. The world of software engineering is constantly evolving. How we go about developing our software needs to evolve with it. 






Wednesday 4 November 2015

UPDATE is evil...

The UPDATE keyword broke RDBMS's (Relational Database Management Systems).

That's a somewhat crazy assertion to make. Or is it?

The thought occurred to me a little earlier & I'm having a tough time trying to think of a good reason for this particular keyword to exist in modern software engineering. As a construct, it may be the root of all database evils.

I'm thinking of this from the perspective of moderate to huge applications/platforms. Smaller apps would be immune to evils of UPDATE. Although I'm picking on SQL Server in this post, the concept could probably apply to any row-based RDBMS which supports the UPDATE construct.

Databases are great. What they're great at is the storage and retrieval of information. What they're not great at is logic.Of course you *can* introduce logic in a database through the use of stored procedures, etc. but should you? My response is "No, you should not.

UPDATE as a construct gives rise to the possibility of introducing logic to a database. As a result, we've had to introduce more & more constructs to be able to handle the fallout from having this ability, handy though it may be.

Think about it, once we had the ability to update one of more records, we found new & inventive, although ultimately destructive ways of manipulating the data at our disposal. This lead to us grouping statements together in reusable blocks which could be called at will - the stored procedure.

Stored procedures are great for exposing only the data you want exposed, isolating the source of the data from the consumer of it. A code contract of sorts. With stored procedures it became possible to alter the underlying schema of data without breaking a dependency on the data. Combine a stored procedure with UPDATE and you have the means of introducing logic branches in the database.

There's a few reasons off the top of my head why this is bad in today's world. I'm sure there a many more, but these spring to mind immediately.

  • Logic trapped in databases introduces a limitation to scale as your data can no longer be distributed across multiple hosts/geographic locations. Yes, there may be solutions to the shared hosts/location problem so that your database(s) appear to be a single instance to your procedures but this is, essentially, a workaround to the problem that your logic is in the wrong place.

    The shared/hosts location problem is one in it's own right, with it's own solutions. Your logic should not add to this problem.
  • Having logic trapped in the database means your logic can't be shared quite as easily between applications without introducing more dependency on the database, invariably leading to brittle software. 
  • Security - introducing logic in the database means that the data it acts upon can't easily be encrypted. Logic depends on conditions, conditions depend on data. While the query engine may not care whether the data is encrypted or not, the author of the procedure almost certainly will. 
If we didn't have the power to update records, introducing logic would be significantly more challenging. Sharing data between multiple hosts/locations becomes easier (as we don't have to worry about how that affects logic). It would promote moving logic into libraries which are more easily shared (not to mention unit-tested) and encrypting/decrypting the data falls then to the application (a separate discussion). In this age of frequent security breaches, keeping data encrypted may be safer for everyone.

As if this wasn't enough, it became apparent that simply updating a record was not enough. Preserving the history of the update, or the value of the field before the update, became necessary to maintain an audit trail. Enter, the humble trigger. While useful, it opens the door to additional logic branches or crazy workflows which aren't easy to diagnose or debug.

I propose we use only the CRD (Create, Retrieve, Delete) constructs in RDBMS's. Not having UPDATE forces a re-think in how we design data dependant software, possibly with some very positive side effects. Of course, it could backfire entirely... I haven't tried it yet but I'm giving it some serious thought.

I might go into more detail on this subject in the future. It's late at the moment, I'm tired & want to publish this :) While procedures & triggers are really only the most visible but there are other areas as well. Still, I hope this has helped you question along the same lines I have.

Thursday 8 October 2015

A Reply To A Rugby Troll


This was in response to a comment on this article.

Something which our American friends may not quite understand is that rugby - like NFL - is a "generational" sport. You don't become great at a sport relatively overnight.

A Quick History Lesson
Even though rugby was introduced in the mid 19th century, it dropped in popularity and only experienced a resurgence in the late 1960's/early 1970's (or thereabouts according to some sources). The governing body for rugby in the US, now called USA Rugby, was formed in 1975.

Compare that to South Africa where rugby has been played since around 1860, with rugby union being played from around 1875. The governing body for rugby in South Africa, now called the South African Rugby Union (SARU) being formed in 1889. That's nearly a century apart.

Generational Sport
Disclaimer: Although I have very little context of life in the US (I'm South African, living in Ireland) I presume what I am about to describe holds true for NFL as well.

Kids in SA play rugby from a very young age. Although I'm not sure if it's still the case, it was not uncommon for rugby to be a compulsory sport at school. If your parents like rugby it's almost certain that you will play as well. And if your father/grandfather/etc. played for the Springboks then it's pretty much a given that you will play. A number of our current Springboks had fathers/grandfathers who were Springboks - Schalk Burger, Ruan Pienaar, Cobus Reinach (although not selected for the RWC) are just a few that spring to mind.

If you're identified as having talent - even at a very young age (under 9/10) - you will usually get a chance to try out for one of the junior academies of one of the larger franchises at provincial level. This offers them the chance to be coached at a much higher standard than they'd get at school (where it's usually a teacher or two with a keen interest in rugby but no real coaching experience or training).

Kids that continue to perform will remain in the academy & continue getting access to better quality coaching than they would at school. Some kids may earn a scholarship to one of the top rugby schools in the country. This way, they get coaching at school and at the academy.

From there onwards, they'll make their senior debut at a domestic tournament & the rest is pretty self-explanatory.

Of course, mileage may vary, but this is a generalized, grossly over-simplified gist.

The Point
The point is that kids in SA have rugby drilled into them from the womb. We watch it with our parents, we play it at school, play it with our friends, etc. We learn the intricacies of the sport and understanding it becomes second nature for those who play (& for most of the supporters as well).

Despite it's brutality, rugby is actually a very technical and tactical sport. It's not just about speed or size or stamina - these attributes are, of course, desirable in a rugby player but what's more important is situational/spacial awareness in the context of what's happening in front of and around you on the field. Split second decisions matter. You could be the biggest, fastest guy on the planet but if you don't have an innate understanding of how to read a game of rugby to be able to make the right decision in that split second you will not succeed in the sport.

Rugby is a developing sport in the US, it's a religion in SA. According to Mike Tolkin (head coach for Team USA), it's the fastest growing team sport in the US. When Americans lose a rugby match, they chalk it up to experience and hope to improve in the next match. The vast majority of the American public wouldn't even know that they'd played.

Completely different when the Springboks lose. When we lost to Japan earlier in the tournament it was considered a travesty - a national catastrophe! It made headlines throughout the country and dominated media coverage. It was spoken about at length & prodded from every possible angle.

Conclusion
The USA may be a Tier 2 nation at the moment but that's not to say they will remain that way. To be truly competitive at the elite level will take generations. The good news is that YOU can do something about it - support your national team - win or lose. Be proud that they qualified to compete in the tournament so that they could get the opportunity to play against two-time world champions.

Find out about and support your domestic league. Tell your friends about the sport, tell your kids, go to matches & support your local teams. It's not your athletes which will grow your team to elite level - it's the everyday American. It's you.

Tuesday 17 December 2013

The Dusty Shelf

I need to catch up on my writing. Shameful when I look at the lack of content here versus what I have in draft.

Monday 7 May 2012

Language and culture

By necessity, culture becomes irrelevant when trying to share knowledge. Nowhere is this more true than on the internet. If one considers one's self to be a reasonable human being, open to discussion and the dissemination of ideas then one must accept that not everyone you share your ideas with will speak the same language. Not everyone will have the same implicit bias or frame of reference. 



As mentioned earlier, there is a nuance to language - what we communicate is not only in the words we speak or write, but the concepts which we refer to and project. Not only does the use of language tend to reveal a fair amount about the individual, but also ones reaction to the use of language. And lest we forget, we miss the all important subtlety from body language when we converse online. 


The internet has been - and will continue to be - blamed as party to the destruction of certain cultures. While this essentially is true, it has also been a phenomenal transport for knowledge. I, for one, wouldn't have a clue where I'd be without Wikipedia. 


As ever, I overstate my point with shameless digression and pedantry.




I wrote this as a comment on Facebook. As such, the context might be lost but I think the content is still relevant and worth sharing more publicly.

SoundWave, Active Sonar And Alternative Uses

I recently read this article on SoundWave, a Kinect-like system which uses the Doppler Effect for determining user gestures through an ultrasonic frequency emitted by your laptop's stock microphone/speaker setup.


In essence, it's active sonar


Now, while the article does note that having the keyboard in such close proximity makes the technology kinda superfluous, it occurs to me that gesture input is not the only use case for this software. It could, in fact, possibly be the least useful. That's not to detract from the excellent work done by the researchers and developers of this technology in any way at all. I simply think there may be alternative uses for it. Outside of the military that is, where sonar/radar have been used extensively since around World War 1.


At first I considered using Kinect in conjunction with SoundWave - would it potentially improve the accuracy and user experience of Kinect? Possibly. Probably not. I also considered how one might use this for improving accessibility (could it?). Then it occurred to me - maybe we're looking at this from the wrong direction? Instead of using the SoundWave/Doppler shift as a means to provide input (me --> device), what if we used it as a means of keeping us aware of our ambient surroundings (device --> me)?


Consider how one could incorporate this with HUD or Project Glass. While it could prove to be more of a distraction than a help, I can't help but think there is an alternative use case for this. Perhaps as means for alerting us of people invading our personal space? Not likely to be helpful on a crowded bus or street. 


I'm almost certainly barking up the wrong tree but I just can't shake the feeling that we're missing something important here. Not in what it currently is, but in what it could be.