The Never Ending Journey: 2011

Rather than using svn.exe through Process.Start SharpSvn library offers a complete API mapping of the command line tool.

It can be downloaded from the website or fetched via NuGet with: Install-Package SharpSvn.x86 or Install-Package SharpSvn.x64

The API may seem a bit weird at first but that’s because it wraps a C++ library under the hood.

As of April 2011 there is no official documentation of the API, so the best source is still trying to make svn.exe do what you want and then translate that in C#.

So far, the other source of information I currently use is the discussion forums

One other thing don't really like is the footprint of this library, 64Mo for the NuGet package and 21Mo in bin. Its huge!! I hope that future version will be a bit more slim than now.

Overall it's still a lot better than using svn.exe directly.

Working in god classes with monster methods is no fun. It usually involves guessing where I should put the new code exactly. Then it takes a lot of debugging to find out why the change doesn’t work as expected. After that, it takes more time to make sure we did not break anything else around that change. This is definitely not fun at all and it shouldn’t be the way we work. We have control of the code and we should not tolerate a situation like that. We can fix this!

The Single Responsibility Principle or SRP state that

A class should have only one reason to change, only one responsibility.

This means that when I want to add a new feature I don’t want to care about things I’m not changing. For example, below in the class diagram we have the ReportingService class responsible to load and process data, then populate and show a report. This class will be modified every time I need to change my database schema, the report engine behaviour and the layout of the report itself. This mean that it's violating the Single Responsibility Principle, it's doing too much.

If we split the class into separated concerns, we will end up with classes that will need to change for only one reason. It's still not perfect but that's a start.

So, why is this so important again?

Well, first of all reuse. In the diagram above we see that the DAL concern (Data Access Layer) is now encapsulated into a Repository class. This means that we can reuse the Repository code to load the same data but not only for creating reports.

Second, by creating new classes we are forced to think of a name for them. Good naming of classes and methods will help improve comprehension by people who are not familiar with this part of the code base. Another side effect is that we expose hidden concepts from inside our original class like the DAL (now the Repository class).

Third, we are now free to evolve each concern more independently from one another than before. Also, it will be easier to write unit tests for such smaller and simple classes. We reduced the overall complexity of the code and we will be more confident that we are not breaking the application everything we make some changes.

So, should we always break all the large classes into smaller ones until we have only a few methods in each of them?

Not exactly, we need to be careful not to break the OOP basic notion of encapsulation. So where should we draw the line, where should we split our classes? Unfortunately, there is no easy answer, rather we need to understand what the class responsibilities are.

Responsibility Driven Design

One technique we could use comes from the 80s as an OO design practices called RDD for Responsibility Driven Design. The goal was to identify the Object Role Stereotype of classes to better understand their responsibilities. Let's take a look at those six roles.

Information Holder: Knows things and provides information. May make calculations from the data that it holds.

Structurer: Knows the relationships between other objects.

Controller: Controls and directs the actions of other objects. Decides what other objects should do.

Coordinator: Reacts to events and relays the events to other objects.

Service Provider: Does a service for other objects upon request.

Interfacer: Objects that provide a means to communicate with other parts of the system, external systems or infrastructure, or end users.

The second part of this technique is to use CRC Cards which are index card put on a board while designing the systems.

Class name
Responsibilities	Collaborators
- responsibility 1 - responsibility 1 …	- collaborator 1 - collaborator 2 …

CRC stands for Class - Responsibilities - Collaborator. You put all the responsibilities on the left side and all the collaborators (the classes you talk to) on the right side. When putting all the cards of a system or sub-system next to each other on a wall you get a really great big picture of what’s going on.

Back to SRP, our ultimate goal is to take the secondary responsibilities on the left side and transform them into new collaborators on the right. In the end we should have only one responsibility (the principal one) on the left hand side. So, now in the previous example BetterReportingService is only a Controller for its collaborators. Repository is an Interfacer and ReportEngine is a Service Provider. ReportBuilder acts as an Information Holder and a Structurer so we may consider separating the concerns one more time.

Dealing with a large code base

It’s not always easy to look at an aging code base and spot SRP violations. Sometimes we may think a class only have one responsibility when reading the code. The separation opportunities will rarely jump at us.

Fortunately there is one technique that could help us with that. I'm talking about code metrics. Tools like NDepend and to some extent Visual Studio can use reflection technology to look at the code and calculate metrics to help us see the big picture but with a different set of eyes. It can help us find things we could hardly see ourselves.

The LCOM (Lack of Cohesion of Methods) is a metric which indicate whether all the methods and fields of a class are strongly connected together. For that, there should be a connection between all members of the class including fields.

The diagram below shows that the group composed of methods A and B and field x are connected together but they have no connection to the rest of the methods and field. The LCOM4 metric gives us a value of 2 for this class because 2 distinct groups exist.

Now, a class with a LCOM4 value one 1 will look more like this

So, does this mean that the class in diagram 1 have 2 responsibilities and the one in diagram 2 only 1? Not necessarily. Some types of classes like pure domain objects may have only properties exposing the attributes of the class without been connected. We need other indicators to asses that classes may possess multiple responsibilities.

The Cyclomatic Complexity metric gives us the number of execution paths in a method. Code constructs like if, for, foreach, switch and while all generate at least one extra path of execution and contribute to code complexity. A class with high cyclomatic complexity value means that it is actually doing something more than exposing attributes.

So, we should look for classes with a high value of Lack of Cohesion of Methods and high Cyclomatic Complexity. Usually if you know your code base a little bit you should not be really surprised by the top result of that list. They are the ones you need to change for every new feature, the ones where bugs are found periodically and the ones who does not have unit tests around them.

I hope now that you understand what the advantages of more little classes are over a few big ones. That in the long run, it is worth our time to think about the OO design of the system to reduce complexity, responsibility coupling and improve the testability of our code.

The Never Ending Journey

Thursday, June 30, 2011

Resharper 6 is out!

Saturday, May 7, 2011

SharpSvn

Thursday, April 7, 2011

Single Responsibility Principle

Responsibility Driven Design

Dealing with a large code base

Sunday, February 20, 2011

SOLID principles at .Net Montreal Community