Quantcast
Channel: CraigTP's Blog
Viewing all 103 articles
Browse latest View live

Beware NuGet’s Filename Encoding!

$
0
0

The other day, I was troubleshooting some issues that had occurred on a deployment of some code to a test server.  Large parts of the application were simply not working after deployment, however, the (apparently) same set of code and files worked just fine on my local development machine.

After much digging, the problem was finally discovered as being how NuGet handles and packages files with non-standard characters in the filename.

It seems that NuGet will ASCII-encode certain characters within filename, such as spaces, @ symbols etc.  This is usually not a problem as NuGet itself will correctly decode the filenames again when extracting (or installing) the package, so for example, a file named:

READ ME.txt

within your solution will be encoded inside the .nupkg file as:

READ%20ME.txt

And once installed / extracted again using NuGet will get it’s original filename back.  However, there’s a big caveat around this.  We’re told that NuGet’s nupkg files are “just zip files” and that simply renaming the file to have a .zip extension rather than a .nupkg extension allows the file to be opened using 7-Zip or any other zip archive tool.  This is all fine, except that if you extract the contents of a nupkg file using an archiving utility like 7-zip, any encoded filenames will retain their encoding and will not be renamed back to the original, correct, filename!

It turns out that my deployment included some manual steps which included the manual extraction of a nupkg file using 7-Zip.  It also turns out that my nupkg package contained files with @ symbols and spaces in some of the filenames.  These files were critical to the functioning of the application, and when manually extracting them from the package, the filenames were left in an encoded format meaning the application could not load the files as it was looking for the files with their correct (non-encoded) filenames.


Setting up Jenkins on Windows with Git, Mercurial and SSH

$
0
0

This guide will detail the steps required to correctly setup and configure Jenkins on Windows using both Git and Mercurial as the version control tools and using SSH with both in order to authenticate with repositories hosted on the BitBucket service.

  • Download and install Jenkins, Git & Mercurial to their default locations. Ensure you get the 64-bit versions of all of these tools.
  • First, we need to create an SSH key pair, using OpenSSH which comes bundled with Git, that will allow Git to communicate with Bitbucket via SSH.
  • Next, we’ll configure OpenSSH (which is used by Git), so follow the steps under the section “Set Up SSH for Git” from here: https://confluence.atlassian.com/bitbucket/set-up-ssh-for-git-728138079.html
  • This primarily involves creating a new ssh keypair from a Git Bash shell, using ssh-keygen, ensuring the resulting keys are stored in your user’s “home” directory (which on Windows is usually, C:\Users\xxxxx\ - where xxxx is your logged on Windows username) within an .ssh directory and that you have a config file within that same folder that tells Git/SSH which key to use for a specific host (i.e. Host bitbucket.org
    IdentityFile ~/.ssh/<privatekeyfile>
    )
  • Next, we need to configure Mercurial. Since Mercurial is a more “windows-y” tool, by default, it wants to use PuTTY (and its related tools of Plink and Pageant), however, we’re going to tell Mercurial to use OpenSSH instead. Normally, you would edit a mercurial.ini file inside the installation folder of TortoiseHg (usually, C:\Program Files\TortoiseHg\) however, this didn’t seem to work for me, as Hg insisted on pulling the required config from a different file which constantly overrode anything I had set in the mercurial.ini file! The file in question that is likely to need to be edited is C:\Program Files\TortoiseHg\hgrc.d\Paths.rc Within this file, you’ll need to add or amend the [ui] section to configure ssh:
    [ui]
    ssh = ssh -2 -C -x
  • Note that the above assumes that the path for SSH (which, since it’s installed with Git, is usually C:\Program Files\Git\usr\bin. If this path isn’t added to your PATH environment variable for SYSTEM (not for a specific user) then you’ll need to add it. Open Control Panel, go to “System”, click “Advanced System Settings” in the left-hand menu, then click the button “Environment Variables” on the resulting dialog. Remember to edit the System Variables not the ones for your user).
  • Jenkins, when installed on Windows, is by default configured to run as a Windows Service. The service is configured to run under the Local System account. As a result of this, Mercurial will invoke OpenSSH in this context, and so OpenSSH will now look to the Local System account’s home directory for it’s SSH keys (Git seems perfectly happy looking for the keys in the user’s home folder). So, take the entire .ssh folder from the user’s home folder (the same folder as used earlier when creating the ssh-keys initially) and copy them to the Local System account’s home folder. Where is this? Well, it’s not within the C:\Users\ area, not even as a hidden/system folder, oh no, that would be too logical. So, Windows, in it’s infinite wisdom decides to place the Local System account’s home folder here: C:\Windows\System32\config\systemprofile\
  • After this, you should be able to start Jenkins and add some build jobs. When creating the jobs, you’ll set the relevant Source Code Management to wither Git or Mercurial, and you’ll specify an ssh:// protocol address for the Repository URL . Note that you do NOT need to specify anything within the credentials (i.e. leave it set to it’s default value of --none--) section here as when Jenkins “shells” out to the relevant SCM tool, that tool will be given an ssh:// address to connect to and that tool’s configuration will look to see how it is configured to access ssh:// URLs. It’s that configuration that will provide the necessary credentials to connect to the remote repository.

DDD 11 In Review

$
0
0

IMG_20160903_084627This past Saturday 3rd September 2016, the 11th DDD (DeveloperDeveloperDeveloper) conference was held at Microsoft’s UK HQ in Reading.  Although I’ve been a number of DDD events in recent years, this was my first time at the original DDD event (aka Developer Day aka DDD Reading) which spawned all of the other localised DDD events.

IMG_20160903_090335After travelling the evening before and staying overnight in a hotel in Swindon, I set off bright and early to make the 1 hour drive to Reading.  After arriving and checking in, collecting my badge along the way, it was time to grab a coffee and one of the hearty breakfast butties supplied.  Coffee and sausage sandwich consumed, it was time to familiarise myself with the layout of the rooms.  There were 4 parallel tracks of talks, and there had also been a room change from the printed agendas that we received upon checking in.  After finding the new rooms, and consulting my agenda sheet it was time for me to head off to the first talk of the day.  This was Gary Short’sHow to make your bookie cry”.

With a promise of showing us all how to make money on better exchanges and also how to “beat the bookie”, Gary’s talk was an interesting proposition and commanded a full room of attendees.  Gary’s session is all about machine learning and how data science can help us do many things, including making predictions on horse races in an attempt to beat the bookie.  Gary starts by giving the fundamental steps of machine learning – Predict – Measure – Analyze – Adjust.  But, we start with measure as we need some data to set us off on our way. 

IMG_20160903_093802Gary states that bookie odds in the UK are expressed as fractions and that this hides the inherent probabilities of each horse winning in a given race.  Bookies ultimately will make a profit on a given race as the probabilities of all of the horses add up to more than 1!  So, we can beat the bookie if we build a better data model.  We do this with data.   We can purchase horse racing data, which means we’re already at a loss given the cost of the data, or we can screen scrape it from a sports website, such as BBC Sport.  Gary shows us a demo of some Python code used to scrape the data from the BBC website.  He states that Python is one of two “standard” languages used within Data Science, the other language being R.  After scraping a sufficiently sized dataset over a number of days, we can analyze that data by building a Logistic Regression Model.  Gary shows how to use the R language to achieve this, ultimately giving us a percentage likelihood of a given horse winning a new race based upon its past results, its weight and the jockey riding it.

Gary next explains a very important consideration within Data Science known as The Turkey Paradox.  You’re a turkey on a farm, you have to decide if today you’re going to get fed or go to market.  If your data model only has the data points of being fed at 9am for the last 500 days, you’ll never be able to predict if today is the day you go to market - as it’s never happened before.  There is a solution to this - it’s called Active Learning or Human in the Loop learning.   But.  It turns out humans are not very good at making decisions.

Gary next explains the differences between System 1 and System 2 thinking.  System 2 is very deliberate actions - you think first and deliberately make the action.  System 1 is reflexive - when you put your hand on a hot plate, you pull it away without even thinking.  It uses less of the brain.  System 1 is our “lizard brain” from the days when we were cavemen.  And it takes precedence over System 2.  Gary talks about the types of System 1 thinking.  There’s Cognitive Dissonance – holding onto a belief in the face of mounting contrary evidence.  Another is bait-and-switch – substituting a less favourable option after being “baited” with a more favourable one, and yet another type is the “halo effect” – beautiful things are believed to be desirable.  We need to ensure that, when using human-in-the-loop additions to our data model, we don’t fall foul of these problems.

IMG_20160903_092511Next, we explore Bayes’ theorem.  A theorem describing how the conditional probability of each of a set of possible causes for a given observed outcome can be computed from knowledge of the probability of each cause and the conditional probability of the outcome of each cause.  Gary uses this theorem over our horse racing data model to demonstrate Bayes inference using prior probabilities to predict future ones.  This is using the raw scraped data, with no human-in-the-loop additions, but we can add our own additions which become prior probabilities and can be used to compute further probabilities using Bayes theorem.

Gary concludes that, once we’ve acquired, trained and analyzed our data model, we can beat the bookie if our odds are shorter than the bookie’s.  Another way, it not to beat the bookie at all!  We can make money simply by beating other gamblers.  We can do this using betting exchanges - backing and laying bets and getting other gamblers to bet against your prediction of the outcome of an event.  Finally, you can also profit from “trading arbitrage” – whereby the clever placing of bets when two different bookies have the same event outcome at two different odds can produce a profit from the difference between those odds.

IMG_20160903_104403After a short coffee break, it was onto the second session of the day, which was Ali Kheyrollahi’sMicroservice Architecture at ASOS”.  Ali first explains the background of the ASOS company where he works.  They’re a Top 35 online retailer, within the Top 10 of online fashion retailers, they have a £1.5 billion turnover and, for their IT, they process around 10000 requests per second.  Ali states that ASOS is at it’s core a technology company, and it’s through this that they succeed with IT – you’ve got to be a great tech business, not just a great tech function.  Tech drives the agenda and doesn’t chase the rest of the business.

Ali asks “Why Microservices?” and states that it’s really about scaling the people within the business, not just the tech solution.  Through decoupling the entire solution, you decentralise decision making.  Core services can be built in their own tech stack by largely independent teams.  It allows fast and frequent releases and deployments of separate services.  You reduce the complexity of each service, although, Ali does admit that you will, overall, increase the complexity of the overall solution.

The best way achieve all of this is through committed people. Ali shows a slide which mentions the German army’s “Auftragstaktik” which is method of commanding in which the commander gives subordinate leaders a specific mission, a timescale of achievement and the forces required to meet the goal, however, the individual leaders are free to engage their own subordinates services are they see fit.  It’s about telling them how to think, not what to think.  He also shares a quote from “The Little Prince” that embodies this thinking, “If you wish to build a ship, do not divide the men into teams and send them to the forest to cut wood. Instead, teach them to long for the vast and endless sea.”  If you wish to succeed with IT and Microservices in particular, you have to embrace this culture.  Ali states that with a “triangle” of domain modelling, people and a good operation model, this really all equals successful architecture.

Ali hands over to his colleague Dave Green who talks about how ASOS, like many companies, started with a legacy monolithic system.  And like most others, they had to work with this system as it stood – they couldn’t just throw it out and start over again it was after all handling nearly £1 billion in transaction per year, however, despite fixing some of the worst performance problems of the monolithic system, they ultimately concluded that it would be easier and cheaper to build a new system than to fix the old one.  Dave explains how they have a 2 tier IT system within the company – there’s the enterprise domain and the digital domain.  The enterprise domain is primarily focused on buy off-the-shelf software to run the Finance, HR and other aspects of the business.  They’re project lead.  Then there’s the digital domain, much more agile, product lead and focused on building solutions rather than buying them.

Ali state how ASOS is a strategic partner with Microsoft and is heavily invested in cloud technology, specifically Microsoft’s Azure platform.  He suggests that ASOS may well be the largest single Azure user this side of the Atlantic ocean!  He talks about the general tech stack, which is C# and using TeamCity for building and Octopus Deploy for deployment.  There’s also lots of other tech used, however, and other teams are making use of Scala, R, and other languages where it’s appropriate.  The database stack is primarily SQL Server, but they also use Redis and MongoDB.

IMG_20160903_111122Ali talks about one of the most important parts of building a distributed micro service based solution – the LMA stack – that’s Logging, Monitoring and Altering.  All micro services are build to adhere to some core principles.  All queries and commands use HTTP API, but there’s no message brokers or ESB-style pseudo microservices.  They exist outside of the services, but never inside.  For the logging, Ali states how logging is inherent within all parts of every service, however, they do most logging and instrumentation whenever there is any kind of I/O – network, file system or database reads and writes.  As part of their logging infrastructure, they use Woodpecker, which is a queue and topic monitoring solution for Azure Service Bus. 

All of the logs and Woodpecker output is fed into a Log collector and processor.  They don’t use LogStash for this, which is a popular component, but instead use ConveyorBelt.  This play better with Azure and some of the Azure-specific implementation and storage of certain log data.  Both LogStash and ConveyorBelt, however, have the same purpose – to quickly collect and push log data to ElasticSearch.  From here, they use the popular Kibana product to visualise that data.  So rather than a ELK stack (ElasticSearch, LogStash, Kibana), it’s a ECK stack (ElasticSearch, ConveyorBelt, Kibana).

Ali concludes his talk by discussing lessons learnt.  He says, if you’re in the cloud - build for failure as the cloud is a jungle!  Network latency and failures add up so it's important to understand and optimize time from the user to the data.  With regard to operating in the cloud in general, Ignore the hype - trust no one.  Test, measure, adopt/drop, monitor and engage with your provider.  It's difficult to manage platform costs, so get automation and monitoring of the cloud infrastructure to prevent developers creating erroneous VM’s that they forget to switch off!  Finally, distributed computing is hard, geo-distribution is even harder.  Expect to roll up your sleeves. Maturity in areas can be low and is changing rapidly.

IMG_20160903_115726After Ali’s talk there was another coffee break in the communal area before we all headed off to the 3rd session of the day.  For me, this was Mark Rendle’sSomewhere over the Windows”.  Mark’s talk revolved around .NET core and it’s ability to run cross-platform.  He opened by suggesting that, being the rebel he is, the thought he’d come to Microsoft UK HQ and give a talk about how to move away from Windows and onto a Linux OS!

Mark starts by saying that Window is great, and a lot of the intrinsic parts of Windows that we use as developers, such as IIS and .NET are far too deeply tied into a specific version of Windows.  Mark gives the example that IIS has only just received support for HTTP2, but that it’s only the version of IIS contained within the not-yet-released Windows Server 2016 that’ll support it.  He says that, unfortunately, Windows is stuck in a rut for around 4 years, and every 4 years Microsoft’s eco-system has to try to catch up with everybody else with a new version of Windows.

.NET Core will help us as developers to break away from this getting stuck in a rut.  .NET Core runs on Windows, Linux and Mac OSX.  It’s self-contained so that you can simply ship a folder containing your application’s files and the .NET core runtime files, and it’ll all “just work”.  Mark mentions ASP.NET Core, which actually started the whole “core” thing at Microsoft and  they then decided to go for it with everything else.  ASP.NET Core is a ground-up rewrite, merges MVC and Web API into a unified whole and has it’s own built-in web server, Kestrel which is incredibly fast.  Mark says how his own laptop has now been running Linux Mint for the last 1.5 years and how he’s been able to continue being a “.NET developer” despite not using Windows as his main, daily OS.

Mark talks about how, in this brave new world, we’re going to have to get used to the CLI – Command Line Interface.  Although some graphical tooling exists, the slimmed down .NET core will take us back to the days of developing and creating our projects and files from the CLI.  Mark says he uses Guake as his CLI of choice on his Linux Mint install.  Mark talks about Yeoman - the scaffolding engine used for ASP.NET Core bootstrap.  It’s a node package, and mark admits that pretty much all web development these days, irrespective of platform, is pretty much dependent on node and it’s npm package manager.  Even Microsoft’s own TypeScript is a node package.  Mark shows creating a new ASP.NET Core application using Yeoman.  The yeoman script creates the files/folders, does a dotnet restore command to restore nuget packages then does a bower restore to restore front-end (i.e. JavaScript) packages from Bower.

Mark says that tooling was previously an issue with developing on Linux, but it’s now better.  There’s Visual Studio 2015 Update 3 for Windows only, but there's  also Project Rider and Xamarin Studio which can run on Linux in which .NET Core code can be developed.  For general editors, there’s VS Code, Atom, SubLime Text 3, Vim or Emacs! VS Code and Atom are both based on Electron.

Mark moves on to discuss logging in an application.  In .NET Core it’s a first class citizen as it contains a LoggerFactory.  It’ll write to STDOUT and STDERROR and therefore it works equally well on Windows and Linux. This is an improvement over the previous types of logging we could achieve which would often result in writing to Windows-only log stores (for example, the Windows Event Log). 

Next, Mark moves on to discuss Docker.  He’s says that the ability to run your .NET Core apps on a lightweight and fast web server such as NGINX, inside a Docker container, is one of the killer reasons to move to and embrace the Linux platform as a .NET Developer.  Mark first gives the background of “what is docker?”  They’re “containers” which are like small, light-weight VM’s (Virtual Machines). The processes within them run on the host OS, but they’re isolated from other processes in other containers.  Docker containers use a “layered” file system.  What this means is that Docker containers, or “images” which are the blueprints for a container instance can be layered on top of each other.  So, we can get NGINX as a Docker image - which can be a “base” image but upon which you can “layer” additional images of your own, so your web application can be a subsequent layered image which together form a single running instance of a Docker container, and you get a nice preconfigured NGINX instance from the base container for free!  Microsoft even provide a “base” image for ASP.NET Core which is based upon Debian 8. Mark suggests using jwilder/nginx-proxy as the base NGinX image.  Mark talks about how IIS is the de-facto standard web server for Windows, but nowadays, NGinX is the de-facto standard for Linux.  We need to use NGinX as Kestrel (the default webserver for ASP.NET Core) is not a production webserver and isn’t “hardened”.  NGinX is a production web server, hardened for this purpose.

To prevent baking configuration settings in the Docker image (say database connections) we can use Docker Compose.  This allows us to pass in various environment settings at the time when we run the Docker container.  It uses YAML.  It also allows you to easily specify the various command line arguments that you might otherwise need to pass to Docker when running an image (i.e. -p 5000:5000 - which binds port 5000 in the Docker image to port 5000 on the localhost). 

Mark then shows us a demo of getting an ELK stack (Elastic Search, LogStash & Kibana) up and running.  The ASP.NET Core application can simply write it’s logs to its console, which on Linux, is STDOUT.  There is then a LogStash input processor, called Gelf, that will grab anything written to STDOUT and process it and store it within LogStash.  This is then immediately visible to Kibana for visualisation!

Mark concludes that, ultimately, the main benefits of the “new way” with .NET and ASP.NET Core are the same as the fundamental benefits of the whole Linux/Unix philosophy that has been around for years.  Compose your applications (and even you OS) out of many small programs that are designed to do only one thing and to do it well.

IMG_20160903_131124After Mark’s session, which slightly overran, it was time for lunch.  Lunch at DDD 11 was superb.  I opted for the chicken salad rather than a sandwich, and very delicious (and filling) it was too, which a large portion of chicken contained within.  This was accompanied by some nice crisps, a chocolate bar, an apple and some flavoured water to wash it all down with!

I ate my lunch on the steps just outside the building, however, the imminently approaching rain soon started to fall and it put a stop to the idea of staying outside in the fresh air for very long! IMG_20160903_132320  That didn’t matter too much as not long after we’d managed to eat our food we were told that the ubiquitous “grok talks” would be starting in one of the conference rooms very soon.

I finished off my lunch and headed towards the conference room where the grok talks were being held.   I was slightly late arriving to the room, and by the time I had arrived all available seating was taken, with only standing room left!  I’d missed the first of the grok talks, given by Rik Hepworth about Azure Resource Templates however, I’d seen a more complete talk given by Rik about the same subject at DDD North the previous year.   Unfortunately, I also missed most of the following grok talk by Andrew Fryer which discussed Power BI, a previously stand-alone product, but is now hosted within Azure.

I did catch the remaining two grok talks, the first of which was Liam Westley’sWhat is the point of Microsoft?”  Liam’s talk is about how Microsoft is a very different company today to what it was only a few short years ago.  He starts by talking about how far Microsoft has come in recent years, and how many beliefs today are complete reversals of previously held positions – one major example of this is Microsoft’s attitude towards open source software.  Steve Ballmer, the previous Microsoft CEO famously stated that Linux was a “cancer” however, the current Microsoft is embracing Linux on both Azure and for it’s .NET Development tools.    Liam states that Microsoft’s future is very much in the cloud, and that they’re investing heavily in Azure.  Liam shows some slides which acknowledge that Amazon has the largest share of the public cloud market (over 50%) whilst Azure only currently has around 9%, but that this figure is growing all the time.  He also talks about how Office 365 is a big driver for Microsoft's cloud and that we should just accept that Office has “won” (i.e. better than LibreOffice, OpenOffice etc.).  Liam wraps up his quick talk with something rather odd – a slide that shows a book about creating cat craft from cat hair!

The final grok talk was by Ben Hall, who introduced us very briefly to an interesting website that he’s created called Katacoda.  The website is an interactive learning platform and aims to help developers learn all about new and interesting technologies from right within their browser!  It allows developers to test out and play with a variety of new technologies (such as Docker, Kubernetes, Git, CoreOS, CI/CD with Jenkins etc.) right inside your browser in an interactive CLI!  He says it’s completely free and that they’re improving the number of “labs” being offered all the time.

IMG_20160903_143625After the grok talks, there was a little more time to grab some refreshments prior to the first session of the afternoon, and penultimate session of the day, João “Jota” Pedro Martins’Azure Service Fabric and the Actor Model”.  Jota’s session is all about Azure Service Fabric, what it is and how it can help you with distributed  applications in the cloud.  Azure Service Fabric is a PaaS v2 (Platform As A Service) which supports both stateful and stateless services using the Actor model.  It’s a platform for applications that are “born in the cloud”.  So what is the Actor Model?  Well, it’s a model of concurrent computation that treat “actors” – which are distinct, independent units of code – as the fundamental, core primitives of an application.  An application is composed of numerous actors, and these actors communicate with each other via messages rather than method calls.  Azure Service Fabric is built into Azure, but it’s also downloadable for free and can be used not only within Microsoft’s Azure cloud, but also inside the clouds of other providers too, such as Amazon’s AWS.  IMG_20160903_143735Azure Service Fabric is battle hardened, and has Microsoft’s long-standing “Project Orleans” at it’s core.

The “fabric” part of the name is effectively the “cluster” of nodes that run as part of the service fabric framework, this is usually based upon a minimum configuration of 1 primary node with at least 2 secondary nodes, but can be configured in numerous other ways.  The application’s “actors” run inside these nodes and communicate with each other via message passing.  Nodes are grouped into replica sets and will balance load between themselves and failover from one node to another if a node becomes unresponsive, taking “votes” upon who the primary node will be when required.  Your microservices within Service Fabric can be any executable process that you can run, such as an ASP.NET website, a C# class library, even a NodeJS application or even some Java application running inside a JVM.  Currently Azure Service Fabric doesn’t support Linux, but support for that is being developed.

Your microservices can be stateless or stateful.  Stateless services are simply as there’s no state to store, so messages consumed by the service are self-contained.  Stateful services can store state inside of Service Fabric itself, and Service Fabric will take care of making sure that the state data stored is replicated across nodes ensuring availability in the event of a node failure.  Service Fabric clusters can be upgraded with zero downtime, you can have part of the cluster responding to messages from a previous version of your microservice whilst other parts of the cluster, those that have already had the microservices upgraded to a new version, can process messages from your new microservice versions.  You can create a simple 5 node cluster on your own local development machine by downloading Azure Service Fabric using the Microsoft Web Platform Installer.

IMG_20160903_145754Jota shows us a quick demo, creating a service fabric solution within Visual Studio.  It has 2 projects within the solution, one is the actual project for your service and the other project is effectively metadata to help service fabric know how to instantiate and control your service (i.e. how many nodes within cluster etc.).  Service Fabric exposes a Reliable Services API and built on top of this is a Reliable Actors API.  It’s by implementing the interfaces from the Reliable Actors API that we create our own reliable services.  Actors operate in an asynchronous and single-threaded way.  Actors act as effectively singletons. Requests to an actor are serialized and processed one after the other and the runtime platform manages the lifetime and lifecycle of the individual actors.  Because of this, the whole system must expect that messages can be received by actors in a non-deterministic order.

Actors can implement timers (i.e. perform some action every X seconds) but “normal” timers will die if the Actor on a specific node dies and has to fail over to another node.  You can use a IActorReminder type reminder which effectively allow the same timer-based action processing but will survive and continue to work if an Actor has to failover to another node.  Jota reminds us that the Actor Model isn’t always appropriate to all circumstances and types of application development, for example, if you have some deep, long-running logic processing that must remain in memory with lots of data and state, it’s probably not suited to the Actos Model, but if your processing can be broken down into smaller, granular chunks which can handle and process the messages sent to them in any arbitrary order and you want to maximize easy scalability of your application, the Actors are a great model.  Remember, though, that since actor communicate via messages – which are passed over the network – you will have to contend with some latency.

IMG_20160903_151942Service Fabric contains an ActorProxy class.  The ActorProxy will retry failed sent messages, but there’s no “at-least-once” delivery guarantees - if you wish to ensure this, you'll need to ensure your actors are idempotent and can receive the same message multiple time.  It's also important to remember that concurrency is only turn-based, actors process messages one at a time in the order they receive them, which may not be the order they were sent in.  Jota talks about the built-in StateManager class of Service Fabric, which is how Service Fabric deals with persisting state for stateful services.  The StateManager has “"GetStateAsync and SetStateAsync methods which allow stateful actors to persist any arbitrary state (so long as it’s serializable).  One interesting observation of this is that the state is only persisted when the method that calls SetStateAsync has finished running. The state is not persisted immediately upon calling the SetStateAsync method!

Finally, Jota wraps up his talk with a brief summary.  He mentions how Service Fabric actors have behaviour and (optionally) state, are run in a performant, enterprise-ready scalable environment and are especially suited to web session state, shopping cart or any other scenarios with independent objects with their own lifetime, state and behaviour.  He does say that existing applications would probably need significant re-architecture to take advantage of Service Fabric, and that the Service Fabric API has some niggles which can be improved.

IMG_20160905_211919After João’s session, there’s time for one final quick refreshments break, which included a table full of various crisps, fruit and chocolate which had been left over from the excess lunches earlier in the afternoon as well as a lovely selection of various individually-wrapped biscuits!

Before long it was time for the final session of the day, this was Joseph Woodward’sBuilding Rich Client Applications with AngularJS2

Joe’s talk first takes us through the differences between AngularJS 1 and 2.  He states that, when AngularJS1 was first developed back in 2010, there wasn’t even any such thing as NodeJS!  AngularJS 1 was great for it’s time, but did have it’s share of problems.  It was written before ECMAScript 6/2015 was a de-facto standard in client-side scripting therefore it couldn’t benefit from classes, modules, promises or web components.  Eventually, though, the world changed and with both the introduction and ratification of ECMAScript 6 and the introduction of NodeJS, client side development was pushed forward massively.  We now had module loaders, and a component-driven approach to client-side web development, manifested by frameworks such as Facebook’s React that started to push the idea of bi-directional data flow.

IMG_20160903_155951Joe mentions how, with the advent of Angular2, it’s entire architecture is now component based.  It’s simpler too, so the controllers, scopes and directives of Angular1 are all now replaced with Components in Angular2 and the Services and Factories of Angular1 are now just Services in Angular2.  It is much more modular and has first class support for mobile, the desktop and the the web, being built on top of the EMCAScript 6 standard.

Joe mentions how Angular2 is written in Microsoft’s TypeScript language, a superset of JavaScript, that adds better type support and other benefits often found in more strongly-typed languages, such as interfaces.  He states that, since Angular2 itself is written in TypeScript, it’s best to write your own applications, which target Angular2, in TypeScript too.  Doing this allows for static analysis of your code (thus enforcing types etc.) as well as elimination of dead code via tree shaking which becomes a huge help when writing larger-scale applications.

Joe examines the Controller model used in Angular1 and talks about how controllers could communicate arbitrarily with pretty much any other controller within your application.  As your application grows larger, this becomes problematic as it becomes more difficult to reason about how events are flowing through your application.  This is especially true when trying to find the source of code that performs UI updates as these events are often cascaded through numerous layers of controllers.  In Angular2, however, this becomes much simpler as the component structure is that of a tree.  The tree is evaluated starting at the top and flowing down through the tree in a predictable manner.

IMG_20160903_160258_1In Angular2, Services take the place of the Services and Factories of Angular1 and Joe states how they’re really just JavaScript classes decorated with some additional attributes.  Joe further discusses how the very latest Release Candidate version of Angular2, RC6, has introduced the @NgModule directive.  NgModules allow you to build your application by acting as a container for a collection of services and components.  These are grouped together to for the module, from which your application can be built as a collection of one or more modules.  Joe talks about how components in Angular2 can be “nested”, allowing one parent component to contain the definition of further child components.  Data can flow between the parent and child components and this is all encapsulated from other components “outside”.

Next, Joe shows us some demos using a simple Angular2 application which displays a web page with a textbox and a number of other labels/boxes that are updated with the content of the textbox when that content changes.  The code is very simple for such a simple app, however, it shows how clearly defined and structured an Angular2 application can be.  Joe then changes the value of how many labels are created on the webpage to 10000 just to see how Angular2 copes with updating 10000 elements.  Although there’s some lag, as would be expected when performing this many independent updates, the performance isn’t too bad at all.

IMG_20160903_163209Finally, Joe talks about the future of Angular2.  The Angular team are going to improve static analysis and ensure that only used code and accessible code is included within the final minified JavaScript file.  There’ll be better tooling to allow generation of many of the “plumbing” around creating an Angular2 application as improvements around building and testing Angular2 applications.  Joe explains that this is a clear message that Angular2 is not just a framework, but a complete platform and that, although some developers are upset when Angular2 totally "changed the game" with no clear upgrade path from Aungular1, leaving a lot of A1 developers feeling left out, Google insist that Angular2 is developed in such a way that it can evolve incrementally over time as web technologies evolve and so there shouldn’t be the same kind of wholesale “break from the past” kind of re-development in the future of Angular as a platform.  Indeed, Google themselves are re-writing their AdWords product (an important product generating significant revenue for Google) using their own Dart language and using Angular2 as the platform.  And with that, Joe’s session came to an end.  He was so impressed with the size of his audience, though, that he insisted on taking a photo of us all, just to prove to his wife that we was talking to a big crowd!

After this final session of the day it was time for all the attendees to gather in the communal area for to customary “closing ceremony”.  This involved big thanks to all of the sponsors of the event as well as prize draw for numerous goodies.  Unfortunately, I didn’t win anything in the prize draws, but I’d had a brilliant time at my first DDD in Reading.  Here’s hoping that they continue the “original” DDD’s well into the future.

PANO_20160903_090157

UPDATE: Kevin O’Shaughnessy has also written a blog post reviewing his experience at DDD 11, which is an excellent read.  Apart from the session by Mark Rendle, Kevin attended entirely different sessions to me, so his review is well worth a read to get a fuller picture of the entire DDD event.

DDD North 2016 In Review

$
0
0

IMG_20161001_083505

On Saturday, 1st October 2016 at the University of Leeds, the 6th annual DDD North event was held.  After a great event last year, at the University of Sunderland in the North East, this year’s event was held in Leeds as is now customary for the event to alternate between the two locations each year.

After arriving and collecting my badge, it was a short walk to the communal area for some tea and coffee to start the day.  Unfortunately, there were no bacon butties or Danish pastries this time around, but I’d had a hearty breakfast before setting off on the journey to Leeds anyway.

The first session of the day was Pete Smith’sThe Three Problems with Software Development”.   Pete starts by talking about Conway’s Game of Life and how this game is similar to how software development often works, producing complex behaviours from simple building blocks.  Pete says how his talk will examine some “heuristics” for software development, a sort of “series of steps” for software development best practice.

IMG_20161001_093811Firstly, we look at the three problems themselves.  Problem Number 1 is about dividing and breaking down software components.  Pete tells us that this isn’t just code or software components themselves, but can also relate to people and teams and how they are “broken down”.  Problem Number 2 is how to choose effective tools, processes and approaches to your software development and Problem Number 3 is effective communication.

Looking at problem number 1 in more detail, Pete talks about “reasons for change”.  He says that we should always endeavour to keep things together that need to change together.  He shows an example of two simple web pages of lists of teachers and of students.  The ASP.NET MVC view’s mark-up for both of these view is almost identical.  As developers we’d be very tempted to abstract this into a single MVC view and only alter, using variables, the parts that differ between teachers and students, however, Pete suggests that this is not a good approach.  Fundamentally, teachers and students are not the same thing, so even if the MVC views are almost identical and we have some amount of repetition, it’s good to keep them separate – for example, if we need to add specific abilities to only one of the types of teachers or students, having two separate views makes that much easier.

Nest we look at how we can best identify reasons for change.  We should look at what parts of an application get deployed together, we should also look at the domain and the terminology used – are two different domain entities referred to by the same name?  Or two different names for the same entity?  We should consider the “ripple effect” of change – what something changes, what else has to change?  Finally, the main thing to examine is logic vs intent.  Logic is the code and behaviour and can (and should) be refactored and reused, however, intent should never be reused or refactored (in the previous example, the teachers and students were “intents” as they represent two entirely different things within the domain).

In looking at Problem Number 2 is more details, Pete says that we should promote good change practices.  We should reduce coupling at all layer in the application and the entire software development process, but don’t over-abstract.  We need to have strong test coverage for this when done in the software itself.  Not necessarily 100% test coverage, but a good suite of robust tests.  Pete says that in large organisations we should try to align teams with the reasons for change, however, in smaller organisations, this isn’t something that you’d need to worry about to much as the team will be much smaller anyway.

Next, Pete makes the strong suggestion that MVC controllers that do very little - something generally considered to be a good thing - is “considered harmful”!  What he really means is that blanket advice is considered harmful – controllers should, generally, do as little as they need to but they can be larger if they have good reasons for it.  When we’re making choices, it’s important to remain dogmatic.  Don’t forget about the trade-offs and don’t get taken in by the “new shiny” things.  Most importantly, when receiving advice, always remember the context of the advice.  Use the right tool for the job and always read differing viewpoints for any subject to gain a more rounded understanding of the problem.  Do test the limits of the approaches you take, learn from your mistakes and always focus on providing value.

In examining Problem Number 3, Pete talks about communication and how it’s often impaired due to the choice of language we use in software development.  He talks about using the same names and terminology for essentially different things.  For example, in the context of ASP.NET MVC, we have the notion of a “controller”, however, Angular also has the notion of a “controller” and they’re not the same thing.  Pete also states how terminology like “serverless architecture” is a misnomer as it’s not serverless and how “devops”, “agile” etc. mean different things to different people!  We should always say what we mean and mean what we say! 

Pete talks about how code is communication.  Code is read far more often than it’s written, so therefore code should be optimized for reading.  Pete looks at some forms of communication and states that things like face-to-face communication, pair programming and even perhaps instant messaging are often the better forms of communication rather than things like once-a-day stand-ups and email.  This is because the best forms of communication offer instant feedback.  To improve our code communication, we should eliminate implicit knowledge – such as not refactoring those teacher and student views into one view.  New programmers would expect to be able to find something like a TeacherList.cshtml file within the solution.  Doing this helps to improve discovery, enabling new people to a codebase to get up to speed more quickly.  Finally, Pete repeats his important point of focusing on refactoring the “logic” of the application and not the “intent”.

Most importantly, the best thing we can do to communicate better is to simply listen.  By listening more intently, we ensure that we have the correct information that we need and we can learn from the knowledge and experience of others.

IMG_20161001_103012After Pete’s talk it was time to head back to the communal area for more refreshments.  Tea, coffee, water and cans of coke were all available.  After suitable further watering, it was time to head back to the conference rooms for the next session.  This one was John Stovin’sThinking Functionally”.

John’s talk was held in one of the smaller rooms and was also one of the rooms located farthest away from the communal area.  After the short walk to the room, I made it there with only a few seconds to spare prior to the start of the talk, and it was standing room only in the room!

John starts his talk by mentioning how the leap from OO (Object-Oriented) programming to functional programming is similar to the leap from procedural programming to OO itself.  It’s a big paradigm shift!  John mentions how most of today’s non-functional languages are designed to closely mimic the way the computer itself processes machine code in the “von Neumann” style.  That is to say that programs are really just a big series of steps with conditions and branches along the way.  Functional programming helps in the attempt to break free from this by expressing programs as pure functions – a series of functions, similar to mathematical functions, that take an input and produce an output.

John mentions how, when writing functional programs, it’s important to try your best to keep your functions “pure”.  This means that the function should have no side-effects.  For example a function that writes something to the console is not pure, since the side-effect is the output on the console window.  John states that even throwing an exception from a function is a side-effect in itself!

IMG_20161001_104700

We should also endeavour to always keep our data immutable.  This means that we never try to assign a new value to a variable once it has already been initialized with a value – it’s a single assignment.  Write once but read many.  This helps us to reason about our data better as it improves readability and guarantees thread-safety of the data.  To change data in a functional program, we should perform an atomic “copy-and-modify” operation which creates a copy of the data,  but with our own changes applied.

In F#, most variables are immutable by default, and F# forces you to use a qualifier keyword, mutable, in order to make a variable mutable.  In C#, however, we’re not so lucky.  We can “fake” this, though, by wrapping our data in a type (class) – i.e. a money type, and only accepting values in the type’s constructor, ensuring all properties are either read-only or at least have a private setter.  Class methods that perform some operation on the data should return a whole new instance of the type.

We move on to examine how Functional Programming eradicates nulls.  Variables have to be assigned a value at declaration, and due to not being able to reassign values thanks to immutability, we can’t create a null reference.  We’re stuck with nulls in C#, but we can alleviate that somewhat via the use of such techniques as the Null Object Pattern, or even the use of an Option<T> type.  John continues saying that types are fundamental to F#.  It has real tuple and records – which are “multiplicative” types and are effectively aggregates of other existing types, created by “multiplying” those existing types together – and also discriminating unions which are “additive” types which are created by “summing” other existing types together.  For example, the “multiplicative” types aggregate or combine other types – a Tuple can contain two (or more) other types which are (e.g.) string and int, and a discriminated union, as an “additive” type, can act as the sum total of all of it’s constituent types, so a discriminated union of an int and a boolean can represent all of the possible values of an int AND all of the possible values of a boolean.

John continues with how far too much C# code is written using granular primitive types and that in F#, we’re encouraged to make all of our code based on types.  So, for example, a monetary amount shouldn’t be written as simply a variable of type decimal or float, but should be wrapped in a strong Money type, which can enforce certain constraints around how that type is used.  This is possible in C# and is something we should all try to do more of.  John then shows us some F# code declaring an F# discriminated union:

type Shape =
| Rectangle of float * float
| Circle of float

He states how this is similar to the inheritance we know in C#, but it’s not quite the same.  It’s more like set theory for types!

IMG_20161001_111727John continues by discussing pattern matching.  He says how this is much richer in F# than the kind-of equivalent if() or switch() statements in C# as pattern matching can match based upon the general “shape” of the type.  We’re told how functional programming also favours recursion over loops.  F#’s compiler has tail recursion, where the compiler can re-write the function to pass additional parameters on a recursive call and therefore negate the need to continually add accumulated values to the stack as this helps to prevent stack overflow problems.   Loops are problematic in functional programming as we need a variable for the loop counter which is traditionally re-assigned to with every iteration of the loop – something that we can’t due in F# due to variable immutability.

We continue by looking at lists and sequences.  These are very well used data structures in functional programming.  Lists are recursive structures and are either an empty list or a “head” with a list attached to it.  We iterate over the list by taking the “head” element with each pass – kind of like popping values off a stack.  Next we look at higher-order functions.  These are simply functions that take another function as a parameter, so for example, virtually all of the LINQ extension methods found in C# are higher-order functions (i.,e. .Where, .Select etc.) as these functions take a lambda function to act as a predicate.  F# has List and Seq and the built-in functions for working with these are primarily Filter() and Map().  These are also higher-order functions.  Filter takes a predicate to filter a list and Map takes a Func that transforms each list element from one type to another.

John goes on to mention Reactive Extensions for C# which is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators.  These operators are also higher-order functions and are very “functional” in their architecture.  The Reactive Extensions (Rx) allow composability over events and are great for both UI code and processing data streams.

IMG_20161001_113323John then moves on to discuss Railway-oriented programming.  This is a concept whereby all functions both accept and return a type which is of type Result<TSuccess, TFailure>.  All functions return a “Result<T,K>” type which “contains” a type that indicates either success or failure.  Functions are then composable based upon the types returned, and execution path through code can be modified based upon the resulting outcome of prior functions.

Using such techniques as Railway-oriented programming, along with the other inherent features of F#, such as a lack of null values and immutability means that frequently programs are far easier to reason about in F# than the equivalent program written in C#.  This is especially true for multi-threaded programs.

Finally, John recaps by stating that functional languages give a level of abstraction above the von Neumann architecture of the underlying machine.  This is perhaps one of the major reasons that FP is gaining ground in recent years as machine are now powerful enough to allow this (previously, old-school LISP programs – LISP being one of the very first functional languages originally design back in 1958 - often used purpose built machines to run LISP sufficiently well).  John recommends a few resources for further reading – one is the F# for Fun and Profit website.

After John’s session, it was time for a further break and additional refreshment.  Since John’s session had been in a small room and one which was farthest away from the communal area where the refreshments where, and given that my next session was still in this very same conference room, I decided that I’d stay where I was and await the next session, which was Matteo Emili’s “I Read The Phoenix Project And I Loved It. Now What?”

IMG_20161001_115558

Matteo’s session was all about introducing a “devops” culture into somewhere that doesn’t yet have such a culture.  The Phoenix Project is a development “novel” which tells a story of doing just such a thing.  Matteo starts by mentioning The Phoenix Project book and how it’s a great book.  I  must concur that the book is very good, having read it myself only a few weeks before attending DDD North.  Matteo that asks that, if we’d read the book and would like to implement it’s ideas into our own places of work, we should be very careful.  It’s not so simple, and you can’t change an entire company overnight, but you can start to make small steps towards the end goal.

There are three critical concepts that cause failure and a breakdown in an effective devops culture.  They are bottlenecks, lack of communication and boundaries between departments.  In order to start with the introduction of a devops culture, you need to start “out-of-band”.  This means you’ll need to do something yourself, without the backing of your team, in order to prove a specific hypothesis.  Only when you’re sure something will work should you then introduce the idea to the team.

Starting with bottlenecks, the best way to eliminate them is to automate everything that can be automated.  This reduces human error, is entirely repeatable, and importantly frees up time and people for other, more important, tasks.  Matteo reminds us that we can’t change what we can’t measure and in the loop of “build-measure-learn”, the most important aspect is measure.  We measure by gathering metrics on our automations and our process using logging and telemetry and it’s only from these metrics will we know whether we’re really heading in the right direction and what is really “broken” or needs improvement.  We should gather insights from our users as well by utilising such tools and software as Google Analytics, New Relic, Splunk& HockeyApp for example.  Doing this leads to evidence based management allowing you to use real world numbers to drive change.

IMG_20161001_120736

Matteo explains that resource utilisation is key.  Don’t bring a whole new change management process out of the blue.  Use small changes that generate big wins and this is frequently done “out-of-band”.  One simple thing that can be done to help break down boundaries between areas of the company is a company-wide “stand up”.  Do this once a week, and limit it to 1-2 minutes per functional area.  This greatly improves communication and helps areas understand each other better.  The implementation of automation and the eradication of boundaries form the basis of the road to continuous delivery. 

We should ensure that our applications are properly packaged to allow such automation.  MSDeploy is such a tool to help enable this.  It’s an old technology, having first been released around 2003, but it’s seeing a modern resurgence as it can be heavily utilised with Azure.  Use an infrastructure-as-code approach.  Virtual Machines, Servers, Network topology etc. should all be scripted and version controlled.  This allows automation.  This is fair easy to achieve with cloud-based infrastructure in Azure by using Azure ARM or by using AWS CloudFormation with Amazon Web Services.  Some options for achieving the same thing with on-premise infrastructure are Chef, Puppet or even Powershell Desired State Configuration.  Databases are often neglected with regard to DevOps scenarios, however, by using version control and performing small, incremental changes to database infrastructure and the usage of packages (such as SQL Server’s DACPAC files), this can help to move Database Lifecycle Management into a DevOps/continuous delivery environment.

This brings us to testing.  We should use test suites to ensure our scripts and automation is correct and we must remember the golden rule.  If something is going to fail, it must fail fast.  Automated and manual testing should be used to ensure this.  Accountability is important so tests are critical to the product, and remediation (recovery from failure) should be something that is also automated.

Matteo summarises, start with changing people first, then change the processes and the tools will follow.  Remember, automation, automation, automation!  Finally, tackle the broader technical side and blend individual competencies to the real world requirements of the teams and the overall business.

IMG_20161001_083329After Matteo’s session, it was time for lunch.  All of the attendees reconvened in the communal area where we were treated to a selection of sandwiches and packets of crisps.  After selecting my lunch, I found a vacant spot in the corner of the rather small communal area (which easily filled to capacity once all of the different sessions had finished and all of the conference’s attendees descending on the same space) to eat it.  Since lunch break was 1.5 hours and I’d eaten my lunch within the first 20 minutes, I decided to step outside to grab some fresh air.  It was at this point I remembered a rather excellent little pub just 2 minutes walk down the road from the university venue hosting the conference.  Well, never one to pass up the opportunity of a nice pint of real ale, I heading off down the road to The Pack Horse.

IMG_20161001_133433Once inside, I treated myself to lovely pint of Laguna Seca from a local brewery, Burley Street Brewhouse, and settled down in the quiet pub to enjoy my pint and reflect on the morning’s sessions.  During the lunch break, there are usually some grok talks being held, which are are 10-15 minute long “lightning” talks, which attendees can watch whilst they enjoy their lunch.  Since DDD North was held very close to the previous DDD Reading event (only a matter of a few weeks apart) and since the organisers were largely the same for both events, I had heard that the grok talks would be largely the same as those that had taken place, and which I’d already seen, at DDD Reading only a matter of weeks prior.  Due to this, I decided the pub was a more attractive option over the lunch time break!

After slowly drinking and savouring my pint, it was time to head back to the university’s mechanical engineering department and to the afternoon sessions of DDD North 2016.

The afternoon’s first session was, luckily, in one of the “main” lecture halls of the venue, so I didn’t have too far to travel to take my seat for Bart Read’sHow To Speed Up .NET & SQL Server Apps”.

Bart’s session is al about performance.  Performance of our application’s code and performance of the databases that underlie our application.  Bart starts by introducing himself and states that, amongst other things, he was previously an employee of Red Gate, who make quite a number of SQL Server tools so paying close attention to performance monitoring in something that Bart has done for much of his career.

IMG_20161001_142359He states that we need to start with measurement.  Without this, we can’t possibly know where issues are occurring within our application.  Surprisingly, Bart does say that when starting to measure a database-driven application, many of the worst areas are not within the code itself, and are almost always down in the database layer.  This may be from an errant query or general lack of helpful database additions (better indexes etc.)

Bart mentions the tools that he himself uses as part of his general “toolbox” for performance analysis of an application stack.  ANTS Memory Profiler from Red Gate will help analyse memory consumption issues.  dotMemory from JetBrains is another good choice in the same area.  ANTS Performance Profiler from Red Gate will help analyse the performance of .NET code and monitor it’s CPU consumption.  Again, JetBrains have dotTrace in the same space.  There’s also the lesser known .NET Memory Profiler which is a good option.  For network monitoring, Bart uses Wireshark.  For general testing tools, Bart recommends BlazeMeter (for load testing) and Neustar.

Bart also stresses the importance of the ongoing usage of production monitoring tools.  Services such as New Relic, AppDynamics etc. can provide ongoing metrics for your running application when it’s live in production and are invaluable to understand exactly how your application is behaving in a production environment.

arithabortBart shares a very handy tip regarding usage of SQL Server Management Studio for general debugging of SQL Server queries.  He states that we should always UNCHECK the SET ARITHABORT option inside SSMS’s options menu.  Doing this prevents SQL Server from aborting any queries that perform arithmetic overflows or divide-by-zero operations, meaning that your query will continue to run, giving you a much clearer picture of what the query is actually doing (and how long it takes to run).

From here, Bart shares with us 3 different real-world performance scenarios that he has been involved in, how he went about diagnosing the performance issues and how he fixed them.

The first scenario was helping a client’s customer support team who were struggling as it was taking them 40 seconds to retrieve one of their customer’s details from their system when on a support phone call.  The architecture of the application was a ASP.NET MVC web application in C# and using NHibernate to talk to 2 different SQL Server instances - one server was a primary and the other, a linked server.

Bart started by using ANTS Performance Profiler on the web layer and was able to highlight “hotspots” of slow running code, precisely in the area where the application was calling out to the database.  From here, Bart could see that one of the SQL queries was taking 9 seconds to complete.  After capturing the exact SQL statement that was being sent to the database, it was time to fire up SSMS and use SQL Server Profiler in order to run that SQL statement and gain further insight into why it was taking so long to run.

IMG_20161001_144719After some analysis, Bart discovered that there was a database View on the primary SQL Server that was pulling data from a table on the linked server.  Further, there was no filtering on the data pulled from the linked server, only filtering on the final result set after multiple tables of data had been combined.  This meant that the entire table’s data from the linked server was being pulled across the network to the primary server before any filtering was applied, even though not all of the data was required (the filtering discarded most of it).  To resolve the problem, Bart added a simple WHERE clause to the data that was being selected from the linked server’s table and the execution time of the query went from 9 seconds to only 100 milliseconds!

Bart moves on to tell us about the second scenario.   This one had a very similar application architecture as the first scenario, but the problem here was a creeping increase in memory usage of the application over time.  As the memory increased, so the performance of the application decreased and this was due to the .NET garbage collector having to examine more and more memory in order to determine which objects to garbage collect.  This examination of memory takes time.  For this scenario, Bart used ANTS Memory Profiler to show specific objects that were leaking memory.  After some analysis, he found it was down to a DI (dependency injection) container (in this case, Windsor) having an incorrect lifecycle setting for objects that it created and thus these objects were not cleaned up as efficiently as they should have been.  The resolution was to simply configure the DI container to correctly dispose of unneeded objects and the excessive memory consumption disappeared.

IMG_20161001_150655From here, we move onto the third scenario.  This was a multi-tenanted application where each customer had their own database.  It was an ASP.NET Web application but used a custom ADO layer written in C++ to access the database.  Bart spares us the details, but tells us that the problem was ultimately down to locking, blocking and deadlocking in the database.  Bart uses this to remind us of the various concurrency levels in SQL Server.  There’s object level concurrency and row level concurrency, and when many people are trying to read a row that’s concurrently being written to, deadlocks can occur.  There’s many different solution available for this and one such solution is to use a READ COMMITED SNAPSHOT isolation level on the database.  This uses TempDB to help “scale” the demands against the database, so it’s important that the TempDB is stored on a fast storage medium (a fast SSD drive for example).  The best solution is a more disciplined ordering of object access and this is usually implemented with a Unit Of Work pattern, but Bart tells us that this is difficult to achieve with SQL Server.

Finally, Bart tells us all about scenario number four.  The fundamental problem with this scenario was networking, and more specifically it was down to network latency that was killing the application’s performance.  The application architecture here was not a problem as the application was using Virtual Machines running on VMWare’s vSphere with lots and lots of CPU and Memory to spare.  The SQL Server was running on bare metal to ensure performance of the database layer.  Bart noticed that the problem manifested itself when certain queries were run.  Most of the time, the query would complete in less than 100ms, but occasionally spikes of 500-600ms could be seen when running the exact same query.  To diagnose this issue, Bart used WireShark on both ends of the network, that is to say on the application server where the query originated and on the database server where the data was stored, however, as soon as Wireshark was attached to the network, the performance problem disappeared!

This ultimately turned out to be an incorrect setting on the virtual NIC as Bart could see the the SQL Server was sending results back to the client in only 1ms, however, it was a full 500ms to receive the results when measured from the client (application) side of the network link.  It was disabling the “receive side coalescing” setting that fixed the problem.  Wireshark itself temporarily disables this setting, hence the problem disappearing when Wireshark was attached.

IMG_20161001_152003Bart finally tells us that whilst he’s mostly a server-side performance guy, he’s made some general observations about dealing with client-side performance problems.  These are generally down to size of payload, chattiness of the client-side code, garbage collection in JavaScript code and the execution speed of JavaScript code.  He also reminds us that most performance problems in database-driven applications are usually found at the database layer, and can often be fixed with simple things like adding more relevant indexes, adding stored procedures and utilising efficient cached execution plans.

After Bart’s session, it was time for a final refreshment break before the final session of the day.  For me, the final session was Gary McClean Hall’s “DDD: the God That Failed

Gary starts his session by acknowledging that the title is a little clickbait-ish as his talk started life as a blog post he had previously written.  His talk is all about Domain Driven Design (DDD) and how he implemented DDD when he was working within the games industry.  Gary mentions that he’s the author of the book, “Adaptive Code via C#” and that when we he was working in the game industry, he had worked on the Championship Manager 2008 game.

Gary’s usage of DDD in game development started when there was a split between two companies involved in the Championship Manager series of games.  In the fall out of the split, one company kept the rights to the name, and the other company kept the codebase!  Gary was with the company that had the name but no code and they needed to re-create the game, which had previously been many years in development, in a very compressed timescale of only 12 months.

IMG_20161001_155048Gary starts with a definition of DDD.   It is modelling for complicated domains.  Gary is keen to stress the word “complicated”.  Therefore, we need to be able to identify what exactly is a complicated domain.  In order to help with this, it’s often best to create a “DDD Maturity Model” for the domain in which we’re working.  This is a series of topics which can be further expanded upon with the specifics for that topic (if any) within out domain.  The topics are:

The Domain
Domain Entity Behaviour
Decoupled Domain
Aggregate Roots
Domain Events
CQRS
Bounded Contexts
Polyglotism

By examining the topics in the list above and determining the details for those topics within our own domain, we can evaluate our domain and it’s relative complexity and thus its suitability to be modelled using DDD.

IMG_20161001_155454Gary continues by showing us a typical structure of a Visual Studio solution that purports to follow the Domain Driven Design pattern.  He states that he sees many such solutions configured this way, but it’s not really DDD and usually represent a very anaemic domain.  Anaemic domain models are collections of classes that are usually nothing more than properties with getters and setters, but little to no behaviour.  This type of model is considered an anti-pattern as they offer very low cohesion and high coupling.

If you’re working with such a domain model, you can start to fix things.  Looking for areas of the domain that can benefit from better types rather than using primitive types is a good start.  A classic example of this is a class to represent money.  Having a “money” class allows better control over the scale of the values you’re dealing with and can also encompass currency information as well.  This is preferable to simply passing values around the domain as decimals or ints.

Commonly, in the type of anaemic domain model as detailed above, there are usually repositories associated with entity models within the domain, and it’s usually a single repository per entity model.  This is also considered an anti-pattern as most entities within the domain will be heavily related and thus should be persisted together in the same transaction.  Of course, the persistence of the entity data should be abstracted from the domain model itself.

Gary then touches upon an interested subject, which is the decoupling within a DDD solution.  Our ASP.NET views have ViewModels, our domain has it’s Domain Models and the persistence (data) layer has it’s own data models.  One frequent piece of code plumbing that’s required here is extensive mapping between the various models throughout the layers of the application.  In this regard, Gary suggests reading Mark Seemann’s article, “Is layering worth the mapping?”  In this article, Mark suggests that the best way to avoid having to perform extensive mapping is to move less data around between the layers of our application.  This can sometimes be accomplished, but depending upon the nature of the application, this can be difficult to achieve.

IMG_20161001_160741_1So, looking back at the “repository-per-entity” model again, we’re reminded that it’s usually the wrong approach.  In order to determine the repositories of our domain, we need to examine the domain’s “Aggregate Roots”.  A aggregate root is the top-level object that “contains” additional other child objects within the domain.  So, for example, a class representing a Customer could be an aggregate root.  Here, the customer would have zero, one or more Order classes as children, and each Order class could have one or more OrderItems as children, with each OrderItem linking out to a Product class.  It’s also possible that the Product class could be considered an aggregate root of the domain too, as the product could be the “root” object that is retrieved within the domain, and the various order items across multiple orders for many different customers  could be retrieved as part of the product’s object graph.

To help determine the aggregate roots within our domain, we first need to examine and determine the bounded contexts.  A bounded context is a conceptually related set of objects within the domain that will work together and make sense for some of the domain’s behaviours.  For example, the customer, order, orderitem and product classes above could be considered part of a “Sales” context within the domain.  It’s important to note that a single domain entity can exist in more than one bounded context, and it’s frequently the case that the actually objects within code that represent that domain entity can be entirely different objects and classes from one bounded context to the next.  For example, within the Sales bounded context, it’s possible that only a small subset of the product data is required, therefore the Product class within the Sales bounded context has a lot less properties/data than the Product class in a different bounded context – for example, there could be a “Catalogue” context, with the Product entity as it’s aggregate root, but this Product object is different from the previous one and contains significantly more properties/data.

IMG_20161001_161509The number of different bounded contexts you have within your domain determines the domain’s breadth.  The size of the bounded contexts (i.e. the number of related objects within it) determines the domains depth.  The size of a given bounded context’s depth determines the importance of that area of the domain to the user of the application.

Bounded contexts and the aggregate roots within them will need to communicate with one another in order that behaviour within the domain can be implemented.  It’s important to ensure that aggregate roots and especially bounded contexts are not coupled to each other, so communication is performed using domain events.  Domain events are an event that is raised by one aggregate root or bounded context’s entity that is broadcast to the rest of the domain.  Other entities within other bounded contexts or aggregate roots will subscribe to the domain events that they may be interested in, in order for them to respond to actions and behaviour in other areas of the domain.  Domain events in a .NET application are frequently modelled using the built-in events and delegates functionality of the .NET framework, although there are other options available such as the Reactive Extensions library as well as specific patterns of implementation.

IMG_20161001_161830

One difficult area of most applications, and somewhere where the “pure” DDD model may break down slightly is search.  Many different applications will require the ability to search across data within the domain, and frequently search is seen as a cross-cutting concern as the result data returned can be small amounts of data from many different aggregates and bounded contexts in one amalgamated data set.  One approach that can be used to mitigate this is the CQRS– Command and Query Responsibility Segregation pattern.

Essentially, this pattern states that the models and code that we use to read data does not necessarily have to be the same models and code that we use to write data.  In fact, most of the time, these models and code should be different.  In the case of requiring a search across disparate data within the DDD-modelled domain, it’s absolutely fine to forego the strict DDD model and to create a specific “view” – this could be a database stored procedure or a database view – that retrieves the exact cross-cutting data that you need.  Doing this prevents using the DDD model to specifically create and hydrate entire aggregate roots of object graphs (possibly across multiple different bounded contexts) as this is something that could be a very expensive operation as most of the retrieved data wouldn’t be required.

Gary reminds us that DDD aggregates can still be painful when using a relational database as the persistence storage due to the impedance mismatch of the domain models in code and the tables within the database.  It’s worth examining Document databases or Graph databases as the persistent storage as these can often be a better choice. 

Finally, we learn that DDD is frequently not justified in applications that are largely CRUD based or for applications that make very extensive use of data queries and reports (especially with custom result sets).  Therefore, DDD is mostly appropriate for those applications that have to model a genuinely complex domain with specific and complex domain objects and behaviours and where a DDD approach can deliver real value.

IMG_20161001_165949After Gary’s session was over, it was time for all of the attendees to gather in the largest of the conference rooms for the final wrap-up.  There were only a few prize give-aways on this occasion, and after those were awarded to the lucky attendees who had their feedback forms drawn at random, it was time to thank the great sponsors of the event, without whom there simply wouldn’t be a DDD North.

I’d had a great time at yet another fantastic DDD event, and am already looking forward to the next one!

Multiple SSH Keys for the same host with PuTTY & Pageant

$
0
0

I’ve been using SSH to access by various source code repositories for quite some time now.  I’ve always used PuTTY and the related tools of Plink and Pageant in order to connect to my various online providers (mainly BitBucket and Github).  Until now, I’ve only ever needed one SSH Key per provider (or “host”) however, I recently started a new job whereby I needed to connect to two different BitBucket accounts, using two different SSH Keys.

As the two SSH Keys are connecting to the same host, it’s not possible to simply load both of the keys into Pageant and go from there as only the first key loaded will be sent to a given host.  If the account you’re trying to connect to uses the other SSH Key, Pageant will send the first (incorrect) key and your connection will fail.

The way to ensure the correct key is sent is by creating multiple “sessions” within PuTTY itself.

Here’s the steps to create a “session” within PuTTY (which Plink and Pageant will honor it you’re using the correct “host” alias – see later):

  1. putty1Start PuTTY
  2. Type in the relevant “real” host name in the Host Name field (i.e. bitbucket.org or github.com)
  3. Navigate to the Connection > SSH > Auth node in the treeview.
  4. Specify the correct private key file in the “Private Key File for Authentication” section (this is the same key that you’d load into Pageant).
  5. Navigate back to the “Session” node in the treeview.
  6. Type a “host alias” name in the “Saved Sessions” box and click Save.

You can repeat the above steps for as many different keys you wish to add.  You can have multiple “sessions” using the same Host Name, just give each of them a different “Saved Session” name.

Once PuTTY is configured in this way, you will continue to load Pageant and load in each of the keys that you’ll want “cached”, just as you did before.

The key to making this now work is in the Remote URL that you’ll use for your repositories.

Whereas the “standard” SSH URL would look like this:

ssh://git@bitbucket.org/craigtp/stampver.git

you simply replace the actual host (in the above example, it’s bitbucket.org) with the Saved Session name (aka “host alias”) that you entered in PuTTY (in the example from the animated gif on the right, I used “bitbucket-craig”).  So you remote host URL for your source repository becomes:

ssh://git@bitbucket-craig/craigtp/stampver.git

Of course, this works for both Mercurial and Git repositories on any actual remote host.  So long as you use the host alias, Pageant and the PLink program that acts as a “bridge”  between Pageant and PuTTY will use the host alias in the URL to both look up the actual host to connect to and to identify the correct private key file to send for the given alias.  This is the PuTTY/Pageant equivalent of OpenSSH’s IdentityFile, which performs the same function.

Complex custom validation with ASP.NET MVC, jQuery Unobtrusive Validation & KnockoutJS

$
0
0

When developing ASP.NET line of business web applications, one very common requirement is to perform validation of user input.  In a modern, responsive web application this validation is expected to be performed on the client-side, using JavaScript, as well as on the server-side using standard C# code.  ASP.NET MVC ships with jQuery as a standard library and also includes a validation library called jQuery Unobtrusive Validation (latest repository is here), which is an open-source, Microsoft specific add-on to the jQuery Validation plugin.

It's very easy easy to enable jQuery Unobtrusive Validation in an ASP.NET MVC 3+ application.  You simply add the following settings into your web.config:

<configuration><appSettings><add key="ClientValidationEnabled" value="true"/><add key="UnobtrusiveJavaScriptEnabled" value="true"/></appSettings></configuration>

Now, out of the box, jQuery unobtrusive validation allows you to simply decorate your C# model class properties with attributes from the System.ComponentModel.DataAnnotations namespace, like so:

public class MyViewModel
{
	[Required]
	[MaxLength(250)]
	public string Name { get; set; }

	[MaxLength(1000)]
	public string Description { get; set; }
}

Here, we're saying that the Name property is both required and has a maximum length of 250 characters.  The Description property isn't required, but does have a maximum length of 1000 characters if it's supplied.

jQuery Unobtrusive Validation has out-of-the-box implementations of a number of Data Annotation validation attributes, which all derive from the ValidationAttribute class.  Some of these are:

  • Required (ensures the input element is provided and not left empty)
  • MinLength (ensures the text-based input element has a minimum amount of characters entered)
  • MaxLength (ensures the text-based input element has a no more than a maximum amount of characters entered)
  • Range (ensures that the numeric-based input element is within a certain range of values)
  • RegularExpression (ensures that the input elements value conforms to a provided regular expression)

Note that there are more attributes available within the System.ComponentModel.DataAnnotations namespace, however, not all of them have a default implementation within jQuery Unobtrusive Validation.

When the View Model is rendered out to the web page, ASP.NET MVC and jQuery Unobtrusive Validation will render HTML mark-up similar to the following:

<input  id="myField" data-val="true"
		data-val-maxlength="The field Name must not be longer than 250 characters."
		data-val-maxlength-max="250"
		data-val-required="The Name field is required."
		placeholder="Enter Name..." 
		type="text"
		value=""><span data-valmsg-for="myField" data-valmsg-replace="true"></span>

Here we can see the various data-* attributes being rendered on the input element.  The Unobtrusive Validation JavaScript looks for all elements that are decorated with data-val-* attributes and uses them to perform client-side validation complete with showing/hiding relevant error messages, for example:

Now, this is all well and good when dealing with the most basic of validation requirements, however, there is frequently a need for validation to encompass far more elaborate business rules and complex logic.  Such logic often involves performing certain validation operations not just on a single granular property, but on multiple properties of the model.  It is, unfortunately, at this point that the out-of-the-box jQuery Unobtrusive Validation falls short and we must rely on custom developed validation for this.

Moreover, jQuery Unobtrusive Validation is largely geared towards rendering a web page / form that contains a set amount of input elements when first loaded.  Again, lots of line of business applications frequently have "dynamic" elements added to the form at run-time.  This is most often used when the user of the application is expected to build up a "table" of data, like so:

In the animation above, we can see that new rows are able to be added to the "Rates" table.  We can also see that there's validation on each row of the table, whereby the Owner Writer Share and Net Publisher Share text boxes, which each contain a numeric percentage value, have to add up to 100.

Such user interfaces and forms are not uncommon in a line of business application, but the validation required for this goes way beyond that which the jQuery Unobtrusive Validation offers out of the box.  Developing custom validation for such a form often requires juggling a number of frameworks or libraries and setting up and configuring things "just right" to ensure everything works as expected.  In order to make sense of the various pieces that need to fit together, we'll take a look at each of them one by one.

Creating our custom validator

The first part requires us to create our custom validator.  This requires two steps.  The first step is to create the C# class which is a custom attribute and implements the server-side validation logic, and the second step is to implement a JavaScript function which "mimics" the C# class and provides the same validation logic to the client browser.

Server-side implementation

The C# class is a custom attribute deriving from ValidationAttribute class (for server-side implementation) and also implementing the IClientValidatable interface (in order to "expose" this custom validation logic to the client-side validation framework).   Here's our C# class:

    [AttributeUsage(AttributeTargets.Property)]
    public class MustAddUpToAttribute : ValidationAttribute, IClientValidatable
    {
        public string OtherPropertyName { get; private set; }
        public double Amount { get; private set; }

        public MustAddUpToAttribute(double amount, string otherPropertyName, string errorMessage) : base(errorMessage)
        {
            Amount = amount;
            OtherPropertyName = otherPropertyName;
        }

        protected override ValidationResult IsValid(object value, ValidationContext validationContext)
        {
            double thisValue;
            double otherValue;
            PropertyInfo propertyInfo;
            try
            {
                thisValue = Convert.ToDouble(value);
            }
            catch(Exception ex)
            {
                throw new InvalidOperationException("Property Type is not numeric", ex);
            }
            try
            {
                propertyInfo = validationContext.ObjectType.GetProperty(OtherPropertyName);
            }
            catch (ArgumentException ex)
            {
                throw new InvalidOperationException("Other property not found", ex);
            }
            try
            {
                otherValue = Convert.ToDouble(propertyInfo.GetValue(validationContext.ObjectInstance, null));
            }
            catch (Exception ex)
            {
                throw new InvalidOperationException("Other property type is not numeric", ex);
            }
            var sumOfValues = thisValue + otherValue;
            return NumericHelpers.NearlyEqual(sumOfValues, Amount, double.Epsilon) ? ValidationResult.Success : new ValidationResult(this.FormatErrorMessage(validationContext.DisplayName));
        }

        public override string FormatErrorMessage(string name)
        {
            return string.Format(ErrorMessageString, name, OtherPropertyName);
        }
        
        public IEnumerable<ModelClientValidationRule> GetClientValidationRules(ModelMetadata metadata, ControllerContext context)
        {
            var rule = new ModelClientValidationRule
            {
                ErrorMessage = FormatErrorMessage(metadata.DisplayName),
                ValidationType = "mustaddupto"
            };
            rule.ValidationParameters.Add("otherpropertyname", OtherPropertyName);
            rule.ValidationParameters.Add("amount", Amount);
            yield return rule;
        }
    }

(Note that this class relies on a "NumericHelper" to determine near equality - see here for the implementation).

Also note that the .NET Framework provides an IValidateableObject interface, which isn't used here.  The IValidateableObject interface is used for model-level validation - i.e. ensuring an entire object is valid, rather than one or two properties.  Moreover, the IValidateableObject can only be used server-side with no client-side equivalent (client-side model validation must be performed and managed with bespoke JavaScript functions), therefore, it's often unused.

Our C# class takes three constructor parameters, the amount that the calculation must add up to, the name of the "other property" that we'll get the value from to add to the value of the property decorated with the attribute and an optional error message.  The custom attribute is used similar to this:

public class Rate
{
	[Required]
	public float OwnerWriterShare { get; set; }

	[Required]
	[MustAddUpTo(100, "OwnerWriterShare", "Owner Writer Share and Net Publisher Share must add up to 100.")]
	public float NetPublisherShare { get; set; }
}

Note how we use a string based reference to the "OwnerWriterShare" property name to tell the MustAddUpTo attribute that we're getting the value of the NetPublisherShare property (the property to which the attribute is applied) and then to get the value of the OwnerWriterShare property, add them together and determine if the total adds up to the total we require - the first parameter to the MustAddUpTo attribute. Also note that the validation is using both the OwnerWriterShare and NetPublisherShare properties, but the attribute itself is only applied to the NetPublisherShare property.  This ensures we're not "doubling up" by effectively having the same validation on both properties which would result in two copies of the same error message on the client. 

So, this C# class gives us our server-side implementation, and includes the GetClientValidationRules method from the IClientValidatable interface.  The implementation of this method requires returning a ModelClientValidationRule object with the relevant properties set.  The ModelClientValidationRule effectively tells the jQuery Unobtrusive Validation framework which client-side implementation it should use along with the relevant parameters required to perform the validation.  We'll see the implementation shortly, but the main code line here is: 

ValidationType = "mustaddupto"

The value of ValidationTypemust always be lower case, and will be the name of our JavaScript function that will implement the client-side validation logic.

Client-side implementation

Having written our C# server-side class, we now must implement the same validation logic on the client-side, which means writing JavaScript.  The JavaScript function name will be mustaddupto.  Here's the JavaScript code that implements our MustAddUpTo validator:

// ================================================================================================
// MustAddUpToAttribute
// ================================================================================================
$.validator.addMethod("mustaddupto", function (value, element, params) {
    // Get the value of the element that the validation is assigned to and
    // the value of the "other" element that we're comparing against.
    var thisValue = value;
    var otherValue = $('#' + params.otherpropertyname).val();
    var amount = params.amount;

    // If either this or the other value is null or "empty" simply return true
    // as we can't perform validation.  Other validation can assert not null/emptiness.
    if (thisValue == null || thisValue.length === 0) {
        return true;
    }
    if (otherValue == null || otherValue.length === 0) {
        return true;
    }

    // Check we're dealing with numbers.
    if ($.isNumeric(value) && $.isNumeric(value)) {
        var thisValueNumber = parseFloat(thisValue);
        var otherValueNumber = parseFloat(otherValue);
        var sumTotal = thisValueNumber + otherValueNumber;
        return nearlyEqual(sumTotal, amount);
    }

    // If we get here, we're dealing with a data type that we don't recognise
    // and can't process, so let the validation pass.
    return true;
});

$.validator.unobtrusive.adapters.add("mustaddupto", ["otherpropertyname", "amount"], function (options) {
    options.rules["mustaddupto"] = options.params;
    options.messages["mustaddupto"] = options.message;
});


// Supporting functions
function nearlyEqual(a, b) {
    if (a == b) return true;
    var diff = Math.abs(a - b);
    if (diff < 4.94065645841247E-320) return true;
    a = Math.abs(a);
    b = Math.abs(b);
    var smallest = (b < a) ? b : a;
    return diff < smallest * 1e-12;
}
// ================================================================================================

The bulk of the function is plain old vanilla JavaScript, but let's examine the first few lines of each block of code in more detail as they're the most interesting parts.  Firstly, we use the addMethod method of the validator object that's attached to the jQuery object in order to create and add our validation method to the jQuery Validation framework.  Note that this isn't the Microsoft-specific jQuery Unobtrusive Validation framework, just the jQuery Validation framework, which is the underlying framework for Microsoft's Unobtrusive Validation framework.  Once we've added our validation method to the jQuery Validation Framework, we'll hook it up to the Unobtrusive Validation library too by adding an "adapter" to the unobtrusive object (this is the line that begins $.validator.unobtrusive.adapters.add).  Firstly, we pass in the name first ("mustaddupto" - this the same name for the ValidationType property of the ModelClientValidationRule that we expose from our server-side implementation) and then a function which takes 3 arguments, value, element and params.  The value parameter is the value of the "property" (or rather element since this is client-side) to which the validation is attached - it's the same as the server-side model property that has the attribute on it.  The element parameter is the actual element that has the validation applied to it.  In our case here, we're only interested in the value itself as the framework will take care of visually highlighting the correct web page element that fails validation, so we don't need to concern ourselves with it here.  What we DO need, though, is the "other" element that we'll need to grab the value of in order to add the two values together.  That's passed in the params object, which is the same object as exposed by the ModelClientValidationRule from the server.  So, the JavaScript params object used in these lines:

var otherValue = $('#' + params.otherpropertyname).val();
var amount = params.amount;

has come from this section of the server-side code:

rule.ValidationParameters.Add("otherpropertyname", OtherPropertyName);
rule.ValidationParameters.Add("amount", Amount);

After having set up and registered our validation function with the jQuery Validation library, we then need to also tell the jQuery Unobtrusive Validation framework all about it's existence too.   We do this with this section of the JavaScript code:

$.validator.unobtrusive.adapters.add("mustaddupto", ["otherpropertyname", "amount"], function (options) {
    options.rules["mustaddupto"] = options.params;
    options.messages["mustaddupto"] = options.message;
});

The jQuery Unobtrusive Validation library has a collection of "adapters" which allow us to register (add) our jQuery Validation method to be used by the unobtrusive validation framework.  Again, we can see that the key parts of this are the name ("mustaddupto") - which is the same as the main JavaScript function name and the name exposed from the server, as well as an array of the string param names ("otherpropertyname" and "amount").

With the C# class written and the attribute applied to our Model class, and the JavaScript function written and registered with both the jQuery Validation and jQuery Unobtrusive Validation libraries, we can move on to the next step...

Adding dynamic elements to the page

In order to dynamically add new elements to a web page without requiring a full page reload, we will be required to use JavaScript or some JavaScript-based framework in order to add elements into the page DOM after the page is initially rendered.  One such framework that actually makes this kind of thing a lot easier to manage is KnockoutJS.  Although it's beyond the scope of this post to go into detail on KnockoutJS, it's sufficient to say that Knockout takes a JSON view model representing your form data and deals with the two-way binding, rendering and updating of both existing and new page elements with the underlying data.  This makes it very easy to dynamically add new elements onto a form, for example, adding new rows to a table of data.

We can expose our server generated viewmodel to the client-side JavaScript with a single line like this:

var koViewModel = ko.mapping.fromJS(@Html.Raw(JsonConvert.SerializeObject(Model)));

The code above uses the Knockout Mapping plugin to take our serialized Model and convert it into a Knockout view model, complete with the required observables to facilitate two-way data binding.  Knockout can then work by "binding" values from the underlying view model to page elements that are decorated with appropriate data-bind attributes, such as this:

<input data-bind="value: MyViewModel.OwnerWriterShare" />

The data-bind attribute tells Knockout to bind the value of the input element to the value of the OwnerWriterShare property of the JSON object, MyViewModel.  This binding is two-way, so changes to the model's value will have Knockout updating the input element, and changes to the input element will have knockout updating the underlying model's value.

This is fine for existing page elements, but Knockout also makes it easy to bind to an array within the underlying JSON model.  This means that as new elements are added to the array, entirely new sets of page elements are created (or destroyed) as the array grows or shrinks.  So, if we have this JSON view model:

{
	"MyViewModel" : {
		"Rates" : [{
				"OwnerWriterShare" : "50",
				"NetPublisherShare" : "50"
			}, {
				"OwnerWriterShare" : "60",
				"NetPublisherShare" : "40"
			}
		]
	}
}

And the following definition of a table within our page markup:

<table><tbody data-bind="foreach: MyViewModel.Rates"><tr><td><input data-bind="value: $data.OwnerWriterShare" /></td><td><input data-bind="value: $data.NetPublisherShare" /></td></tr>    </tbody></table>

Knockout will repeat the contents of the tbody for each element in the MyViewModel.Rates array due to the foreach: binding.  This is great for building up our additional <tr> table rows dynamically, with each element in each row bound to the correct underlying JSON property, managed by KnockoutJS, thanks to the Knockout data-bind attribute.  Allowing users to add a new row to a table is as simple as adding a small amount of JavaScript that simply adds a new element to the underlying view model array, and the Knockout foreach binding will deal with creating the necessary additional page elements.  However, we have further work to do in order to add our validation to the dynamically generated elements, and this involves specifying additional data-* attributes needed for the validation to work, and for that, we'll also require Knockout's help...

Adding validation to dynamically added elements

So, we've got our custom validation functions written and we've wired up Knockout to help generate our page elements and performing the necessary two-way data binding, but now we need to add some additional attributes to our generated markup to ensure that the required validation is added to these elements.

Unobtrusive validation data attributes

Unobtrusive validation works by decorating our elements with data-val-* attributes.  These attributes are ignored by the browser, but JavaScript can read them and the values they contain in order to do all manner of interesting things.  In looking at our rendered input element from earlier:

<input data-bind="value: MyViewModel.Name"
        data-val="true"
        data-val-maxlength="The field Name must not be longer than 250 characters."
        data-val-maxlength-max="250"
        data-val-required="The Name field is required."
        id="MyViewModel_Name"
        name="MyViewModel.Name"
        placeholder="Enter Name..."
        type="text"
        value="">

We can see that as well as the Knockout specific data-bind attribute, we also have a number of attributes the start data-val.  It's these attributes that control the application of validation for the element by the jQuery Unobtrusive Validation library.  There's a data-val attribute that contains a boolean value to indicate whether this element is subject to validation.  Then, we have a number of other attributes starting with data-val that control the actual validation itself.  We'll have an attribute named data-val-VALIDATORNAME for each validation function that is applied to the element, and if that validation required further parameters, these are provided by additional parameters with the required parameter name in the attribute name.  So, for example, the data-val-maxlength attribute contains the error message to be shown when this validation fails and the data-val-maxlength-max attribute holds the value for the maximum length allowed.  We also have a data-val-required attribute that contains the error message for failure of that particular validation, however, there's no further attributes for that validation as no further parameters are required.

Now, when we're rendering known elements from a server-side view model, we can use ASP.NET MVC's Html Helpers to ensure that all of the these necessary attributes are applied to the rendered elements for us, so we will frequently see code such as this in a Razor view:

<div class="form-group">
	@Html.LabelFor(m => m.Username, new { @class = "control-label col-md-3" })<div class="col-md-9">
		@Html.TextBoxFor(m => m.Username, new { @class = "form-control", @placeholder = "Enter Username..." })
		@Html.ValidationMessageFor(m => m.Username, string.Empty, new { @class = "text-danger" })</div></div>

The @Html.TextBoxFor and @Html.ValidationMessageFor directives are aware of any validation attributes applied to the server-side model and will ensure that the relevant data-val attributes are applied to the rendered output (the ValidationMessageFor directive ensuring that the relevant <span> is rendered to hold an validation error message, should validation for the input element fail).

However, since we're now very firmly in the realm of generating our page elements client-side rather than from the server, with the use of KnockoutJS to control dynamically adding elements to the page, we can no longer rely on Html Helpers to write out the necessary elements with the required validation attributes.  Instead, we need to do all of this manually.

Manually adding data validation attributes

So, here's an example of the markup that we'll have to manually write in order to wire up the unobtrusive validation.  We'll examine this in more detail immediately afterwards:

<table><thead><tr><th>Owner Writer Share</th><th>Net Publisher Share</th></tr></thead><tbody data-bind="foreach: MyViewModel.Rates"><tr><td><input type="number" min="0" max="100" data-input-type="percentage"
					   data-bind="value: $data.OwnerWriterShare,
							attr: { id: 'MyViewModel_Rates_' + $index() + '__OwnerWriterShare',
							name: 'MyViewModel.Rates[' + $index() + '].OwnerWriterShare' }"
					   data-val="true"
					   data-val-mustaddupto="Owner Writer Share and Net Publisher Share Percentages must add up to 100."
					   data-val-mustaddupto-amount="100"
					   data-val-number="The field Owner Writer Share must be a number."
					   data-val-required="The Owner Writer Share field is required."
					   value=""></td><td><input type="number" min="0" max="100" data-input-type="percentage"
					   data-bind="value: $data.NetPublisherShare,
							attr: { id: 'MyViewModel_Rates_' + $index() + '__NetPublisherShare',
							name: 'MyViewModel.Rates[' + $index() + '].NetPublisherShare',
							'data-val-mustaddupto-otherpropertyname': 'MyViewModel_Rates_' + $index() + '__OwnerWriterShare' }"
					   data-val="true"
					   data-val-mustaddupto="Owner Writer Share and Net Publisher Share Percentages must add up to 100."
					   data-val-mustaddupto-amount="100"
					   data-val-number="The field Net Publisher Share must be a number."
					   data-val-required="The Net Publisher Share field is required."
					   value=""></td></tr></tbody></table>

This markup represents the table that will have rows dynamically added to it at run-time by Knockout and which the two text boxes for our percentages that must both add up to 100.  We can see that many of the data-val-* attributes are simply written out in a hard-coded format (i.e. the data-val, data-val-mustaddupto, data-val-required etc. are all "static" values) and they won't vary from one row of dynamically added elements to the next.  This is all fairly simple so far, however, the Knockout specific data-bind attribute is quite special.

As well as containing the binding to the element's value, we also use the attr binding, which allows Knockout to render attributes on the element.  This is necessary for certain attributes, such as the data-val-mustaddupto-otherpropertyname attribute as this attribute needs to have a value which contains the exact id of the element that we need to examine as part of our validation.  Of course, since we're dealing with potentially many text boxes over many possible rows, these id values need to be dynamically generated and then referenced from the respective data-val-mustaddupto-otherpropertyname attribute.  This is achieved by having Knockout render those attributes for us, as well as rendering our id and name attributes on the element that requires validation.  If we examine the data-bind attributes for each of the <input> elements in the two <td>'s we can see that they both use a Knockout attr binding.  Each of the attr bindings renders both the id and the name of the input element:

attr: { id: 'MyViewModel_Rates_' + $index() + '__OwnerWriterShare', name: 'MyViewModel.Rates[' + $index() + '].OwnerWriterShare' }"

We can see that this binding also uses a special Knockout context property, $index,  which provides the attr binding with the numeric array index of the current data item being processed within the foreach loop.  The resulting markup that's rendered to the page is something similar to this:

<tr><td><input type="number" min="0" max="100" data-input-type="percentage"
			   id="MyViewModel_Rates_0__OwnerWriterShare"
			   name="MyViewModel.Rates[0].OwnerWriterShare"
			   data-val="true"
			   data-val-mustaddupto="Owner Writer Share and Net Publisher Share Percentages must add up to 100."
			   data-val-mustaddupto-amount="100"
			   data-val-number="The field Owner Writer Share must be a number."
			   data-val-required="The Owner Writer Share field is required."
			   value="50"></td><td><input type="number" min="0" max="100" data-input-type="percentage"
			   id="MyViewModel_Rates_0__NetPublisherShare"
			   name="MyViewModel.Rates[0].NetPublisherShare"
			   data-val-mustaddupto-otherpropertyname="MyViewModel_Rates_0__OwnerWriterShare"
			   data-val="true"
			   data-val-mustaddupto="Owner Writer Share and Net Publisher Share Percentages must add up to 100."
			   data-val-mustaddupto-amount="100"
			   data-val-number="The field Net Publisher Share must be a number."
			   data-val-required="The Net Publisher Share field is required."
			   value="50"></td></tr><tr><td><input type="number" min="0" max="100" data-input-type="percentage"
			   id="MyViewModel_Rates_1__OwnerWriterShare"
			   name="MyViewModel.Rates[1].OwnerWriterShare"
			   data-val="true"
			   data-val-mustaddupto="Owner Writer Share and Net Publisher Share Percentages must add up to 100."
			   data-val-mustaddupto-amount="100"
			   data-val-number="The field Owner Writer Share must be a number."
			   data-val-required="The Owner Writer Share field is required."
			   value="60"></td><td><input type="number" min="0" max="100" data-input-type="percentage"
			   id="MyViewModel_Rates_1__NetPublisherShare"
			   name="MyViewModel.Rates[1].NetPublisherShare"
			   data-val-mustaddupto-otherpropertyname="MyViewModel_Rates_1__OwnerWriterShare"
			   data-val="true"
			   data-val-mustaddupto="Owner Writer Share and Net Publisher Share Percentages must add up to 100."
			   data-val-mustaddupto-amount="100"
			   data-val-number="The field Net Publisher Share must be a number."
			   data-val-required="The Net Publisher Share field is required."
			   value="40"></td></tr>

Note how the id and name attributes contain the incrementing numeric index, giving each input element a unique id and name.  Furthermore, due to this, the second <td> of each row contains the additional data-val-mustaddupto-otherpropertyname attribute which correctly references that specific row's input element, allowing validation to be wired up to each dynamically added element in each dynamically added table row.

Ensuring validation is performed for dynamically added elements

Once we have Knockout rendering our id, name and validation specific element attributes, we're up and running with our complex validation.  Almost.  There's one final thing that needs to happen in order for the validation to work.  The jQuery Unobtrusive Validation library will initialize it's own internal cache of page elements that require validation when the page has first fully loaded.  Due to this, any page elements that are subsequently added to the page DOM by client-side script (such as our additional table rows, added by KnockoutJS) will not be included in the internal list of page elements that the Unobtrusive Validation will validate.  Therefore, we need a way of telling jQuery Unobtrusive Validation to re-build it's internal list of page elements each time we dynamically add or remove to or from the page any elements containing validation attributes.  This can be done quite simply with a short JavaScript function as follows:

function updateValidation() {
	$("#myForm").removeData("validator");
	$("#myForm").removeData("unobtrusiveValidation");
	$.validator.unobtrusive.parse("#createContract");
}

The #myForm selector represents an id of either a form, or some containing element that encompasses all of the dynamic parts of your web page.  It's important that this function is called by your own JavaScript code whenever a page element is added or removed from the page DOM.  This is usually easy enough to do as adding a table row is usually done in response to the user clicking some kind of "Add" button, which you'll usually have wired up via a Knockout click: binding to run a JavaScript function, usually defined as part of your Knockout view model.  For example, an additional <td> on our table could contain mark up similar to the following:

<td><a href="#" title="Add" data-bind="click: addRateRow"><span>Add New Rate</span></a></td>

And the addRateRow function would be called on the anchor tag's click event, adding a new row to the underlying Knockout view model - causing Knockout to re-render the table with the additional row and associated elements - and finally calling our simple updateValidation() function to ensure both jQuery Validation and jQuery Unobtrusive Validation are aware of the newly added elements:

koViewModel.addRateRow = function () {
	this.Rates.push(new Rate());
	updateValidation();
};

And with that, our custom complex jQuery Validation / Unobtrusive Validation with KnockoutJS dynamically created page elements is complete!

KnockoutJS binding on Select elements in a ForEach loop

$
0
0

In an application that I was working on recently, I had the need to bind a select element (i.e. a drop-down list) with KnockoutJS.  This is not an unusual thing to need to do with Knockout, however, this particular select element was ultimately rendered by Knockout itself as it was part of a collection of data objects, and so was within a Knockout foreach binding.

I had a collection of "rates" to choose from in one part of my Knockout View Model and needed to populate another part of the same Knockout View Model with the user's selected rate.  In this case, I wasn't simply selecting the numerical index value or some singular intrinsic data type as the value of the select element, but rather, I needed to whole underlying object.

So, our view model looks something like this:

{
	"MyViewModel" : {
		"Rates" : [{
				"Name" : "Initial Rate",
				"Percentage" : "80",
				"Categories" : [{
						"Id" : "b1fae11b-ce8d-4c72-ad4f-71dc515d0f42",
						"Name" : "Mechanical"
					}, {
						"Id" : "10e01f58-e7f5-4f2c-8d35-e2b87fc43a77",
						"Name" : "Performance"
					}
				]
			}, {
				"Name" : "Extended Rate",
				"Percentage" : "60",
				"Categories" : [{
						"Id" : "10e01f58-e7f5-4f2c-8d35-e2b87fc43a77",
						"Name" : "Performance"
					}, {
						"Id" : "d79a225b-4b76-4371-910d-47a9f3f58665",
						"Name" : "Sync"
					}
				]
			}
		],
		"OverrideRates" : []
	}
}

You can see that we're going to select one of the "Rates" from the array of Rate objects, and that each Rate object doesn't have it's own obvious unique identifier.  Once the user has selected a rate from the rates array, we want to copy the entire object into an OverrideRate object and store that in the OverrideRates array.  Our OverrideRate object will eventually look like this:

{
	"SelectedRate" : {
		"Name" : "Initial Rate",
		"Percentage" : "80",
		"Categories" : [{
				"Id" : "b1fae11b-ce8d-4c72-ad4f-71dc515d0f42",
				"Name" : "Mechanical"
			}, {
				"Id" : "10e01f58-e7f5-4f2c-8d35-e2b87fc43a77",
				"Name" : "Performance"
			}
		]
	},
	"OverridePercentage" : "90",
	"DateFrom" : "2017-01-01T00:00:00.000Z",
	"DateTo" : "2017-12-31T00:00:00.000Z"
}

So you can see here that the entire selected Rate object has been copied from the Rates array into the SelectedRate property.

In order to achieve this, I had markup similar to the following:

<table><tbody data-bind="foreach: MyViewModel.OverrideRates"><tr><td><div><select data-bind="options: $root.Rates,
							optionsCaption: 'Please select a rate...',
							optionsText: function(item) {
								return item.Name() + ' (' +
								item.Categories().map(function(elem){ return elem.Name() }).join(', ') + ')'},
							value: $data.SelectedRate"></select></div></td></tr></tbody></table>

The user was presented with a select element like this:

The Knockout binding for the select element shows that we're telling Knockout to get the options available for selection from the $root.Rates property, which is the collection of available rates (the $root prefix is a special binding context telling Knockout to access the rates object from the root level of the View Model), and further we're telling Knockout to use a special user-defined inline function to take certain string properties of our Rate object and use them to build up the actual text that the user will see inside the select element (i.e. "Initial Rate (Mechanical, Performance)").  Finally, we tell knockout that the "value" of the selected option from the select element needs to be bound to the $data.SelectedRate property.  However, there was a problem.

Two way data binding and data contexts

Knockout's data binding works by having your view model's properties actually be observable functions rather than the data that you actually want to bind.  This means that your MyViewModel.Name property, which can be two-way data-bound to a input box for editing the name string, is actually a function.  The function, when invoked with no parameters (i.e. MyViewModel.Name() ) will return the underlying data - in this case the name string, whilst invoking the function with a parameter (i.e. MyViewModel.Name("Jimmy") ) will set the underlying data to the parameter value passed in.

Whenever Knockout is binding page elements to underlying view model properties inside a loop, you use the special $data binding context prefix in order to access the current object that is currently being processed as part of the the loop.  On all other elements (for example, an input element) this works just fine, since the data that Knockout has to bind is a single value - in the case of a textbox, it's a simple string:

<tbody data-bind="foreach: MyViewModel.Rates"><tr><td data-bind="text: $data.Name"></td></tr>    </tbody>

$data gives us a reference to the current rate object within the loop, allowing us to access the Name property of the correct object.  Note that although the Name property of the object is actually a Knockout observable function, we don't need to add the parentheses at the end (i.e. $data.Name() isn't required) since Knockout's binding is clever enough to deal with that for us.

This is great for single, granular pieces of data, however, when attempting to use this binding context as part of my select element that needed not just a simple, singular value, but a whole object to be bound to it, things did work so well.

As the user was selecting a "rate" from the select element drop-down list, I wanted knockout to binding the entire Rate object to the SelectedRate property of the View Model.  This part of the data-bind attribute was intended to achieve that:

value: $data.SelectedRate

However, when examining the View Model with Google Chrome's debugging tools, I noticed that the view model's SelectedRate property was not being updated when the user changed the selection.  Bizarrely, I did notice that when Knockout performed other data binding, for example, when the user changed the value in a textbox that was related to the selected rate, the SelectedRate property was suddenly bound with the correct object!

This was quite strange behaviour, and actually turned out to be a bit of a red herring which was leading my debugging efforts awry.  After much Googling and trial and error, it turned out the issue was all down to observable functions and how they work with the data binding context.

When your $data context isn't

Turns out that when you use the $data binding context in a binding expression, $data is bound not to the Knockout observable function but to the returned value from invoking the function.  This means that $data.SelectedRate was simply returning the underlying values of the observables rather than the observable function's themselves (SelectedRate is also the underlying value since the call to $data is only returning a value, meaning that properties "hanging off" that are also not observables).  This fix is to use a different Knockout binding altogether, which is the $rawData binding context.

Knockout's own documentation describes this:

$rawData

This is the raw view model value in the current context. Usually this will be the same as $data, but if the view model provided to Knockout is wrapped in an observable, $data will be the unwrapped view model, and $rawData will be the observable itself.

 

(Emphasis above is mine).

So, we needed to use the $rawData binding context to ensure that data bound page elements, that are expected to be bound to a complex object, are correctly bound.  And that this is not required when data binding to a simple, intrinsic type (i.e. a string, number etc.)

Interestingly, a Stack Overflow answer to a similar question talks about the $data binding context only referring to the underlying value rather than the observable function, but offers a different solution, one which didn't seem to work correctly for me.

You might think that, since usage of the $rawData binding context is fully explained in the Knockout documentation, why did I not discover and resolve this issue earlier?  Well, it turns out that Knockout has a somewhat chequered history regarding its $data binding context.

Knockout broke my data binding!

Another Stack Overflow question gives us a clue to the history of Knockout's $data binding context.

It would appear that in versions of Knockout prior to Version 3.0, the $data binding context used to function the same as the current version's $rawData binding context works - that is to say that $data used to be bound to the observable function rather than the underlying value. This was entirely due to the fact that the $rawData binding context did not exist in Knockout prior to Version 3.0 and so was considered missing functionality.

With Knockout Version 3.0, the developers introduced the $rawData binding context, however, it was somewhat buggy.  In Knockout Version 3.1, they finally fixed it so that $rawData now correctly refers to the observable function (if one exists) rather than the underlying value, whilst $data continues to "unwrap" the observable and refer to the underlying data value.

Upgrading AutoMapper

$
0
0

On more than one or two occasions now, I've found myself working with a codebase that uses AutoMapper to perform mapping between objects, most commonly to map data entity objects to their equivalent domain objects.

These codebases often are using a version of AutoMapper prior to version 4.2 which is the version that AutoMapper decided to remove it's static API. I often find the need to upgrade the version of AutoMapper that's in use for any one of a number of different reasons and so I find myself having to re-write all of the AutoMapper initialisation code as the old way of creating mappings is no longer supported.

The problem here is that, in a large enough codebase, moving from purely static mappings as was done with AutoMapper prior to v4.2 to non-static mappings that would need to be either manually constructed or injected into all of the places in the codebase that would require them.  This is often a non-trivial feat and makes moving from static mappings to non-static mappings a much larger job that you might ideally like it to be.

The "proper" way to configure your AutoMaper mappings these days is write code similar to this:

var config = new MapperConfiguration(cfg => {
  cfg.CreateMap<SourceObject, DestinationObject>();
});

and then to pass that config object around, or inject it into the class that might need your mapping config.  Of course, if that class is short-lived, you'll need to re-create these mapping configs again and again.  Whilst this is undoubtedly the "best" approach and does not require any static objects, it's sometimes not immediately feasible in an a large, legacy codebase.

There is, however, a way of changing your old, static API-based, mapping configurations in such a way that they continue to use a static object - albeit not the same one as was in AutoMapper < v4.2 - such that the change can be a "drop-in" replacement without having a knock-on ripple effect through out the rest of the codebase.

The Old Way

So the old way of configuring AutoMapper mappings with the static API was something very similar to this:

public class AutomapperConfig
{
	public static void Configure()
	{
		Mapper.CreateMap<SourceObjectA, DestinationObjectA>();
		Mapper.CreateMap<SourceObjectB, DestinationObjectB>();
		Mapper.CreateMap<SourceObjectC, DestinationObjectC>();
	}
}

and the static Configure method would be called within some entry point of your program (i.e. inside the Application_Start method of the Global.asax.cs class)

The New Way

And here's the new way of performing the same thing.  A slight syntactical change to the code and it's a "drop-in" replacement.

public class AutomapperConfig
{
	public static void Configure()
	{
		Mapper.Initialize(cfg =>
		{
			cfg.CreateMap<SourceObjectA, DestinationObjectA>();
			cfg.CreateMap<SourceObjectB, DestinationObjectB>();
			cfg.CreateMap<SourceObjectC, DestinationObjectC>();
		}
	}
}

Note how we use the new Initialize method of the static Mapper object passing in a lambda that performs our mapping configurations rather than constructing a new instance of the MapperConfiguration object to pass to the relevant part of our code that needs to perform mappings.

And with that, we can easily upgrade from AutoMapper pre-v4.2 to a version post-v4.2 without the code changes being too onerous.


Entity Framework with MySQL - Booleans, Bits and "String was not recognized as a valid Boolean" errors

$
0
0

In my current role, I'm using MySQL v5.6 as the database engine for an ASP.NET MVC web application which also uses Entity Framework v6.1.3 as the ORM to connect to the backend database, doing so via the use of the MySQL .NET Connector ADO.NET driver.

It appears that there are some "interesting" issues regarding how MySQL (and the .NET Connector) handle and model boolean entity properties, modelling them as a TINYINT(1) column by default, and causing odd and spurious errors, such as the "String was not recognized as a valid Boolean" error when performing entirely innocuous Entity Framework functions within your code.

The MySQL .NET Connector contains a bug whereby database columns that are generated for any boolean property of an entity, by default, get modelled with a TINYINT(1) data type rather than the more appropriate BIT datatype.  This, in itself, isn't the bug, however the strange usage of a TINYINT(1) data type for a boolean column rather than the seemingly more appropriate BIT(1) datatype does enable the bug to subsequently raise it's head.

In using a TINYINT(1) data type by default, this causes the MySQL .NET Connector to occasionally throw an exception which upon examination contains the error string, "String was not recognized as a valid Boolean".  This occurs when performing such innocuous functionality as using a number of Entity Framework Include methods when attempting to retrieve a large enough graph of objects from the database.  For example, code such as this:

var results = dbContext.SocietyMemberships
				.Include(m => m.TenantPersona),
				.Include(m => m.Society),
				.Include(m => m.IncomeTypeGroups),
				.Include(m => m.IncomeTypeGroups.Select(itg => itg.IncomeTypeGroup)),
				.Include(m => m.IncludeTerritories),
				.Include(m => m.ExcludeTerritories);

return results.ToList();

Would result in the aforementioned exception being throw.  However, take out a few of those "includes" and the query works perfectly with no exception thrown.

It appears that the spurious exception is caused by the MySQL .NET Connector's attempt to generate the necessary SQL statement required for the query and subsequent parsing of that SQL statement would fail to parse any included TINYINT(1) columns as boolean values. In my (albeit limited) testing, it appeared that include'ing 5 or less related tables did not display the error, but Including more than 5 tables would cause the error consistently.

In order to fix the issue, we simply have to ensure that the MySQL connector, via Entity Framework, will model all Boolean entity properties as a BIT(1) data type column.  This must be done explicitly in code using code such as the following:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
	base.OnModelCreating(modelBuilder);
	// other initialization code here...
	modelBuilder.Properties<bool>().Configure(c => c.HasColumnType("bit"));
}

Performing this configuration in the OnModelCreating method of the DbContext object ensures that this configuration applies to all entities, both those already existing and any new ones created in the future,  ensuring they are modelled with a BIT(1) column data type rather than a TINYINT(1) data type.

It should be noted that the same "String was not recognized as a valid Boolean" exception can be thrown when using other Entity Framework functionality, however, I've yet to pin down exactly what code can cause the error.  I suspect that, due to the nature of the error as detailed above, there's an incredible amount of different C# code that, when translated to the MySQL SQL dialect, generates SQL of sufficient amount and complexity that the exception is thrown.

Finally, it would appear that the choice of the MySQL .NET Connector to use TINYINT(1) as the default column data type instead of BIT(1) stems from the history of the BIT data type in MySQL.  An old blog post from Baron Schwartz that documents this history and behaviour states,

MySQL has supported the BIT data type for a long time, but only as a synonym for TINYINT(1) until version 5.0.3. Once the column was created, MySQL no longer knew it had been created with BIT columns.

In version 5.0.3 a native BIT data type was introduced for MyISAM, and shortly thereafter for other storage engines as well. This type behaves very differently from TINYINT.

He goes on to indicate how this behaviour has changed over time:

The old data type behaved just like the small integer value it really was, with a range from -128 to 127. The new data type is a totally different animal. It’s not a single bit. It’s a fixed-width “bit-field value,” which can be from 1 to 64 bits wide. This means it doesn’t store a single BIT value; it’s something akin to the ENUM and SET types. The data seems to be stored as a BINARY value, even though the documentation lists it as a “numeric type,” in the same category as the other numeric types. The data isn’t treated the same as a numeric value in queries, however. Comparisons to numeric values don’t always work as expected.

This change in behavior means it’s not safe to use the BIT type in earlier versions and assume upgrades will go smoothly.

It's from this chequered history that it appears that such issues converting from "BIT" columns (that can potentially have a large range of actual values) to a binary boolean have arisen.

SqlBits 2017 In Review

$
0
0

IMG_20170408_073513This past Saturday 8th April 2017, the annual SqlBits conference took place in the International Centre in Telford, Shropshire.  The event is a four day conference, with the first three days being a paid conference and the final day, the Saturday, always being a free community day.

I’d had to get up quite early for this event, setting my alarm for 5:30am to allow me to get my all-important cup of coffee before setting off for the approximately 90 mile journey to arrive at the conference for the opening time of 7:30am!

IMG_20170408_074118After arriving at the venue, which on this occasion had free car parking in the venue’s ample car park, I heading indoors to search for the registration booth.  It appeared that I’d arrived slightly early as I joined a small crowd that was gathered in one of the venue’s corridors just outside the entrance to a large hall where the event itself was taking place.  After a short wait, we were informed that we could enter the hall and proceed to the registration booths on the way in.

After registering and receiving my badge, conference programme and goodie bag, I heading to the catering area for another all-important cup of coffee and some breakfast.

IMG_20170408_074857This year, having recently become vegetarian, I knew the almost obligatory bacon butties would not be an option, so I quickly acquired a cup of coffee (the most important thing) and searched for the vegetarian breakfast option.  The choice was of either a fruit smoothie or a fruit bowl.  I selected the fruit bowl and made my way to one of the many tables dotted around the venue to consume my breakfast and take a peek through the goodie bag!

After a short while, an announcement came over a venue wide loudspeaker system to tell the attendees that the first sessions would be starting in 10 minutes.  This year, there were eight tracks of talks, each one presented in one of eight separate domes scattered throughout the venue.  I quickly finished my coffee, collected my things and made my way to Dome 3 for my first session, Ust Oldfield’s “A Deep Dive Into Data Lakes”.

IMG_20170408_081230Ust first introduced himself as a consultant working for Adatis who provide Business Intelligence (BI) consultancy services.  He says that, as a company, they were fully conversant with standard data warehouses, but needed to move forward in order to understand the relatively recent phenomenon of data lakes on the Microsoft Azure platform.

Ust asks the audience who already knows about Data Lakes, not many of us do, so he asks if we’re familiar with Hadoop – the Apache Foundations distributed computing framework – to which a few more people are familiar.  Ust explains that, under the hood, Azure’s Data Lake is a combination of distributed Azure BLOB storage (and so can work with any file type or size) with Hadoop overlaid on top to provide the distributed compute capability.

Azure Data Lake (hereafter referred to as ADL) uses things called “Extents” to contain its data, which are 250MB blocks that all storage is divided into.  Ust explains that ADL uses the “lambda” architecture and allows users to perform computations and queries using a language called U-SQL, which he says is like a cross between T-SQL used in SQL Server and C#.  All of the files that are added to a data lake can be set to automatically expire and be deleted, and so ADL contains functionality to allow some automated maintenance of the data held within it.

All data warehouses and lakes in ADL go through three stages which are the ingestion of raw data, and enrichment phase (where the data is verified, de-duplicated, cleaned and augmented with additional data from other sources) and finally is curated and presented for user consumption.  U-SQL scripts specify how the raw input data is transformed into output data and U-SQL includes both traditional ways to select and filter raw data, similar to how SQL Server would provide such functionality, but also includes other methods of transforming data more specific to distributed data sets, such as MapReduce

ADL provides a dashboard within the Azure website where ADL can be accessed and scripts created and run against the ADL data, however, there is also Microsoft Visual Studio tooling available so that many of the ADL functions can be accessed through Visual Studio.  One very interesting feature is that U-SQL scripts that would normally be confined to running within Azure can be downloaded to a local machine and debugged using Microsoft Visual Studio and it’s important to note that some functionality of ADL can only be accessed via the ADL Visual Studio tooling.

When performing queries against ADL data, U-SQL scripts are split and parallelized into multiple “vertexes”, which are the discrete units of computation within ADL.  Each vertex can be independent or dependent upon a previous vertex completing it’s computation.  You can manage vertexes and their dependencies within ADL, but this is a piece of functionality only available within the ADL Visual Studio tooling.

Ust shows us a demo of some U-SQL queries running over some sample ADL data.  He demonstrates how, despite ADL like almost all Azure features is a pay-as-you-go service, reworking your queries to be longer running queries that use fewer ADLAU’s (Azure Data Lake Analysis Units – the discrete single compute/billing units within ADL) can actually save you a lot of money.  This is due to how ADL charges are calculated, meaning that it’s far more expensive to use more ADLAU’s than it is to use fewer but for a longer time.

Ust shows us some tips around using “partition elimination”, which is a mechanism whereby data is pre-filtered prior to being distributed and computed upon by your standard U-SQL scripts.  Partition Elimination is best implemented with a deliberately defined file naming system (i.e. MyLogs_2017_05_01.txt, MyLogs_2017_05_16.txt etc.)  Using such a mechanism, you can filter the data files to be included within the U-SQL compute based upon partial filename matches and wildcards (i.e. you could process MyLogs for the month of April 2017 with a pre-filter such as MyLogs_2017_04_**.txt).  Ust tells us some more about the ADL data and the requirements for its storage.  He says that indexes are mandatory within ADL data, but that we can only have a single clustered index on each table.  Currently, ADL does not support non-clustered indexes however this is something that may come in the future.

Finally, Ust talks about data “skew”, which is the mechanism of how your dataset is distributed throughout the cluster for computing.  Data can be split for processing based upon a round-robin technique, which guarantees an even distribution of data across all nodes in the cluster, but does not guarantee that similar data will be kept together and processed by the same node.  This can cause a performance degradation of the compute function as potentially separate nodes must communicate much more to transfer related data when it’s on multiple nodes.  The other technique for data distribution is to split the data based upon a hash.  This guarantees that related data will be kept together on the same node – thus potentially improving the compute performance – but can now no longer guarantee that the data is evenly distributed across all the nodes in the cluster.  This means that some nodes will have significantly more work to perform than other nodes which can again impact overall compute performance.  Therefore, it’s essential that you understand the general “shape” of your raw data in order to maximise the compute performance – and thus the overall cost – of your ADL service.

After Ust’s session, we had a quick coffee break and I grabbed another cup of coffee.  There was just enough time to drink my coffee and take a very quick look around the main hall of the venue before I had to make my way back to Dome 3 for the next session.  This one was Hugo Kornelis’s “Normalization Beyond Third-Normal Form”.

IMG_20170408_093319Hugo starts his sessions by reminding us of some key concepts that we need to be aware of when performing any data modelling activity.  He talks about the “Universe Of Discourse” which is the view of reality as defined by the data/software model, it’s not necessarily the view of actual reality.  We then look at the purpose of database normalization.  We recall that normalization is the process of organising our data into columns and tables in such a way as to reduce redundancy and improve data integrity.  Hugo points out that normalization’s purpose is not to prevent incorrect data but to prevent impossible, inconsistent or business rule violating data.  We can’t stop the user from entering false data into the name column, but we can prevent them from providing us with a non-date value for their birthdate.  Hugo also reminds us that normalization is never performed at a database level, only at a table level.  It’s perfectly possible to have a database that, across it’s many tables, contains multiple forms of normalization.

Next, we look at what defines normalization.  Hugo tells us that it’s based upon Functional Dependency.  This is a constraint that dictates that for every value in Column A in a relationship, there is exactly none, or one value for Column B (i.e. A –> B).  Column A can actually be a composite of multiple actual columns (i.e. {A,B} –> C) and Hugo gives the conference-specific example of a SqlBits Dome number and a chair number which can define the exact name of the attendee sitting there.  It’s possible that the composite can exist on the other side of the relationship (i.e. A –> {B,C}) however this can be reduced to two constraints of A -> B and A –> C. 

Hugo reminds us of 3rd normal form.  This is the most “popular” normal form that many people take their databases to and then stop there.  3rd normal form (3NF) states that, in a given table, every non-key column is dependent upon the key, the whole key and nothing but the key (so help me Codd!).  We can use an algorithm called “Bernstein’s Algorithm for Synthesis of a 3rd Normal Form Schema” to help us create a database schema that is guaranteed to be in 3rd normal form, so long as all of our functional dependencies are known up front.  Hugo also mentions Boyce-Codd normal form, which is based upon 3rd normal form but extends the requirement that all columns in a table, including key columns, must be dependent upon the key.  When all columns in a table are dependent upon the key, there should usually be no duplicated data within that table’s row.

Hugo proceeds by detailing something called Elementary Key normal form.  This is perhaps a little known and used type of normal form, based upon 3rd normal form but where the constraint is defined as only non-elementary columns being dependent upon the key.  So what is an non-elementary column?  Well, it’s where functional dependencies such as {A,B} –> C does not have the reverse dependencies of either C –> A or C –> B.  This can also be expressed as where every full non-trivial functional dependency of the form A –> B, then either A is a key or B is (a part of) an elementary key.  Hugo explains that, in practice, Elementary Key normal form is almost identical to 3rd normal form.

From here, Hugo takes us into the more elaborate normal forms.  We start with 4th normal form. 4th normal form, unlike the lower normal forms, is less concerned with functional dependencies, but rather with multi-valued dependencies.  These are best explained with an example.  Hugo uses a table representing the availability of experts to discuss SQL problems on given days of the week:

DayExpertSubject
MondayJimDesign
MondayJimTuning
TuesdayJimDebugging
TuesdayFredDesign

Looking at this data, we can infer the following fact: On Monday, you can ask Jim about Design.  From this fact, we can further infer two additional facts: On Monday, you can ask Jim questions, and Jim knows about Design.  In looking at the two facts that we’ve inferred, we can see that it is not possible to work backwards and infer the first original fact merely from the two subsequent facts.  This is a violation of 4th normal form.  In order to make this data compliant with 4NF, we must separate the information regarding days of the week and subjects into different tables, each table then becomes compliant with 4th normal form:

ExpertDay
JimMonday
JimTuesday
FredTuesday
 
ExpertSubject
JimDesign
JimTuning
JimDebugging
FredDesign

After 4NF we move on to look at 5th normal form.  5th normal form is based upon 4th normal form but extends the rules to dictate that there must be no “join dependencies” between the columns except based upon key.   A join dependency is effectively the ability to take a single table, split it into multiple tables and be able to recreate the original table by constructing a query that joins the split tables back into one.  In practice, a table being in 5NF effectively means that if a column has the same value in multiple rows and to remove the value from the table requires the removal of multiple columns then our table is not compliant with 5th normal form. 5NF is so closely related to 4NF that it’s very rare for a table compliant with 4NF to not also be compliant with 5NF.

Expert
Jim
Fred
 
Day
Monday
Tuesday
 
Subject
Design
Tuning
Debugging

Hugo briefly touches upon 6th normal form.  He starts by stating that 6th normal form is very hard to find in practice, being far more an academic curiosity.  6NF is based upon 5NF but further constrains the join dependencies to state that no join dependencies, even those implied by key, are allowed to exist within the table.  This effectively means that there can never be any NULL values within any columns of a 6NF table.  There would be no need for NULLs as we could simply remove the entire row.  The primary reason we don’t see tables and especially entire databases that conform to 6th normal form is that 6NF largely implies that our entire data schema is modelled using a very large number of tables with each table having only a key column and a data value column.  Today’s real-world database platforms are simply not optimised to operate with such a data schema and so data normalisation to this level is rarely, if ever, performed in the real world.

Hugo next talks about Optimal Normal Form.  This is based upon 6NF but prevents the “splitting” of tables if “elementary fact types” would be split. Elementary fact types are multiple columns that would have to remain together in a single table to ensure integrity of data.  Again, optimal normal form is very rarely found in the real-world.

IMG_20170408_105616Finally, Hugo talks about a entirely different type of data normalization, and this is known as Domain/Key Normal Form.  Domain Key Normal Form (DKNF) is not based upon functional dependencies like all other forms of normalization, but is instead based solely upon domain constrains and key constraints.  Domains in this context refers to the range of values that are allowed within the given column.  These are not the values allowed by the data type of the underlying column, but rather the values allowed by the business logic of the domain.  An example that violates DKNF could be shown as follows with a school report card and grade for students whereby the score is a value between 0 and 100, and the status of FAIL or PASS (FAIL for scores below 50):

StudentScoreStatus
James78PASS
William63PASS
David48FAIL
Timothy57PASS

From the table above, we can see that it would be possible to enter a value of FAIL in the Status column for the row containing James’ name.  The database constraints would not prevent us from doing this, however, we would be violating our business rules that state that Scores greater than or equal to 50 are a PASS status.  In order to correct this data so as not to violate DKNF, we would change it as follows by splitting into two tables:

StudentScore
James78
William63
David48
Timothy57
 
StatusMinimum ScoreMaximum Score
FAIL049
PASS50100

By splitting the data, we ensure that business logic is captured and no table data can violate the domain rules.  An interesting side-effect of complying with DKNF is that you’ll also comply with 5NF too.  The relevance of DKNF, despite being a very different form or normalization that other forms, is that data integrity against business rules can now be expressed and enforced from the database design alone, something that has traditionally been enforced only within application code that is responsible for reading and writing data to and from the database.  It should be noted, however, that compliance with DKNF isn’t always possible and depends very much on the business domain.

After this, Hugo’s session was complete and it was time for another short coffee break.  I quickly grabbed another coffee from one of the numerous catering stands dotted throughout the venue and checked my programme for the Dome that I would need to head towards for the next session.  That was Dome 4 and Conor Cunningham’s “SQL Server vNext and SQL Azure – Upcoming Features”.

IMG_20170408_111016

Conor’s talk was originally intended to be given by Lindsey Allen, however, a scheduling mix-up had resulting in Lindsay being unable to give the presentation.  Instead we were provided with some excellent content from Conor Cunningham who is the Principal Software Architect for Microsoft on the SQL Server Query Processor Team.  Conor is here to tell us all about the new features that will be coming in the upcoming versions of the on-premise SQL Server product as well as SQL Azure.

Firstly, Conor tells us that both the on-premise SQL Server product and the SQL Azure product share the exact same codebase.  SQL Azure has a monthly release cadence and so is always the first product to receive new SQL Server functionality and have that functionality available to the public whilst on-premise SQL Server currently has a release cadence of approximately 1 year and so receives the same features from SQL Azure in each of it’s subsequent public releases.

A big feature coming in SQL Server vNext (the official marketing title is not yet decided) is the ability to run it on Linux.  This isn’t just a version of SQL Server that’s specially built for Linux, but the exact same binaries that run the Windows version of SQL Server.  Microsoft has built an abstraction layer, known as a “PAL” (or Platform Abstraction Layer) which is used to align all operating system or platform specific code in one place and allow the rest of the codebase to stay operating system agnostic.  Moreover, SQL Server when run on Linux will effectively be SQL Server running inside a Docker container.  Previously, SQL Server has relied on Windows Server Failover Clustering (WSFC) to provide clustering capability to SQL Server, however, as part of the work required to allow SQL Server to run on Linux, this is being abstracted away to allow 3rd party cluster management software to be used.  Initially, SQL Server on Linux will support an open-source product called Pacemaker, however more cluster management product support will follow in time.

IMG_20170408_113255There have been big improvements within the In-Memory Tables features of SQL Server.  These improvements mean that In-Memory tables, which were previously constrained in how they operated compared to normal disk-based tables, will now operate much closer to how standard tables operate, supporting many more features including JSON support, CROSS Apply, CASE statements amongst others.

IMG_20170408_113256Another major set of improvement work within SQL Server vNext are improvements in the area of ColumnStore indexes.  ColumnStore indexes are perhaps one of the best new features to be added to SQL Server in recent years and allow potentially significant performance enhancements for queries on tables using such an index.  ColumnStore index now have support for BLOB column data types and the index itself is now compressed, reducing space and storage requirements as well as improving performance.  Further, rebuilds of ColumnStore indexes will now no longer cause significant blocking of the tables upon which the index is being rebuilt, meaning that users of the database are no longer severely negatively impacted by such rebuilds.

IMG_20170408_114604SQL Server vNext also includes advancements in “Adaptive Query Processing”.  This is a major new area of functionality in SQL Server and will receive even more improvements in future versions of SQL Server beyond vNext.  Adaptive Query Processing is a series of algorithms that work within SQL Server’s Query Processor in order to improve query performance by analysing query plans, SQL Server data and other meta-data.  It aims to improve query performance without introducing any query degradation from incorrect query plan optimisations.  It does this dynamically adjusting joins (i.e. switching from hash joins to merge or loop joins, or vice-versa), adjusting memory grants in order to ensure efficient allocation of memory without under or over allocating and interleaving compilation and execution for the most complex queries in order to maximise their performance.

IMG_20170408_120426Another major new feature of SQL Server vNext will be support for Graph Databases.  Graph databases are highly specialised databases that have their data in graph structures, using nodes, edges and properties in which to store their data allowing for semantic querying of data.  Common applications of graph databases are for querying large graphs of data such as those found inside a social network.  Graph data and the ability to efficiently query it makes questions such as “How many friends of Person A are also friends of Person B?” and “Which friends of Person A are also friends of the friends of Person B?” very easy to answer, something a relational database would have difficulty in achieving in an efficient manner.  SQL Server vNext’s support for graph databases promises to offer full CRUD support for node and edge creation, query language extensions to allow querying of graph data as well as allowing queries to span both standard relational SQL Server data and graph data at the same time.

Conor continues his exploration of the new SQL Azure features by telling us about a new feature in SQL Azure that can automatically create indexes for table columns inferred from usage of the column within queries, the maximum database size has also improved, now supporting databases up to 4TB in size.  There’s also some long-awaited improvements to the syntax of the T-SQL query language itself.  There are new string concatenation and aggregation functions as well as a TRIM function (finally!).  New Japanese collation families have been added also and new bulk insert operations have been added to support specific new standards such as RFC 4180 CSV file formats.

IMG_20170408_123009After a brief Q&A at the end of Conor’s session it was time for another refreshment/toilet break.  I decided I was all coffee’d out by this point, having had around 4 or 5 cups of coffee so far, and it still only being just past 12pm.  There was one more session before the break for lunch and so after consulting my conference programme, I headed off to Dome 2 for the somewhat light-hearted session that was Denny Cherry’s “Things You Should Never Do In SQL Server”.

Denny introduced himself first and indicated that this session was to be a bit lighter than other sessions in the day, being a look at some of things that you should not do rather than the things you should.  As such, he reminded us that everything on his slides was wrong!

He started by talking about the enforcement of data integrity in SQL Server and tells us that we really shouldn’t use things like triggers, stored procedures or even application code to enforce data integrity.  SQL Server is a fully relational database and we can leverage what SQL Server is good at by designing our schemas to provide such integrity for us.  Denny talks about a book that he reviewed as a technical editor, which was so bad that he implored the publisher to completely scrap the book.  One of the pearls of wisdom in this book, he says, was the recommendation to use 32-bit editions of SQL Server for “local offices” reserving the 64-bit edition of SQL Server only for large corporations.  Don’t do this.  We all run on 64 bit operating systems today, so where available, we should always be running 64 bit application software, too.  He states that recommendations from third party software vendors should always be questioned, too.

Next up is migrating databases.  Something seen quite frequently is the “copy database wizard” to migrate databases from one server to another.  This is error prone and simply not as good an option as something like log shipping, which has been around for decades is a very robust and mature technique for performing migrations.  Then we look at the account under which SQL Server will run.  Whilst it’s true that very old versions of SQL Server (pre-2005 versions at least) required local administrator privileges in order to run, modern versions of SQL Server do not require such special privileges at all.  SQL Server does require some additional permissions above those usually found in a standard local user account, but not many more.  Always run with the minimum permissions you need.

Next we look at SQL Profiler.  It’s a great tool for debugging issues on a SQL Server instance, but it should never really be connected to your production database.  This can negatively impact performance of the database.  It’s far better to use it against either a local server or an offline backup or staging server.  Moreover, the very latest versions of SQL Server have functionality that SQL Profiler unfortunately doesn’t support.

Denny then moves on to look at the SQL queries we write.  He says that it’s really not worth the effort to ensure that SQL is written in a cross-platform manner (i.e. ANSI SQL).  Whilst it’ll work, of course, you’re really giving up a lot of functionality and performance improvements that have been built into the platform specific dialects of SQL used on each database platform.  Using SQL that is written specifically for the platform you’re targeting will always allow you to write code in the most performant manner.  Moreover, it’s incredibly rare to have to need your SQL written in a cross-platform way as it’s incredibly rare to actually want to ever migrate your databases to an entirely new database platform.

Denny then looks at the some anti-patterns with data itself.  He states that you should always use NULL where there’s an absence of data, and not values such as empty strings, minimum dates (i.e. 01/01/1990) and other “magic” values.  He also says that you should never blindly design your database schema to a specific level of normalization.  You should always consider the application that the data will support and the required performance of that data and domain design and then design your database schema accordingly.  Next we talk about transaction logs.  Denny says he’s seen a number of people simply deleting transaction logs in order to reclaim disk space.  This is a bad idea, and if you find you really need more disk space, you should simply buy more disk space rather than severely impact your ability to recover your database from crashes by deleting the transaction log.  On the subject of transaction logs, Denny reminds us not to use RAID 5 for our disk array that will store the transaction log.  RAID 5 is not optimized for write intensive operations – which the transaction log requires – and so the performance will suffer as a result.  Also, never ever use AUTO_SHRINK to automatically reclaim disk space.  Whilst this does reclaim disk space, the negatives of doing this far outweigh the positives.

Next we look at columns and schema.  Denny reminds us to always use the correct and most appropriate data type for our data, and always be aware of the kinds of data we’ll be working with.  For example, in the US, the zip code (equivalent to postcode in the UK) is entirely numeric – it’s a 5 digit number.  An integer column might seem appropriate here, but some states (Maine) have their zip code start with two zeroes and it must always be written in 5 digit format with these leading zeroes.  Further, zip codes / post codes are not always numeric once you move beyond the USA, so it’s highly likely you’ll need to support alphabetic characters in there too.  Also, don’t assume that certain values will never change and therefore use them as a primary key.  Some developers have previously used a US Social Security Number as a primary key thinking that it’ll never change, and whilst it very rarely does, it’s not guaranteed not to.

Some developers believe that views will improve performance, however, they really don’t.  And don’t ever be tempted to use nested views as they are considered evil and incredibly difficult to debug.  Don’t require RDP access to a SQL Server in order to run queries against it, it’s not only a security risk, but it’s simply not required.  In thinking about the permissions that you can grant to users and other objects in SQL Server, always ensure that you grant the minimum amount of permissions required.  Also, don’t ever revoke permissions from the built-in database roles, such as the public role.  Denny talks about a time that an over-zealous auditor at a client insisted that certain permissions be removed from the public database role (this was a permission allowing access to the underlying Windows Registry).  When revoked this caused users within the public role to be entirely unable to log in to the SQL Server due to SQL Server’s own requirement to access the Windows Registry when a user logs in!  And with that in mind, like third-party vendors, don’t ever listen to external auditors about what permissions you should or should not assign/revoke on your SQL Server.

IMG_20170408_133339After Denny’s session, it was time for lunch.  All of the attendees gathered in the main hall and headed towards one of the 4 main catering points throughout the venue.  As a vegetarian, there was a rather nice option which was Butternut squash & Sage tortellini with roasted Mediterranean vegetables.  This was followed by either a Strawberry bavarois or a warm chocolate brownie with toffee sauce.  Well, I couldn’t resist the chocolate brownie, so after collecting my meal along with some freshly squeezed orange juice to wash it all down, I found an empty spot at one of the many tables and ate my lunch.

After my lunch I decided to take a stroll around the grounds of the International Centre as it had become such a lovely sunny day outside.  The morning’s sessions had been great but intensive, so this gentle stroll allowed me to clear my head, get some fresh air, enjoy the sunshine and put myself back in the right frame of mind required for the final two sessions of the day.  I made my way back inside the venue as the lunch break drew to a close and once again consulted my programme to determine the correct dome for the next session.  This time it was Dome 6 for Richard Douglas’s “Understanding the Transaction Log”.  After taking my seat, the speaker announced that he wasn’t actually Richard at all, as unfortunately, Richard was suffering from food poisoning so this talk was to be given by one of Richard’s colleagues, John Martin.

IMG_20170408_142519John starts by saying that the transaction log is not just all about backups.  There are numerous and varied uses for the transaction log, and it’s integral to a well-running and performant SQL Server.  John talks about the three recovery models for the transaction log.  There’s “Simple”, “Full” and “Bulk” modes and John is keen to point out that even when using Simple mode, everything is still logged to the transaction log.  There may not be as much detail as could be found if using Full or Bulk modes, but everything is still there.  One of the golden rules is to only ever have one transaction log file for each database.  You can have numerous data files, however, transaction logs should be kept within a single file.  This improves performance and makes recovery and backups much easier.  John reminds us what the various modes do.  Simple mode is effectively auto-maintenance and auto-shrinking of our log – SQL Server will take care of all of this itself.  This may or may not be a good thing depending upon your use case.  In Full mode, SQL Server will not perform any of it’s own maintenance or shrinking at all.  You’re responsible for doing this.  This is very often the better choice as you will know your own use cases better than SQL Server does and so can schedule such maintenance for the most convenient and appropriate times.

IMG_20170408_144507John reminds us of the set of operations within SQL Sever that are “minimally logged”, even when operating in Full recovery mode.  Many common operations , such as “SELECT * INTO”, the CREATE / ALTER & DROP’ping of indexes etc. are all minimally logged in the transaction log, and this minimal logging impacts your ability to perform a complete “point-in-time” restore of your database in the event of a crash.

Select_InsertAfter this we look at the general process by which a standard SELECT or INSERT statement is processed by SQL Server, we can see how there’s a lot of moving parts to the entire process flow and how the transaction log is central to ensuring that SQL Server is able to provide the D (Durability) from the ACID properties that we require from our database engine.  John reminds us that it’s not until our data is fully persisted to the transaction log file that SQL Server considers the data persisted and committed to the database as it’s only after this step that the data could be recovered in the event of a server crash.

John moves on to talk about the internals of the transaction log and how SQL Server uses the space within the log file.  Transaction log files are split into multiple VLFs (Virtual Log File).  These VLF’s exist within the same single physical file on disk.  A VLF can be in either an active or inactive state, and at any given time, there’s always at least one active VLF.  SQL Server will manage the creation of new VLFs as the transaction log grows over time, but it’s possible to control some of how VLF’s are created yourself.  The creation of too many VLF’s within a file is thought to be a bad thing and there’s various discussions about how best to manage this.  Ordinarily, you will have either 4, 8 or 16 VLF’s inside the transaction log file and the VLF is sized appropriately based upon the “chunk” size (chunks are the size by which the transaction log physical file grows on disk).

We look at the makeup of a VLF and John tells us that within a VLF there’s many “log blocks”.  Log Blocks are between 512 bytes and 60KB in size, and a VLF will have as many log blocks as is required to fill the VLF space.  John states how it’s better to try to keep the log blocks at the largest possible size of 60KB as performance is better with fewer, but larger log blocks rather than with many smaller log blocks.  Inside the log block are the individual log records.  Each log record is identified by a unique Log Sequence Number (LSN).  The log sequence number is made up from the VLF sequence number, the log block number and the log record number.

John talks about how the transaction log grows over time.  First we look at log growth when using the “Simple” recovery mode.  SQL Server will periodically instigate a “checkpoint”.  This is the flushing of dirty pages of data within memory to disk and any VLF’s that have no open transactions within them are cleared down.  Note that this does not reclaim any disk space from the VLF, however.  If additional logs are added SQL Server moves through the VLF’s within the file looking for space to write the log records, primarily looking for currently inactive VLFs that have been previously cleared and reusing those. If we reach the end of the file, we “wrap around” to the beginning of the file searching for inactive VLF’s where we can write the transactions.  If no inactive VLFs are found, we must increase the size of the physical transaction log file.  This negatively impacts the server, which has to pause all activity whilst the physical file is extended on disk.  Then we look at log growth with “Full” or “Bulk” recovery mode, which is almost the same as log growth with the simple recovery mode, but instead of a checkpoint, we have a transaction file backup occurring instead which ensures we have much more transaction data available for a full recovery of the state of the server if required.

IMG_20170408_152331At this point, John talks about how we can make things go faster with SQL Server by improving transaction log performance.  We start firstly with a good I/O architecture – consider the balance of reads and writes of your data and select an appropriate RAID strategy that’s optimized for your own use case, and remember that using a bigger RAID cache is always better.  Within the architecture of your SQL Server itself, it’s best if you can determine the actual required size of your physical transaction log file at the beginning.  This is difficult to predict, but if you can get close, your server will perform better as a result.  We should be aware of such things as page splits, and particularly the “evil” variety as not all page splits are bad.  These bad splits can have serious negative performance impacts.  John also cautions against using “delayed durability” which is effectively asynchronous transaction writes.  These cause the server to consider data persisted even though it’s not yet fully written to disk.  Depending upon your application, delayed durability could be ok, but if your system must never lose a single transaction then don’t use it.  One time when it might be appropriate to temporarily enable delayed durability is during large scale purging of data from the database.  John tells us to keep an eye on our indexes.  Too many of those means that each one is a discrete data structure that must be updated and persisted to disk individually which can hurt performance.

Finally John tells us about monitoring the SQL Server transaction log file and we can use both external tools such as PERFMON for that as well as built-in SQL Server system stored procedures such as the sys.dm_io_virtual_file_stats stored procedure.  Regularly reviewing these monitors and logs can help identify bottlenecks within the server and highlight the areas where issues may arise.

IMG_20170408_153231After John’s session, it was time for the final afternoon coffee break, which this time was accompanied by a rather nice selection of cakes!  After a cup of coffee and a cake or two (the scones and carrot cake was particularly nice!) it time to consult the conference programme one last time to determine the dome for the final session of the day.

This time it was Dome 3 for Emanuele Zanchettin’s “Performance Tips For Faster SQL Queries”.

IMG_20170408_160333Emanuele starts his session by talking about debugging.  He reminds us that debugging database queries often starts at the application layer with developers digging through C# code, but he states that debugging sometimes also stops there too.  We need to be aware that we often need to debug down into SQL Server itself.  With that, Emanuele asks the attendees if they’d rather have a talk full of slides or a talk full of real-world demos.  The attendees unanimously vote for demos, so from here Emanuele opens up his SQL Server Management Studio tool and begins to show us some T-SQL code.

He first creates a demo database which includes at least one table of over 2 million rows.  He writes some simple SELECT queries that contain a few joined tables.  We run the queries and see that they execute in a very short space of time, however, in looking at the execution plan generated for our query, we can see that we can make the query perform even better.  So Emanuele’s first tip is to make sure you always check the execution plans generated for your queries as they can often indicate where an additional index on the source tables would greatly improve query performance.

IMG_20170408_073649Unfortunately, it was at this point that I had to leave Emanuele’s session in order to make an early start on my journey back home. I missed the end of Emanuele’s session as well as the final conference wrap-up and prize giving session, but I’d had a fantastic day at another incredibly well-run, well-organised and very informative SQLBits event.

As ever, I can’t wait until the next one!

DDD South West 2017 In Review

$
0
0

IMG_20170506_090757This past Saturday 6th May 2017, the seventh iteration of the DDD South West conference was held in Bristol, at the Redcliffe Sixth Form Centre.

This was my second DDD South West conference that I’d attended, having attended the prior DDD South West conference in 2015 (they’d had a year off in 2016), so returning the 2017, the DDD South West team had put on another fine DDD conference.

Since the conference is in Bristol, I’d stayed over the evening before in Gloucester in order to break up the long journey down to the south west.  After an early breakfast on the Saturday morning, I made the relatively short drive from Gloucester to Bristol in order to attend the conference.

IMG_20170506_085557Arriving at the Redcliffe centre, we were signed in and made our way to the main hall where we were treated to teas, coffees and danish pastries.  I’d remembered these from the previous DDDSW conference, and they were excellent and very much enjoyed by the arriving conference attendees.

After a short while in which I had another cup of coffee and the remaining attendees arrived at the conference, we had the brief introduction from the organisers, running through the list of sponsors who help to fund the event and some other house-keeping information.  Once this was done it was time for us to head off to the relevant room to attend our first session.  DDDSW 7 had 4 separate tracks, however, for 2 of the event slots they also had a 5th track which was the Sponsor track – this is where one of the technical people from their primary sponsor gives a talk on some of the tech that they’re using and these are often a great way to gain insight into how other people and companies do things.

I heading off down the narrow corridor to my first session.  This was Ian Cooper’s 12-Factor Apps In .NET.

IMG_20170506_092930Ian starts off my introducing the 12 factors, these are 12 common best practices for building web based applications, especially in the modern cloud-first world.  There’s a website for the 12 factors that details them.  These 12 factors and the website itself were first assembled by the team behind the Heroku cloud application platform as the set of best practices that most of the most successful applications running on their platform had.

Ian states how the 12 factors cover 3 broad categories: Design, Build & Release and Management.  We first look at getting setup with a new project when you’re a new developer on a team.  Setup automation should be declarative.  It should ideally take no more than 10 minutes for a new developer to go from a clean development machine to a working build of the project.  This is frequently not the case for most teams, but we should work to reduce the friction and the time it takes to get developers up and running with the codebase.  Ian also state that a single application, although it may consist of many different individual projects and “moving parts” should remain together in a single source code repository.  Although you’re keeping multiple distinct parts of the application together, you can still easily build and deploy the individual parts independently based upon individual .csproj files.

Next, we talk about application servers.  We should try to avoid application servers if possible.  What is an application server?  Well, it can be many things, but the most obvious one in the world of .NET web applications is Internet Information Server itself.  This is an application server and we often require this for our code to run within it.  IIS has a lot of configuration of it’s own and so this means that our code is then tied to and dependent upon that specifically configured application server.  An alternative to this is have our code self-host, which is entirely possible with ASP.NET Web API and ASP.NET Core.  By self-hosting and running our endpoint of a specific port to differentiate it from other services that may also potentially run on the same server, we ensure our code is far more portable, and platform agnostic.

Ian then talks about backing services.  These are the concerns such as databases, file systems etc. that our application will inevitably need to talk to.  But we need to ensure that we treat them as the potentially ephemeral systems that they are, and therefore accept that such a system may not even have a fixed location.  Using a service discovery service, such as Consul.io, is a good way to remove our application’s dependence on a a required service being in a specific place.

Ian mentions the port and adapters architecture (aka hexagonal architecture) for organising our codebase itself.  He likes this architecture as it’s not only a very clean way to separate concerns, and keep the domain at the heart of the application model, it also works well in the context of a “12-factor” compliant application as the terminology (specifically around the use of “ports”) is similar and consistent.  We then look at performance of our overall application.  Client requests to our front-end website should be responded to in around 200-300 milliseconds.  If, as part of that request, there’s some long-running process that needs to be performed, we should offload that processing to some background task or external service, which can update the web site “out-of-band” when the processing is complete, allowing the front-end website to render the initial page very quickly.

This leads on to talking about ensuring our services can start up very quickly, ideally no more than a second or two, and should shutdown gracefully also.  If we have slow startup, it’s usually because of the need to built complex state, so like the web front-end itself, we should offload that state building to a background task.  If our service absolutely needs that state before it can process incoming requests, we can buffer or queue early request that the service receives immediately after startup until our background initialization is complete.  As an aside, Ian mentions supervisorD as a handy tool to keep our services and processes alive.  One great thing about keeping our startup fast and lean, and our shutdown graceful, we essentially have elastic scaling with that service!

Ian now starts to show us some demo code that demonstrates many of the 12 best practices within the “12-factors”.  He uses the ToDoBackend website as a repository of sample code that frequently follows the 12-factor best practices.  Ian specifically points out the C# ASP.NET Core implementation of ToDoBackend as it was contributed by one of his colleagues.  The entire ToDoBackend website is a great place where many backend frameworks from many different languages and platforms can be compared and contrasted.

Ian talks about the ToDoBackend ASP.NET Core implementation and how it is built to use his own libraries Brighter and Darker.  These are open-source libraries implementing the command dispatcher/processor pattern which allow building applications compliant with the CQRS principle, effectively allowing a more decomposed and decoupled application.  MediatR is a similar library that can be used for the same purposes.  The Brighter library wraps the Polly library which provides essential functionality within a highly decomposed application around improved resilience and transient fault handling, so such things as retries, the circuit breaker pattern, timeouts and other patterns are implemented allowing you to ensure your application, whilst decomposed, remains robust to transient faults and errors – such as services becoming unavailable.

Looking back at the source code, we should ensure that we explicitly isolate and declare our application’s dependencies.  This means not relying on pre-installed frameworks or other library code that already exists on a given machine.  For .NET applications, this primarily means depending on only locally installed NuGet packages and specifically avoiding referencing anything installed in the GAC.  Other dependencies, such as application configuration, and especially configuration that differs from environment to environment – i.e. database connection strings for development, QA and production databases, should be kept within the environment itself rather than existing within the source code.  We often do this with web.config transformations although it’s better if our web.config merely references external environment variables with those variables being defined and maintained elsewhere. Ian mentions HashiCorp’s Vault project and the Spring Boot projects as ways in which you can maintain environment variables and other application “secrets”.  An added bonus of using such a setup is that your application secrets are never added to your source code meaning that, if your code is open source and available on something like GitHub, you won’t be exposing sensitive information to the world!

Finally, we turn to application logging.  Ian states how we should treat all of our application’s logs as event streams.  This means we can view our logs as a stream of aggregated, time-ordered events from the application and should flow continuously so long as the application is running.  Logs from different environments should be kept as similar as possible. If we do both of these things we can store our logs in a system such as Splunk, AWS CloudWatch Logs, LogEntries or some other log storage engine/database.  From here, we can easily manipulate and visualize our application’s behaviour from this log data using something like the ELK stack.

After a few quick Q&A’s, Ian’s session was over and it was time to head back to the main hall where refreshments awaited us.  After a nice break with some more coffee and a few nice biscuits, it was quickly time to head back through the corridors to the rooms where the sessions were held for the next session.  For this next session, I decided to attend the sponsors track, which was Ed Courtenay’s “Growing Configurable Software With Expression<T>

IMG_20170506_105002Ed’s session was a code heavy session which used a single simple application based around filtering a set of geographic data against a number of filters.  The session looked at how configurability of the filtering as well as performance could be improved with each successive iteration refactoring the application to using an ever more “expressive” implementation.  Ed starts by asking what we define as configurable software.  We all agree that software that can change it’s behaviour at runtime is a good definition, and Ed says that we can also think of configurable software as a way to build software from individual building blocks too.  We’re then introduced to the data that we’ll be using within the application, which is the geographic data that’s freely available from geonames.org.

From here, we dive straight into some code.  Ed shows us the IFilter interface that exposes a single method which provide a filter function:

using System;

namespace ExpressionDemo.Common
{
    public interface IFilter
    {
        Func<IGeoDataLocation, bool> GetFilterFunction();
    }
}

The implementation of the IFilter interface is fairly simple and combines a number of actual filter functions that filter the geographic data by various properties, country, classification, minimum population etc.

public Func<IGeoDataLocation, bool> GetFilterFunction()
{
    return location => FilterByCountryCode(location) && FilterByClassification(location)&& FilterByPopulation(location) && FilterByModificationDate(location);
}

Each of the actual filtering functions is initially implemented as a simple loop, iterating over the set of allowed data (i.e. the allowed country codes) and testing the currently processed geographic data row against the allowed values to determine if the data should be kept (returned true from the filter function) or discarded (returning false):

private bool FilterByCountryCode(IGeoDataLocation location)
{
	if (!_configuration.CountryCodes.Any())
		return true;
	foreach (string countryCode in _configuration.CountryCodes) {
		if (location.CountryCode.Equals(countryCode, StringComparison.InvariantCultureIgnoreCase))
			return true;
	}
	return false;
}

Ed shows us his application with all of his filters implemented in such a way and we see the application running over a reduced geographic data set of approximately 10000 records.  We process all of the data in around 280ms, however Ed tells us that it's really a very naive implementation and with only a small improvement in the filter implementations, we can improve this.  From here, we look at the same filters, this time implemented as a Func<T, bool>:

private Func<IGeoDataLocation, bool> CountryCodesExpression()
{
	IEnumerable<string> countryCodes = _configuration.CountryCodes;

	string[] enumerable = countryCodes as string[] ?? countryCodes.ToArray();

	if (enumerable.Any())
		return location => enumerable
			.Any(s => location.CountryCode.Equals(s, StringComparison.InvariantCultureIgnoreCase));

	return location => true;
}

We're doing the exact same logic, but instead of iterating over the allow country codes list, we're being more declarative and simply returning a Func<> which performs the selection/filter for us.  All of the other filters are re-implemented this way, and Ed re-runs his application.  This time, we process the exact same data in around 250ms.  We're starting to see an improvement.  But we can do more.  Instead of returning Func<>'s we can go even further and return an Expression.

We pause looking at the code to discuss Expressions for a moment.  These are really "code-as-data".  This means that we can decompose our algorithm that performs some specific type of filtering and can then express that algorithm as a data structure or tree.  This is a really powerful way or expressing our algorithm and allows our application simply functionally "apply" our input data to the expression rather than having to iterate or loop over lists as we had in the very first implementation.  What this means for our algorithms is increased flexibility in the construction of the algorithm, but also increase performance.  Ed shows us the implementation filter using an Expression:

private Expression<Func<IGeoDataLocation, bool>> CountryCodesExpression()
{
	var countryCodes = _configuration.CountryCodes;

	var enumerable = countryCodes as string[] ?? countryCodes.ToArray();
	if (enumerable.Any())
		return enumerable.Select(CountryCodeExpression).OrElse();

	return location => true;
}

private static Expression<Func<IGeoDataLocation, bool>> CountryCodeExpression(string code)
{
	return location => location.CountryCode.Equals(code, StringComparison.InvariantCultureIgnoreCase);
}

Here, we've simply taken the Func<> implementation and effectively "wrapped" the Func<> in an Expression.  So long as our Func<> is simply a single expression statement, rather than a multi-statement function wrapped in curly brackets, we can trivially turn the Func<> into an

Expression:

public class Foo
{
	public string Name
	{
		get { return this.GetType().Name; }
	}
}

// Func<> implementation
Func<Foo,string> foofunc = foo => foo.Name;
Console.WriteLine(foofunc(new Foo()));

// Same Func<> as an Expression
Expression<Func<Foo,string>> fooexpr = foo => foo.Name;
var func = fooexpr.Compile();
Console.WriteLine(func(new Foo()));

Once again, Ed refactors all of his filters to use Expression functions and we again run the application.  And once again, we see an improvement in performance, this time processing the data in around 240ms.  But, of course, we can do even better!

The next iteration of the application has us again using Expressions, however, instead of merely wrapping the previously defined Func<> in an Expression, this time we're going to write the Expressions from scratch.  At this point, the code does admittedly become less readable as a result of this, however, as an academic exercise in how to construct our filtering using ever more lower-level algorithm implementations, it's very interesting.  Here's the country filter as a hand-rolled Expression:

private Expression CountryCodeExpression(ParameterExpression location, string code)
{
	return StringEqualsExpression(Expression.Property(location, "CountryCode"), Expression.Constant(code));
}

private Expression StringEqualsExpression(Expression expr1, Expression expr2)
{
	return Expression.Call(typeof (string).GetMethod("Equals",
		new[] { typeof (string), typeof (string), typeof (StringComparison) }), expr1, expr2,
		Expression.Constant(StringComparison.InvariantCultureIgnoreCase));
}

Here, we're calling in the Expression API and manually building up the expression tree with our algorithm broken down into operators and operands.  It's far less readable code, and it's highly unlikely you'd actually implement your algorithm in this way, however if you're optimizing for performance, you just might do this as once again we run the application and observe the results.  It is indeed slightly faster than the previous Expression function implementation, taking around 235ms.  Not a huge improvement over the previously implementation, but an improvement nonetheless.

Finally, Ed shows us the final implementation of the filters.  This is the Filter Implementation Bridge.  The code for this is quite complex and hairy so I've not reproduced it here.  It is, however, available in Ed's Github repository which contains the entire code base for the session.

The Filter Implementation Bridge involves building up our filters as Expressions once again, but this time we go one step further and write the resulting code out to a pre-compiled assembly.  This is a separate DLL file, written to disk, which contains a library of all of our filter implementations.  Our application code is then refactored to load the filter functions from the external DLL rather than expecting to find them within the same project.  Because the assembly is pre-compiled and JIT'ed, when we invoke the assembly's functions, we should see another performance improvement.  Sure enough, Ed runs the application after making the necessary changes to implement this and we do indeed see an improvement in performance.  This time processing the data set in around 210ms.

Ed says that although we've looked at performance and the improvements in performance with each successive refactor, the same refactorings have also introduced flexibility in how our filtering is implemented and composed.  By using Expressions, we can easily change the implementation at run-time.  If we have these expressions exported into external libraries/DLL's, we could fairly trivially decide to load a different DLL containing a different function implementation.  Ed uses the example of needing to calculate VAT and the differing ways in which that calculation might need to be implemented depending upon country.  By adhering to a common interface and implementing the algorithm as a generic Expression, we can gain both flexibility and performance in our application code.

After Ed’s session it time for another refreshment break.  We heading back through the corridors to the main hall for more teas, coffees and biscuits.  Again, with only enough time for a quick cup of coffee, it was time again to trek back to the session rooms for the 3rd and final session before lunch.  This one was Sandeep Singh’s “JavaScript Services: Building Single-Page Applications With ASP.NET Core”.

IMG_20170506_120854Sandeep starts his talk with an overview of where Single Page Applications (SPA’s) are currently.  There’s been an explosion of frameworks in the last 5+ years that SPA’s have been around, so there’s an overwhelming choice of JavaScript libraries to choose from for both the main framework (i.e. Angular, Aurelia, React, VueJS etc.) and for supporting JavaScript libraries and build tools.  Due to this, it can be difficult to get an initial project setup and having a good cohesion between the back-end server side code and the front-end client-side code can be challenging.  JavaScriptServices is a part of the ASP.NET Core project and aims to simplify and harmonize development for ASP.NET Core developers building SPA’s.  The project was started by Steve Sanderson, who was the initial creator of the KnockoutJS framework.

IMG_20170506_121721Sandeep talks about some of the difficulties with current SPA applications.  There’s often the slow initial site load whilst a lot of JavaScript is sent to the client browser for processing and eventual display of content, and due to the nature of the application being a single page and using hash-based routing for each page’s URL, and each page being loaded because of running some JavaScript, SEO (Search Engine Optimization) can be difficult.  JavaScriptServices attempts to improve this by combining a set of SPA templates along with some SPA Services, which contains WebPack middleware, server-side pre-rendering of client-side code as well as routing helpers and the ability to “hot swap” modules on the fly.

WebPack is the current major module bundler for JavaScript, allowing many disparate client side asset files, such as JavaScript, images, CSS/LeSS,Sass etc. to be pre-compiled/transpiled and combined in a bundle for efficient delivery to the client.  The WebPack middleware also contains tooling similar to dotnet watch, which is effectively a file system watcher and can rebuild, reload and re-render your web site at development time in response to real-time file changes.  The ability to “hot swap” entire sets of assets, which is effectively a whole part of your application, is a very handy development-time feature.  Sandeep quickly shows us a demo to explain the concept better.  We have a simple SPA page with a button, that when clicked will increment a counter by 1 and display that incrementing count on the page.  If we edit our Angular code which is called in response to the button click, we can change the “increment by 1” code to say “increment by 10” instead.  Normally, you’d have to reload your entire Angular application before this code change was available to the client browser and the built-in debugging tools there.  Now, however, using JavaScript services hot swappable functionality, we can edit the Angular code and see it in action immediately, even continuing the incrementing counter without it resetting back to it’s initial value of 0!

Sandeep continues by discussing another part of JavaScriptServices which is “Universal JavaScript”, also known as “Isomorphic JavaScript”.  This is JavaScript which is shared between the client and the server.  How does JavaScriptServices deal with JavaScript on the server-side?   It uses a .NET library called Microsoft.AspNetCore.NodeServices which is a managed code wrapper around NodeJS.  Using this functionality, the server can effectively run the JavaScript that is part of the client-side code, and send pre-rendered pages to the client.  This is a major factor in being able to improve the initial slow load time of traditional SPA applications.  Another amazing benefit of this functionality is the ability to run an SPA application in the browser even with JavaScript disabled!  Another part of the JavaScriptServices technology that enables this functionality is RoutingServices, which provides handling for static files, and a “fallback” route for when explicit routes aren’t declared.  This means that a request to something like http://myapp.com/orders/ would be rendered on the client-side via the client JavaScript framework code (i.e. Angular) after an AJAX request to retrieve the mark-up/data from the server, however, if client-side routing is unavailable to process such a route (in the case that JavaScript is disabled in the browser, for example) then the request is sent to the server so that the server may render the same page (perhaps requiring client-side JavaScript which is now processed on the server!) on the server before sending the result back to the client via the means of a standard HTTP(s) request/response cycle.

I must admit, I didn’t quite believe this, however, Sandeep quickly setup some more demo code and showed us the application working fine with a JavaScript enabled browser, whereby each page would indeed be rendered by the JavaScript of the client-side SPA.  He then disabled JavaScript in his browser and re-loaded the “home” page of the application and was able to demonstrate still being able to navigate around the pages of the demo application just as smoothly (due to the page’s having been rendered on the server and sent to the client without needing further JavaScript processing to render them).  This was truly mind blowing stuff.  Since the server-side NodeServices library wraps NodeJS on the server, we can write server-side C# code and call out to invoke functionality within any NodeJS package that might be installed.  This means that if have a NodeJS package that, for example, renders some HTML markup and converts it to a PDF, we can generate that PDF with the Node package from C# code, sending the resulting PDF file to the client browser via a standard ASP.NET mechanism.  Sandeep wraps up his talk by sharing some links to find out more about JavaScriptServices and he also provides us with a link to his GitHub repository for the demo code he’s used in his session.

IMG_20170506_130734After Sandeep’s talk, it was time to head back to the main conference area as it was time for lunch.  The food at DDDSW is usually something special, and this year was no exception.  Being in the South West of England, it’s obligatory to have some quintessentially South West food – Pasties!  A choice of Steak or Cheese and Onion pasty along with a packet of crisps, a chocolate bar and a piece of fruit and our lunches were quite a sizable portion of food.  After the attendees queued for the pasties and other treats, we all found a place to sit in the large communal area and ate our delicious lunch.

IMG_20170506_130906The lunch break at DDDSW was an hour and a half and usually has some lightning or grok talks that take place within one of the session rooms over the lunch break.  Due to the sixth form centre not being the largest of venues (some of the session rooms were very full during some sessions and could get quite hot) and the weather outside turning into being a rather nice day and keeping us all very warm indeed, I decided that it would be nice to take a walk around outside to grab some fresh air and also to help work off a bit of the ample lunch I’d just enjoyed.

I decided a brief walk around the grounds of the St. Mary Redcliffe church would be quite pleasant and so off I went, capturing a few photos along the way.  After walking around the grounds, I still had some time to kill so decided to pop down to the river front to wander along there.  After a few more minutes, I stumbled across a nice old fashioned pub and so decided that a cheeky half pint of ale would go down quite nice right about now!  I ordered my ale and sat down in the beer garden overlooking the river watching the world go by as I quaffed!  A lovely afternoon lunch break it was turning out to be.

IMG_20170506_134245-EFFECTSAfter a little while longer it was time to head back to the Redcliffe sixth form centre for the afternoon’s sessions.  I made my way back up the hill, past the church and back to the sixth form centre.  I grabbed a quick coffee in the main conference hall area before heading off down the corridor to the session room for the first afternoon session.  This one was Joel Hammond-Turner’s “Service Discovery With Consul.io for the .NET Developer

IMG_20170506_143139Joel’s session is about the Consul.io software used for service discovery.  Consul.io is a distributed application that allows applications running on a given machine, to query consul for registered services and find out where those services are located either on a local network or on the wider internet.  Consul is fully open source and cross-platform.  Joel tells us the Consul is a service registry – applications can call Consul’s API to register themselves as an available service at a known location – and it’s also service discovery – applications can run a local copy of Consul which will synchronize with other instances of Consul on other machines in the same network to ensure the registry is up-to-date and can query Consul for a given services’ location.  Consul also includes a built-in DNS Server, a Key/Value store and a distributed event engine.  Since Consul requires all of these features itself in order to operate, they are exposed to the users of Consul for their own use.  Joel mentions how his own company, Landmark, is using Consul as part of a large migration project aimed at migrating a 30 year old C++ legacy system to a new event-driven distributed system written in C# and using ASP.NET Core.

Joel talks about distributed systems and they’re often architected from a hardware or infrastructure perspective.  Often, due to each “layer” of the solution requiring multiple servers for failover and redundancy, you’ll need a load balancer to direct requests to this layer to a given server.  If you’ve got web, application and perhaps other layers, this often means load balancers at every layer.  This is not only potentially expensive – especially if your load balancers are separate physical hardware – it also complicates your infrastructure.  Using Consul can help to replace the majority of these load balancers since Consul itself acts as a way to discover a given service and can, when querying Consul’s built-in DNS server, randomize the order in which multiple instances of a given service are reported to a client.  In this way, requests can be routed and balanced between multiple instances of the required service.  This means that load balancing is reduced to a single load balancer at the “border” or entry point of the entire network infrastructure.

Joel proceeds by showing us some demo code.  He starts by downloading Consul.  Consul runs from the command line.  It has a handy built-in developer mode where nothing is persisted to disk, allowing for quick and easy evaluation of the software.  Joel say how Consul command line API is very similar to that of the source control software, Git, where the consul command acts as an introducer to other commands allowing consul to perform some specific function (i.e. consul agent –dev starts the consul agent service running in developer mode).

The great thing with Consul is that you’ll never need to know where on your network the consul service is itself.  It’s always running on localhost!  The idea is that you have Consul running on every machine on localhost.  This way, any application can always access Consul by looking on localhost on the specific Consul port (8300 by default).  Since Consul is distributed, every copy of Consul will synchronize with all others on the network ensuring that the data is shared and registered services are known by each and every running instance of Consul.  The built in DNS Server will provide your consuming app with the ability to get the dynamic IP of the service my using a DNS name of something like [myservicename].services.consul.  If you've got multiple servers for that services (i.e. 3 different IP's that can provide the service) registered, the built-in DNS service will automagically randomize the order in which those IP addresses are returned to DNS queries - automatic load-balancing for the service!

Consuming applications, perhaps written in C#, can use some simple code to both register as a service with Consul and also to query Consul in order to discover services:

public void Configure(IApplicationBuilder app, IHostingEnvironment env, ILoggerFactory loggerFactory, IApplicationLifetime lifetime)
{
	loggerFactory.AddConsole(Configuration.GetSection("Logging"));
	loggerFactory.AddDebug();
	app.UseMvc();
	var serverAddressFeature = (IServerAddressesFeature)app.ServerFeatures.FirstOrDefault(f => f.Key ==typeof(IServerAddressesFeature)).Value;
	var serverAddress = new Uri(serverAddressFeature.Addresses.First());
	// Register service with consul
	var registration = new AgentServiceRegistration()
	{
		ID = $"webapi-{serverAddress.Port}",
		Name = "webapi",
		Address = $"{serverAddress.Scheme}://{serverAddress.Host}",
		Port = serverAddress.Port,
		Tags = new[] { "Flibble", "Wotsit", "Aardvark" },
		Checks = new AgentServiceCheck[] {new AgentCheckRegistration()
		{
			HTTP = $"{serverAddress.Scheme}://{serverAddress.Host}:{serverAddress.Port}/api/health/status",
			Notes = "Checks /health/status on localhost",
			Timeout = TimeSpan.FromSeconds(3),
			Interval = TimeSpan.FromSeconds(10)
		}}
	};
	var consulClient = app.ApplicationServices.GetRequiredService<IConsulClient>();
	consulClient.Agent.ServiceDeregister(registration.ID).Wait();
	consulClient.Agent.ServiceRegister(registration).Wait();
	lifetime.ApplicationStopping.Register(() =>
	{
		consulClient.Agent.ServiceDeregister(registration.ID).Wait();
	});
}

Consul’s services can be queried based upon an exact service name or can be queries based upon a tag (registered services can assign multiple arbitrary tags upon registration to themselves to aid discovery).

As well as service registration and discovery, Consul also provides service monitoring.  This means that Consul itself will monitor the health of your service so that should one of the instances of a given service become unhealthy or unavailable, Consul will prevent that service’s IP Address from being served up to consuming clients when queried.

Joel now shared with us some tips for using Consul from within a .NET application.  He says to be careful of finding Consul registered services with .NET’s built-in DNS resolver.  The reason for this is that .NET’s DNS Resolver is very heavily cached, and it may serve up stale DNS data that may not be up to date with the data inside Consul’s own DNS service.  Another thing to be aware of is that Consul’s Key/Value store will always store values as byte arrays.  This can sometimes be slightly awkward if we’re storing mostly strings inside, however, it’s trivial to write a wrapper to always convert from a byte array when querying the Key/Value store.  Finally, Joel tells us about a more advanced feature of Consul which is the “watches”.  Theses are effectively events that Consul will fire when Consul’s own data changes.  A good use for this would be to have some code that runs in response to one of these events that can re-write the border load balancer rules to provide you with a fully automatic means of keeping your network infrastructure up-to-date and discoverable.

In wrapping up, Joel shares a few links to his GitHub repositories for both his demo code used during his session and the slides.

IMG_20170506_154655After Joel’s talk it was time for another refreshment break.  This one being the only one of the afternoon was accompanied by another DDD South West tradition – cream teas!  These went down a storm with the conference attendees and there was so many of them that many people were able to have seconds.  There was also some additional fruit, crisps and chocolate bars left over from lunch time, which made this particular break quite gourmand.

After a few cups of coffee which were required to wash down the excellent cream teas and other snack treats, it was time to head back down the corridor to the session rooms for the final session of the day.  This one was Naeem Sarfraz’s “Layers, Abstractions & Spaghetti Code: Revisiting the Onion Architecture”.

Naeem’s talk was a retrospective of the last 10+ years of .NET software development.  Not only a retrospective of his own career, but he posits that it’s quite probably a retrospective of the last 10 years of many people’s careers.  Naeem starts by talking about standards.  He asks us to cast our minds back to when we started new jobs earlier in our career and that perhaps one of the first things that we had to do in starting in the job was to read page after page of “standards” documents for the new team that we’d joined.  Coding standards, architecture standards etc. How times have changed nowadays where the focus is less on onerous documentation but more of expressive code that is easier to understand as well as such practices as pair programming allowing new developers to get up to speed with a new codebase quickly without the need to read large volumes of documentation.

IMG_20170506_160127Naeem then moves on to show us some of the code that he himself wrote when he started his career around 10 years ago.  This is typical of a lot of code that many of us write when we first started developing, and has methods which are called in response to some UI event that has business logic, database access and web code (redirections etc.) all combined in the same method!

Naeem says he quickly realised that this code was not great and that he needed to start separating concerns.  At this time, it seemed that N-Tier became the new buzzword and so newer code written was compliant with an N-Tier architecture.  A UI layer that is decoupled from all other layers and only talks to the Business layer which in turn only talks to the Data layer.  However, this architecture was eventually found to be lacking too.

Naeem stops and asks us about how we decompose our applications.  He says that most of the time, we end up with technical designs taking priority over everything else.  We also often find that we can start to model our software by starting at the persistence (database) layer first, this frequently ends up bleeding into the other layers and affects the design of them.  This is great for us as techies, but it’s not the best approach for the business.  What we need to do is start to model our software on the business domain and leave technical designs for a later part of the modelling process.

Naeem shows us some more code from his past.  This one is slightly more separated, but still has user interface elements specifically designed so that rows of values from a grid of columns are designed in such a way that they can be directly mapped to a DataTable type, allowing easy interaction with the database.  This is another example of a data-centric approach to the entire application design.

IMG_20170506_162337We then move on to look at another way of designing our application architecture.  Naeem tells us how he discovered the 4+1 approach, which has us examining the software we’re building by looking at it from different perspectives.  This helps to provide a better, more balanced view of what we’re seeking to achieve with it’s development.

The next logical step along the architecture path is that of Domain-Driven Design.  This takes the approach of placing the core business domain, described with a ubiquitous language– which is always in the language of the business domain, not the language of any technical implementation – at the very heart of the entire software model.  One popular way of using domain-driven design (DDD) is to adopt an architecture called “Onion architecture”.  This is also known as “Hexagonal architecture” or “Ports and adapters architecture”. 

We look at some more code, this time code that is compliant with DDD.  However, on closer inspection, it really isn’t.  He says how this code adopts all the right names for a DDD compliant implementation, however, it’s only the DDD vernacular that’s been adopted and the code still has a very database-centric, table-per-class type of approach.  The danger here is that we can appear to be following DDD patterns, but we’re really just doing things like implementing excessive interfaces on all repository classes or implementing generic repositories without really separating and abstracting our code from the design of the underlying database.

Finally, we look at some more code, this time written by Greg Young and adopting a CQRS based approach.  The demo code is called SimplestPossibleThing and is available at GitHub.  This code demonstrates a much cleaner and abstracted approach to modelling the code.  By using a CQRS approach, reads and writes of data are separated also, with the code implementing Commands - which are responsible for writing data and Queries – which are responsible for reading data.  Finally, Naeem points us to a talk given by Jimmy Bogard which talks about architecting application in “slices not layers”.  Each feature is developed in isolation from others and the feature development includes a “full stack” of development (domain/business layer, persistence/database layer and user interface layers) isolated to that feature.

IMG_20170506_170157After Naeem’s session was over, it was time for all the attendees to gather back in the main conference hall for the final wrap-up by the organisers and the prize draw.  After thanking the various people involved in making the conference what it is (sponsors, volunteers, organisers etc.) it was time for the prize draw.  There were some good prizes up for grabs, but alas, I wasn’t to be a winner on this occasion.

Once again, the DDDSW conference proved to be a huge success.  I’d had a wonderful day and now just had the long drive home to look forward to.  I didn’t mind, however, as it had been well worth the trip.  Here’s hoping the organisers do it all all over again next year!

DDD 12 In Review

$
0
0

IMG_20170610_085641On Saturday 10th June 2017 in sunny Reading, the 12th DeveloperDeveloperDeveloper event was held at Microsoft’s UK headquarters.  The DDD events started at the Microsoft HQ back in 2005 and after some years away and a successful revival in 2016, this year’s DDD event was another great occasion.

I’d travelled down to Reading the evening before, staying over in a B&B in order to be able to get to the Thames Valley Park venue bright and early on the Saturday morning.  I arrived at the venue and parked my car in the ample parking spaces available on Microsoft’s campus and headed into Building 3 for the conference.  I collected my badge at reception and was guided through to the main lounge area where an excellent breakfast selection awaited the attendees.  Having already had a lovely big breakfast at the B&B where I was staying, I simply decided on another cup of coffee, but the excellent breakfast selection was very well appreciated by the other attendees of the event.

IMG_20170610_085843

I found a corner in which to drink my coffee and double-check the agenda sheets that we’d been provided with as we entered the building.   There’d been no changes since I’d printed out my own copy of the agenda a few nights earlier, so I was happy with the selections of talks that I’d chosen to attend.   It’s always tricky at the DDD events as there are usually at least 3 parallel tracks of talks and invariably there will be at least one or two timeslots where multiple talks that you’d really like to attend will overlap.

IMG_20170611_104806Occasionally, these talks will resurface at another DDD event (some of the talks and speakers at this DDD in Reading had given the same or similar talks in the recently held DDD South West in Bristol back in May 2017) and so, if you’re really good at scheduling, you can often catch a talk at a subsequent event that you’d missed at an earlier one!

As I finished off my coffee, more attendees turned up and the lounge area was now quite full.  It wasn’t too much longer before we were directed by the venue staff to make our way to the relevant conference rooms for our first session of the day.

Wanting to try something a little different, I’d picked Andy Pike’s Introducing Elixir: Self-Healing Applications at ZOMG scale as my first session.

Andy started his sessions by talking about where Elixir, the language came from.  It’s a new language that is built on top of the Erlang virtual machine (BEAM VM) and so has it’s roots in Erlang.  Like Erlang, Elixir compiles down to the same bytecode that the VM ultimately runs.  Erlang was originally created at the Ericsson company in order to run their telecommunications systems.  As such, Erlang is built on top of a collection of middleware and libraries known as OTP or Open Telecom Platform.  Due to its very nature, Erlang and thus Elixir has very strong concurrency, fault tolerance and distributed computing at it’s very core.  Although Elixir is a “new” language in terms perhaps of it’s adoption, it’s actually been around for approximately 5 years now and is currently on version 1.4 with version 1.5 not too far off in the future.

Andy tells of how Erlang is used by WhatsApp to run it’s global messaging system.  He provides some numbers around this.  WhatsApp have 1.2 billion users, process around 42 billion messages per day and can manage to handle around 2 million connections on each server!   That’s some impressive performance and concurrency figures and Andy is certainly right when he states that very few other, if any, platforms and languages can boast such impressive statistics.

Elixir is a functional language and its syntax is heavily inspired by Ruby.  The Elixir language was first designed by José Valim, who was a core contributor in the Ruby community.  Andy talks about the Elixir REPL that ships in the Elixir installation which is called “iex”.  He shows us some slides of simple REPL commands showing that Elixir supports all the same basic intrinsic types that you’d expect of any modern language, integers, strings, tuples and maps.  At this point, things look very similar to most other high level functional (and perhaps some not-quite-so-functional) languages, such as F# or even Python.

Andy then shows us something that appears to be an assignment operator, but is actually a match operator:

a = 1

Andy explains that this is not assigning 1 to the variable ‘a’, but is “matching” a against the constant value 1.  If ‘a’ has no value, Elixir will bind the right-hand side operand to the variable, then perform the same match.  An alternative pattern is:

^a = 1

which performs the matching without the initial binding.  Andy goes on to show how this pattern matching can work to bind variables to values in a list.  For example, given the code:

success = { :ok, 42 }
{ ;ok, result } = success

this will bind the value of 42 to the variable result and subsequently perform the match of the tuple to the variable ‘success’ which now matches.  We’re told how the colon in front of the ok variable makes it an “atom”.  This is similar to a constant, where the variables name is it’s own value.

IMG_20170610_095341Andy shows how Elixir’s code is grouped into functions and functions can be contained within modules.  This is influenced by Ruby and how it also groups it’s code.  We then move on to look at lists.  These are handled in a very similar way to most other functional languages in that a list is merely seen as a “head” and a “tail”.  The “head” is the first value in the list and the “tail” is the entire rest of the list.  When processing the items in a list in Elixir, you process the head and then, perhaps recursively, call the same method passing the list “tail”.  This allows a gradual shortening of the list as the “head” is effectively removed with each pass through the list.  In order for such recursive processing to be performant, Elixir includes tail-call optimisation which allows the compiler to eliminate the necessity of maintaining state through each successive call to the method.  This is possible when the last line of code in the method is the recursive call.

Elixir also has guard clauses built right into the language.  Code such as:

def what_is(x) when is_number(x) and X > 0 do…

helps to ensure that code is more robust by only being invoked when ‘x’ is not only a number but also has some specific value too.  Andy states that, between the usage of such guard clauses and pattern matching, you can probably eliminate around 90-95% of all conditionals within your code (i.e. if x then y).

Elixir is very expressive within it’s allowed characters for function names, so functions can (and often do) have things like question marks in their name.  It’s a convention of the language that methods that return a Boolean value should end in a question mark, something shared with Ruby also, i.e. String.contains? "elixir of life", "of"  And of course, Elixir, like most other functional languages has a pipe operator(|>) which allows the piping of the result of one function call into the input of another function call, so instead of writing:

text = "THIS IS MY STRING"
text = String.downcase(text)
text = String.reverse(text)
IO.puts text

Which forces us to continually repeat the “text” variable, we can instead write the same code like this:

text = "THIS IS MY STRING"
|> String.downcase
|> String.reverse
|> IO.puts

Andy then moves on to show us an element of the Elixir language that I found particular intriguing, doctests.  Elixir makes function documentation a first class citizen within the language and not only does the doctest code provide documentation for the function – Elixir has a function, h, that when passed another function name as a parameter, will display the help for that function – but also serves as a unit test for the function, too!  Here’s a sample of an Elixir function containing some doctest code:

defmodule MyString do
   @doc ~S"""
   Converts a string to uppercase.

   ## Examples
       iex> MyString.upcase("andy")
       "ANDY"
   """
   def upcase(string) do
     String.upcase(string)
   end
end

The doctest code not only shows the textual help text that is shown if the user invokes the help function for the method (i.e. “h MyString”) but the examples contained within the help text can be executed as part of a doctest unit test for the MyString method:

defmodule MyStringTest do
   use ExUnit.Case, async: true
   doctest MyString
end

This above code uses the doctest code inside the MyString method to invoke each of the provided “example” calls and assert that the output is the same as that defined within the doctest!

After taking a look into the various language features, Andy moves on talk about the real power of Elixir which it inherits from it’s Erlang heritage – processes.  It’s processes that provide Erlang, and thus Elixir with the ability to scale massively, provide it with fault-tolerance and its highly distributed features.

Wen Elixir functions are invoked, they can effectively be “wrapped” within a process.  This involves spawning a process that contains the function.  Processes are not the same as Operating System processes, but are much more lightweight and are effectively only a C struct that contains a pointer to the function to call, some memory and mailbox (which will hold messages sent to the function).  Processes have a Process ID (PID) and will, once spawned, continue to run until the function contained within terminates or some error or exception occurs.  Processes can communicate with other processes by passing messages to those processes.  Here’s an example of a very simple module containing a single function and how that function can be called by spawning a separate process:

defmodule HelloWorld do
     def greet do
		IO.puts "Hello World"
     end
end

HelloWorld.greet                 # This is a normal function call.
pid = spawn(HelloWorld, :greet)  # This spawns a process containing the function

Messages are sent to processes by invoking the “send” function, providing the PID and the parameters to send to the function:

send pid, { :greet, “Andy” }

This means that invoking functions in processes is almost as simple as invoking a local function.

Elixir uses the concept of schedulers to actually execute processes.  The Beam VM will supply one scheduler per core of CPU available, giving the ability to run highly concurrently.  Elixir also uses supervisors as part of the Beam VM which can monitor processes (or even monitor other supervisors) and can kill processes if they misbehave in unexpected ways.  Supervisors can be configured with a “strategy”, which allows them to deal with errant processes in specific ways.  One common strategy is one_for_one which means that if a given process dies, a single new one is restarted in it’s place.

Andy then talks about the OTP heritage of Elixir and Erlang and from this there is a concept of a “GenServer”.  A GenServer is a module within Elixir that provides a consistent way to implement processes.  The Elixir documentation states:

A GenServer is a process like any other Elixir process and it can be used to keep state, execute code asynchronously and so on. The advantage of using a generic server process (GenServer) implemented using this module is that it will have a standard set of interface functions and include functionality for tracing and error reporting. It will also fit into a supervision tree.

The GenServer provides a common set of interfaces and API’s that all processes can adhere to, allowing common conventions such as the ability to stop a process, which is frequently implemented like so:

GenServer.cast(pid, :stop)

Andy then talks about “nodes”.  Nodes are separate actual machines that can run Elixir and the Beam VM and these nodes can be clustered together.  Once clustered, a node can start a process not only on it’s own node, but on another node entirely.  Communication between processes, irrespective of the node that the process is running on is handled seamlessly by the Beam VM itself.  This provides Elixir solutions great scalability, robustness and fault-tolerance.

Andy mentions how Elixir has it’s own package manager called “hex” which gives access to a large range of packages providing lots of great functionality.  There’s “Mix”, which is a build tool for Elixir, OTP Observer for inspection of IO, memory and CPU usage by a node, along with “ETS”, which is an in-memory key-value store, similar to Redis, just to name a few.

Andy shares some book information for those of us who may wish to know more.  He suggests “Programming Elixir” and “Programming Phoenix”, both part of the Pragmatic Programmers series and also “The Little Elixir & OTP Guidebook” from Manning.

Finally, Andy wraps up by sharing a story of a US-based Sports news website called “Bleacher Report”.  The Bleacher Report serves up around 1.5 billion pages per day and is a very popular website both in the USA and internationally.  Their entire web application was originally built on Ruby on Rails and they required approximately 150 servers in order to meet the demand for the load.  Eventually, they re-wrote their application using Elixir.  They now serve up the same load using only 5 servers.  Not only have they reduced their servers by an enormous factor, but they believe that with 5 servers, they’re actually over-provisioned as the 5 servers are able to handle the load very easily.  High praise indeed for Elixir and the BeamVM.  Andy has blogged about his talk here.

After Andy’s talk, it was time for the first coffee break.  Amazingly, there were still some breakfast sandwiches left over from earlier in the morning, which many attendees enjoyed.  Since I was still quite full from my own breakfast, I decided a quick cup of coffee was in order before heading back to the conference rooms for the next session.  This one was Sandeep Singh’s Goodbye REST; Hello GraphQL.

IMG_20170610_103642

Sandeep’s session is all about the relatively new technology of GraphQL.  It’s a query language for your API comprising of a server-side runtime for processing queries along with a client-side framework providing an in-browser IDE, called GraphiQL.  One thing that Sandeep is quick to point out is that GraphQL has nothing to do with Graph databases.  It can certainly act as a query layer over the top of a graph database, but can just as easily query any kind of underlying data such as RDBMS’s through to even flat-file data.

Sandeep first talks about where we are today with API technologies.  There’s many of them, XML-RPC, REST, ODATA etc. but they all have their pros and cons.  We explore REST in a bit more detail, as that’s a very popular modern day architectural style of API.  REST is all about resources and working with those resources via nouns (the name of the resource) and verbs (HTTP verbs such as POST, GET etc.), there’s also HATEOAS if your API is “fully” compliant with REST.

Sandeep talks about some of the potential drawbacks with a REST API.   There’s the problem of under-fetching.  This is seen when a given application’s page is required to show multiple resources at once, perhaps a customer along with recently purchased products and a list of recent orders.  Since this is three different resources, we would usually have to perform three distinct different API calls in order to retrieve all of the data required for this page, which is not the most performant way of retrieving the data.  There’s also the problem of over-fetching.  This is where REST API’s can take a parameter to instruct them to include additional data in the response (i.e. /api/customers/1/?include=products,orders), however, this often results in data that is additional to that which is required.  We’re also exposing our REST endpoint to potential abuse as people can add arbitrary inclusions to the endpoint call.  One way to get around the problems of under or over fetching is to create ad-hoc custom endpoints that retrieve the exact set of data required, however, this can quickly become unmaintainable over time as the sheer number of these ad-hoc endpoints grows.

IMG_20170610_110327GraphQL isn’t an architectural style, but is a query language that sits on top of your existing API. This means that the client of your API now has access to a far more expressive way of querying your API’s data.  GraphQL responses are, by default, JSON but can be configured to return XML instead if required, the input queries themselves are similarly structured to JSON.  Here’s a quick example of a GraphQL query and the kind of data the query might return:

{
	customer {
		name,
		address
	}
}
{
	"data" : {
		"customer" : {
			"name" : "Acme Ltd.",
			"address" : ["123 High Street", "Anytown", "Anywhere"]
		}
	}
}

GraphQL works by sending the query to the server where it’s translated by the GraphQL server-side library.  From here, the query details are passed on to code that you have written on the server in order that the query can be executed.  You’ll write your own types and resolvers for this purpose.  Types provide the GraphQL queries with the types/classes that are available for querying – this means that all GraphQL queries are strongly-typed.  Resolvers tell the GraphQL framework how to turn the values provided by the query into calls against your underlying API.  GraphQL has a complete type system within it, so it supports all of the standard intrinsic types that you’d expect such as strings, integers, floats etc. but also enums, lists and unions.

Sandeep looks at the implementations of GraphQL and explains that it started life as a NodeJS package.  It’s heritage is therefore in JavaScript, however, he also states that there are many implementations in numerous other languages and technologies such as Python, Ruby, PHP, .NET and many, many more.  Sandeep says how GraphQL can be very efficient, you’re no longer under or over fetching data, and you’re retrieving the exact fields you want from the tables that are available.  GraphQL allows you to version your queries and types also by adding deprecated attributes to fields and types that are no longer available.

IMG_20170610_110046We take a brief looks at the GraphiQL GUI client which is part of the GraphQL client side library.  It displays 3 panes within your browser showing the schemas, available types and fields and a pane allowing you to type and perform ad-hoc queries.  Sandeep explains that the schema and sample tables and fields are populated within the GUI by performing introspection over the types configured in your server-side code, so changes and additions there are instantly reflected in the client.  Unlike REST, which has obvious verbs around data access, GraphQL doesn't really have these.  You need introspection over the data to know how you can use that data, however, this is a very good thing.  Sandeep states how introspection is at the very heart of GraphQL – it is effectively reflection over your API – and it’s this that leads to the ability to provide strongly-typed queries.

We’re reminded that GraphQL itself has no concept of authorisation or any other business logic as it “sits on top of” the existing API.  Such authorisation and business logic concerns should be embedded within the API or a lower layer of code.  Sandeep says that the best way to think of GraphQL is like a “thin wrapper around the business logic, not the data layer”.

IMG_20170610_111157GraphQL is not just for reading data!  It has full capabilities to write data too and these are known as mutations rather than queries.  The general principle remains the same and a mutation is constructed using JSON-like syntax and sent to the server for resolving to a custom method that will validate the data and persist it the data store by invoking the relevant API endpoint.  Sandeep explains how read queries can be nested, so you can actually send one query to the server that actually contains syntax to perform two queries against two different resources.    GraphQL has a concept of "loaders".  These can batch up actual queries to the database to prevent issues when asking for such things all Customers including Orders.  Doing something like this normally results in a N+1 issue where by Orders are retrieved by issuing a separate query for each customer, resulting in degraded performance.  GraphQL loaders works by enabling the rewriting of the underlying SQL that can be generated for retrieving the data so that all Orders are retrieved for all of the required Customers in a single SQL statement.  i.e. Instead of sending queries like the following to the database:

SELECT CustomerID, CustomerName FROM Customer
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 1
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 2
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID = 3

We will instead send queries such as:

SELECT CustomerID, CustomerName FROM Customer
SELECT OrderID, OrderNumber, OrderDate FROM Order WHERE CustomerID IN (1,2,3)

IMG_20170610_110644Sandeep then looks at some of the potential downsides of using GraphQL.  Caching is somewhat problematic as you can no longer perform any caching at the network layer as each query is now completely dynamic.  Despite the benefits, there are also performance considerations if you intend to use GraphQL as your underlying data needs to be correctly structured in order to work with GraphQL in the most efficient manner.  GraphQL Loaders should also be used to ensure N+1 problems don’t become an issue.  There’s security considerations too. You shouldn’t expose anything that you don’t want to be public since everything through the API is available to GraphQL and you’ll need to be aware of potentially malicious queries that attempt to retrieve too much data.  One simple solution to such queries is to use a timeout.  If a given query takes longer than some arbitrarily defined timeout value, you simply kill the query, however this may not be the best approach.  Other approaches taken by big websites currently using such functionality is to whitelist all the acceptable queries.  If a query is received that isn’t in the whitelist, it doesn’t get run.  You also can’t use HTTP codes to indicate status or contextual information to the client.  Errors that occur when processing the GraphQL query are contained within the GraphQL response text itself which is returned to the client with 200 HTTP success code.  You’ll need to have your own strategy for exposing such data to the user in a friendly way.   Finally Sandeep explains that, unlike other querying technologies such as ODATA, GraphQL has no intrinsic ability to paginate data.  All data pagination must be built into your underlying API and business layer, GraphQL will merely pass the paging data – such as page number and size – onto the API and expect the API to correctly deal with limiting the data returned.

IMG_20170610_103123After Sandeep’s session, it’s time for another coffee break.  I quickly grabbed another coffee from the main lounge area of the conference, this time accompanied by some rather delicious biscuits, before consulting my Agenda sheet to see which room I needed to be in for my next session.  Turned out that the next room I needed to be in was the same one I’d just left!  After finishing my coffee and biscuits I headed back to Chicago 1 for the next session and the last one before the lunch break.  This one was Dave Mateer’s Fun With Twitter Stream API, ELK Stack, RabbitMQ, Redis and High Performance SQL.

Dave’s talk is really all about performance and how to process large sets of data in the most efficient and performant manner.  Dave’s talk is going to be very demo heavy and so to give us a large data set to work with, Dave starts by looking at Twitter, and specifically, it’s Stream API.  Dave explains that the full Twitter firehose, which is reserved only for Twitter’s own use, currently has 1.3 million tweets per second flowing through it.  As a consumer, you can get access to a deca-firehose which contains 1/10th of the full firehose (i.e. 130000 tweets per second) but this costs money, however, Twitter does expose a freely available Stream API although it’s limited to 50 tweets per second.  This is still quite a sizable amount of data to process.

IMG_20170610_120247Dave starts by showing us a demo of a simple C# console application that uses the Twitter Stream API to gather real-time tweets for the hashtag of #scotland which are then echoed out to the console.   Our goal is to get the tweet data into a SQL Server database as quickly and as efficiently as possible.  Dave now says that to simulate a larger quantity of data, he’s going to read in a number of pre-saved text files containing tweet data that he’s previously collected.  These files represent around 6GB of raw tweet data, containing approximately 1.9 million tweets.  He then adds to the demo to start saving the tweets into SQL Server.  Dave mentions that he's using Dapper to access SQL Server and that previously he tried such things using Entity Framework, which was admittedly some time in the past, but that it was a painful experience and not very performant.   Dave likes Dapper as it's a much simpler abstraction over the database, so therefore much more performant.  It's also a lot easier to optimize your queries when using Dapper as the abstraction isn’t so great and you’re not hiding the implementation details too much.

IMG_20170610_121958Next, Dave shows us a Kibana interface.  As well as writing to SQL Server, he's also saving the tweets to a log file using SeriLog and then using LogStash to send those logs to ElasticSearch allowing viewing the raw log data with Kibana (also known as the ELK stack).  Dave then shows us how easy it is to really leverage the power of tools like Kibana by creating a dashboard for all of the data.

From here, Dave begins to talk about performance and just how fast we can process the large quantity of tweet data.  The initial run of the application, which was simply reading in each tweet from the file and performing an INSERT via Dapper to insert the data to the SQL Server database was able to process approximately 420 tweets per second.  This isn’t bad, but it’s not a great level of performance.  Dave digs out SQL Server Profiler to see where the bottle-necks are within the data access code, and this shows that there are some expensive reads – the data is normalized so that the user that a tweet belongs to is stored in a separate table and looked up when needed.  It’s decided that adding indexes on the relevant columns used for the lookup data might speed up the import process.  Sure enough, after adding the indexes and re-running the tweet import, we improve from 420 to 1600 tweets per second.  A good improvement, but not an order of magnitude improvement.  Dave wants to know if we can change the architecture of our application and go even faster.  What if we want to try to achieve a 10x level of increase in performance?

Dave states that, since his laptop has multiple cores, we should start by changing the application architecture to make better use of parallel processing across all of the available cores in the machine.  We set up an instance of the RabbitMQ message queue, allowing us to read the tweet data in from the files and send it to a RabbitMQ queue.  The queue is explicitly set to durable and each message is set to persistent in order to ensure we have the ability to continue where we left off from in the event of a crash or server failure.  From here, we can have multiple instances of another application that pull messages off the queue, leveraging RabbitMQ’s ability to effectively isolate each of the client consumers, ensuring that the same message is not sent to more than one client.  Dave then sets up Redis.  This will be used for "lookups" that are required when adding tweet data.  So users (and other data) is added to the DB first, then all data is cached in Redis, which is an in-memory key/value store and is often used for caching scenarios.  As tweets are added to the RabbitMQ message queue, required ID/Key lookups for Users and other data are done using the Redis cache rather than performing a SQL Server query for the data, thus improving the performance.  Once processed and the relevant values looked up from Redis, Dave uses SQL Server Bulk Copy to get the data into SQL Server itself.  SQL Server Bulk Copy provides a significant performance benefit over using standard INSERT T-SQL statements.   For Dave’s purposes, he Bulk Copies the data into temporary tables within SQL Server and then, at the end of the import, runs a single T-SQL statement to copy the data from the temporary tables to the real tables. 

IMG_20170610_125524Having re-architected the solution in this manner, Dave then runs his import of 6GB of tweet data again.  As he’s now able to take advantage of the multiple CPU cores available, he runs 3 console applications in parallel to process all of the data.  Each console application completes their jobs within around 30 seconds, and whilst they’re running, Dave shows us the Redis dashboard which is indicating that Redis is receiving around 800000 hits to the cache per second!  The result, ultimately, is that the application’s performance has increased from processing around 1600 tweets per second to around 20000!  An impressive improvement indeed!

Dave then looks at some potential downsides of such re-architecture and performance gains.  He shows how he’s actually getting duplicated data imported into his SQL Server and this is likely due to race conditions and concurrency issues at one or more points within the processing pipeline.  Dave then quickly shows us how he’s got around this problem with some rather ugly looking C# code within the application (using temporary generic List and Dictionary structures to pre-process the data).  Due to the added complexity that this performance improvement brings, Dave argues that sometimes slower is actually faster in that, if you don't absolutely really need so much raw speed, you can remove things like Redis from the architecture, slowing down the processing (albeit to a still acceptable level) but allowing a lot of simplification of the code.

IMG_20170610_130715After Dave’s talk was over, it was time for lunch.  The attendees made their way to the main conference lounge area where we could choose between lunch bags of sandwiches (meat, fish and vegetarian options available) or a salad option (again, meat and vegetarian options).

Being a recently converted vegetarian, I opted for a lovely Greek Salad and then made my way to the outside area along with, it would seem, most of the other conference attendees.

IMG_20170610_132208It had turned into a glorious summer’s day in Reading by now and the attendees and some speakers and other conference staff were enjoying the lovely sunshine outdoors while we ate our lunch.  We didn’t have too long to eat, though, as there were a number of Grok talks that would be taking place inside one of the major conference rooms over lunchtime.  After finishing my lunch (food and drink was not allowed in the session rooms themselves) I heading back towards Chicago 1 where the Grok Talks were to be held.

I’d managed to get there early enough in order to get a seat (these talks are usually standing room only) and after settling myself in, we waited for the first speaker.  This was to be Gary Short with a talk about Markov, Trump and Countering Radicalisation.

Gary starts by asking, “What is radicalisation?”.  It’s really just persuasion – being able to persuade people to hold an extreme view of some information.  Counter-radicalisation is being able to persuade people to hold a much less extreme belief.  This is hard to achieve, and Gary says that it’s due to such things as Cognitive Dissonance and The “Backfire” Effect (closely related to Confirmation Bias) – two subjects that he doesn’t have time to go into in a 10 minute grok talk, but he suggests we Google them later!  So, how do we process text to look for signs of radicalisation?  Well, we start with focus groups and corpus's of known good text as well as data from previously de-radicalised people (Answers to questions like “What helped you change?” etc.)  Gary says that Markov chains are used to process the text.  They’re a way of “flowing” through a group of words in such a way that the next word is determined based upon statistical data known about the current word and what’s likely to follow it.  Finally, Gary shows us a demo of some Markov chains in action with a C# console application that generates random tweet-length sentences based upon analysis of a corpus of text from Donald Trump.  His application is called TweetLikeTrump and is available online.

The next Grok talk is Ian Johnson’s Sketch Notes.  Ian starts by defining Sketch Notes.  They’re visual notes that you can create for capturing the information from things like conferences, events etc.   He states that he’s not an artist, so don’t think that you can’t get started doing Sketch Noting yourself.  Ian talks about how to get started with Sketch Notes.  He says it’s best to start by simply practising your handwriting!   Find a clear way of quickly and clearly writing down text using pen & paper and then practice this over and over.  Ian shared his own structure of a sketch note, he places the talk title and the date in the top left and right corners and the event title and the speaker name / twitter handle in the bottom corners.  He goes on to talk more about the component parts of a sketch note.   Use arrows to create a flow between ideas and text that capture the individual concepts from the talk.  Use colour and decoration to underline and underscore specific and important points – but try to do this when there’s a “lull” in the talk and you don’t necessarily have to be concentrating on the talk 100% as it’s important to remember that the decoration is not as important as the content itself.  Ian shares a specific example.  If the speaker mentions a book, Ian would draw a little book icon and simply write the title of the book next to the icon.  He says drawing people is easy and can be done with a simple circle for the head and no more than 4 or 5 lines for the rest of the body.  Placing the person’s arms in different positions helps to indicate expression.  Finally, Ian says that if you make a mistake in the drawing (and you almost certainly will do, at least when starting out) make a feature of it!  Draw over the mistake to create some icon that might be meaningful in the entire context of the talk or create a fancy stylised bullet point and repeat that “mistake” to make it look intentional!  Ian has blogged about his talk here.

Next up is Christos Matskas’s grok talk on Becoming an awesome OSS contributor.   Christos starts by asking the audience who is using open source software?  After a little thought, virtually everyone in the room raises their hand as we’re all using open source software in one way or another.  This is usually because we'll be working on projects that are using at least one NuGet package, and NuGet packages are mostly open source.  Christos shares the start of his own journey into becoming an OSS contributor which started with him messing up a NuGet Package restore on his first day at a new contract.  This lead to him documenting the fix he applied which was eventually seen by the NuGet maintainers and he was offered to write some documentation for them.  Christos talks about major organisations using open source software including Apple, Google, Microsoft as well as the US Department of Defence and the City of Munich to name but a few.  Getting involved in open source software yourself is a great way to give back to the community that we’re all invariably benefiting from.  It’s a great way to improve your own code and to improve your network of peers and colleagues.  It’s also great for your CV/Resume.  In the US, its almost mandatory that you have code on a GitHub profile.  Even in the UK and Europe, having such a profile is not mandatory, but is a very big plus to your application when you’re looking for a new job.  You can also get free software tools to help you with your coding.  Companies such as JetBrains, RedGate, and many, many others will frequently offer many of their software products for free or a heavily discounted price for open source projects.   Finally, Christos shares a few websites that you can use to get started in contributing to open source, such as up-for-grabs.net, www.firsttimersonly.com and the twitter account @yourfirstPR.

The final grok talk is Rik Hepworth’s Lability.  Lability is a Powershell module, available via Github, that uses Azure’s DSC (Desired State Configuration) feature to facilitate the automated provisioning of complete development and testing environments using Windows Hyper-V.  The tool extends the Powershell DSC commands to add metadata that can be understood by the Lability tool to configure not only the virtual machines themselves (i.e. the host machine, networking etc.) but also the software that is deployed and configured on the virtual machines.  Lability can be used to automate such provisioning not only on Azure itself, but also on a local development machine.

IMG_20170610_141738After the grok talks were over, I had a little more time available in the lunch break to grab another quick drink before once again examining the agenda to see where to go for the first session of the afternoon, and penultimate session of the day.  This one was Ian Russell’s Strategic Domain-Driven Design.

Ian starts his talk by mentioning the “bible” for Domain Driven Design and that’s the book by Eric Evans of the same name.  Ian asks the audience who has read the book to which quite a few people raise their hands.  He then asks who has read past Chapter 14, to which a lot of people put their hands down.  Ian states that, if you’ve never read the book, the best way to get the biggest value from the book is to read the last 1/3rd of the book first and only then return to read the first 2/3rds!

So what is Domain-Driven Design?  It’s an abstraction of reality, attempting to model the real-world we see and experience before us in a given business domain.  Domain-Driven Design (DDD) tries to break down the domain into what is the “core” business domain – these are the business functions that are the very reason for a business’s being – and other domains that exist that are there to support the “core” domain.  One example could be a business that sells fidget spinners.  The actual domain logic involved with selling the spinners to customers would be the core domain, however, the same company may need to provide a dispute resolution service for unhappy customers.  The dispute resolution service, whilst required by the overall business, would not be the core domain but would be a supporting or generic domain.

DDD has a ubiquitous language.  This is the language of the business domain and not the technical domain.  Great care should be taken to not use technical terms when discussing domains with business stake holders, and the terms within the language should be ubiquitous across all domains and across the entire business.  Having this ubiquitous language reduces the chance of ambiguity and ensures that everyone can relate to specific component parts of the domain using the same terminology.  DDD has sub-domains, too.  These are the core domain – the main business functionality, the supporting domains – which exist solely to support the core, and generic domains - such as the dispute resolution domain, which both supports the core domain but is also generic and could apply to other businesses too.  DDD has bounded contexts.  These are like sub-domains but don’t necessarily map directly to sub-domains.  They are explicitly boundaries from other areas of the business.  Primarily, bounded contexts can be developed in software independently from each other.  They could be written by different development teams and could even use entirely different architectures and technologies in their construction.

Ian talks about driving out the core concepts by creating a “shared kernel”.  These are the concepts that exist in the overlaps between bounded contexts.  These concepts don’t have to be shared between the bounded contexts and they may be different – the concept of an “account” might mean different things within the finance bounded context to the concept of an “account” within the shipping bounded context, for example.  Ian talks about the concept of an “anti-corruption layer” as part of each bounded context.  This allows bounded contexts to communicate with items from the shared kernel but where those concepts actually do differ between contexts, the anti-corruption layer will prevent corruption from incorrect implementations of concepts being passed to it.  Ian mentions domain events next.  He says that these are something that are not within Eric Evans’ book but are often documented in other DDD literature.  Domain events are essentially just ”things that occur” within the domain. For example, a new customer registered on the company’s website is an domain event.  Events can be created by users, by the passage of time, by external system, or even by other domain events.

All of this is Strategic domain-driven design.  It’s the modelling and understanding of the business domain without ever letting any technological considerations interfere with the modelling and understanding.  It’s simply good architectural practice and the understanding of how different parts of the business domain interact with, and communicate with other parts of the business domain.

Ian suggests that there’s three main ways in which to achieve strategic domain-driven design.  These are Behavioural Driven Design (BDD), Prototyping and Event Storming.  BDD involves writing acceptance tests for our software application using a domain-specific language within our application code that allows the tests to use the ubiquitous language of the domain, rather than the explicitly technical language of the software solution.  BDD facilitates engagement with business stakeholders in the development of the acceptance tests that form part of the software solution.  One common way to do this is to run three amigos sessions which allow developers, QA/testers and domain experts to write the BDD tests, usually in the standard Given-When-Then style.   Prototyping consists of producing images and “wireframes” that give an impression of how a completed software application could look and feel.  Prototypes can be low-fidelity, just simple static images and mock-ups, but it’s better if you can produce high-fidelity prototypes, which allows varying levels of interaction with the prototype. Tools such as Balsamiq and InVision amongst others can help with the production of high-fidelity prototypes.  Event storming is a particular format of meeting or workshop that has developers and domain experts collaborating on the production of a large paper artefact that contains numerous sticky notes of varying colours.  The sticky notes’ colour represent various artifacts within the domain such as events, commands, users, external systems and others.  Sticky notes are added, amended, removed and moved around the paper on the wall by all meeting attendees.  The resulting sticky notes tends to naturally cluster into the various bounded contexts of the domain, allowing the overall domain design to emerge.  If you run your own Event Storming session, Ian suggests to start by trying to drive out the domain events first, and for each event, attempt to first work backwards to find the cause or causes of that event, then work forwards, investigating what this event causes - perhaps further events or the requirement for user intervention etc.

IMG_20170610_145159Ian rounds off his talk by sharing some of his own DDD best practices.  We should strive for creative collaboration between developers and domain experts at all stages of the project, and a fostering of an environment that allows exploration and experimentation in order to find the best model for the domain, which may not be the first model that is found.  Ian states that determining the explicit context boundaries are far more important than finding a perfect model, and that the focus should always be primarily on the core domain, as this is the area that will bring the greatest value to the business.

After Ian’s talk was over, it was time for another coffee break, the last of the day.  I grabbed a coffee and once again checked my agenda sheet to see where to head for the final session of the day.   This last session was to be Stuart Lang’s Async in C#, the Good, the Bad and the Ugly.

IMG_20170610_153023Stuart starts his session by asking the audience to wonder why he’s doing a session on Async in C#' today.  It’s 2017 and Async/Await has been around for over 6 years!  Simply put, Stuart says that, whilst Async/Await code is fairly ubiquitous these days, there are still many mistakes made with some implementations and the finer points of asynchronous code are not as well understood.  We learn how Async is really an abstraction, much like an ORM tool.  If you use it in a naïve way, it’s easy to get things wrong.

Stuart mentions Async’s good parts.  We can essentially perform non-blocking waiting for background I/O operations, thus allowing our user interfaces to remain responsive.  But then comes the bad parts.  It’s not always entirely clear what is asynchronous code.  If we call an Async method in someone else’s library, there’s no guarantee that what we’re really executing is asynchronous code, even if the method returns a Task<T>.  Async can sometimes leads to the need to duplicate code, for example, when your code has to provide both asynchronous and synchronous versions of each method.  Also, Async can’t be used everywhere.  It can’t be used in a class constructor or inside a lock statement.

IMG_20170610_155205One of the worst side-effects of poor Async implementation is deadlocks.  Stuart states that, when it comes to Async, doing it wrong is worse than not doing it at all!  Stuart shows us a simple Async method that uses some Task.Delay methods to simulate background work.  We then look at what the Roslyn compiler actually translates the code to, which is a large amount of code that effectively implements a complete state machine around the actual method’s code. 

IMG_20170610_160038Stuart then talks about Synchronization Context.  This is an often misunderstood part of Async that allows awaited code, that can be resumed on a different thread, to synchronize with other threads.  For example, if awaited code needs to update some element on the user interface in a WinForms or WPF application, it will need to synchronize that change to the UI thread, it can’t be performed by a thread-pool thread that the awaited code would be running on.  Stuart talks about blocking on Async code, for example:

var result = MyAsyncMethod().Result;

We should try to never do this!   Doing so can cause code within the Async method to block on the synchronization context as awaited code within the Async method may be trying to restart on the same synchronization context that is already blocked by the "outer" code that is performing the .Result on the main Async method.

Stuart then shows us some sample code that runs as an ASP.NET page with the Async code being called from within the MVC controller.  He outputs the threads used by the Async code to the browser, and we then examine variations of the code to see how each awaited piece of code uses either the same or different threads.  One way of overcoming the blocking issue when using .Result at the end of an Async method call is to write code similar to:

var result = Task.Run(() => MyAsyncMethod().Result).Result;

It's messy code, but it works.  However, code like this should be heavily commented because if someone removes that outer Task.Run(); the code will start blocking again and will fail miserably.

Stuart then talks about the vs-threading library which provides a JoinableTaskFactory.   Using code such as:

jtf.Run(() => MyAsyncMethod())

ensures that awaited code resumes on the same thread that's blocked, so Stuart shows the output from his same ASP.NET demo when using the JoinableTaskFactory and all of the various awaited blocks of code can be seen to always run and resume on the same thread.

IMG_20170610_163019Finally, Stuart shares some of the best practices around deadlock prevention.  He asserts that the ultimate goal for prevention has to be an application that can provide Async code from the very “bottom” level of the code (i.e. that which is closest to I/O) all the way up to the “top” level, where it’s exposed to other client code or applications.

IMG_20170610_164756After Stuart’s talk is over, it’s time for the traditional prize/swag giving ceremony.  All of the attendees start to gather in the main conference lounge area and await the attendees from the other sessions which are finishing shortly afterwards.  Once all sessions have finished and the full set of attendees are gathered, the main organisers of the conference take time to thank the event sponsors.  There’s quite a few of them and, without them, there simply wouldn’t be a DDD conference.  For this, the sponsors get a rapturous round of applause from the attendees.

After the thanks, there’s a small prize giving ceremony with many of the prizes being given away by the sponsors themselves.  Like most times, I don’t win a prize – given that there was only around half a dozen prizes, I’m not alone!

It only remained for the organisers to announce the next conference in the DDD calendar, which although doesn’t have a specific date at the moment, will take place in October 2017.  This one is DDD North.  There’s also a surprise announcement of a DDD in Dublin to be held in November 2017 which will be a “double capacity” event, catering for around 600 – 800 conference delegates.  Now there’s something to look forward to!

So, another wonderful DDD conference was wrapped up, and there’s not long to wait until it’s time to do it all over again!

UPDATE (20th June 2017):

As last year, Kevin O'Shaughnessy was also in attendance at DDD12 and he has blogged his own review and write-up of the various sessions that he attended.  Many of the sessions attended by Kevin were different than the sessions that I attended, so check out his blog for a fuller picture of the day's many great sessions.

 

DDD East Anglia 2017 In Review

$
0
0

IMG_20170916_083642xThis past Saturday, 16th September 2017, the fourth DDD East Anglia event took place in Cambridge.  DDD East Anglia is a relatively new addition to the DDD event line-up but now it’s fourth event sees it going from strength to strength.

IMG_20170917_091108I’d made the long journey to Cambridge on the Friday evening and stayed in a local B&B to be able to get to the Hills Road College where DDD East Anglia was being held on the Saturday.  I arrived bright and early, registered at the main check-in desk and then proceeded to the college’s recital room just around the corner from the main building for breakfast refreshments.

After some coffee, it was soon time to head back to the main college building and up the stairs to the room where the first session of the day would commence.  My first session was to be Joseph Woodward’s Building A Better Web API With CQRS.

IMG_20170916_091329xJoseph starts his session by defining CQRS.  It’s an acronym, standing for Command Query Responsibility Segregation.  Fundamentally, it’s a pattern for splitting the “read” models from your “write” models within a piece of software.  Joseph points out that we should beware when googling for CQRS as google seems to think it’s a term relating to cars!

CQRS was first coined by Greg Young and it’s very closely related to a prior pattern called CQS (Command Query Separation), originally coined by Bertrand Meyer which states that every method should either be a command which performs an action, or a query which returns data to the caller, but never both.  CQS primarily deals with such separations at a very micro level, whilst CQRS primarily deals with the separations at a broader level, usually along the seams of bounded contexts.  Commands will mutate state and will often be of a “fire and forget” nature.  They will usually return void from the method call.  Queries will return state and, since they don’t mutate any state are idempotent and safe.  We learn that CQRS is not an architectural pattern, but is more of a programming style that simply adheres to the the separation of the commands and queries.

Joseph continues by asking what’s the problem with some of our existing code that CQRS attempts to address.   We look at a typical IXService (where X is some domain entity in a typical business application):

public class ICustomerService
{
     void MakeCustomerPreferred(int customerId);
     Customer GetCustomer(int customerId);
     CustomerSet GetCustomersWithName(string name);
     CustomerSet GetPreferredCustomers();
     void ChangeCustomerLocale(int cutomerId, Locale newLocale);
     void CreateCustomer(Customer customer);
     void EditCustomerDetails(CustomerDetails customerDetails);
}

The problem here is that the interface ends up growing and growing and our service methods are simply an arbitrary collection of queries, commands, actions and other functions that happen to relate to a Customer object.  At this point, Joseph shares a rather insightful quote from a developer called Rob Pike who stated:

“The bigger the interface, the weaker the abstraction”

And so with this in mind, it makes sense to split our interface into something a bit smaller.  Using CQRS, we can split out and group all of our "read" methods, which are our CQRS queries, and split out and group our "write" methods (i.e. Create/Update etc.) which are our CQRS commands.  This will simply become two interfaces in the place of one, an ICustomerReadService and an ICustomerWriteService.

There's good reasons for separating our concerns along the lines of reads vs writes, too.  Quite often, since reads are idempotent, we'll utilise heavy caching to prevent us from making excessive calls to the database and ensure our application can return data in as efficient a manner as possible, whilst our write methods will always hit the database directly.  This leads on to the ability to have entirely different back-end architectures between our reads and our writes throughout the entire application.  For example, we can scale multiple read replica databases independently of the database that is the target for writes.  They could even be entirely different database platforms.

From the perspective of Web API, Joseph tells us how HTTP verbs and CQRS play very nicely together.  The HTTP verb GET is simply one of our read methods, whilst the verbs PUT, POST, DELETE etc. are all of our write concerns.  Further to this, Joseph looks at how we can often end up with MVC or WebAPI controllers that require services to be injected into them and often our controller methods end up becoming bloated from having additional concerns embedded within them, such as validation.  We then look at the command dispatcher pattern as a way of supporting our separation of reads and writes and also as a way of keeping our controller action methods lightweight.

There are two popular frameworks that implement the command dispatcher pattern in the .NET world. MediatR and Brighter.  Both frameworks allow us to define our commands using a plain old C# object (that implements specific interfaces provided by the framework) and also to define a "handler" to which the commands are dispatched for processing.  For example:

public class CreateUserCommand : IRequest
{
     public string EmailAddress { get; set; }
     // Other properties...
}

public class CreateUserCommandHandler : IAsyncRequestHandler<CreateUserCommand>
{
     public CreateUserCommandHandler(IUserRepository userRepository, IMapper mapper)
     {
          _userRepository = userRepository;
          _mapper = mapper;
     }

     public Task Handle(CreateUserCommand command)
     {
          var user = _userRepository.Map<CreateUserCommand, UserEntity>(command);
          await _userRepository.CreateUser(user);
     }
}

Using the above style of defining commands and handlers along with some rudimentary configuration of the framework to allow specific commands and handlers to be connected, we can move almost all of the required logic for reading and writing out of our controllers and into independent, self-contained classes that perform a single specific action.  This enables further decoupling of the domain and business logic from the controller methods, ensuring the controller action methods remain incredibly lightweight:

public class UserController : Controller
{
     private readonly IMediator _mediator;
	 public UserController(IMediator mediator)
	 {
	      _mediator = mediator;
	 }
	 [HttpPost]
	 public async Task Create(CreateUserCommand user)
	 {
	      await _mediator.Send(user);
	 }
}

Above, we can see that the Create action method has been reduced down to a single line.  All of the logic of creating the entity is contained inside the handler class and all of the required input for creating the entity is contained inside the command class.

Both the MediatR and Brighter libraries allow for request and post-request pre-processors.  This allows defining another class, again deriving from specific interfaces/base classes within the framework, which will be invoked before the actual handler class or immediately afterwards.  Such pre-processing if often a perfect place to put cross-cutting concerns such as validation:

public class CreateUserCommandValidation : AbstractValidation<CreateUserCommand>
{
     public CreateUserCommandValidation()
	 {
	      RuleFor(x => x.EmailAddress).NotEmpty().WithMessage("Please specify an email address");
	 }
}

The above code shows some very simple example validation, using the FluentValidation library, that can be hooked into the command dispatcher framework's request pre-processing to firstly validate the command object prior to invoking the handler, and thus saving the entity to the database.

Again, we've got a very nice and clean separation of concerns with this approach, with each specific part of the process being encapsulated within it's own class.  The input parameters, the validation and the actual creation logic.

Both MediatR and Brighter have an IPipelineBehaviour interface, which allows us to write code that hooks into arbitrary places along the processing pipeline.  This allows us to implement other cross-cutting concerns such as logging.  Something that's often required at multiple stages of the entire processing pipeline.

At this point, Joseph shares another quote with us.  This one's from Uncle Bob:

"If your architecture is based on frameworks then it cannot be based on your use cases"

From here, Joseph turns his talk to discuss how we might structure our codebases in terms of files and folders such that separation of concerns within the business domain that the software is addressing are more clearly realised.  He talks about a relatively new style of laying out our projects called Feature Folders (aka Feature Slices).

This involves laying out our solutions so that instead of having a single top-level "Controllers" folder, as is common in almost all ASP.NET MVC web applications, we instead have multiple folders named such that they represent features or specific areas of functionality within our software.  We then have the requisite Controllers, Views and other folders underneath those.   This allows different areas of the software to be conceptually decoupled and kept separate from the other areas.  Whilst this is possible in ASP.NET MVC today, it's even easier with the newer ASP.NET Core framework, and a NuGet package called AddFeatureFolders already exists that enables this exact setup within ASP.NET Core.

Joseph wraps up his talk by suggesting that we take a look at some of his own code on GitHub for the DDD South West website (Joseph is one of the organisers for the DDD South West events) as this has been written using the the CQRS pattern along with using feature folders for layout.

IMG_20170916_102558After Joseph's talk it's time for a quick coffee break, so we head back to the recital room around the corner from the main building for some liquid refreshment.  This time also accompanied by some very welcome biscuits!

After our coffee break, it's time to head back to the main building for the next session.  This one was to be Bart Read's Client-Side Performance For Back-End Developers.

IMG_20170916_103032Bart's session is all about how to maximise performance of client-side script using tools and techniques that we might employ when attempting to troubleshoot and improve the performance of our back-end, server-side code.  Bart starts by stating that he's not a client-side developer, but is more of a full stack developer.  That said, as a full stack developer, one is expected to perform at least some client-side development work from time to time.  Bart continues that in other talks concerning client-side performance, the speakers tend to focus on the page load times and page lifecycle, which whilst interesting and important, is a more a technology-centric way of looking at the problem.  Instead, Bart says that he wants to focus on RAIL, which was originally coined by Google.  This is an acronym for Response, Animation, Idle and Load and is a far more user-centric way of looking at the performance (or perhaps even just perceived performance) issue.  In order to explore this topic, Bart states that he learnt JavaScript and built his own arcade game site, Arcade.ly, which uses extensive JavaScript and other resources as part of the site.

We first look at Response.  For this we need to build a very snappy User Interface so that the user feels that the application is responding to them immediately.  Modern web applications are far more like desktop applications written using either WinForms or WPF than ever and users are very used to these desktop applications being incredibly responsive, even if lots of processing is happening in the background.  One way to get around this is to use "fake" pages.  These are pages that load very fast, usually without lots of dynamic data on them, that are shown to the user whilst asynchronous JavaScript loads the "real" page in the background.  Once fully loaded, the real page can be gracefully shown to the user.

Next, we look at Animation. Bart reminds us that animations help to improve the user perception of responsiveness of your user interface.  Even if your interface is performing some processing that takes a few milliseconds to run, loading and displaying an animation that the user can view whilst that processing to going on will greater enhance the perceived performance of the complete application.  We need to ensure that our animations always run at 60 fps (frames per second), anything less than this will cause them to look jerky and is not a good user experience.  Quite often, we need to perform some computation prior to invoking our animation and in this scenario we should ensure that the computation is ideally performed in less than 10 milliseconds.

Bart shares a helpful online utility called CanvasMark which provides benchmarking for HTML5 Canvas rendering.  This can be really useful in order to test the animations and graphics on your site and how they perform on different platforms and browsers

Bart then talks about using the Google Chrome Task Manager to monitor the memory usage of your page.  A lot of memory can be consumed by your page's JavaScript and other large resources.  Bart talks about his own arcade.ly site which uses 676MB of memory.  This might be acceptable on a modern day desktop machine, but it will not work so well on a more constrained mobile device.  He states that after some analysis of the memory consumption, most of the memory was consumed by the raw audio that was decompressed from loaded compressed audio in order to provide sound effects for the game.  By gracefully degrading the quality and size of the audio used by the site based upon the platform or device that is rendering the site, performance was vastly improved.

Another common pitfall is in how we write our JavaScript functions.  If we're going to be creating many instances of a JavaScript object, as can happen in a game with many individual items on the screen, we shouldn't attach functions directly to the JavaScript object as this creates many copies of the same function.  Instead, we should attach the function to the object prototype, creating a single copy of the function, which is then shared by all instances of the object and thus saving a lot of memory.  Bart also warns us to be careful of closures on our JavaScript objects as we may end up capturing far more than we actually need.

We now move onto Idle.   This is all about deferring work as the main concern for our UI is to respond to the user immediately.  One approach to this is to use Web Workers to perform work at a later time.  In Bart's case, he says that he wrote his own Task Executor which creates an array of tasks and uses the builtin JavaScript setTimeout function to slowly execute each of the queued tasks.  By staggering the execution of the queued tasks, we prevent the potential for the browser to "hang" with a white screen whilst background processing is being performed, as can often happen if excessive tasks and processing is performed all at once.

Finally, we look at Load.  A key take away of the talk is to always use HTTP/2 if possible.  Just by switching this on alone, Bart says you'll see a 20-30% improvement in performance for free.  In order to achieve this, HTTP/2 provides us with request multiplexing, which bundles requests together meaning that the browser can send multiple requests the the server in one go.  These requests won't necessarily respond any quicker, but we do save on the latency overhead we would incur if sending of each request separately.  HTTP/2 also provides server push functionality, stream priority and header compression.  It also has protocol encryption, which whilst not an official part of the HTTP/2 specification, is currently mandated by all browsers that support the HTTP/2 protocol, effectively making encryption compulsory.  HTTP/2 is widely supported across all modern browsers on virtually all platforms, with only Opera Mini being only browser without full support, and HTTP/2 is also fully supported within most of today's programming frameworks.  For example, the .NET Framework has supported HTTP/2 since version 4.6.0.  One other significant change when using HTTP/2 is that we no longer need to "bundle" our CSS and JavaScript resources.  This also applies to "spriting" of icons as a single large image.

Bart moves on to talk about loading our CSS resources and he suggests that one very effective approach is to inline the bare minimum CSS we would require to display and render our "above the fold" content with the rest of the CSS being loaded asynchronously.  The same applies to our JavaScript files, however, there can be an important caveat to this.  Bart explains how he loads some of his JavaScript synchronously, which itself negatively impacts performance, however, this is required to ensure that the asynchronously loaded 3rd-party JavaScript - over which you have no control - does not interfere with your own JavaScript as the 3rd-party JavaScript is loaded at the very last moment whilst Bart's own JavaScript is loaded right up front.  We should look into using DNS Prefetch to force the browser to perform DNS Lookups ahead of time for all of the domains that our site might reference for 3rd party resources.  This incurs a one off small performance impact as the page first loads, but makes subsequent requests for 3rd party content much quicker.

Bart warns us not to get too carried away putting things in the HEAD section of our pages and instead we should focus on getting the "above the fold" content to be as small as possible, ideally it should be all under 15kb, which is the size of data that can fit in a single HTTP packet.  Again, this is a performance optimization that may not have noticeable impact on desktop browsers, but can make a huge difference on mobile devices, especially if they're using a slow connection.  We should always check the payload size of our sites and ensure that we're being as efficient as possible and not sending more data than is required.  Further to this, we should use content compression if our web server supports it.  IIS has supported content compression for a long time now, however, we should be aware of a bug that affects IIS version 8 and possibly version 9 which turns off compression for chunked content. This bug was fixed in IIS version 10.

If we're using libraries or frameworks in our page, ensure we only deliver the required parts.  Many of today's libraries are componentized, allowing the developer to only include the parts of the library/framework that they actually need and use.  Use Content Delivery Networks if you're serving your site to many different geographic areas, but also be aware that, if your audience is almost exclusively located in a single geographic region, using a CDN could actually slow things down.  In this case, it's better to simply serve up your site directly from a server located within that region.

Finally, Bart re-iterates.  It's all about Latency.   It's latency that slows you down significantly and any performance optimizations that can be done to remove or reduce latency will improve the performance, or perceived performance, of your websites.

IMG_20170916_102542After Bart's talk, it's time for another coffee break.  We head back to the recital room for further coffee and biscuits and after a short while, it's time for the 3rd session of the day and the last one prior to lunch.  This session is to be a Visual Note Taking Workshop delivered by Ian Johnson.

As Ian's session was an actual workshop, I didn't take too many notes but instead attempted to take my notes visually using the technique of Sketch-Noting that Ian was describing.

Ian first states that Sketch-Noting is still mostly about writing words.  He says that most of us, as developers using keyboards all day long, have pretty terrible hand writing so we simply need to practice more at it.  Ian suggests to avoid all caps words and cursive writing, using a simple font and camel cased lettering (although all caps is fine for titles and headings).  Start bigger to get the practice of forming each letter correctly, then write smaller and smaller as you get better at it.  You'll need this valuable skill since Sketch-Noting requires you to be able to write both very quickly and legibly. 

At this point, I put my laptop away and stopped taking written notes in my text editor and tried to actually sketch-note the rest of Ian's talk, which gave us many more pointers and advice on how to construct our Sketch Notes.  I don't consider myself artistic in the slightest, but Ian insists that Sketch Notes don't really rely on artistic skill, but more on the skill of being able to capture the relevant information from a fast-moving talk.  I didn't have proper pens for my Sketch Note and had to rely solely on my biro, but here in all its glory is my very first attempt at a Sketch Note:

IMG_20170916_125616

IMG_20170916_130218After Ian's talk was over, it was time for lunch.  All the attendees reconvened in the recital room where we could help ourselves to the lunch kindly provided by the conference organizers and paid for by the sponsors.  Lunch was the usual brown bag affair consisting of a sandwich, some crisps a chocolate bar, a piece of fruit and a can of drink.  I took the various items for my lunch and the bag and proceeded to wander just outside the recital room to a small green area with some tables.   It was at this point that the weather decided to turn against us and is started raining very heavily.  I made a hasty retreat back inside the recital room where it was warm and dry and proceeded to eat my lunch there.

There were some grok talks taking place over the lunch time, but considering the weather and the fact the the grok talk were taking place in the theatre room which was the further point from the recital room, I decided against attending them and chose to remain warm and dry instead.

After lunch, it was time to head back to the main building for the next session, this one was to be Nathan Gloyn'sMicroservices - What I've Learned After A Year Building Systems.

IMG_20170916_135912Nathan first introduces himself and states that he's a contract developer.  As such, he's been involved in two different projects over the previous 12 months that have been developed using a microservices architecture.  We first asked to consider the question of why should we use microservices?  In Nathan's experience so far, he says, Don't!  In qualifying that statement, Nathan states that microservices are ideal if you need only part of a system to scale, however, for the majority of applications, the benefits to adopting a microservices architecture doesn't outweigh the additional complexity that is introduced.

Nathan state that building a system composed of microservices requires a different way of thinking.  With more monolithic applications, we usually scale them by scaling out - i.e. we use the same monolithic codebase for the website and simply deploy it to more machines which all target the same back-end database.  Microservices don't really work like this, and need to be individually small and simple.  They may even have their own individual database just for the individual service.

Microservices are often closely related to Domain-driven Design's Bounded Contexts so it's important to correctly identify the business domain's bounded contexts and model the microservices after those.  Failure to do this runs the risk that you'll create a suite of mini-monoliths rather than true microservices.

Nathan reminds us that we are definitely going to need a messaging system for an application built with a microservice architecture.  It's simply not an option not to use one as virtually all user interactions will be performed across multiple services.  Microservices are, by their very nature, entirely distributed.  Even simple business processes can often require multiple services and co-ordination of those services.  Nathan says that it's important not to build any messaging into the UI layer as you'll end up coupling the UI to the domain logic and the microservice which is something to be avoided.  One option for a messaging system is NServiceBus, which is what Nathan is currently using, however many other options exist.   When designing the messaging within the system, it's very important to give consideration to versioning of messages and message contracts.  Building in versioning from the beginning ensures that you can deploy individual microservices independently rather than being forced to deploy large parts of the system - perhaps multiple microservices - together if they're all required to use the exact same message version.

We next look at the difference between "fat" versus "thin" services.  Thin services generally only deal with data that they "own", if the thin service needs other data for processing, they must request it from the other service that owns that data.  Fat services, however, will hold on to data (albeit temporarily) that actually "belongs" to other services in order to perform their own processing.  This results in coupling between the services, however, the coupling of fat and thin services is different as fat services are coupled by data whereas thin services are coupled by service.

With microservices, cross-cutting concerns such as security and logging become even more important than ever.  We should always ensure that security is built-in to every service from the very beginning and is treated as a first class citizen of the service.  Enforcing the use of HTTPS across the board (i.e. even when running in development or test environments as well as production) helps to enforce this as a default choice.

We then look at how our system's source code can be structured for a microservices based system.  It's possible to use either one source control repository or multiple and there's trade-offs against both options.  If we use a single repository, that's really beneficial during the development phase of the project, but is not so great when it comes to deployment.  On the other hand, using multiple repositories, usually separated by microservice, is great for deployment since each service can be easily integrated and deployed individually, but it's more cumbersome during the development phase of the project.

It's important to remember that each microservice can be written using it's own technology stack and that each service could use an entirely different stack to others.  This can be beneficial if you have different team with different skill sets building the different services, but it's important to remember to you'll need to constantly monitor each of the technology stacks that you use for security vulnerabilities and other issues that may arise or be discovered over time.  Obviously, the more technology stacks you're using, the more time-consuming this will be.

It's also important to remember that even when you're building a microservices based system, you will still require shared functionality that will be used by multiple services.  This can be built into each service or can be separated out to become a microservice in it's own right depending upon the nature of the shared functionality.

Nathan talks about user interfaces to microservice based systems.  These are often written using a SPA framework such as Angular or React.  They'll often go into their own repository for independent deployment, however, you should be very careful that the front-end user interface part of the system doesn't become a monolith in itself.  If the back-end is nicely separated into microservice based on domain bounded contexts, the front-end should be broken down similarly too.

Next we look at testing of a microservice based system.  This can often be a double-edged sword as it's fairly easy to test a single microservice with its known good (or bad) inputs and outputs, however, much of the real-world usage of the system will be interactions that span multiple services so it's important to ensure that you're also testing the user's path through multiple services.  This can be quite tricky and there's no easy way to achieve this.  It's often done using automated integration testing via the user interface, although you should also ensure you do test the underlying API separately to ensure that security can't be bypassed.

Configuration of the whole system can often be problematic with a microservice based system.  For this reason, it's usually best to use a separate configuration management system rather than trying to implement things like web.config transforms for each service.  Tools like Consul or Spring Cloud Config are very useful here.

Data management is also of critical importance.  It should be possible to change data within the system's data store without requiring a deployment.  Database migrations are a key tool is helping with this.  Nathan mentions both Entity Framework Migrations and also FluentMigrator as two good choices.  He offers a suggestion for things like column renames and suggests that instead of a migration that renames the column, create a whole new column instead.  That way, if the change has to be rolled back, you can simply remove the new column, leaving the old column (with the old name) in place.  This allows other services that may not be being deployed to continue to use the old column until they're also updated.

Nathan then touches on multi-tenancy within microservice based systems and says that if you use the model of a separate database per tenant, this can lead to a huge explosion of databases if your microservices are using multiple databases for themselves.  It's usually much more manageable to have multi-tenancy by partitioning tenant data within a single database (or the database for each microservice).

Next, we look at logging and monitoring within our system.  Given the distributed nature of a microservice based system, it's important to be able to log and understand the complete user interaction even though logging is done individually by individual microservices.  To facilitate understanding the entire end-to-end interaction we can use a CorrelationID for this.  It's simply a unique identifier that travels through all services, passed along in each message and written to the logs of each microservice.  When we look back at the complete set of logs, combined from the disparate services, we can use the CorrelationID to correlate the log messages into a cohesive whole.  With regard to monitoring, it's also critically important to monitor the entire system and not just the individual services.  It's far more important to know how healthy the entire system is rather than each service, although monitoring services individually is still valuable.

Finally, Nathan shares some details regarding custom tools.  He says that, as a developer on a microservice based system, you will end up building many custom tools.  These will be such tools as bulk data loading whereby getting the data into the database requires processing by a number of different services and cannot simply be directly loaded into the database.  He says that despite the potential downsides of working on such systems, building the custom tools can often be some of the more enjoyable parts of building the complete system.

After Nathan's talk it was time for the last coffee break of the day, after which it was time for the final day's session.  For me, this was Monitoring-First Development by Benji Weber.

IMG_20170916_152036Benji starts his talk by introducing himself and says that he's mostly a Java developer but that he done some .NET and also writes JavaScript as well.  He works at an "ad-tech" company in London.  He wants to first start by talking about Extreme Programming as this is a style of programming that he uses in his current company.  We look at the various practices within Extreme Programming (aka XP) as many of these practices have been adopted within wider development styles, even for teams that don't consider themselves as using XP.  Benji says that, ultimately, all of the XP practices boils down to one key thing - Feedback.  They're all about getting better feedback, quicker.  Benji's company uses full XP with a very collaborative environment, collectively owning all the code and the entire end-to-end process from design/coding through to production, releasing small changes very frequently.

As part of this style adopted by the teams, it's lead onto the adoption of something they term Monitor-Driven Development.  This is simply the idea that monitoring of all parts of a software system is the core way to get feedback on that system, both when the system is being developed and as the system is running in production.  Therefore, all new feature development starts by asking the question, "How will we monitor this system?" and then ensuring that the ability to deeply monitor all aspects of the system being developed is a front and centre concern throughout the development cycle.

To illustrate how the company came to adopt this methodology, Benji shares three short stories with us.  The first started with an email to the development team with the subject of "URGENT".  It was from some sales people trying to demonstrate some part of the software and they were complaining that the graphs on the analytics dashboard weren't loading for them.  Benji state that this was a feature that was heavily tested in development so the error seemed strange.   After some analysis into the problem, it was discovered that data was the root cause of the issue and that the development team had underestimated the way in which the underlying data would grow due to users doing unanticipated things on the system, which the existing monitoring that the team had in place didn't highlight.  The second story involves the discovery that 90% of the traffic their software was serving from the CDN was HTTP 500 server errors!  Again, after analysis it was discovered that the problem lay in some JavaScript code that a recently released new version of Internet Explorer was interpreting different from the old version and that this new version was caused the client-side JavaScript to continually make requested to a non-existent URL.  The third story involves a report from an irate client that the adverts being served up by the company's advert system was breaking the client's own website.  Analysis showed that this was again caused by a JavaScript issue and a line of code of self = this; that was incorrectly written using a globally-scoped variable, thereby overwriting variables that the client's own website relied upon.  The common theme throughout all of the stories was that the behaviour of the system had changed, even though no code had changed.   Moreover, all of the problems that arose from the changed behaviour were first discovered by the system's user and not the development team.

Benji references Google's own Site Reliability Engineering book (available to read online for free) which states that 70% of the reasons behind things breaking is because you've changed something.  But this leaves a large 30% of the time where the reasons are something that's outside of your control.  So how did Benji approach improving his ability to detect and respond to issues?  He started by looking at causes vs problems and concluded that they didn't have enough coverage of the problems that occurred.  Benji tells us that it's almost impossible to get sufficient coverage of the causes since there's an almost infinite number of things that could happen that could cause the problem.

To get better coverage of the problems, they universally adopted the "5 whys" approach to determining the root issues.  This involves starting with the problem and repeated asking "why?" to each cause to determine the root cause.  An example is, monitoring is hard. Why is it hard since we don't have the same issue when using Test-Driven Development during coding?  But TDD follows a Red - Green - Refactor cycle so you can't really write untestable code etc.

IMG_20170916_155043So Benji decided to try to apply the Test-Driven Development principles to monitoring.  Before even writing the feature, they start by determining how the feature will be monitored, then only after determining this, they start work on writing the feature ensuring that the monitoring is not negatively impacted.  In this way, the monitoring of the feature becomes the failing unit test that the actual feature implementation must make "go green".

Benji shares an example of show this is implemented and says that the "failing test" starts with a rule defined within their chosen monitoring tool, Nagios.  This rule could be something like "ensure that more adverts are loaded than reported page views", whereby the user interface is checked for specific elements or a specific response rendering.  This test will show as a failure within the monitoring system as the feature has not yet been implemented, however, upon implementation of the feature, the monitoring test will eventually pass (go green) and there we have a correct implementation of the feature driven by the monitoring system to guide it.  Of course, this monitoring system remains in place with these tests increasing over time and becoming an early warning system should any part of the software, within any environment, start to show any failures.  This ensure that the development team are the first to know of any potential issues, rather than the users being first to know.

Benji says they use a pattern called the Screenplay pattern for their UI based tests.  It's an evolution of the Page Objects pattern and allows highly decoupled tests as well as bringing the SOLID principles to the tests themselves.  He also states that they make use of Feature Toggles not only when implementing new features and functionality but also when refactoring existing parts of the system.  This allows them to test new monitoring scenarios without affecting older implementations.  Benji states that it's incredibly important to follow a true Red - Green - Refactor cycle when implementing monitoring rules and that you should always see your monitoring tests failing first before trying to make them pass/go green.

Finally, Benji says that adopting a monitoring-driven development approach ultimately helps humans too.  It helps in future planning and development efforts as it builds awareness of how and what to think about when designing new systems and/or functionality.

IMG_20170916_165131After Benji's session was over, it was time for all the attendees to gather back in the theatre room for the final wrap-up by the organisers and the prize draw.  After thanking the various people involved in making the conference what it is (sponsors, volunteers, organisers etc.) it was time for the prize draw.  There were some good prizes up for grabs, but alas, I wasn’t to be a winner on this occasion.  The DDD East Anglia 2017 event had been a huge success and it was all the more impressive given that the organisers shared the story that their original venue had pulled out only 5 weeks prior to the event!  The new venue had stepped in at the last minute and held an excellent conference which was  completely seamless to the attendees.  We would never have known of the last minute panic had it not been shared with us.  Here's looking forward to the next DDD East Anglia event next year.

Analyze Solution For Code Clones Missing In Visual Studio 2017

$
0
0

I was watching a video on Microsoft’s Channel 9 website regarding design patterns and during the video the presenter showed a feature of Visual Studio that analysed the currently loaded solution for “code clones” – sections of code that appear to be clones or duplicates in different methods.

This seemed like a very useful feature and something that I thought I could benefit from on a number of codebases that I’m currently working with.  So off I went to explore this handy feature for myself.

First thing to note this that, although this feature has been part of Visual Studio since Visual Studio 2012, it’s only available in the Enterprise SKU of the product (or the Premium or Ultimate versions for VS 2012/2013).

That was ok, as I’m using Visual Studio 2017 Enterprise, however, upon expanding the Analyze menu option, I was greeted with a distinct absence of the Analyze Solution For Code Clones option:

cc-visualstudio

Hmm..  I was curious why the option was missing so went off doing my best googling to try to find out why.   And it turns out that there’s very little information out there on the internet regarding this feature, and even less information regarding troubleshooting why the option might not be appearing on the Analyze tools menu.

After some amount of aimlessly clicking around on the spurious search results, it suddenly dawned on me.   Perhaps it’s an “optional extra” component that you need to install?   Turns out I was right.

When I’d originally installed Visual Studio 2017, I’d selected one of the pre-configured “Workloads” to install.  In my case it was the “ASP.NET And Web Development” workload as that’s the sort of development work I do.  Turns out that, whilst this workload installs almost everything you might want or need for ASP.NET And Web Development, there’s a few crucial things missing, one of which is the “Architecture And Analysis Tools” component – which in turn includes the “Code Clone” tool:

cc-visualstudioinstaller

In order to get the Code Clone tool installed, you can either select just that tool from the “Individual components” tab of the Visual Studio installer, or you can select the “Architecture and analysis tools” optional component (as seen on the right hand side in the above picture) which will select the Code Clone, Code Map and Live Dependency Validation tools.

Once this is done, restarting Visual Studio will give you the missing “Analyze Solution for Code Clones” menu option:

cc-visualstudio2

Easy when you know how, eh?

DDD North 2017 In Review

$
0
0

IMG_20171014_085217On Saturday, 14th October 2017, the 7th annual DDD North event took place.  This time taking place in the University of Bradford.

IMG_20171015_171513One nice element of the DDD North conferences (as opposed to the various other DDD conferences around the UK) is that I'm able to drive to the conference on the morning of the event and drive home again after the event has finished.  This time, the journey was merely 1 hour 20 minutes by car, so I didn't have to get up too early in order to make the journey.  On the Saturday morning, after having a quick cup of coffee and some toast at home, I headed off towards Bradford for the DDD North event.

IMG_20171014_085900After arriving at the venue and parking my car in one of the ample free car parks available, I headed to the Richmond Building reception area where the conference attendees were gathering.  After registering my attendance and collecting my conference badge, I headed into the main foyer area to grab some coffee and breakfast.  The catering has always been particularly good at the DDD North conferences and this time round was no exception.  Being a vegetarian nowadays, I can no longer avail myself of a sausage or bacon roll, both of which were available, however on this occasion there was also veggie sausage breakfast rolls available too.   A very nice and thoughtful touch!  And delicious, too!

After a some lovely breakfast and a couple of cups of coffee, it was soon time to head off the the first session of the day.  This one was to be Colin Mackay's User Story Mapping For Beginners.

IMG_20171014_093309Colin informs us that his talk will be very hands-on, and so he hands out some sticky notes and markers to some of the session attendees, but unfortunately, runs out of stock of them before being able to supply everyone.

Colin tells us about story mapping and shares a quote from Martin Fowler:

"Story mapping is a technique that provides the big picture that a pile of stories so often misses"

Story mapping is essentially a way of arranging our user stories, written out on sticky notes, into meaningful "groups" of stories, tasks, and sections of application or business functionality.  Colin tells us that it's a very helpful technique for driving out a "ubiquitous language" and shares an example of how he was able to understand a sales person's usage of the phrase "closing off a customer" to mean closing a sale, rather than the assuming it to mean that customer no longer had a relationship with the supplier.  Colin also states that a document cannot always tell you the whole story.  He shares a picture from his own wedding which was taken in a back alley from the wedding venue.  He says how the picture appears unusual for a wedding photo, but the photo doesn't explain that there'd been a fire alarm in the building and all the wedding guests had to gather in the back alley at this time and so they decided to take a photo of the event!  He also tells us how User Story Mapping is great for sparking conversations and helps to improve prioritisation of software development tasks.

Colin then gets the attendees that have the sticky notes and the markers to actually write out some user story tasks based upon a person's morning routine.  He states that this is an exercise from the book User Story Mapping by Jeff Patton.  Everyone is given around 5 minutes to do this and afterwards, Colin collects the sticky notes and starts to stick them onto a whiteboard.  Whilst he's doing this, he tells us that there's 3 level of tasks with User Story Mapping.  At the very top level, there's "Sea" level.  These are the user goals and each task within is atomic - i.e. you can't stop in the middle of it and do something else.  Next is Summary Level which is often represented by a cloud or a kite and this level shows greater context and is made up of many different user goals.  Finally, we have the Sub-functions, represented by a fish or a clam.  These are the individual tasks that go to make up a user goal. So an example might have a user goal (sea level) of "Take a Shower" and the individual tasks could be "Turn on shower", "Set temperature", "Get in shower", "Wash body", "Shampoo hair" etc.

After an initial arrangement of sticky notes, we have our initial User Story Map for a morning routine.  Colin then says we can start to look for alternatives.  The body of the map is filled with notes representing details and individual granular tasks, there's also variations and exceptions here and we'll need to re-organise the map as new details are discovered so that the complete map makes sense.  In a software system, the map becomes the "narrative flow" and is not necessarily in a strict order as some tasks can run in parallel.  Colin suggests using additional sticker or symbols that can be added to the sticky note to represent which teams will work on which parts of the map.

Colin says it's good to anthropomorphise the back-end systems within the overall software architecture as this helps with conversations and allows non-technical people to better understand how the component parts of the system work together.  So, instead of saying that the web server will communicate with the database server, we could say that Fred will communicate with Bob or that Luke communicates with Leia. Giving the systems names greater helps.

We now start to look at the map's "backbone".  These are the high level groups that many individual tasks will fit into.  So, for our morning routine map, we can group tasks such as "Turn off alarm" and "Get out of bed" as a grouping called "Waking up".  We also talk about scope creep.  Colin tells us that, traditionally, more sticky notes being added to a board even once the software has started to be built is usually referred to as scope creep, however, when using techniques such as User Story Mapping, it often just means that your understanding of the overall system that's required is getting better and more refined.

IMG_20171014_101304Once we've built our initial User Story Map, it's easy to move individual tasks within a group of tasks in a goal below a horizontal line which was can draw across the whiteboard.  These tasks can the represent a good minimum viable product and we simply move those tasks in a group that we deem to be more valuable, and thus required for the MVP, whilst leaving the "nice to have" tasks in the group on the other side of the line.  In doing this, it's perfectly acceptable to replace a task with a simpler task as a temporary measure, which would then be removed and replaced with the original "proper" task for work beyond MVP.  After deciding upon our MVP tasks, we can simply rinse and repeat the process, taking individual tasks from within groups and allocating them to the next version of the product whilst leaving the less valuable tasks for a future iteration. 

Colin says how this process results in producing something called "now maps" as the represent what we have, or where we're at currently, whereas what we'll often produce is "later maps", these are the maps that represent some aspect of where we want to be in the future.  Now maps are usually produced when you're first trying to understand the existing business processes that will be modelled into software.  From here, you can produce Later maps showing the iterations of the software as will be produced and delivered in the future.  Colin also mentions that we should always be questioning all of the elements of our maps, asking question such as "Why does X happen?", "What are the pain points around this process?", "What's good about the process?" and "What would make this process better?".  It's by continually asking such questions, refining the actual tasks on the map, and continually reorganising the map that we can ultimately create great software that really adds business value.

Finally, Colin shares some additional resources where we can learn more about User Story Mapping and related processes in general.  He mentions the User Story Mapping book by Jeff Patton along with The Goal by Eli Goldratt, The Phoenix Project by Gene Kim, Kevin Behr and George Spafford and finally, Rolling Rocks Downhill by Clarke Ching.

After Colin's session is over, it's time for a quick coffee break before the next session.   The individual rooms are a little distance away from the main foyer area where the coffee is served, and I realised by next session was in the same room as I was already sat!  Therefore, I decided I'm simply stay in my seat and await the next session.  This one was to be David Whitney's How Stuff Works...In C# - Metaprogramming 101.

IMG_20171014_104223David's talk is going to be all about how some of the fundamental frameworks that we use as .NET developers everyday work and how they're full of "metaprogramming".  Throughout his talk, he's going to decompose an MVC (Model View Controller) framework, a unit testing framework and a IoC (Inversion of Control) container framework to show they work and specifically to examine how they operate on the code that we write that uses and consumes these frameworks.

To start, David explains what "MetaProgramming" is.  He shares the Wikipedia definition, which in typical Wikipedia fashion, is somewhat obtuse.  However the first statement does sum it up:

"Metaprogramming is a programming technique in which computer programs have the ability to treat programs as their data."

This simply means that meta programs are programs that operate on other source code, and Meta programming is essentially about writing code that looks at, inspects and works with your own software's source code.

David says that in C#, meta programming is mostly done by using class within the System.Reflection namespace and making heavy use of things such as the Type class therein, which allows us to get all kinds of information about the types, methods and variables that we're going to be working with.  David shows a first trivial example of a meta program, which enumerates the list of types by using a call to the Assembly.GetTypes() method:

public class BasicReflector
{
	public Type[] ListAllTypesFromSamples()
	{
		return GetType().Assembly.GetTypes();
	}

	public MethodInfo[] ListMethodsOn<T>()
	{
		return typeof(T).GetMethods();
	}
}

He asks why you want to do this?  Well, it's because many of the frameworks we use (MVC, Unit Testing etc.) are essentially based on this ability to perform introspection on the code that you write in order to use them.  We often make extensive use of the Type class in our code, even when we're not necessarily aware that we're doing meta-programming but the Type class is just one part of a rich "meta-model" for performing reflection and introspection over code.  A Meta-Model is essentially a "model of your model".  The majority of methods within the System.Reflection namespace that provide this Metamodel usually end with "Info" in the method name, so methods such as TypeInfo, MethodInfo, MemberInfo and ConstructorInfo can all be used to give us highly detailed information and data about our code.

As an example, a unit testing framework at it's core is actually trivially simple.  It essentially just finds code and runs it.  It examines your code for classes decorated with a specific attribute (i.e. [TestFixture]) and invokes methods that are decorated with a specific attribute(s)(i.e. [Test]).  David says that one of his favourite coding katas is to write a basic unit testing framework in less than an hour as this is a very good exercise for "Meta Programming 101".

We look at some code for a very simple Unit Testing Framework, and there's really not a lot to it.  Of course, real world unit testing frameworks contain many more "bells-and-whistles", but the basic code shown below performs the core functionality of a simple test runner:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;

namespace ConsoleApp1
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var testFinder = new TestFinder(args);
            var testExecutor = new TestExecutor();
            var testReporter = new TestReporter();
            var allTests = testFinder.FindTests();
            foreach (var test in allTests)
            {
                TestResult result = testExecutor.ExecuteSafely(test);
                testReporter.Report(result);
            }
        }
    }

    public class TestFinder
    {
        private readonly Assembly _testDll;

        public TestFinder(string[] args)
        {
            var assemblyname = AssemblyName.GetAssemblyName(args[0]);
            _testDll = AppDomain.CurrentDomain.Load(assemblyname);
        }

        public List<MethodInfo> FindTests()
        {
            var fixtures = _testDll.GetTypes()
                .Where(x => x.GetCustomAttributes()
                    .Any(c => c.GetType()
                        .Name.StartsWith("TestFixture"))).ToList();
            var allMethods = fixtures.SelectMany(f => 
                f.GetMethods(BindingFlags.Public | BindingFlags.Instance));
            return allMethods.Where(x => x.GetCustomAttributes()
                .Any(m => m.GetType().Name.StartsWith("Test")))
                .ToList();
        }
    }

    public class TestExecutor
    {
        public TestResult ExecuteSafely(MethodInfo test)
        {
            try
            {
                var instance = Activator.CreateInstance(test.DeclaringType);
                test.Invoke(instance, null);
                return TestResult.Pass(test);
            }
            catch (Exception ex)
            {
                return TestResult.Fail(test, ex);
            }
        }
    }

    public class TestReporter
    {
        public void Report(TestResult result)
        {
            Console.Write(result.Exception == null ? "." : "x");
        }
    }

    public class TestResult
    {
        private Exception _exception = null;
        public Exception Exception { get => _exception;
            set => _exception = value;
        }

        public static TestResult Pass(MethodInfo test)
        {
            return new TestResult { Exception = null };
        }

        public static TestResult Fail(MethodInfo test, Exception ex)
        {
            return new TestResult { Exception = ex };
        }
    }
}

David then talks about the ASP.NET MVC framework.  He says that it is a framework that, in essence, just finds and runs user code, which sounds oddly familiar to a unit testing framework!  Sure, there's additional functionality within the framework, but at a basic level, the framework simply accepts a HTTP request, finds the user code for the requested URL/Route and runs that code (this is the controller action method).  Part of running that code might be the invoking of a ViewEngine (i.e. Razor) to render some HTML which is sent back to the client at the end of the action method.  Therefore, MVC is merely meta-programming which is bound to HTTP.  This is a lot like an ASP.NET HttpHandler and, in fact, the very first version of ASP.NET MVC was little more than one of these.

David asks if we know why MVC was so successful.  It was successful because of Rails.  And why was Rails successful?  Well, because it had sensible defaults.  This approach is the foundation of the often used "convention over configuration" paradigm.  This allows users of the framework to easily "fall into the pit of success" rather than the "pit of failure" and therefore makes learning and working with the framework a pleasurable experience.  David shows some more code here, which is his super simple MVC framework.  Again, it's largely based on using reflection to find and invoke appropriate user code, and is really not at all dissimilar to the Unit Testing code we looked at earlier.  We have a ProcessRequest method:

public void ProcessRequest(HttpContext context)
{
	var controller = PickController(context);
	var method = PickMethod(context, controller);
	var instance = Activator.CreateInstance(controller);
	var response = method.Invoke(instance, null);
	HttpContext.Current.Response.Write(response);
}

This is the method that orchestrates the entire HTTP request/response cycle of MVC.  And the other methods called by the ProcessRequest method use the reflective meta-programming and are very similar to what we've already seen.  Here's the PickController method, which we can see tries to find types whose names both start with a value from the route/URL and also end with "Controller".  We can also see that we use a sensible default of "HomeController" when a suitable controller can't be found:

private Type PickController(HttpContext context)
{
	var url = context.Request.Url;
	Type controller = null;
	var types = AppDomain.CurrentDomain.GetAssemblies()
					.SelectMany(x => x.GetTypes()).ToList();
	controller = types.FirstOrDefault(x => x.Name.EndsWith("Controller")&& url.PathAndQuery.StartsWith(x.Name)) 
					?? types.Single(x => x.Name.StartsWith("HomeController"));
	return controller;
}

Next, we move on to the deconstruction of an IoC Container Framework.  An IoC container framework is again a simple framework that works due to meta-programming and reflection.  At their core, they simply stores a dictionary of mappings of interfaces to types, and they expose a method to register this mapping, as well as a method to create an instance of a type based on a given interface.  This creation is simply a recursive call ensuring that all objects down the object hierarchy are constructed by the IoC Container using the same logic to find each object's dependencies (if any).   David shows us his own IoC container framework on one of his slides and it's only around 70 lines of code.  It almost fits on a single screen.  Of course, this is a very basic container and doesn't have all the real-world required features such as object lifetime management and scoping, but it does work and performs the basic functionality.  I haven't shown the code here as it's very similar to the other meta-programming code we've already looked at, but there's a number of examples of simple IoC containers out there on the internet, some written in only 15 lines of code!

After the demos, David talks about how we can actually use the reflection and meta-programming we've seen demonstrated in our own code as we're unlikely to re-write our MVC, Unit Testing or IoC frameworks.  Well, there's a number of ways in which such introspective code can be used.  One example is based upon some functionality for sending emails, a common enough requirement for many applications.  We look at some all too frequently found code that has to branch based upon the type of email that we're going to be sending:

public string SendEmails(string emailType)
{
	var emailMerger = new EmailMerger();
	if (emailType == "Nightly")
	{
		var nightlyExtract = new NightlyEmail();
		var templatePath = "\\Templates\\NightlyTemplate.html";
		return emailMerger.Merge(templatePath, nightlyExtract);
	}
	if (emailType == "Daily")
	{
		var dailyExtract = new DailyEmail();
		var templatePath = "\\Templates\\DailyTemplate.html";
		return emailMerger.Merge(templatePath, dailyExtract);
	}
	throw new NotImplementedException();
}

We can see we're branching conditionally based upon a string that represents the type of email we'll be processing, either a daily email or a nightly one.  However, by using reflective meta-programming, we can change the above code to something much more sophisticated:

public string SendEmails(string emailType)
{
	var strategies = new Email[] {new NightlyEmail(), new DailyEmail()};
	var selected = strategies.First(x => x.GetType().Name.StartsWith(emailType));
	var templatePath = "\\Templates\\" + selected.GetType().Name + ".html";
	return new EmailMerger().Merge(templatePath, selected);
}

IMG_20171014_112710Another way of using meta-programming within our own code is to perform automatic registrations for our DI/IoC Containers.  We often have hundreds or thousands of lines of manual registration, such as container.Register<IFoo, Foo>(); and we can simplify this by simply enumerating over all of the interfaces within our assemblies and looking for classes that implement that interface and possibly are called by the same name prefix and automatically registering the interface and type with the IoC container.  Of course, care must be taken here as such an approach may actually hide intent and is somewhat less explicit.  In this regard, David says that with the great power available to us via meta-programming comes great responsibility, so we should take care to only use it to "make the obvious thing work, not make the right thing totally un-obvious".

Finally, perhaps one of the best uses of meta-programming in this way is to help protect code quality.  We can do this by using meta-programming within our unit tests to enforce some attribute to our code that we care about.  One great example of this is to ensure that all classes within a given namespace have a specific suffix to their name.  Here's a very simple unit test that ensures that all classes in a Factories namespace have the word "Factory" at the end of the class name:

[Test]
public void MakeSureFactoriesHaveTheRightNamingConventions()
{
	var types = AppDomain.CurrentDomain
		.GetAssemblies()
		.SelectMany(a => a.GetTypes())
		.Where(x => x.Namespace == "MyApp.Factories");

	foreach (var type in types)
	{
		Assert.That(type.Name.EndsWith("Factory"));
	}
}

After David's session was over it was time for another quick coffee break.  As I had to change rooms this time, I decided to head back to the main foyer and grab a quick cup of coffee before immediately heading off to find the room for my next session.  This session was James Murphy's A Gentle Introduction To Elm.

IMG_20171014_120146James starts by introducing the Elm language.  Elm calls itself a "delightful language for reliable web apps".  It's a purely functional language that transpiles to JavaScript and is a domain-specific language designed for developing web applications.  Being a purely functional language allows Elm to make a very bold claim.  No run-time exceptions!

James asks "Why use Elm?".  Well, for one thing, it's not JavaScript!  It's also functional, giving it all of the benefits of other functional languages such as immutability and pure functions with no side effects.  Also, as it's a domain-specific language, it's quite small and is therefore relatively easy to pick up and learn.  As it boasts no run-time exceptions, this means that if your Elm code compiles, it'll run and run correctly.

James talks about the Elm architecture and the basic pattern of implementation, which is Model-Update-View.  The Model is the state of your application and it's data.  The Update is the mechanism by which the state is updated, and the View is how the state is represented as HTML.  It's this pattern that provides reliability and simplicity to Elm programs.  It's a popular, modern approach to front-end architecture, and the Redux JavaScript framework was directly inspired by the Elm architecture.  A number of companies are already using Elm in production, such as Pivotal, NoRedInk, Prezi and many others.

Here's a simple example Elm file showing the structure using the Model-Update-View pattern.  The pattern should be understandable even if you don't know the Elm syntax:

import Html exposing (Html, button, div, text)
import Html.Events exposing (onClick)

main =
  Html.beginnerProgram { model = 0, view = view, update = update }

type Msg = Increment | Decrement

update msg model =
  case msg of
    Increment ->
      model + 1

    Decrement ->
      model - 1

view model =
  div []
    [ button [ onClick Decrement ] [ text "-" ]
    , div [] [ text (toString model) ]
    , button [ onClick Increment ] [ text "+" ]
    ]

Note that the Elm code is generating the HTML that will be rendered by the browser.  This is very similar to the React framework and how it also performs the rendering for the actual page's markup.  This provides for a strongly-typed code representation of the HTML/web page, thus allowing far greater control and reasoning around the ultimate web page's markup.

You can get started with Elm by visiting the projects home page at elm-lang.org.  Elm can be installed either directly from the website, or via the Node Package Manager (NPM).  After installation, you'll have elm-repl - a REPL for Elm, elm-make which is the Elm compiler, elm-package - the Elm package manager and elm-reactor - the Elm development web server.  One interesting thing to note is that Elm has strong opinions about cleanliness and maintainability, so with that in mind, Elm enforces semantic versioning on all of it's packages!

James shows us some sample Elm statements in the Elm REPL. We see can use all the standard and expected language elements, numbers, strings, defining functions etc.  We can also use partial application, pipe-lining and lists/maps, which are common constructs within functional languages.  We then look at the code for a very simple "Hello World" web page, using the Model-Update-View pattern that Elm programs follow.  James is using Visual Studio Code to as his code editor here, and he informs us that there's mature tooling available to support Elm within Visual Studio Code.

We expand the "Hello World" page to allow user input via a textbox on the page, printing "Hello" and then the user's input.  Due to the continuous Model-Update-View loop, the resulting page is updated with every key press in the textbox, and this is controlled by the client-side JavaScript that has been transpiled from the Elm functions.  James shows this code running through the Elm Reactor development web server.  On very nice feature of Elm Reactor is that is contains built-in "time-travel" debugging, meaning that we can enumerate through each and every "event" that happens within our webpage. In this case, we can see the events that populate the "Hello <user>" text character-by-character.  Of course, it's possible to only update the Hello display text when the user has finished entering their text and presses the Enter key in the textbox, however, since this involves maintaining state, we have to perform some interesting work in our Elm code to achieve it.

James shows us how Elm can respond to events from the outside world.  He writes a simply function that will respond to system tick events to show an ever updating current time display on the web page.  James shows how we can work with remote data by defining specific types (unions) that represent the data we'll be consuming and these types are then added to the Elm model that forms the state/data for the web page.  One important thing to note here is that we need to be able to not only represent the data but also the absence of any data with specific types that represent the lack of data.  This is, of course, due to Elm being a purely functional language that does not support the concept of null.

IMG_20171014_121610The crux of Elm's processing is taking some input (in the form of a model and a message), performing the processing and responding with both a model and a message.  Each Elm file has an "init" section that deals with the input data.  The message that is included in that data can be a function, and could be something that would access a remote endpoint to gather data from a remote source.  This newly acquired data can then be processed in the "Update" section of the processing loop, ultimately for returning as part of the View's model/message output.  James demonstrates this by showing us a very simple API that he's written implementing a simply To-Do list.  The API endpoint exposes a JSON response containing a list of to-do items.  We then see how this API endpoint can be called from the Elm code by using a custom defined message that queries the API endpoint and pulls in the various to-do items, processes them and writes that data into the Elm output model which is ultimately nicely rendered on the web page.

Elm contains a number of rich packages out-of-the-box, such as a HTTP module.  This allows us to perform HTTP requests and responses using most of the available HTTP verbs with ease:

import Json.Decode (list, string)

items : Task Error (List String)
items =
    get (list string) "http://example.com/to-do-items.json"

Or:

corsPost : Request
corsPost =
    { verb = "POST"
    , headers =
        [ ("Origin", "http://elm-lang.org")
        , ("Access-Control-Request-Method", "POST")
        , ("Access-Control-Request-Headers", "X-Custom-Header")
        ]
    , url = "http://example.com/hats"
    , body = empty
    }

It's important to note, however, that not all HTTP verbs are available out-of-the-box and some verbs, such as PATCH, will need to be manually implemented.

James wraps up his session by talking about the further eco-system around the Elm language.  He mentions that Elm has it's own testing framework, ElmTest, and that you can very easily achieve a very high amount of code coverage when testing in Elm due to it being a purely functional language.  Also, adoption of Elm doesn't have to be an all-or-nothing proposition.  Since Elm transpiles to JavaScript, it can play very well with existing JavaScript applications.  This means that Elm can be adopted in a piece meal fashion, with only small sections of a larger JavaScript application being replaced by their Elm equivalent, perhaps to ensure high code coverage or to benefit from improved robustness and reduced possibility of errors.

Finally, James talks about how to deploy Elm application when using Elm in a real-world production application.  Most often, Elm deployment is performed using WebPack, a JavaScript module bundler.  This often takes the form of shipping a single small HTML file containing the necessary script inclusions for it to bootstrap the main application.

IMG_20171014_131053After James' session was over, it was time for lunch.  All the attendees made there way back to the main foyer area where a delicious lunch of a selection of sandwiches, fruit, crisps and chocolate was available to us.  As is customary at the various DDD events, there were to be a number of grok talks taking place over the lunch period.  As I'd missed the grok talks at the last few DDD events I'd attended, I decided that I'd make sure I aught a few of the talks this time around.

I missed the first few talks as the queue for lunch was quite long and it took a little while to get all attendees served, however, after consuming my lunch in the sunny outdoors, I headed back inside to the large lecture theatre where the grok talks were being held.  I walked in just the catch the last minute of Phil Pursglove's talk on Azure's CosmosDB, which is Microsoft's globally distributed, multi-model database.  Unfortunately, I didn't catch much more than that, so you'll have to follow the link to find out more.

The next grok talk was Robin Minto's OWASP ZAP FTW talk.  Robin introduces us to OWASP, which is the Open Web Application Security Project and exists to help create a safer, more secure web.  Robin then mentions ZAP, which is a security testing tool produced by OWASP.  ZAP is the Zed Attack Proxy and is a vulnerability scanner and intercepting proxy to help detect vulnerabilities in your web application.  Robin shows us a demo application he's built containing deliberate flaws, Bob's Discount Diamonds.  This is running on his local machine.  He then shows us a demo of the OWASP ZAP tool and how it can intercept all of the requests and responses made between the web browser and the web server, analysing those responses for vulnerabilities and weaknesses.  Finally, Robin shows us that the OWASP ZAP software contains a handy "fuzzer" capability which allows it to replay requests using lists of known data or random data - i.e. can replay sending login requests with different usernames/passwords etc.

The next grok talk was an introduction to the GDPR by John Price.  John introduces the GDPR, which is the new EU wide General Data Protection Regulations and effectively replaced the older Data Protection Act in the UK.  GDPR, in a nutshell, means that users of data (mostly companies who collect a person's data) need to ask permission from the data owner (the person to whom that data belongs) for the data and for what purpose they'll use that data.  Data users have to be able to prove that they have a right to use the data that they've collected.  John tells us that adherence to the GDPR in the UK is not affected by Brexit as it's already enshrined in UK law and has been since April 2016, although it's not really been enforced  up to this point.  It will start to be strictly enforced from May 2018 onwards.  We're told that, unlike the previous Data Protection Act, violations of the regulations carry very heavy penalties, usually starting at 20 million Euros or 4% of a company's turnover.  There will be some exceptions to the regulations, such as police and military but also exception for private companies too, such as a mobile phone network provider giving up a person's data due to "immediate threat to life".  Some consent can be implied, so for example, entering your car's registration number into a web site for the purposes of getting an insurance quote is implied permission to use the registration number that you've provided, but the restriction is that the data can only be used for the specific purpose for which it was supplied.  GDPR will force companies to declare if data is sent to third parties.  If this happens, the company initially taking the data and each and every third-party that receives that data have to inform the owner of the data that they are in possession of the data.  GDPR is regulated by the Information Commissioners Office in the UK.  Finally, John says that the GDPR may make certain businesses redundant.  He gives an example industry of credit reference agencies.  Their whole business model is built on non-consentual usage of data, so it will be interesting to see how GDPR affects industries like these.

After John's talk, there was a final grok talk however, I needed a quick restroom break before the main sessions of the afternoon, so headed off for my restroom break before making my way back to the room for the first of the afternoon's sessions.  This was Matt Ellis's How To Parse A File.

IMG_20171014_142833Matt starts his session by stating that his talk is all about parsing files, but he immediately says, "But, Don't do it!"  He tells us that it's a solved problem and we really shouldn't be writing code to parsing files by hand for ourselves and should just use one of the many excellent libraries out there instead.  Matt does discuss why you decide you really needed to parse files for yourself.  Perhaps you need better speed and efficiency or maybe it's to reduce dependencies or to parse highly specific custom formats.  It could even be parsing for things that aren't even files such as HTTP headers, standard output etc.  From here, Matt mentions that he works for JetBrains and that the introduction of simply parsing a file is a good segue into talking about some of the features that can be found inside many of JetBrains' products.

Matt starts by looking at the architecture of many of JetBrains' IDE's and developer tools such as ReSharper.  They're build with a similar architecture and they all rely on a layer that they call the PSI layer.  The PSI layer is responsible for parsing, lexing and understanding the user code that the tool is working on.  Matt says that he's going to use the Unity framework to show some examples throughout his session and that he's going to attempt to build up a syntax tree for his Unity code.  We first look at a hand-rolled parser, this one is attempting to understand the code by observing each character at a time.  It's a very laborious approach and prone to error, so this is an approach to parsing that we shouldn't use.  Matt tells us that the best approach, which has been "solved" many time in the past is ti employ the services of a lexer.  This is a processor that turns the raw code into meaningful tokens based upon the words and vocabulary of the underlying code or language and gives structure to those tokens.  It's from the output of the lexer that we can more easily and robustly perform the parsing.  Lexers are another solved problem, and many lexers already exist for popular programming languages such as lex, CsLex, FsLex, flex, JFlex and many more.

Lexers generate source code, but it's not human readable code.  It's similar to how .NET language code (C# or VB.NET) is first compiled to Intermediate Language prior to being JIT'ed at runtime.  The code output from the lexer is read by the parser and from there the parser can try to understand the grammar and structure of the underlying code via syntactical analysis.  This often involved the use of Regular Expressions in order to match specific tokens or sets of tokens.  This works particularly well as Regular Expressions can be translated into a state machine and from there, translated into a transition table.  Parsers understand the underlying code that they're designed to work on, so for example, a parser for C# would know that in a class declaration, there would be the class name which would be preceded by a token indicating the scope of the class (public, private etc).  Parsing is not a completely solved problem. It's more subjective, so although solutions exist, they're more disparate and specific to the code or language that they're used for and therefore, they're not a generic solution.

Matt tells us how parsing can be done either top-down or bottom-up. Top down parsing starts at highest level construct of the language, for example at a namespace or class level in C#, and it then works it's way down to the lower level constructs from there - through methods and the code and locally scoped variables in those methods.  Bottom up parsing works the opposite way around, starting with the lower level constructs of the language and working back up to the class or namespace.  Bottom up parsers can be beneficial over top-down ones as they have the ability to utilise shift-reduce algorithms to simplify code as it's being parsed.  Parsers can even be "parser combinators". These are parsers built from other, simpler, parsers where the input the next parser in the chain is the output from the previous parser in the chain, more formally known as recursive-descendant parsing.  .NET's LINQ acts in a similar way to this.  Matt tells us about an F# parser combinator called FParsec and a C# parser is sort of like this. FParsec is a parser combinator for F# along with a C# parser combinator called Sprache, itself relying heavily on LINQ:

Parser<string> identifier =
    from leading in Parse.WhiteSpace.Many()
    from first in Parse.Letter.Once()
    from rest in Parse.LetterOrDigit.Many()
    from trailing in Parse.WhiteSpace.Many()
    select new string(first.Concat(rest).ToArray());

var id = identifier.Parse(" abc123  ");

Assert.AreEqual("abc123", id);

Matt continues by asking us to consider how parsers will deal with whitespace in a language.  This is not always as easy as it sounds as some languages, such as F# or Python, use whitespace to give semantic meaning to their code, whilst other languages such as C# use whitespace purely for aesthetic purposes.  In dealing with whitespace, we often make use of a filtering lexer.  This is a simple lexer that specific detects and removes whitespace prior to parsing.  The difficulty then is that, for languages where  whitespace is significant, we need to replace the removed whitespace after parsing.  This, again, can be tricky as the parsing may alter the actual code (i.e. in the case of a refactoring) so we must again be able to understand the grammar of the language in order to re-insert whitespace into the correct places.  This is often accomplished by building something known as a Concrete Parse Tree as opposed to the more normal Abstract Syntax Tree.  Concrete Parse Tree's work in a similar way to a C# Expression Tree, breaking down code into a hierarchical graph of individual code elements.

IMG_20171014_144725Matt tells us about other uses for Lexers such as the ability to determine specific declarations in the language.  For example, in F# typing 2. would represent a floating point number, where as typing [2..0] would represent a range.  When the user is only halfway through typing, how can we know if they require a floating point number or a range?  Also such things as comments within comments, for example: /* This /* is */ valid */  This is something that lexers can be good at, at such matching is difficult to impossible with regular expressions.

The programs that use lexers and parsers can often have very different requirements, too.  Compilers using them will generally want to compile the code, and so they'll work on the basis that the program code that they're lexing/parsing is assumed correct, whilst IDE's will take the exact opposite approach.  After all, most of the time whilst we're typing, our code is in an invalid state.  For those programs that assume the code is in an invalid state most of the time, they often use techniques such as error detection and recovery.  This is, for example, to prevent your entire C# class from being highlighted as invalid within the IDE just because the closing brace character is missing from the class declaration.  They perform error detection on the missing closing brace, but halt highlighting of the error at the first "valid" block of code immediately after the matching opening brace.  This is how the Visual Studio IDE is able to only highlight the missing closing brace as invalid and not the entire file full of otherwise valid code.  In order for this to be performant, lexers in such programs will make heavy use of caching to prevent having to continually lex the entire file with every keystroke.

Finally, Matt talks about how JetBrains often need to also deal with "composable languages".  These are things like ASP.NET MVC's Razor files, which are predominantly comprised of HTML mark-up, but which can also contain "islands" of C# code.  For this, we take a similar approach to dealing with whitespace in that the file is lexed for both languages, HTML and C# and the HTML is temporarily removed whilst the C# code is parsed and possibly altered.  The lexed tokens from both the C# and the preserved HTML are then re-combined after the parsing to re-create the file.

After Matt's session, there was one final break before the last session of the day.  Since there was, unfortunately, no coffee served at this final break, I made my way directly to the room for my next and final session, Joe Stead's .NET Core In The Real World.

IMG_20171014_154513Joe starts his talk by announcing that there's no code or demos in his talk, and that his talk will really just be about his own personal experience of attempting to migrate a legacy application to .NET Core.  He says that he contemplated doing a simple "Hello World" style demo for getting started with .NET Core, but that it would give a false sense of .NET Core being simple.  In the real-world, and when migrating an older application, it's a bit more complicated than that.

Joe mentions the .NET Standard and reminds us that it's a different thing than .NET Core.  .NET Core does adhere to the .NET Standard and Joe tells us that .NET Standard is really just akin to Portable Class Libraries Version 2.0.

Joe introduces the project that he currently works on at his place of employment.  It's a system that started life in 2002 and was originally built with a combination of Windows Forms applications and ASP.NET Web Forms web pages, sprinkled with Microsoft AJAX JavaScript.   The system was in need of being upgraded in terms of the technologies used, and so in 2012, they migrated to KnockoutJS for the front-end websites, and in 2013, to further aid with the transition to KnockoutJS, they adopted the NancyFX framework to handle the web requests.  Improvements in the system continued and by 2014 they had started to support the Mono Framework and had moved from Microsoft SQL Server to a PostgreSQL database.  This last lot of technology adoptions was to support their growing demand for a Linux version of their application from their user base.  The adoptions didn't come without issues, however, and by late 2014, they started to experience serious segfaults in their application.  After some diagnosis after which they never did fully get to the bottom of the root cause of the segfaults, they decided to adopt Docker in 2015 as a means of mitigating the segfault problem.  If one container started to display problems associated with segfaults, they could kill the container instance and create a new one.  By this point, they were in 2015 and decided that they'd start to now look into .NET Core.  It was only in Beta at this time, but were looking for a better platform that Mono that might provide some much needed stability and consistency across operating systems.  And since they were on such a roll with changing their technology stacks, they decided to move to Angular2 on the front-end, replacing KnockoutJS in 2016 as well!

By 2017, they'd adopted .NET Core v1.1 along with RabbitMQ and Kubernetes.  Joe states that the reason for .NET Core adoption was to move away from Mono.  By this point, they were not only targeting Mono, but a custom build of Mono that they'd had to fork in order to try to fix their segfault issues.  They needed much more flexible deployments such as the ability to package and deploy multiple versions of their application using multiple different versions of the underlying .NET platform on the same machine.  This was problematic in Mono, as it can be in the "full" .NET Framework, but one of the benefits of .NET Core is the ability to package the run-time with your application, allowing true side-by-side versions of the run-time to exist for different applications on the same machine.

Joe talks about some of the issues encountered when adopting and migrating to .NET Core.  The first issue was missing API's.  .NET Core 1.0 and 1.1 were built against .NET Standard 1.x and so many API's and namespace were completely missing.  Joe also found that many NuGet packages that his solution was dependent upon were not yet ported across to .NET Core.  Joe recalls that testing of the .NET Core version of the solution was a particular challenge as few other people had adopted the platform and the general response from Microsoft themselves was that "it's coming in version 2.0!".  What really helped save the day for Joe and his team was that .NET Core itself and many of the NuGet packages were open source.  This allowed them to fork many of the projects that the NuGet packages were derived from and help with transitioning them to support .NET Core.  Joe's company even employed a third party to work full time on helped to port NancyFX to .NET Core. 

Joe now talks about the tooling around .NET Core in the early days of the project.  We examine how Microsoft introduced a whole new project file structure, moving away from XML representation in the .csproj files, and moving to a JSON representation with project.json.  Joe explains how they had to move their build script and build tooling to the FAKE build tool as a result of the introduction of project.json.  There were also legal issues around using the .NET Core debugger assemblies in tools other than Microsoft's own IDE's, something that the JetBrain's Rider IDE struggled with.  We then look at tooling in the modern world of .NET Core and project.json has gone away, and reverted back to the .csproj files although they're much more simplified and improved.  This allows the use of MSBuild again, however, FAKE itself now has native support for .NET Core.  The dotnetCLI tool has improved greatly and the legal issues around the use of the .NET Core debugging assemblies has been resolved, allowing third-party IDE's such as JetBrain's Rider to use them again. 

IMG_20171014_161612Joe also mentions how .NET Core now, with the introduction of version 2.0, is much better than the Mono Framework when it comes to targeting multiple run-times.  He also mentions issues that plagued their use of libcurl on the Mac platform when using .NET Core 1.x, but that these have now been resolved in .NET Core 2.0 as .NET Core 2.0 now uses the native macOS implementation rather than trying to abstract that and use it's own implementation.

Joe moves on to discuss something that's not really specific to .NET Core, but is a concern when developing code to be run on multiple platforms.  He shows us the following two lines of code:

TimeZoneInfo.FindSystemTimeZoneById("Eastern Standard Time");
TimeZoneInfo.FindSystemTimeZoneById("America/New_York");

He asks which is the "correct" one to use.   Well, it turns out that they're both correct.  And possibly incorrect!   The top line works on Windows, and only on Windows, whilst the bottom line works on Linux, and not on Windows.  It's therefore incredibly important to understand such differences when targeting multiple platforms with your application.  Joe also says how, as a result of discrepancies such as the timezone issue, the tooling can often lie.  He recalls a debugging session where one debugging window would show the value of a variable with one particular date time value, and another debugging window - in the exact same debug session - would interpret and display the same variable with an entirely different date time value.  Luckily, most of these issues are now largely resolved with the stability that's come from recent versions of .NET Core and the tooling around it.

In wrapping up, Joe says that, despite the issues they encountered, moving to .NET Core was the right thing for him and his company.  He does say, though, that for other organisations, such a migration may not be the right decision. Each company, and each application, needs to be evaluated for migration to .NET Core on it's own merits.  For Joe's company, the move to .NET Core allowed them to focus attention elsewhere after migration.  They've since been able to adopt Kubernetes. They've been able to improve and refactor code to implement better testing and many more long overdue improvements.  In the last month, they've migrated again from .NET Core 1.1 to .NET Core 2.0 which was a relatively easy task after the initial .NET Core migration.  This one only involved the upgrading of a few NuGet packages and that was it.  The move to .NET Core 2.0 also allowed them to re-instate lots of code and functionality that had been temporarily removed thanks to the new, vastly increased API surface area of .NET Core 2.0 (really, .NET Standard 2.0).

IMG_20171014_165613After Joe's session, it was time for all the attendees to gather in the main foyer area of the university building for the final wrap-up and prize draws.  After thanking to sponsors, the venue, and the organisers and volunteers, without whom of course, events such as DDD simply wouldn't be able to take place, we moved onto the prize draw.  Unfortunately, I wasn't a winner, however, the day had been brilliant.

IMG_20171014_132318Another wonderful DDD event had been and gone but a great day was had by all.  We were told that the next DDD Event was to be a DDD Dublin, held sometime around March 2018.  So there's always that to look forward to.


NDepend In Review

$
0
0

Back in September of this year, I was contacted by Patrick Smacchia, the lead developer from the NDepend team.  If you're not familiar with NDepend, it's a static analysis tool for .NET.  Patrick had asked me if I would be interested in reviewing the software.  I don't normally write software reviews on my blog, but on this occasion, I was intrigued.  I'm a long time user of ReSharper in my day to day work and ReSharper offers some amount of static analysis of your code whilst you write it.  I'd also been watching certain open source projects that had appeared since the introduction of the Roslyn compiler technology such as RefactoringEssentials, Code Cracker& SonarAnalyzer.  I've been thinking a lot about static analysis of code recently, especially since I discovered Connascence and started using that as a way to reason about code.

I had previously heard of the NDepend tool, but had never used it, so when Patrick contacted me and asked if I'd like to review his static analysis tool, I was more than happy to do so.

Full Disclosure:
NDepend is a commercial tool with a trial period.  Patrick agreed to provide me with a complimentary professional license for the software if I wrote a review of the software on my blog.  I agreed on the condition that the review would be my own, honest opinion of the software, good or bad.  What follows is that personal opinion.

This is a fairly long blog post so if you want the TL;DR skip to the end and the "Conclusion" section, but I suggest that you do read this whole post to better understand my complete experience using the NDepend product.

 

Getting Started

The first thing to getting up and running with NDepend was to download the software from the NDepend website.  This was easy enough, simply by following the relevant links from the NDepend homepage.  The version of NDepend that is current as I write this is v2017.3.2.  As I already had a license, I was able to enter my license key that had been previously supplied to me via email and begin the download.  NDepend comes as a .zip archive file rather than an MSI or other installer program.  This was somewhat unusual as a lot of software is delivered via a setup installer package these days, but I'm an old-school, command line kind of guy and I liked that the program was supplied as a .zip file that you simply extract into a folder and go.  After having done that, there's a couple of ways you can launch NDepend.  There's a few executable files in the extracted folder, so it's not immediately obvious what you execute first, but NDepend does have a helpful getting started guide on their website that's easy enough to follow along with.

For usage on a development machine, you're likely to be using either the stand alone Visual NDepend IDE (VisualNDepend.exe) or, perhaps more likely, the Visual Studio plugin for interacting with NDepend (installed by running the NDepend.VisualStudioExtension.Installer.exe installer).  As well as being used on a developer's machine, NDepend can also run on your build machine, and this is most likely to be integrated into your build process via the NDepend command line tool (NDepend.Console.exe). 

NDepend also ships with an executable called NDepend.PowerTools.exe and its source code in the NDepend.PowerTools.SourceCode folder.  This is a kind of optional extra utility which is run from the command line and contains a number of predefined metrics that can be run against some .NET code.  NDepend also provides an API with which we can integrate and use the NDepend functionality from our own code.  Many of the metrics of the NDepend Powertools are also used within the "main" NDepend tool itself, but as the source code is supplied to the Powertools utility, we can see exactly how those metrics are calculated and exactly how the various statistics around the code that NDepend is analysing is gathered from the NDepend API.  In essence, as well as providing some handy metrics of its own, the NDepend Powertools also serves as kind of demonstration code for how to use the NDepend API.

 

Launching NDepend For The First Time

After installing the Visual Studio plugin and launching Visual Studio, I loaded in a solution of source code that I wanted to analyse.  I opted to initially launch NDepend via the installed Visual Studio plugin as I figured that is the place I'd be interacting with it most.  NDepend installs itself as a extra menu option in the Visual Studio menu bar, much like other extensions such as ReSharper does.  When you first view this menu on a new solution, most of the menu options are greyed out.  Initially I was slightly confused by this but soon realised that as well as you having a Visual Studio solution and one or more project files for your own code, so too does NDepend have its own project files, and you need to create one of these first before you can perform any analysis of the current loaded solution.  NDepend helpfully allows this with one simple click on the "Attach New NDepend project to current VS Solution" menu item.  Once selected, you're asked which assemblies within your solution you wish to have analysed and then NDepend will go off and perform an initial analysis of your code.

NDepend uses both your solution's source code as well as the compiled assemblies to perform its analysis.  Most of the analysis is done against the compiled code and this allows NDepend to provide statistics and metrics of the IL (or Common Intermediate Language) that your code will compile to as well as statistics and metrics of other code areas.  NDepend uses the solution's source code during analysis in order to gather metrics relating to things not present inside the IL code such as code comments and also to gather statistics allowing NDepend to locate code elements such as types, methods and fields when these are shown as part of a query or search (see later for further details).

Once the analysis is complete, you'll get a pop-up window informing you of the completion of the initial analysis and asking you what to do next.  You'll probably also get a new browser tab open up in your default browser containing the NDepend Analysis report.   It was at this point, I was initially quite confused.  Having multiple things pop-up after the analysis was complete was a little startling and quite overwhelming.  One thing I'd have rather seen is for the browser based report to not be loaded initially, but to have been a button to be clicked on ("View Analysis Report" on something similar) within the pop-up window.  This way, only one "pop-up" is appearing, which to me is a lot less jarring.

The analysis popup allows you to view the NDepend dashboard, the interactive graph or the NDepend Code rules.  I hadn't read the online documentation, so it's probably mostly my own fault, but I'm the kind of person who just likes to dive straight in to something and try to figure things out for myself, but at this point, I was quite stuck.  I didn't quite know what to do next.  The help text on the popup indicates that the Dashboard is the best place to start, and so I selected that and was greeted with the following screen:

Now, this is an incredibly informative dashboard, with an awful lot of information contained within, but for me (and again, quite possibly my own fault for not reading the documentation more thoroughly) this was even more confusing.  There's a few things on the dashboard that seem to be obvious and make sense, such as the number of lines of code, the number of lines of comments and the percentage compared to actual code, but a lot of the other information around the debt level, quality gates and rules didn't seem to make much sense to me at all.  I decided I'd look at a few of the other options that were available in the popup, however that initial popup had now disappeared, so I had to go hunting through the NDepend menus.  Under the "Graph" menu, I found the "View Dependency Graph" option which seemed to coincide with one of the buttons I'd seen on the initial post-analysis pop-up.  Sure enough, opting to view the dependency graph showed me a nice UI of the various assemblies within my solution and how they were all related.

The Dependency Graph is a very nice way to get an overview of your solution.  It should be familiar to developers who have used the Enterprise SKU of Visual Studio and have used the various Architecture features, specifically the Code Map.  What's great about NDepend's Dependency Graph is the ability to alter the box size for the assemblies based upon various metrics for the assembly.  By default, it's based upon lines of code for the assembly, but can be changed to Cyclomatic Complexity, In/Out Edges, two types of coupling and overall "ranking" to name but a few.  The thickness of the arrows can also be altered based upon various metrics of the code such as namespaces, types, methods etc.  This gives the Dependency Graph view a great amount of power in seeing an overview of you solution, and specifically to see where complexity and potential issues with the code may lie. 

An alternative view of the dependency data of your solution can be found with the dependency matrix view.  This shows assemblies by name duplicated across two axes and showing the assembly metrics where the two assemblies intersect.

The assembly matrix view is very handy to use on larger solutions that may contain dozens of different assemblies as the Dependency Graph view of such a solution can be quite busy and difficult to follow due to being overloaded with information.  Another very useful visual representation of solution and project metrics is the Code Metrics View (aka the Treemap Metrics View).  This again shows all assemblies in the solution but this time as a series of coloured boxes.  Each box is further sub-divided into more coloured boxes of various sizes based upon the chosen metric.

The entire collection of boxes can be viewed at assembly, namespace, method, type or even field level with the size of the main boxes being determined by a similar set of metrics as the Dependency Graph, albeit with more options to choose from for method level metrics.  The colour of the boxes can be determined based upon a similar set of metrics as the size and is perhaps most frequently used to determine code coverage of the chosen element.  All of which gives a huge combination of options by which to gain a 100-foot view of the overall solution.  What's most interesting here is that the entire view can be based upon a specific "filter" or "query" performed over the entire codebase.

A "filter" or "query" over the codebase?  Well, this is where something called CQLinq comes in, and it's the very heart of the NDepend product.

 

Understanding NDepend

It took me a while to overcome the initial confusion and feeling of being overwhelmed by NDepend when I first started using it, however, after some time, I started to understand how the product hangs together, and it all starts with the foundation of everything else that NDepend offers, be that dependency graphs, code metrics, quality rules or other features.  The foundation of all of the impressive metrics and data that you can get from NDepend is CQLinq.

CQLinq stands for Code Query Linq and is at the heart of NDepend's features and functionality and is, essentially, a domain-specific language for code analysis.  It's based upon Linq, which is the Language Integration Query functionality that's been part of the .NET Framework since version 3.5.  For any developer who has used Linq to query in-memory objects, you'll know just have powerful and flexible a tool Linq can be.  It's even more powerful when Linq is used with a Linq Provider that can target some collection of external data such as a database.   Linq allows us to perform incredibly powerful filters, aggregations, projections and more, all from either a SQL-like syntax, or a very intuitive fluent chaining syntax.  For example:

var totalOutstanding = Customers.Where(c => c.Type == 1).Sum(c => c.Balance);

shows a simple but powerful query allowing the developer to, in a single line of code, get the total sum of all balances for a given type of customer from an arbitrary sized collection of customers, based upon properties of the customer objects within that collection.  This is just one example of a simple Linq query, and far more sophisticated queries are possible with Linq without adding too much additional complexity to the code.  The data here, of course, is the collection of customers, but what if that data was your source code?   That's where CQLinq comes in.

CQLinq takes your source code and turns it into data, allowing you, the developer, to construct a Linq query that can filter, aggregate and select metrics and statistics based upon viewing your code as data.  Here's a fairly simple CQLinq query:

from a in Application.Assemblies 
where a.NbLinesOfCode >= 0 
orderby a.NbLinesOfCode descending 
select new { a, a.NbLinesOfCode }

Here we're saying, look in all the assemblies within the solution that have greater than 0 lines of code.  We order those assemblies by the highest count of lines of code descending then we select a projected anonymous type showing the assembly itself and the count of its lines of code.  The result is a list similar to the following:

MyApp.Admin			8554
MyApp.Core			6112
MyApp.Data			4232
MyApp.DataAccess	3095
MyApp.Services		2398

So, even with this simple query, we can see the expressive power of a simple CQLinq query and the aggregate data we can obtain from our source code.

Now, the real power of CQLinq lies in the custom objects and properties that it makes available for you to use.  In the query above we can see the first line is selecting an a from Application.AssembliesApplication is an object provided by NDepend, and Assemblies is a collection of objects associated with the Application.  Further in the query, we can see that the variable a, that represents an assembly, has a property called NbLinesOfCode.  This custom property is also provided by NDepend and allows us to know how many lines of actual code the assembly contains.

NDepend, and CQLinq, provide a large number of custom objects (NDepend calls them custom domains) and properties out of the box giving insight into all sorts of different metrics and statistics on the source code that has been analysed by NDepend.  Custom objects/domains such as Assemblies, Types, Methods, Namespaces and custom properties such as IsClass, DepthOfInheritance, CouldBePrivate, CyclomaticComplexity, NbLinesOfCode, MethodsCalled, MethodsCallingMe, to give only the very briefest of overviews, contain the real power of CQLinq, and ultimately, NDepend.

Once you come to appreciate that underneath the surface of all of the other features of NDepend - such as the large array of metrics shown on the NDepend Dashboard and the NDepend Analysis Report - is one or more CQLinq queries slicing and dicing the data of your source code, everything starts to become that bit more clear!  (Or at least, it did with me!)

 

Code Rules, Quality Gates & Metrics

Having understood that most of the power of NDepend stems from the data provided by CQLinq queries, we can revisit the various metrics on the Dashboard.  Here, we can see that amongst the most obvious metrics such as lines of code and the number of assemblies, types and namespaces etc. there are more interesting metrics regarding Quality Gates, Rules and Issues.

If we click on the numbers next to the Quality Gates that have either failed, warned or passed, NDepend opens up a Queries and Rules explorer window and a Queries and Rules Edit window.

Within the Queries and Rules Explorer, we can see a treeview of a number of different rules, grouped together by category.  These are code quality rules that ship out of the box with NDepend, and include such categories as Code Smells, Architecture, Naming Conventions and many others.  The icons next to the various category groups show us whether our source code that has been analysed has failed (i.e. violates a rule), is considered a warning, or passes for each rule defined.  If we click on a rule category within the treeview, we see the individual rules from that category in the window to the right.  This shows us how much of our code has matched with the fail/warn/pass states.  Clicking on the individual rule name will cause the Queries and Rules Edit window to show the details for the chosen rule.  Note that, by default and for me at least, the Queries and Rules Explorer Window was docking along the bottom of the Visual Studio main window, whilst the Queries and Rules Edit Window was docking to the right of the main Visual Studio window (as a new tab along with my Solution Explorer).  It was initially confusing as I hadn't noticed one window being updated when interacting with the other window, but once you're aware of this, it becomes much easier to understand.  One other quirk of the Queries and Rules Explorer window is that there doesn't appear to be any way to search for a specific rule within the many rules contained in the various category groups and I found myself sometimes manually expanding each of the category groups in turn in order to find a rule I'd previously been examining.  It would be great if the ability to search within this window was introduced in a future version of NDepend.

The Queries and Rule Edit window is divided into a bottom section that shows the various parts of your source code (the assembly, type or method) grouped in a treeview that are matched by the chosen rule.  In the top section, we can see the rule name and a textual description of the rule itself.  For example, when looking at one of the Code Smells category rules, Avoid Types Too Big, we can see that the description states, "This rule matches types with more than 200 lines of code. Only lines of code in JustMyCode methods are taken account.  Types where NbLinesOfCode > 200 are extremely complex to develop and maintain.".

Hang on.  NbLinesOfCode?   We've seen that before.  Sure enough, clicking the "View Source Code" button at the top of the top section of the Queries and Rules Edit window changes the top section view into something else.  A CQLinq query!

Note that if the Queries and Rules Edit window is docked to the right in its default position, the overall window can appear a little cramped.  For me, this made it a little unclear exactly what I was looking at until I undocked the window and expanded its size.

Each and every one of the more than 200 rules that ship out of the box with NDepend are based upon a CQLinq query.  What's more is that each and every one of these queries can be edited.  And you can create your own category groups and rules by defining your own CQLinq queries.  Think about that for a moment as this is really what NDepend is providing you with.  Lots and a lots of pre-defined query and examination power right out of the box, but mostly the ability to have a full-featured, fluent and highly intuitive Linq-To-Code (for want of a better expression) provider that you can make full use of to perform all manner of examinations on your own source code.

Here's the pre-defined CQLinq query for determining which methods are in need of refactoring:

warnif count > 0 from m in JustMyCode.Methods where 
  m.NbLinesOfCode > 30 ||           
  m.CyclomaticComplexity > 20 ||    
  m.ILCyclomaticComplexity > 50 ||  
  m.ILNestingDepth > 5 ||           
  m.NbParameters > 5 ||             
  m.NbVariables > 8 ||              
  m.NbOverloads > 6                 

select new { m, m.NbLinesOfCode, m.NbILInstructions, m.CyclomaticComplexity, 
             m.ILCyclomaticComplexity, m.ILNestingDepth, 
             m.NbParameters, m.NbVariables, m.NbOverloads }

As can be seen above, the query makes extensive use of the built-in computed properties, all of which are very intuitively named, to gather a list of methods throughout the solution, and specifically methods that are from "JustMyCode" meaning that NDepend helpfully filters out all methods that belong to external frameworks and libraries, that fall afoul of the rules of this query and therefore are prime candidates for refactoring.  And, of course, if you don't like that NDepend only considers methods with more than 30 lines of code to be in need of refactoring, you can always edit the query and change that value to any number you like that better fits you and your team.

When running NDepend from the Visual Studio plug-in you get the ability to click on any of the results from the currently selected query/rules in the Queries and Rules Edit window and be taken directly to the type or method detected by the rule.   This is a really powerful and easy way to navigate through your code base to find all of specific areas of code that may be in need of attention.  I did, however, notice that not all matched code elements can be easily navigated to (see below for more detail).

NDepend also has extensive tooltip windows that can pop-up giving all manner of additional information about whatever you've selected and is currently in context.  For example, hovering over results in the Query and Rules Edit window shows a large yellow tooltip containing even more information about the item you're hovering over:

The "View Source Code" button within the Queries and Edit window acts as a toggle switch between viewing the source code of the CQLinq query and the textual description of the query.  The textual description often also includes text indicating "How To Fix" any of your code that violates the rule that the query relates to.  This can be a very handy starting point for identifying the first steps to take in refactoring and improving your code.

When editing CQLinq queries, the Queries and Edit window provides full intellisense whilst editing or writing query syntax, as well as even more helpful tooltip windows that give further help and documentation on the various CQLinq objects and properties used within the query.

NDepend allows individual rules to be designated as "Critical" rules and this means that any code found that falls afoul of a critical rule will result in a "failure" rather than a "warning".  These are the rules that you can define (and many are predefined for you, but you can always change them) which should not be violated and often such rules are included as part of NDepend's Quality Gates.

Quality Gates are the way that NDepend can determine an overall status of whether the code rules upon which the quality gate is based has passed or failed.  A Quality Gate is something that, like the code rules themselves, can used as-is out of the box, be modified from an existing one or be completely defined by yourself.  Quality gates can rely on one or more code rules in order to determine if the analysed source code is considered good enough to be released to production, so whereas the individual rules themselves will return a list of types, methods or other code artifacts that have matched the query, quality gates will return a pass, warn or fail status that can be used as part of an automated build process in order to fail the build, similar to how compilation errors will fail the compilation whereas warning won't.  Very powerful stuff.  Quality gates are defined using the same CQLinq syntax we've seen before:

// <QualityGate Name="Percentage Code Coverage" Unit="%" />
failif value < 70%
warnif value < 80%
codeBase.PercentageCoverage

Note that the first comment line actually defines this query as a quality gate and the first two lines of the query show the relative values from the query that should trigger the different return statuses of warning vs failure.

As well as code rules and quality gates, NDepend contains other metrics such as a SQALE Debt Ratio and other debt ratings.  The debt here is the technical debt that may exist within the code base.  Technical Debt is defined as the implied cost of rework/refactoring of parts of the solution in order to fix elements of code that are considered to be of lower quality.  The SQALE Ratio is a well defined, largely industry accepted mechanism for calculating the level of debt within a software project and is expressed as a percentage of the estimated technical-debt, compared to the estimated effort it would take to rewrite the code element from scratch.  NDepend uses the SQALE ratio to show a "Debt Rating", expressed as a letter from A (the least amount of debt) through to E (the most amount of debt), that is attached to the overall solution as well as individual elements of code such as types and methods and this debt rating can be seen on many of the tooltips such as the one shown when hovering over Query and Rules Edit window results as well as on the NDepend dashboard and Analysis Report.  There's a lot more metrics provided by NDepend including a very interesting Abstractness vs Instability metric that's shown on the front page of the Analysis Report.  Speaking of which...

 

Back to the beginning

So, having taken a diversion to understanding the core foundations of what makes NDepend tick, we can go back to that dashboard and analysis report that we saw at the beginning just after we created a new NDepend project and ran an analysis on our code for the very first time.

The dashboard and report contain many of the same metrics but whilst the dashboard is a window inside Visual Studio (or the separate Visual NDepend IDE), the report is an entirely standalone HTML report and so can be sent to colleagues who may not have the requisite development tools on their machines and be viewed by them.  This also means that the Analysis report is the perfect artifact to be produced as part of your automated build process.  Many teams will often have a dashboard of some description that shows statistics and metrics on the build process and the quality of the code that's been built.  NDepend's analysis report can be easily integrated into such a dashboard, or even hosted on a website domain as it's a static HTML file with some linked images.

The analysis report also contains static snapshot versions of the various graphical metrics representations that can be viewed inside NDepend such as the Dependency Graph, Dependency Matrix, Treemap Metric View and another interesting graph which plots Abstractness vs Instability.

This graph is an interesting one and can give a good idea of which areas of the analysed solution are in need of some care and attention but for very different reasons.

The graph plots each assembly within the solution against a diagonal baseline that runs from the top left corner of the graph to the bottom right.  Starting at the bottom right, as we move along the x-axis, towards the left, we get more "instable" and moving up the y-axis towards the top, we get more "abstract".  Abstractness is how much that project depends upon abstract classes or interfaces rather than concrete classes.  This indicates that the project is somewhat easier to change and is more malleable.  Instability is how easy such change can be made without breaking the code by analysing the various inter-dependencies within that code.  This is a combination of two other metrics, those of afferent coupling and efferent coupling.  Afferent coupling is the number of types that depend upon the given assembly whilst efferent coupling is the number of external types that this assembly depends upon.  We want our assemblies to remain inside the diagonal green area of the graph.  If an assembly's code is high in abstractness but also has a very low amount of either internal or external coupling, we're headed towards the "zone of uselessness".  This isn't a place we want to be, but correcting this may be reasonably easy since code being well within this area means that it doesn't do an awful lot and so could probably be fairly easily removed.  If, however, an assembly's  code is both overly rigid in its dependence on concrete types and also rigid in its instability (i.e has a high amount of internal and/or external coupling) we're headed towards the "zone of pain".  This is an area that we really don't want to be in since code being well within this area means that it's probably doing a lot - maybe even too much - and it's going to be very difficult to change it without breaking things.

Keeping an eye on each assembly's place within the Abstractness vs Instability graph can help to ensure your code stays at both the correct level of abstraction without being too abstract and also ensuring that the code remains fairly well decoupled allowing easier modification.  Metrics such as the abstractness vs instability graph and the treemap metric view are some of my favourite features of NDepend as they present their data in a very easily consumable format that can be viewed and understood "at a glance" whilst being based on some very complex underlying data.

 

But wait, it gets better

Similar to the Queries and Edit window that allows us to define our own code queries and rules as well as modifying existing ones, NDepend includes a very powerful Search function. 

This function uses the same kind of interface as the Query and Rules Edit window, but allows arbitrary queries to be performed against the analysed code with the matching results showing immediately within the window.  NDepend's search functionality can be optionally constrained to methods, types, assemblies or namespaces, but by default will search across all code elements.  Searches can be based on simple text matching, regular expressions or even full CQLinq queries.  This makes NDepend's search functionality even more powerful than something like the "Go to anything/everything" functionality that's been a part of ReSharper for a long time and is now also a part of Visual Studio itself as we can not only search for pieces of our code based on textual matches, but we can also use the highly expressive power of NDepend's CQLinq syntax, objects and properties to search for code based upon such things as method size, code lacking unit tests, code with no callers or too many callers.  As with the Queries and Rules edit window results, we can double-click on any result and be navigated directly to the matched line of code.  One quirk I did notice when using the Search functionality is that, when searching for matching elements inside of a method - i.e. fields, or other certain elements of code such as interfaces, you must click on the containing method or type that's shown as part of the search results and not the actual interface or field name itself.  Clicking on these will cause NDepend to display various errors as to why it can't navigate directly to that element.  I suspect this is due to the fact that NDepend's analysis works on the compiled intermediate language and not directly against the source code, but since this is a code search function, it would be nice if such navigation were possible.

As well as NDepend's ability to analyse the source code of your solution, it can leverage other tools' artifacts and data to improve its own analysis.  One such major piece of data that NDepend can utilise is code coverage data.  There are a number of tools that can calculate the code coverage of your solution.  This is the amount of code within your solution that is "covered" by unit tests, that is code that is executed as part of one or more unit tests.  Code coverage is built into Visual Studio itself at the highest SKU level, Enterprise, but is also provided by a number of other tools such as JetBrains' dotCover, and NCover.  NDepend can import and work with the code coverage output data of all three of the aforementioned tools, although the files from the dotCover tool have to be provided to NDepend in a special format.  Once some code coverage data is added to an NDepend project, several additional metrics relating to code coverage become available within the complete quite of NDepend metrics, such as identifying code that should have a minimum level of coverage and code whose coverage percentage should never decrease over time as well as other interesting metrics such as the amusingly titled C.R.A.P. metric.  This is the "Change Risk Analyzer and Predictor" metric and gives methods a score based upon their level of code coverage versus their complexity with more complex methods requiring higher code coverage.  Having the ability to integrate code coverage into NDepend's metrics suite is another very nice touch and allows the full set of metrics for your solution to be centralised in a single location.  This also helps to keep automated builds fairly simple without too many moving parts whilst also providing an extensive amount of data on the build.

We've mentioned a few times how NDepend can be integrated into your automated build process.  There's nothing new in having such tools be able to be integrated with such a fundamental part of any software team's daily processes, but whereas some tools simply give you the ability to run them from a command line and then leave you to figure out the exact return values from the executable and other output artifacts and how they might be integrated, NDepend provides some very comprehensive online documentation (even with links to that documentation from the NDepend menus inside of Visual Studio) giving very precise instructions for integrating NDepend into a large number of modern Continuous Integration tools, such as Visual Studio Team Services, TeamCity, Jenkins, FinalBuilder & SonarQube.  This documentation not only includes extensive step-by-step guides to the integration, but also tutorial and walkthrough videos.  This is a very nice touch.

Finally, we've touched upon this above, but NDepend has the ability to perform its analysis in a temporal way.  This essentially means that NDepend can analyse the same solution at different times and then compare different sets of analysis results.  This allows the production of "trends" within the analysis data and is incredibly powerful and possibly one of the most useful features of the NDepend product.  We saw earlier how the Analysis report and the NDepend dashboard contain metrics for such things as Debt rating of the solution, but upon closer examination, we can see that the debt rating and percentage is shown as either an increase or a decrease since the last time that the solution's debt was calculated:

Having such data available to us is incredibly powerful.  We can now track these metrics over time to see if our solution is generally improving or decreasing in quality.  Something that is very important for any software development team who must maintain a solution over a long period of time.   NDepend allows the setting of a "baseline" set of metrics data against which new sets of recently analysed data can be compared.  This gives us a powerful ability to not only compare our most recent set of analysis results with the immediately prior set of results, which if NDepend's analysis is integrated into a continuous build process could be very recently indeed, but also being able to compare our most recent set of results with those of a day ago, a week or a month ago, or even this time last year.  With this we can see not only how our code changes over small time frames, but over larger units of time too.  This ability to view metrics over time frames is helpfully built into many of the interfaces and windows within NDepend, so for example, the Dashboard contains a very useful filter at the top of the window allowing us to set the time range for the charts shown within the dashboard.  NDepend includes the ability to generate a full set of metrics around trends within the solution, so we can for example, track such things as how many "issues" that an earlier NDepend analysis identified have been fixed as well as how many new issues have been introduced, and many of the built-in rules that NDepend defines are specifically based upon changes to the code over time.  For example, there is a complete category group, "Code Smells Regression" that contains many rules starting with the name "From now...".  These specific rules will be broken if code quality for the specific attribute measured falls over time, from one analysis run to the next.  This helps to ensure that code is not only improved in quality, but stays that way.  NDepend doesn't stop there with its ability to view changes over time, and the CQLinq query language includes a large amount of properties that specifically make use of the ability to examine such change in the code over time.  Objects/Domains such as Issues and IssuesInBaseline, and properties such as NewerVersion on a method can allow us to write arbitrary queries comparing parts of our code over time.  So, for example, the following query shows us which methods have increased in cyclomatic complexity:

from m in Methods
where m.NewerVersion().CyclomaticComplexity > m.OlderVersion().CyclomaticComplexity
select m

This is exceptionally powerful data to have on a software project developed and maintained by a commercial team as it allows very informed decisions to be made regarding how much effort within a given amount of work (say, an agile sprint) should be dedicated not to adding additional features and functionality but to maintenance of the existing code that's already there.  Moreover, not only do we know how much effort we should expend on code maintenance but also we know exactly where that effort should be directed to have maximum impact in improving code quality - a critical factor in the success of any software project.

 

Conclusion

So, we've looked at NDepend and examined some of its powerful static analysis features and we've seen how such a tool could be integrated into a broader software development life-cycle, but should you actually use NDepend for your static analysis needs?

NDepend is a commercial tool and its license comes in two flavours.  A Developer license is required for each developer that will use NDepend on their own development machine, and this license is currently priced at 399 euros.  If you want to use NDepend as part of your Continuous Integration process on a build machine, you'll need a separate license for that which is currently priced at 799 euros.  All licenses are for one year and can be renewed with a 40% discount on the initial purchase price.  All ongoing licenses are eligible for continued upgrades to the product.  All licenses are perpetual fallback licenses meaning that you can stop renewing and still keep using your current version, although you'll lose the ability to upgrade to newer versions of the product.

As a software tool, it's expensive but not exorbitantly so.  For any commercial software development team that cares about the quality of their code, and especially a team tasked with maintaining a software product over time, NDepend is not really expensive at all when compared to the cost of other tools that typical software development teams will invest in.  In such an environment, NDepend is well worth the money.  NDepend has the ability to fit in with most modern team's processes and workflows and has the power to be able to be configured and tweaked to any given team's requirements regarding what constitutes good vs bad code and the level of "quality" that the code has to adhere to before it's considered good enough for production release.  This gives the tool incredible power and flexibility.  That said, I think it's perhaps too expensive a tool for individual software developers who may wish to purchase just for themselves.  Also, to the best of my knowledge, there's no free version of NDepend available for open source projects, as there are of other tools such as ReSharper, OzCode, TeamCity, AppVeyor and many more.  For this reason, I'd love to see another SKU or edition of the NDepend product that is more affordable for individual developers.  Perhaps an "NDepend Lite" that ships only as a Visual Studio plug-in and removes some functionality such as the ability to edit CQLinq queries and the tracking of metrics over time and having a more affordable price for individual developers.

Setup of NDepend is very easy, being delivered as a simple .zip archive file.  The integration of the Visual Studio plugin is also very simple to install and use, and registering your software license is straight-forward and uncomplicated so you can be up and running with the NDepend product in a matter of minutes.  In a way, this is deceptive as once you're back in Visual Studio (or even the Visual NDepend stand alone IDE) it can be difficult to know exactly where to start.  Again, this is largely my own fault for trying to dive straight in without referring to the documentation.  And it must be said that the NDepend's documentation is exceptionally good.  Although newer versions of NDepend try to improve ease of accessibility into the software, the newly added "guide tool-tips" are often not necessarily easily discoverable until you stumble onto them resulting in a bit of a chicken-and-egg situation.  It's hard to really criticise the product for this, though, as by its very nature, it is a highly technical and complex product.  Having done things the hard way, my recommendation would be to spend some time reading the (very good) documentation first and only then diving into the product.

Although the initial learning curve can be quite steep, usage of the product is very easy, especially once you get familiar with the various functions, menu options and windows. And if you're stuck at any point, the excellent documentation and handy tool-tips are always only a mouse click (or hover) away.  Also, NDepend is quick.  Very quick.  Once analysis of a solution is completed - using my test solution, NDepend was able to perform its analysis in less than half the time it takes to fully compile and build the solution - the complete code-base can be queried and searched in near real-time.  And as well as providing a plethora of highly informative and profoundly useful metrics on the analysed code, NDepend also contains such functionality as a general-purpose "code-search-on-steroids" that give other tools offering similar functionality (i.e ReSharper) a run for their money.

NDepend isn't perfect, and I did discover a few quirks with the product as I used it.  One quirk seemed to be a bit of a false positive which was a broken built-in rule that stated that your own methods shouldn't be called "Dispose".  The section of code that was identified as breaking this rule did indeed have a Dispose() method, but it was necessary due to interface implementation.  In this case, it wasn't the IDisposable interface that the class was implementing, but rather the IHttpModule interface that forced me to implement the Dispose() method, yet NDepend didn't seem to like that and flagged that as a broken rule.  NDepend also has a rule about the names of methods not being too long, however, I was finding that my unit tests were frequently falling afoul of this rule as most of them had quite long method names.  This is quite normal for unit tests, and is often considered good or best-practice to name unit tests in such a way with a very expressive name.  You can always decide not to include the test assemblies in the NDepend analysis, but then you may miss out on other NDepend rules that would be very helpful, such as warning when a method has too many lines of code.  It would be nice to be able to tell NDepend to omit certain files or assemblies from certain rules or categories of rules similar to how you can tell ReSharper to ignore certain inspections via comments within your code.  You can omit code from queries at the moment, however, in order to achieve this you have to manually edit the CQLinq of the query, which isn't the best or simplest way.  On the plus side, NDepend have said that they will support a [SuppressWarning] attribute as well as custom views of code within queries, making ignoring certain code much easier in a future version of the product.

Another quirk I found was that navigating to certain code elements from either the search results or results of code rules in the Queries and Rules Edit window can sometimes be problematic.  One rule states "Avoid interfaces too big" and so the results shown in the window are all matched interfaces that the rule has deemed too big.  However, trying to navigate to the interface by double-clicking on the matching result gives an error with a reason of "N/A because interface", which is quite confusing and somewhat unintuitive.  It would be nice if navigation to code via double-clicking on query results was more universal, meaning that all results for all code elements would navigate to the corresponding section of source code when clicked.

An omission from NDepend's functionality that I'd love to see included out-of-the-box would be the ability to identify "code clones" - sections of code that are duplicated in various different parts of the solution.  Whilst this isn't a "metric" as such, it's certainly one of the first ports of call that I turn to when looking to refactor and improve an existing large code base.  In incumbent code-bases, there is frequently a fair amount of duplicated code that exists and removing as much duplication as you possibly can not only reduces the number of lines of code (so long as readability and functionality are maintained, less code is always better than more code) but it also helps in the refactoring of methods against other metrics such as cyclomatic complexity and other factors.  NDepend does include a "Search for Duplicate Code" powertool although in my experience of comparing its output to that of the "Analyze solution for Code Clones" option in Visual Studio Enterprise, I found that it didn't detect as many instances of duplicated code.

This is almost certainly due to the difference in how the two tools work - the NDepend feature looks for methods that make many calls to a "set" of other methods, whilst the Visual Studio function will look for duplicates of actual code itself.  The NDepend feature works in the way it does no doubt due to examining the compiled code rather than the raw source code.  However, in the test solution that I was using, which I knew contained many sections of identically duplicated code inline within many different methods,  NDepend failed to identify a lot of the duplicated code (not to mention taking significantly longer to perform the analysis - a whopping 42 minutes for a 38KLOC solution), whilst the Visual Studio feature detected them all in less than 5 minutes.  Also, as the NDepend functionality is only provided from the command line invoked Powertools, it's not easy to immediately navigate to the offending lines of code by double-clicking as we can with features that are part of the Visual Studio plug-in.  The good news, though, is that in speaking to Patrick, the NDepend lead developer, he tells me that a first-class in-built function for detecting code duplicates is on the roadmap for a future version of the NDepend product.  Hopefully, this functionality shouldn't be too long in coming as NDepend is a very actively developed product, receiving frequent updates.

One other feature I'd love to see in a future version of NDepend would be some dashboard metrics or query rules around identifying connascence within a software solution. Admittedly, some of the levels of connascence are dynamic and so can't really be identified without running the actual code, however, the first five types of connascence can definitely be identified via static analysis and it'd be great if NDepend could include identification of these out-of-the-box.

Overall, I would definitely recommend NDepend if code quality is important to you.  For any commercial software team charged with maintaining a code-base over a period of time, quality of that code-base is of critical importance to the overall success of the software and the ability and ease of continued development.  A tool like NDepend provides great power and insight into the quality of the code-base both right now as well as how that quality fluctuates over time.  NDepend's ability to both aggregate quality metrics into a single, simple debt percentage or rating as well as its ability to identify the many, individual specific issues that the code-base suffers from and how those issues are introduced or resolved over time is a huge help for any team in knowing not only exactly when to focus effort on quality improvement but where to focus that effort.

DDD Scotland 2018 In Review

$
0
0

IMG_20180210_084249This past Saturday 10th February 2018, the first of a new era of DDD events in Scotland took place.  This was DDD Scotland, held at the University of the West of Scotland in Paisley just west of Glasgow.

Previous DDD events in Scotland have been the DunDDD events in Dundee but I believe those particular events are no more.  A new team has now assembled at taken on the mantel of representing Scottish developers with a new event of the DDD calendar.

This was a long drive for me, so I set off on the Friday evening after work to travel north to Paisley.  After a long, but thankfully uneventful, journey I arrived at my accommodation for the evening.  I checked in and being quite tired from the drive, I crashed out in my room and quickly fell asleep after attempting to read for a while, despite it only being about 9pm!

The following morning I arose bright and early, gathered my things and headed off towards the University campus to hopefully get one of the limited parking spaces at the DDD Scotland event.  It was only a 10 minutes drive to the campus, so I arrived early and was lucky enough to get a parking space.  After parking the car, I grabbed my bag and headed towards the entrance, helpfully guided by a friendly gentleman who had parked next to me and happened to be one of the university's IT staff.

IMG_20180210_085213Once inside, we quickly registered for the event by having our tickets scanned.  There were no name badges for this DDD events, which was unusual.  After registration it was time to head upstairs to the mezzanine area where tea, coffee and a rather fine selection pastries awaited the attendees.   DDD Scotland had 4 separate tracks of talks and the 4th track was largely dedicated to lightning talks and various other community-oriented talks and events taking place in a specific community room.  This was something different from other DDD events and was an interesting approach.

After a little while, further attendees arrived and soon the mezzanine level was very busy.  The time was fast approaching for the first session, so I made my way back downstairs and headed off the main lecture hall for my first session, Filip W.'s Interactive Development With Roslyn.

IMG_20180210_092712Filip starts by saying that this will be a session about looking at ways of running C# interactively without requiring full programs or a full compile / build step.  He says that many current dynamic or scripting languages have this way of writing some code and quickly running it and that C# / .NET can learn a lot from that work flow.  We start by looking at DotNet Core, which has this ability available using the "Watcher".  We can run dotnet watch run from the command line and this will cause the compiler to re-compile any files that get modified and saved within the folder that is being watched.  As well as invoking the dotnet watch command with run, we can invoke it with the test parameter instead, dotnet watch test, which will cause dotnet core to run all unit tests within the watched folder.  This is effectively a poor man's continuous testing.  Filip shows us how we can exclude certain files from the watcher process by adding <ItemGroup> entries into the csproj file.

Next, Filip talks about "Edit & Continue". It was first announced for C# back in 2004, however, Edit & Continue frequently doesn't work and the VS IDE doesn't help to identify the things that are or are not supported by the Edit & Continue functionality. The introduction of Roslyn helped greatly with Edit & Continue amongst other things. For example, prior to Roslyn, you couldn't edit lambda expressions during edit & continue session, but with Roslyn you can.

IMG_20180210_094425Visual Studio 2017 (version 15.3) has finally implemented Edit & Continue for C# 7.0 language features.  Filip shows some sample C# code that will load in a memory stream of already compiled code, perform some small changes to that code and then send the changed stream to the Roslyn compiler to re-compile on the fly!

From here, we move on to look at the C# REPL.  A REPL is a Read-Eval-Print-Loop.  This existed before C# introduced the Roslyn compiler platform, but it was somewhart clunky to use and had many limitations.  Since the introduction of Roslyn, C# does indeed now have a first class REPL as part of the platform which is built right into, and ships with, the Roslyn package itself, called "CSI".  CSI is the "official" C# REPL, but there's also scriptcs, OmniSharp, CS-REPL all of which are open source.

Filip says how Roslyn actually introduced a new "mode" in which C# can be executed, specifically to facilitate running individual lines of C# code from a REPL.  This allows you to (for example) declare and initialise a variable without requiring a class and a "sub Main" method to serve as the execution context.  Roslyn also supports expressions such as #r System.IO as a way of introducing references.  Filip also states how it's the only place where valid C# can be written that uses the await keyword without a corresponding async keyword.  We're told that C# REPL compilation works by "chaining" multiple compilations together. So we can declare a variable in one line, compile it, then use that variable on the next REPL loop which is compiled separately and "chained" to the previous compilation in order to reference it.  Within Visual Studio, we have the "C# Interactive Window" which is really just the CSI terminal, but with a nice WPF front-end on top of it, providing such niceties as syntax colouring and Intellisense.

Filip shows us some code that highlights the differences between valid and legal REPL code and "normal" C# code that exists as part of a complete C# program.  There's a few surprises in there, so it's worth understanding the differences.

IMG_20180210_100450Filip goes on to talk about a product called Xamarin Workbooks.  This is an open source piece of software that fuses together documentation with interactive code.  It allows the writing of documentation files, usually written in a tutorial style, in Markdown format with the ability to embed some C# (or other language) code inside.  When the markdown file is rendered by the Xamarin Workbooks application, the included C# code can be compiled and executed from the application rendering the file.  It's this kind of functionality that allows many of the online sites that offer the ability to "try X" for different programming languages (e.g. Try F#, GoLang etc.)

After Filip's talk, it was time to head back to the mezzanine level for further tea and coffee refreshments as well as helping to finish off some of the delicious pastries that had amazingly been left over from the earlier morning breakfast.  After a quick refreshment, it was time for the next session which, for me, was in the same main hall that I'd previously been in and this one was Jonathan Channon's Writing Simpler ASP.NET Core.

IMG_20180210_103344Jonathan started by talking about SOLID.  These are the principles of Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation and Dependency Inversion.  We've probably all used these principles in guiding the code that we write, but Jonathan asks if an adherence to SOLID is actually the best approach.  He shows us a code sample with a large number of parameters for the constructor.  Of course, all of the parameters are to inject the dependencies into the class:

public class FooService
{
	public FooService(ICustomerRepository customerRepository,
					  IBarRepository barRepository,
					  ICarRepository carRepository,
					  IUserRepository userRepository,
					  IPermissionsRepository permissionsRepository,
					  ITemplateRepository templateRepository,
					  IEndpointRepository endpointRepository)
					  {
					  	// ...
					  }
}

Jonathan says how one of the codebases that he works with has many classes like this with many constructor parameters.  After profiling the application with JetBrains' dotTrace, it was found there were a number of performance issues with the major one being the use of reflection due to the IoC framework using extensive reflection in order to provide those dependencies to the class.

Jonathan proceeds with his talk and mentions that it'll be rather code heavy.  He's going to show us a sample application written in a fairly typical style using the SOLID principles and then "morph" that project through a series of refactorings into something that's perhaps a bit less SOLID, but perhaps more readable and easier to reason about.  He shows us some more code for a sample application and tells us that we can get this code for ourselves from his GitHub repository.  We see how the application is initially constructed with the usual set of interface parameters to the constructors of the MVC controller classes.  This can be vastly improved by the use of a mediator pattern which can be provided by such libraries as MediatR.  Using such a pattern means that the controller class only needs a single injection of an IMediator instance. Controller action methods simply create a command instance which is handed off to the handler class and thus removing that long list of dependencies of the controller class.  So we can turn code like this:

public class MyController
{
	private IFoo foo;
	private IBar bar;
	private IBaz baz;
	private ICar car;
	private IDoo doo;
	public MyController(IFoo fooDependency,
					    IBar barDependency,
						IBaz bazDependency,
						ICar carDependency,
						IDoo dooDependency)
						{
							// ...
						}
	public IEnumerable<Foo> Get()
	{
		var foos = foo.GetFoos();
		return MapMyFoos(foos);
	}
	// Other methods that use the other dependencies.
}

Into code a bit more like this:

public class MyMediatedController
{
	private IMediator mediator;
	public MyMediatedController(IMediator mediator)
	{
		this.mediator = mediator;
	}

	public IEnumerable<Foo> Get()
	{
		var message = GetFoosMessage();
		return this.mediator.Send(message);
	}
	// Other methods that send messages to the same Mediator.
}


public class GetFoosMessageHandler : IRequestHandler<GetFoosMessage, IEnumerable<Foo>>
{
	public IEnumerable<Foo> Handle(GetFoosMessage message)
	{
		// Code to return a collection of Foos.
		// This may use the IFoo repository and the GetFoosMessage
		// can contain any other data that the task of Getting Foos
		// might require.
	}
}

The new MyMediatedController now has only one dependency and it's on the Mediator type.  This Mediator is responsible for sending "messages" in the shape of object instances to the class that "handles" that message.  It's the responsibility of that class to perform the task required (GetFoos in our example above), relieving the controller class of having to be loaded down with lots of different dependencies.  Instead the controller focuses on the the thing that controller is supposed to do and that is simply orchestrate the incoming request with the code that actually performs the task requested.  Of course, now we have a dependency on the MediatR framework, but we can remove this by "rolling our own" mediator pattern code, which is fairly simple to implement.  Jonathan mentions his own Botwin framework, which is a web framework that takes the routing techniques of the NancyFX framework and applies them directly on top of ASP.NET Core.  By using this in conjunction with a hand rolled mediator pattern, we can get code (and especially controller code) that has the same readability and succinctness and without the external dependencies (apart from Botwin, of course!).

Next, Jonathan talks about the idea of removing all dependency injection.  He cites an interesting blog post by Mike Hadlow that talks about how C# code can be made more "functional" by passing in class constructor dependencies into the individual methods that require that dependency.  From there, we can compose our functions and use such techniques as partial application to supply the method's dependency in advance, leaving us with a function that we can pass around and use with having to supply the dependency each time it's used, just the other data that the method will operate on.  So, instead of code like this:

public interface IFoo
{
	int DoThing(int a);
}

public class Foo : IFoo
{
	public IFoo fooDependency;
	
	public Foo(IFoo fooDepend)
	{
		fooDependency = fooDepend;
	}
	
	public int DoThing(int a)
	{
		// implementation of DoThing that makes use of the fooDependency
	}
}

We can instead write code like this:

public static int DoThing(IFoo fooDependency, int a)
{
	// some implementation that uses an IFoo.
}

var dependency = new FooDependency();  // Implements IFoo

// Closes over the dependency variable and provides a function that
// can be called by only passing the int required.
Func<int, int> DoThingWithoutDependency = x => DoThing(dependency, x);

Now, the dependency to the DoThing function is already composed by some code - this would perhaps be some single bootstrapper style class that sets up all the dependencies for the application in one central location - and the DoThingWithoutDependency function now represents the DoThing function that has had its dependency partially applied meaning that other code that needs to call the DoThing function calls DoThingWithDependency instead and no longer needs to supply the dependency.  Despite the use of a static method, this code remains highly testable as the DoThingWithoutDependency function can be re-defined within a unit test, similar to how we would currently use a mock implementation of our interface but without requiring a mocking framework.  Another dependency removed!

Jonathan round off his talk by asking if we should still be building applications using SOLID.  Well, ultimately, that's for us to decide.  SOLID has many good ideas behind it, but perhaps it's our current way of applying SOLID within our codebases that needs to be examined.  And as Jonathan has demonstrated for us, we can still have good, readable code without excessive dependency injections that still adheres to many of the SOLID principles of single responsibility, interface segregation etc.

After Jonathan's talk it was time for a short break.  The tea and coffee were dwindling fast, but there would be more for the breaks in the afternoon's sessions.  I'd had quite a fair bit of coffee by the point, so decided to find my next session.  The room was quite some way across the other side of the building, so I headed off to get myself ready for Robin Minto's Security In Cloud-Native.

IMG_20180210_113140Robin starts by introducing himself and talking about his background.  He started many years ago when BBC Micros were in most classrooms of schools. He and his friend used to write software to "take over" the schools network of BBC machines and display messages and play sounds.  It was from there that both Robin and his school friend became interested in security.

We start by looking at some numbers around security breaches that have happened recently.  There's now currently over 10 billion data records that have been lost or stolen.  This number is always growing, especially nowadays as more and more of our lives are online and our personal information is stored in a database somewhere, so security of that information is more important now than it's ever been.  Robin then talks about "cloud-native" and asks what the definition is.  He says it's not simply "lift-and-shift" - the simple moving of virtual machines that were previously hosted on on-premise hardware but are now hosted on a "cloud" platform.  We look at the various factors stated in the 12 Factor App documentation that can help us get a clearer understanding of cloud native.  Cloud native is applications that are built for the cloud first.  They run in virtual machines, or more likely containers these days, and are resilient to failures and downtime, expect their boundaries to be exposed to attack and so security is a first class consideration when building each and every component of a cloud-native application.  Robin makes reference to a talk by Pivotal's San Newman at NDC London 2018 that succinctly defines cloud-native as application that make heavy use of DevOps, Continuous Delivery, Containers and Micro-Services.

We look at the biggest threats in cloud native, and these can be broadly expressed as Vulnerable Software, Leaked Secrets and Time.  To address the problems of vulnerable software, we must continually address defects, bugs and other issues within our own software.  Continuous repair of our software must be part of our daily software development tasks.  We also address this through continuous repaving.  This means tearing down virtual machines, containers and other infrastructure and rebuilding it.  This allows for operating systems and other infrastructure based software and configuration to be continually rebuilt preventing the ability for any potential malware to infect and remain dormant within our systems over time.

We can address the potential for secrets to be leaked by continually changing and rotating credentials and other secrets that our application relies on.  Good practices around handling and storing credentials and secrets should be part of the development team's processes to ensure such things as committing credentials into our source code repositories doesn't happen.  This is important not only for public repositories but also for private ones too.  What's private today, could be become public tomorrow.  There are many options now for using separate credential/secrets stores (for example, HashiCorp's Vault or IdentityServer) which ensures we keep sensitive secrets out of potentially publicly accessible places.  Robin tells us how 81% of breaches are based on stolen or leaked passwords and it's probably therefore preferable to prevent the user from selecting such insecure passwords in the first place by simply prohibiting their use.  The same applies to such data as Environment Variables.  They're potentially vulnerable stored on the specific server that might need them, so consider moving them from off the server and onto the network to increase security.

Time is the factor that runs through all of this.  If we change things over time, malware and other bad actors seeking to attack our system have a much more difficult time.  Change is hostile to malware, and through repeated use of a Repair, Repave and Rotate approach, we can greatly increase the security of our application.

Robin asks if, ultimately, we can trust the cloud.  There's many companies involved in the cloud these days, but we mostly hear about the "Tier 1" players.  Amazon, Microsoft and Google.  They're so big and can invest so much in their own cloud infrastructure that they're far more likely to have much better security than anything we could provide ourselves.  Robin gives us some pointers to places we can go to find tools and resources to help us secure our applications.  OWASP is a great general resource for all things security related.  OWASP have their own Zed Attack Proxy project, which is a software tool to help find vulnerabilities in your software.  There's also the Burp Suite which can also help in this regard.  There's also libraries such as Retire.js that can help to identify those external pieces of code that a long running code base accumulates and which can and should be upgraded over time as new vulnerabilities are discovered and subsequently fixed in newer versions.

IMG_20180210_121823After Robin's talk it was time for lunch.  We all headed back towards the mezzanine upper floor in the main reception area to get our food.  As usual, the food was a brown bag of crisps, chocolate bar and a piece of fruit along with a choice of sandwich.  I was most impressed with the sandwich selection at DDD Scotland as there was a large number of options available for vegetarian and vegans (and meat eaters, too!).  Usually, there's perhaps only one non-meat option available, but here we had around 3 or 4 vegetarian options and a further 2 or 3 vegan options!  I chose my sandwich, which was from the vegan options, a lovely falafel, houmous and red pepper tapenade and picked up a brown bag and headed off to one of the tables downstairs to enjoy my lunch.

IMG_20180210_122051I wasn't aware of any grok talks going on over the lunch period and this was possibly due to the separate track of community events that ran throughout the day and concurrently with the "main" talks.  I scanned my agenda printout and realised that we actually had a large number of sessions throughout the entire day.  We'd had 3 sessions in the morning and there was another 4 in the afternoon for a whopping total of 7 sessions in a single track.  This is far more than the usual 5 sessions available (3 before lunch and 2 afterwards) at most other DDD events and meant that each session was slightly shorter at 45 minutes long rather than 60.

After finishing my lunch, I popped outside for some fresh air and to take a brief look around the area of Paisley where we were located.  After a fairly dull and damp start to the day, the weather had now brightened up and, although still cold, it was now a pleasant day.  I wandered around briefly and took some pictures of local architecture before heading back to the university building for the afternoon's sessions.  The first session I'd selected was Gary Fleming's APIs On The Scale Of Decades.

IMG_20180210_132818Gary starts with a quote.   He shares something that Chet Haase had originally said:

"API's are hard. They're basically ship now and regret later".

Gary tells us that today's API's aren't perfect, but they're based upon a continually evolving understanding of what constitutes a "good" API.  So, what makes a modern API "good"?  They need to be both machine readable and human readable.  They need to be changeable, testable and documented.

Gary talks about an interesting thing called "affordance".  The term was first coined by the psychologist James J. Gibson, and the Merriam-Webster dictionary defines affordance as:

the qualities or properties of an object that define its possible uses or make clear how it can or should be used

Affordance can be seen as "implied documentation".  When we see a door with a handle and perhaps a "Pull" sign, we know that we can use the handle to pull the door open.  The handle and spout on a teapot indicates that we would use the handle to hold the teapot and tilt it to pour the liquid out through the spout.  This is what's known as "perceived affordance". Gary mentions the floppy disk icon that's become ubiquitous as an icon to represent the "Save" action within many pieces of software.  The strange thing is that many users of that software, who implicitly understand that the disk icon means "Save", have never seen an actual floppy disk.

It turns out that affordance is incredibly important, not only in the design of every day things, but in the design of our APIs.  Roy Fielding was the first to talk about RESTful API's.  These are API's that conform to the REST way and are largely self-documenting.  These API's espouse affordance in their design, delivering not only the data the user requested for a given API request, but also further actions that the user can take based upon the data delivered.  This could be presenting the user with a single "page" of data from a large list and giving the user direct links to navigate to the previous and next pages of data within the list.

This is presenting Information and Controls.   Information + Controls = Better API.  Why is this the case?  Because action contextualises information which in turn contextualises actions.

We look at nouns and verbs and their usage and importance as part of affordance.  They're often discoverable via context and having domain knowledge can significantly help this discovery.  We look at change.  Gary mentions the philosophical puzzle, "The Ship Of Theseus" which asks if a ship that has all of it's component parts individually replaced over time is still the same ship.  There's no right or wrong answer to this, but it's an interesting thought experiment.  Gary also mentions something call biomimicry, which is where objects are modelled after a biological object to provide better attributes to the non-biological object.  Japanese Shinkansen trains (bullet trains) have their noses modelled after kingfishers to prevent sonic booms from the train when exiting tunnels.

Gary moves on to talk about testing.  We need lots of tests for our API's and it's by having extensive tests that allows us to change our API faster and easier.  This is important as API's should be built for change and change should be based upon consumer-driven contracts.  The things that people actually use and care about.  As part of that testing, we should use various techniques to ensure that expectations around the API are not based upon fixed structures.  For example, consumers shouldn't rely on the fact that your API may have a URL structure that looks similar to .../car/123.  The consumer should be using the affordance exposed from your API in order to navigate and consume you API.  To this end, you can use "fuzzers" to modify endpoints and parameters as well as data.  This breaks expectations and forces consumers to think about affordance.  Gary says that it's crucial for consumers to use domain knowledge to interact with your API, not fixed structures. It's for this reason that he dislikes JSON with Swagger etc. as an API delivery mechanism as it's too easy for consumers to become accustomed to the structure and grow to depend upon exactly that structure.  They, therefore, don't notice when you update the API - and thus the Swagger documentation - and their consumption breaks.

Finally, Gary mentions what he believes is the best delivery mechanism for an API.  One that provides for rich human and machine readable hyperlinks and metadata and exposes affordance within its syntax.  That mechanism is HTML5!  This is a controversial option and has many of the attendees of the session scratching their heads, but in thinking about it, there's a method to the madness here.  Gary says how HTML5 has it all - affordance, tags, links, semantic markup.  Gary says how a website such as GitHub IS, in effect, an API.  We may interact with it as a human, but it's still an API and we use hypermedia links to navigate from one piece of data to another page with alternative context on the same data i.e. when looking at the content of a file in a repository, there's links to show the history of that file, navigate to other files in the same repository etc.

After Gary's session was over it was time for another short break before the next session of the afternoon.  This one was to be Joe Stead's Learning Kotlin As A C# Developer.

IMG_20180210_143019Joe states that Kotlin is a fairly new open source language that takes many of the best bits of existing languages of many different different disciplines (i.e. object-oriented, functional etc.) with the aim of creating one overall great language.  Kotlin is a language that is written targeting the JDK and runs on top of the JVM, so is in good company with other languages such as Java, Scala, Clojure and many others.  Kotlin is a statically typed and object oriented language that also includes many influences from more functional languages.

As well as compiling to JVM byte code, Kotlin can also compile to JavaScript, and who doesn't want that ability in a modern language?   The JavaScript compilation is experimental at the moment, but does mostly work.  Because it's built on top of the familiar Java toolchain, build tools for Kotlin are largely the same as Java - either Maven or Gradle.

Joe moves on to show us some Kotlin code.  He shows how a simple function can be reduced to a simplified version of the same function, similar to show C# has expression bodied statements, so this:

fun add(a : Int, b : Int) : Int
{
    return a+b
}

Can be expressed as:

fun add(a: Int, b : Int) = a + b

In the second example, the return type is inferred, and we can see how the language differs from C# with the data types expressed after the variable name rather than in front of it.  Also, semicolons are entirely optional.  The var keyword in Kotlin is very similar to var in C# meaning that variables declared as var must be initialised at time of declaration so that the variable is strongly typed.  Variables declared with var are still writable after initialisation, albeit with the same type.  Kotlin introduces another way of declaring and initialising variables with the val keyword.  val works similarly to var, strongly typing the variable to it's initialisation value type, but it also makes the variable read only after initialisation.  The use of val can be used within class, too, meaning that the parameters to a class's constructor can be declared with the val keyword and they're read only after construction of an instance of the class.  Kotlin also implicitly makes these parameters available as read only properties, thus code like the following is perfectly valid:

class Person (val FirstName : String, val LastName : String)

fun main(args: Array<String>) {
    var myPerson = Person("John", "Doe")
    println(myPerson.FirstName)
}

The classes parameters could also have been declared with var instead of val and Kotlin would provide us with both readable and writable properties.  Note also that Kotlin doesn't use the new keyword to instantiate classes, but simply accesses them as though it were a function.  Kotlin has the same class access modifiers as C#, so classes can be private, public, protected or internal.

Kotlin has a concept of a specific form of class known as a data class.  These are intended for use by classes whose purpose is to simply hold some data (aka a bag of properties).  For these cases, it's common to want to have some default methods available on those classes, such as equals(), hashCode() etc.  Kotlin's data classes provide this exact functionality without the need for you to explicitly implement these methods on each and every class.

There's a new feature that may be coming in a future version of the C# language that allows interfaces to have default implementations for methods, and Kotlin has this ability built in already:

fun main(args: Array<String>) {
    var myAnimal = Animal()
    myAnimal.speak()
}

interface Dog {
    fun speak() = println("Woof")
}

class Animal : Dog { }

One caveat to this is that if you have a class that implements multiple interfaces that expose the same method, you must be explicit about which interface's method you're calling.  This is done with code such as super<Dog>.speak() or super<Cat>.speak(), and is similar to how C# has explicit implementation of an interface.

Kotlin provides "smart casting", this means that we can use the "is" operator to determine if a variable's type is of a specific subclass:

fun main(args: Array<String>) {
    var myRect = Rectangle(34.34)
    var mySquare = Square(12.12)
    println(getValue(myRect))
    println(getValue(mySquare))
}

fun getValue(shape : Shape) : Double
{
    if(shape is Square)
    {
        return shape.edge
    }
    if(shape is Rectangle)
    {
        return shape.width
    }
    return 0.toDouble()
}

interface Shape {} 

class Square(val edge : Double) : Shape {}
class Rectangle(val width : Double) : Shape {}

This can be extended to perform pattern matching, so that we can re-write the getValue function thusly:

fun getValue(shape : Shape) : Double
{
    when (shape)
    {
        is Square -> return shape.edge
        is Rectangle -> return shape.width
    }
    return 0.toDouble()
}

Kotlin includes extension methods, similar to C#, however, the syntax is different and they can be simply applied by declaring a new method with the class that will contain the extension method used as part of the method's name, i.e. fun myClass.MyExtensionMethod(a : Int) : Int.  We also have lambda expressions in Kotlin and, again, these are similar to C#.  This includes omitting the parameter for lambda functions that take only a single parameter, i.e. ints.map { it*2 }, as well as using LINQ-style expressive code, strings.filter { it.length==5 }.sortedBy { it }.map { it.toUpperCase() }.  Kotlin lambdas also use a special "it" keyword that can refer to the current object, for example: var longestCityName = addresses.maxBy { it.city.length }.  Kotlin also has a concept of lambdas that can have "receivers" attached.  These are similar to extension methods that work against a specific type, but have the added ability that they can be stored in properties and passed around to other functions:

fun main(args: Array<String>) {
    println("123".represents(123))
    println(123.represents("123"))
}

// This is an extension method
fun String.represents(another: Int) = toIntOrNull() == another

// This is a Lambda with receiver
val represents: Int.(String) -> Boolean = {this == it.toIntOrNull()}

Joe tells us that there's a lot more to Kotlin and that he's really only able to scratch the surface of what Kotlin can do within his 45 minutes talk.  He provides us with a couple of books that he considers good reads if we wish to learn more about Kotlin, Kotlin In Action and Programming Kotlin.  And, of course, there's the excellent online documentation too.

After Joe's session, it was time for another refreshment fuelled break.  We made our way to the mezzanine level once again for tea, coffee and a nice selection of biscuits.  After a quick cup of coffee and some biscuits it was time for the next session in the packed afternoon schedule.  This would be the penultimate session of the day and was to be Kevin Smith's Building APIs with Azure Functions.

IMG_20180210_153425Kevin tells us how Azure functions are serverless pieces of code that can operate in Microsoft's Azure cloud infrastructure.  They can be written in a multitude of different languages, but are largely tied to the Azure infrastructure and back-end that they run on.

Kevin looks at how our current API's are written.  Granular pieces of data are often grouped together into a JSON property that represents the complete object that we're returning and this allows us to add additional properties to the response payload such as HAL style hypermedia links.  This makes it amenable to being a self documenting API, and if this kind of response is returned from a single Azure Function call, we can make a disparate set of independent Azure Functions appear, to the user, to be a cohesive API.

Kevin shows us how Azure Functions can have HTTP triggers configured against them.  These allow Azure Functions, which are otherwise just simple programmatic functions, to be accessed and invoked via a HTTP request - it's this ability that allows us to build serverless API's with Azure Functions.  We look at an extension to Visual Studio that allows us to easily build Azure Functions, called "Visual Studio Tools For Azure Functions", funnily enough.  Kevin mentions that by using this extension, you can develop Azure Functions and both run and debug those functions on a local "emulator" of the actual Azure environment.  This means that it's easy to get your function right before you ever need to worry about actually deploying it to Azure.  This is a benefit that Azure Functions has over one of it's most popular competitors, AWS Lambda.  Another benefits of Azure Functions over AWS Lambda is that AWS Lambda requires you to use another AWS service, namely API Gateway, in order to expose a serverless function over HTTP.  This has an impact on cost as you're now paying for both the function itself and the API Gateway configuration to allow that function to be invoked via a HTTP request.  Azure Functions has no equivalent of AWS's API Gateway as it's not required and so you're simply paying for the function alone.

As well as this local development and debugging ability, we can deploy Azure Functions from Visual Studio to the cloud just as easily as we can any other code.  There's the usual "Publish" method which is part of Visual Studio's Build menu, but there's also a new "Zip Deploy" function that will simply create a zip archive of your code and push it to Azure.

Azure Functions have the ability to generate an OpenAPI set of documentation for them built right into the platform.  OpenAPI is the new name for Swagger.  It's as simple as enabling the OpenAPI integration within the Azure portal and all of the documentation is generated for you.  We also look at how Azure Functions can support Cross-Origin Resource Sharing via a simple additional HTTP header, so long as that 3rd party origin is configured within Azure itself.

There are many different authorisation options for Azure Functions.  There's a number of "Easy Auth" options which leverage other authentication that's available within Azure such as Azure Active Directory, but you can easily use HTTP Headers or URL Query String parameters for your own hand-rolled custom authentication solution.

Kevin shares some of the existing limitations with Azure Functions.  It's currently quite difficult to debug azure functions that are running within Azure, also the "hosting" of the API is controlled for you by Azure's own infrastructure, so there's little scope for alterations there.  Another, quite frustrating, limitation is that due to Azure Functions being in continual development, it's not unheard of for Microsoft to roll out new versions which introduce breaking changes.  This has affected Kevin on a number of occasions, but he states that Microsoft are usually quite quick to fix any such issues.

After Kevin's session was over it was time for a short break before the final session of the day.  This one was to be Peter Shaw's TypeScript for the C# Developer.

IMG_20180210_163106Peter starts by stating that his talk is about why otherwise "back-end" developers using C# should consider moving to "front-end" development.  More and more these days, we're seeing most web-based application code existing on the client-side, with the server-side (or back-end) being merely an API to support the functionality of the front-end code.  All of this front-end code is currently written in JavaScript.  Peter also mentions that many of today's "connected devices", such as home appliances, are also run on software and that this software is frequently written using JavaScript running on Node.js.

TypeScript makes front-end development work with JavaScript much more like C#.  TypeScript is a superset of JavaScript that is 100% compatible with JavaScript.  This means that any existing JavaScript code is, effectively, also TypeScript code.  This makes it incredibly easy to start migrating to TypeScript if you already have an investment in JavaScript.  TypeScript is an ECMAScript 6 transpiler and originally started as a way to provide strong typing to the very loosely typed JavaScript language.  Peter shows us how we can decorate our variables in TypeScript with type identifiers, allowing the TypeScript compiler to enforce type safety:

// person can only be a string.
var person : string;

// This is valid.
person = "John Doe";

// This will cause a compilation error.
person = [0,1,2];

TypeScript also includes union types, previously known as TypeGuards, which "relaxes" the strictness of the typing by allowing you to specify multiple different types that a variable can be:

// person can either be a string or an array of numbers.
var person : string | number[];

// This is valid, it's a string.
person = "John Doe";

// This is also valid, it's an array of numbers.
person = [0,1,2];

// This is invalid.
person = false;

Peter tells us how the TypeScript team are working hard to help us avoid the usage of null or undefined within our TypeScript code, but it's not quite a solved problem just yet.  The current best practice does advocate reducing the use and reliance upon null and undefined, however.

TypeScript has classes and classes can have constructors.  We're shown that they are defined with the keyword of constructor().  This is not really a TypeScript feature, but is in fact an underlying ECMAScript 6 feature.  Of course, constructor parameters can be strongly typed using the TypeScript typing.  Peter tells us how, in the past, TypeScript forced you to make an explicit call to super() from the constructor of a derived class, but this is no longer required.

TypeScript has modules.  These are largely equivalent to namespaces in C# and help to organise large TypeScript programs - another major aim of using TypeScript over JavaScript.  Peter shares one "gotcha" with module names in TypeScript and that is that, unlike namespace in C# which can be aliased to a shorter name, module names in TypeScript must be referred to by their full name.  This can get unwieldy if you have very long module, class and method names as you must explicitly reference and qualify calls to the method via the complete name.  TypeScript has support for generics.  They work similarly to C#'s generics and are also defined similarly:

function identity<T>(arg: T): T {
    return arg;
}

You can also define interfaces with generic types, then create a class that implements the interface and defines the specific type:

interface MyInterface<T>{
    myValue : T;
}

class myClass implements MyInterface<string>{
    myValue;
}

let myThing = new myClass();

// string.
console.log(typeof(myThing.myValue));

The current version of TypeScript will generate ES6 compatible JavaScript upon compilation, however, this can be modified so that ES5 compliant JavaScript is generated simply by setting a compiler flag on the TypeScript compiler.  Many of the newer features of ES6 are now implemented in Typescript such as Lambdas (aka Arrow Functions) and default parameter values.  To provide rich support for externally defined types, TypeScript makes use of definition files.  These are files that have a ".d.ts" extension and provide for rich type support as well as editing improvements (such as Intellisense for those editors that support it).  The canonical reference source for such definition files is the definitelytyped.org website, which currently contains well over 5000 files that provide definitions for the types contained in a large number of external JavaScript libraries and frameworks.  Peter tells us how TypeScript is even being adopted by other frameworks and mentions how modern Angular versions are actually written in TypeScript.

IMG_20180212_065831After Peter's session was over it was time for a final quick break before the conference wrap-up and prize session would be taking place in the main hall.  Due to the fact that this particular DDD had had 7 sessions throughout the day, it ran a little later than other DDD's do, so it was approximately 5:30pm by the time Peter's session was finished.  Due to having had a long day, and a further 3.5+ hour drive facing me to return home that evening, I unfortunately wasn't able to stay around for the closing ceremony and decided to walk back to my car to start that long journey home.  I'd had a brilliant day at the inaugural DDD Scotland organised by the new team.  It was well worth the long journey there and back and here's hoping it will continue for many more years to come.

Beware NuGet's Filename Encoding

$
0
0
Cautions around how NuGet encodes filenames inside the .nupkg file.

Setting up Jenkins on Windows with Git, Mercurial and SSH.

$
0
0
How to correctly setup a Jenkins build server on a Windows machine with Git, Mercurial and SSH too.

DDD 11 In Review

$
0
0
My review of the DeveloperDeveloperDeveloper (aka DDD Reading) 2016 conference, held at Microsoft's UK headquarters in Reading.
Viewing all 103 articles
Browse latest View live