Introduction
While developing PullReview (our Automated Code Review tool for Ruby), we interact a lot with external services (like GitHub), whose APIs mostly return JSONs. While nice, this lead us to some attempts at managing nil values at different levels.
Sample
Let’s say we ask GitHub for information about a specific commit, and want to extract its user name and repository name, in order to be able to recreate the ‘user/repo’ identifier.
The JSON may look like:
{ "repository": { "owner": { "name": "acme" }, "name": "dynamite", "date": "2013-07-21" } }
Based on this sample, we want a method to return “acme/dynamite”. Looks easy:
def user_repo(payload) repository = payload['repository']['name'] owner = payload['repository']['owner']['name'] "#{owner}/#{repository}" end
Done!
Nil happens
This works great, until our first call returns an empty repository. Without a name, our code fails with a dramatic:
/home/martin/code/test/nilnil/test.rb:7:in `user_repo': undefined method `[]' for nil:NilClass (NoMethodError) from /home/martin/code/test/nilnil/test.rb:76:in `<main>'
Even worse, each of the elements can actually be nil, i.e., we can have a repository with no name, a repository with no owner, or with an owner with no name.
Reference: Guard clauses
Time for some guard clauses:
def user_repo(payload) return nil unless payload['repository'] repository = payload['repository']['name'] return nil unless repository # return nil unless payload['repository']['owner'] owner = payload['repository']['owner']['name'] return nil unless owner # "#{owner}/#{repository}" end
This works, but looks quite long for such a basic operation, so let’s look for alternatives.
Alternative #1: Chaining ands
The first option is to simply add ‘and’ conditions in order to chain the ‘ifs’:
def user_repo(payload) return nil unless payload && payload['repository'] && payload['repository']['name'] return nil unless payload['repository']['owner'] && payload['repository']['owner']['name'] # repository = payload['repository']['name'] owner = payload['repository']['owner']['name'] # "#{owner}/#{repository}" end
Looks a tad better, but the first line is already 93 characters long – more than what can be comfortably shown on github or inside a terminal – and this is only a three level deep model.
While GitHub does a nice job of limiting the deepness of its API (mainly using well defined endpoints), this is by no means an hard limit.
Alternative #2: Improving chains with try
Try is ActiveSupport’s answer to the “maybe I have a value, maybe not” problem. It allows you to try to call any method with any set of parameters on any object, and be sure to never raise a NoMethodError exception:
name = user.try(:name) # will return nil if user is nil
Try’s can be chained, so we can rewrite our sample:
def user_repo(payload) repository = payload['repository'].try(:[],'name') owner = payload['repository'].try(:[],'owner').try(:[],'name') # return nil unless owner && repository # "#{owner}/#{repository}" end
Try makes the intention quite clear (we try to get a value, being unsure that there is one), but loses in readability when there are parameters involved as in my case. Try requires active_support, but the gem has been nicely split in order to be able to only import the part your need (when outside of Rails).
Alternative #3: Improving chains with andand
andand is an implementation of the Maybe monad in Ruby. I prefered the “maybe” word to implement it, but the result is quite nice:
def user_repo(payload) repository = payload['repository'].andand['name'] owner = payload['repository'].andand['owner'].andand['name'] # return nil unless owner && repository # "#{owner}/#{repository}" end
No need for the awkward operator try(:[],’owner’), which helps the code to stay clean (while retaining exactly the same structure and principle).
Alernative #4: JSON Path
This is something JSON specific, but large hashes are often the result of JSON documents so it is quite interesting. JSON Path is to JSON what XPath is to XML: a query language to transform and extract data. The jsonpath gem provide an implementation in Ruby:
require 'jsonpath' def user_repo(payload) repo_path = JsonPath.new 'repository.name' repository = repo_path.on(payload).first # owner_path = JsonPath.new 'repository.owner.name' owner = owner_path.on(payload).first # return nil unless owner && repository "#{owner}/#{repository}" end
No win on the method length, as each path needs to be defined, then applied. Calling ‘on’ immediately after ‘new’ would win no readability here.
I definitely miss a construct that would allow me to write something like:
payload.query('repository.name').first
When I do not need to store or reuse the query. This is not possible for now, and would require to “monkey-patch” String, which I’m not that confident to do (remember that there is no JSON object involved here – before the parsing, it is a String, after the parsing it is a Hash).
Now, JSON Path has two main perks: it can do much much more than just selecting values by their name (it is a fully featured query language), and perhaps even more importantly, it reacts nicely when the deepness increases (if we had a first_name and last_name under name,you just need to add those strings - the code complexity will stay exactly the same).
Conclusion
With all of this said, what would be a good choice? I think I’ll choose two options, depending on the complexity of the situation:
-
andand is a clean and well thought solution, and solves the generic problem quite satisfactory
-
JSON Path is more specific, but can be really clean when the documents become larger, as it is the most resistant to complexity. As a query language, it also makes it easier to build dynamic queries, should the need arise (apart from of course being able to do much more).