Emmanuel Genard

Data Dictates Code

The code follows the shape of the data. Once code is attached to the data, the data gets hard to change. If the data is hard to change, the code is hard to change. The representation of data is a significant design decision.

Let’s say you’re tasked with writing a todo app and each todo has a status of complete or incomplete. That sounds like true/false, so you represent it as a boolean field. Easy peasy.

interface Todo {
	isComplete: boolean
}

You’re asked to separate complete from incomplete todos. Inevitably you write code like this:

if (todo.isComplete()) {
	doCompleteThings(todo);
} else {
	doIncompleteThings(todo);
}

You can implement this in ways that don’t look like conditional logic at all.

const completedTodos = getCompletedTodos()
doCompleteThings(completedTodos)
const inCompleteTodos = getIncompleteTodos()
doIncompleteThings(inCompleteTodos)

// Another option
const mapping = { true: doCompleteThings, false: doIncompleteThings}

maping[todo.isComplete](todo)

Whatever way it’s written you’re going to have if/else logic even if you don’t write if/else statements. Representing status as a boolean requires it.

Now let’s say the customers of your todo app now want a third status, “In Progress”.

What I’ve seen way too often and have been guilty of myself is adding another boolean field

interface Todo {
	description: string
	isComplete: boolean
	inProgress: boolean
}

This usually happens because the conditional checks for todo.isComplete are littered across the code base and it feels I think this is mostly a feeling and not a fact. faster to just add another boolean rather than changing the data structure to be open to any number of statuses.

After the first bug or with foresight some version of the following code will be written:

function isTodoValid(todo): boolean {
	if (todo.isComplete) && (todo.inProgress) {
		return false;
	}
	return true;
}

When not If your customers want more statuses(blocked, canceled, etc) it usually leads to more boolean fields and more complicated validation checks. It feels easier to implement new features by tacking on to the current data structure rather than redesigning. Easy usually wins. Eventually it’s not just a feeling, it’s the reality of system. There is so much code that explicitly or implicitly depends on the boolean fields that you cannot change them without redesigning the whole system.

Most of the complexity can be avoided by representing the status as an enum.

interface Todo {
	status: 'Incomplete' | 'Complete' | 'In Progress'
}

You still need to write code to do different things based status.

switch (todo.status) {
	case 'Complete':
	 doCompleteThings(todo)
	case 'Incomplete':
	  doIncompleteThings(todo)
	case 'In Progress':
	  doInProgessThings(todo)
}

However, with the enum, it is impossible for a todo to have two different statuses at the same time and adding new statuses is just adding to the enum. You would still probably have to add a new case to the switch statement anytime you add a new status but I think it much simpler than dealing with the boolean fields.

Once in use, a data structure gets built upon rather than redesigned. It’s always theoretically possible to change the data structure but practically impossible. Changing the data structure would mean changing all the code that depends on it and not shipping any new features for an unacceptable amount time. I think the best way forward is that as soon as you notice that your data structure makes it difficult to add a new feature, change the data structure to make it easy. Make the data structure smart and the code dumb.

Published: 2023-07-09

Last Edited: 2023-07-09