Why Codex?

Originally posted on dev.to

Test-driven-development. Write the test. Make it fail. Write the code. Make it pass. This is a pretty well established process that we generally at least are aware of and know we should be doing more of it in our own work.

It struck me recently, during a long team-wide discussion about the function of a specific piece of the platform we were building, that whilst the principle is sound...the implementation of it is actually quite limiting. That if we broadened the concept then it might have all sorts of beneficial effects.

Lets consider your test suite for a minute. Instead of thinking of it as a set of procedures that manipulate data to either a pass or fail state, think about it as documentation. Think about it as the single-source-of-truth specification for the software that you're building, that allows you - or other developers on your team - to understand exactly what they need to be building. After all, in a true TDD process, the tests - the documentation - gets written first.

The problem with this as a wider approach is one of readability. It's clearly readable for the person who wrote it, and most likely easily readable for other devs working on the project. But what about everyone else? What about the QA team? What about business-focused stakeholders? Product owners? What even about the sales team? How do they have confidence in the function of a product beyond their own experience with it?

The symptoms of creating a product specification like this are obvious. How many projects have you worked on where the details of functionality is buried across Jira tickets, emails, Slack threads and code comments? It can take hours to understand even the most basic requirement.

There has of course been some great work done to try and bridge this gap. Commonly known as behaviour-driven-development, these tools provide a human readable format for test suites. These tools wrap our "code" tests in a language that is more human readable - in effect making the annotation of a test a first-class citizen. This allows stakeholders and the wider team to read the spec which the developers are working towards.

This is good, however I firmly believe we can get better if we go further.

BDD still requires a dev to write the tests. It still adds another layer of communication between those actually using the software, and the creation of this specification. Another layer where the nuances that separate a good product from a great product can be lost.

Which brings me to the core question of all of this.

What if non-developers could write tests?

What is the evolution of X-DD? Test => Behaviour => ...Human?

What if it were possible for product owners and non-technical stakeholders to collaborate over an app's test suite, in the same way that we collaborate every day with Google Sheets or on a Trello board? All without requiring a developer to make the changes for them.

We would create something that is truly human readable. A single-source of truth for an entire software company to reference, when building, testing and understanding the software that is being built.

The challenge here is how we move from the squishy, grey, uncertain world of the written word, to the measurable, predictable an reliable world of software and testing.

At jbrew, we've been approaching that challenge with two key pillars. Recorded, human testing... and statistical analysis.

We figured that if the tests were written in a natural language, then they'd need to be executed in a runtime that supports that language - i.e. a real person executing that test. The issue therefore is one of consistency. We're sort of dealing with multiple "interpreters" for the same language - two people reading the same specification may not interpret it in the same way. To handle this we've been wrapping these test outputs with some statistical analysis to provide confidence intervals - essentially guidance to help stakeholders manage specifications that have a mixture of pass and fail results.

As developers ourselves, we also thought it critical for devs to be able to replay individual test runs. Whilst not core to the human-driven-development concept, it meshes well with it and allows us to see exactly why a test didn't pass, instead of requesting screenshots, recordings etc from those testing.

Recently we've wrapped a lot of our learning and processes and packaged it up into a specification & testing tool - https://codex.jbrew.co.uk/ - that we hope will be useful to others. It - much like the concepts that it encapsulates - is a work in progress. Your thoughts are very welcomed!