When we tell people that Transdata is not built on top of an existing database, the standard response is an immediate: “Why not?”. If you are going to store, retrieve, and transform data, why would you go to the trouble of reinventing the wheel? After all, these days there are plenty of choices. Surely one of them would work, right?
The answer is twofold:
1. It presented an interesting challenge.
2. By starting from scratch, we could guarantee that we wouldn’t be held back by any of the inherent limitations of existing paradigms and that future innovation and improvement would be possible.
The first reason is largely what brought me to the project from my graduate work in artificial intelligence – clearly no small subject change. Getting to create a totally new system in a field which is not known for change promised to be fascinating. On a side note, I worked with graph-like data structures, so this isn’t really such a big leap. The second reason is the one that really matters to users. One of the main goals for Transdata, from the very beginning, was to make sure that the tool was flexible enough that it didn’t hinder the sort of tasks that data workers find themselves doing day in and day out. Things like combining disparate data sources and cleaning up messy or inconsistent data. The sorts of things that are painful (or impossible) to do in spreadsheets. Ragged data was a requirement from the start.
It wasn’t immediately obvious that we would start from scratch, but we quickly discovered that existing databases don’t offer the flexibility that we needed, or, even worse, they attempt to do everything. They would require so much abstraction between what the user was doing and the actual data manipulation that any efficiency would be completely lost. Whenever possible, I like
to write software so that what is happening under the hood is as close to what the user sees as possible. In Transdata, data is stored and moved around very much like in the flowchart you see in your model. Not only is that good from a UX standpoint, but it keeps me sane.
So, was it the right call? Well, we have yet to run into any unsolvable data issues, our codebase is comparatively small and manageable, we get great performance, and we’ve never had anything break due to a change in an external database. Works for me.
-Andy