Why Visual Data Manipulation?
Posted on by Andy KerfontaCategories UncategorizedLeave a comment on Why Visual Data Manipulation?

Visual Data Manipulation (VDM) differs from traditional methods such as spreadsheets or programming languages by allowing you to visually choose, configure, and combine transformations and calculations that act upon your data.  While spreadsheets certainly have a visual component, they focus on showing the data itself, rather than the operations.  Programming languages like Python, R, or Matlab hide even more, leaving you with just the input and output data as well as the code itself (if you are a programmer).

Simple for Beginners

Full-blown programming languages like R or Python are incredibly powerful for data work, but the critical drawback for all of them is that in order to use that power, you must be an experienced user.  That’s no problem if you have a team of programmers, but what about the actual data consumers who aren’t computer scientists?  They’re often stuck using spreadsheets because it’s just not practical to learn a programming language.

VDM splits apart the data and the operations so that you can focus on the flow of your data rather than on deciphering opaque algorithms or learning complicated syntax.  This makes designing your model much closer to drawing a simple flowchart — something programmers often do anyway prior to coding.  Not only is a flowchart natural and intuitive for most people, but it lets you build your model one piece at a time even if you aren’t certain of all the steps at the start.

Since VDM tools provide sets of nodes which each do a single, simple operation, building the steps in your flowchart only requires that you understand those simple operations.  From there, it’s just a question of connecting them together to get the result you need.  

Models Mirror Logic

The ability to logically create a model step by step the way you think about it is arguably the most important part of VDM.  It means that mistakes are less likely and easier to fix, the time required to get something done is reduced, and the learning curve is gentle.

If you’ve used spreadsheet for very long, you’re familiar with “Excel Hell.”  Workbooks become so complicated and patched together that deciphering logic, let alone errors, is next to impossible.  Tracing dependencies can take huge amounts of time since the only real way to do it is to trace each affected cell back to its source.  In VDM, connections between nodes show dependencies at a glance and they also show the data at that specific stage in the process.  If your result is wrong, it’s a simple matter of following the data until an error is uncovered.

It has been estimated that “…88 percent of all spreadsheets have errors in them, while 50 percent of spreadsheets used by large companies have material defects”(1)  and that one in five large companies have suffered financial losses due to errors in spreadsheets (2).

The Model as a Report

Once your flowchart-style model is built, it becomes both a fully functioning data process as well as a useful tool for explaining that process to others.  The report is intrinsic to the model.  Not only is the end result shown, but all of the steps and the results of those steps are clearly visible.

There is no substitute for traditional reporting – graphs, charts, dashboards, etc.  And while most VDM tools (Transdata included) can send data to a reporting tool for presentation, those tools have a critical flaw: they only show the end result dataset(s).  All of the steps to arrive at that result are lost, or at best, obscured.
With a standard dashboard, you don’t get much in the way of an explanation of where the data came from.  How did you arrive at the results shown?  What logic went into the process?  All too often, these explanations are left to a supplemental discussion, or, perhaps, a Powerpoint presentation.  VDM combines the data and the explanation in one intuitive model so that the process is the story.

 

  1. https://www.cnbc.com/id/100923538
  2. http://www.businessinsider.com/stupid-spreadsheet-mistakes-could-lead-to-corporate-disaster-2015-4
TRANSDATA UPDATES
Posted on by Andy KerfontaCategories UncategorizedLeave a comment on TRANSDATA UPDATES

This was quite a large release.  Not only were there a lot of visual and UX improvements, but we made some substantial under-the-hood improvements and added 6 functions.  Here are the highlights:

  • User-defined node and layer colors
  • User-defined layer names
  • New layer toolbox
  • Merge speed improvement (in many cases >10x)
  • Model notes
  • Input names automatically updated in formula(s) and other nodes when a node name is changed in model
  • Improvements to date / time handling
  • Solver improvements
  • New functions:
    • MAKERECTANGULAR
    • RECORDSTOFIELDS
    • FIELDSTORECORDS
    • REMOVEDUPLICATES
    • STDDEVRECORD
    • FIELDFROMUNIXTIME

Nodes and Layers

Node and layer colors

Transdata has had layers for a long time, but they were pretty simple.  They didn’t have names (just numbers) and there was no visual indication of what layer(s) a node might be in.  Not only did we address both of those things, but there’s now a handy toolbox that appears next to your model for toggling layers, changing their names, and changing their colors (the toolbox itself can be toggled from the View or Layers menu).

We didn’t stop at improving the look and feel of just the layers though.  You can now change the color of any node type.  There is also a new default color scheme to maximize the visibility of different node types.

In the screenshot below, you can see the colors and controls in action.  Layer colors alter the border of nodes while changing the node color changes the interior.  Note that layer colors and names are saved with each model, but node colors are specific to the Transdata installation.

Behind-the-Scenes Improvements

While they might not be as eye-catching as all the colors, the under-the-hood improvements in this release are quite significant.  Merging got a huge speed improvement for the vast majority of use cases.  You can expect speedups up to 10x for some models.  Date and time handling is also vastly improved thanks to some new libraries.

Changing node names was an easy way to break other nodes since the name changes weren’t automatically propagated throughout the model.  Now, however, formulas are automatically updated for you.

While most of the new functions are straightforward, MAKERECTANGULAR is a bit special.  It forces any sort of ragged (i.e. non-rectangular) data to be rectangular by making sure all fields are present in every record, including duplicates.  This was already done automatically when exporting to things like CSVs that don’t handle ragged data, but now you can explicitly include it in your model.