Cleanup and Aggregation of QuickBooks Report Data
Posted on by Andy KerfontaCategories TRANSDATA ApplicationLeave a comment on Cleanup and Aggregation of QuickBooks Report Data

It’s almost a given that small to mid-sized businesses (SMBs) use QuickBooks to handle their accounting.  The combination of easy-to-use and power is hard to beat.

For some businesses, the built-in reports are all the information they ever need.  For others, additional analysis is needed to do things like aggregate based on item type, location, address, or customer.  One of our early users wanted to see the total amount of each product type they shipped to each state.  Nothing crazy, but not something that QuickBooks just gives you.

Doing this sort of analysis in Excel is certainly possible.  Generally, it involves some copying and pasting followed by a pivot table.  To see a great how-to for doing it with Excel, check out this blog by David Ringstrom: Analyze QuickBooks Data with Excel Pivot Tables

Let’s take a look at an example using TRANSDATA® Platform.  We’ll be working with the Profit and Loss Statement Detail report.  The Excel export you get out of QuickBooks is frequently not ideal for analysis.  There are extra columns and rows that are really there to make it look nice to a person.

Here is the QuickBooks export data as it appears:


Our first task then is to tidy up the data to make it more analysis-friendly.  In this case, we will simply get rid of the rows which contain subheadings.  This is easily done with a simple Filter Node in TRANSDATA® to check if the Type field is empty since subheading rows have no Type

Here are the results of that filter:


It looks a lot better, though we do end up with an extra column called “1”, but that’s not really an issue.  Since we’re going to aggregate this data, it won’t show up in the end result.  If you really hate it, you can always remove it (hint – Field Order Node).

Now that our data is cleaned up, let’s try a simple aggregation using an Aggregate Node.  We’ll combine records based on the Type field and sum the Amount field like so:


Here are the results showing the totals for each Type:


Much more useful already, isn’t it?


Now let’s try aggregating the data using multiple fields.  We’ll combine based unique combinations of Type, Name, and Class while still summing the Amount:


Here are the results of this new aggregation:


This is the kind of valuable business information that helps to quickly make decisions.  At a glance, we get the amount from each account.
Here is the completed model:


Just 6 nodes.  It’s available for download on our website.  Try it out with our free version and your own data, or use it as a starting point to build your own QuickBooks analysis model.

TRANSDATA® and Weather
Posted on by Andy KerfontaCategories TRANSDATA ApplicationLeave a comment on TRANSDATA® and Weather

Did you know that NOAA (the National Oceanic and Atmospheric Administration) makes ALL of their local weather data available for download?  It’s an astonishingly thorough dataset going back years and showing hundreds of stations with multiple sample intervals.

Through the National Centers for Environmental Information, you can download datasets for specific locations or the entire country with monthly, daily, and even hourly readings.  While reading up on the data online, I came across this question on Stack Exchange.  The issue is that the data has a field for WeatherType and one for HourlyPrecip, but sometimes, the first shows rain while the latter shows no precipitation.  You would think that this would be impossible.  As it turns out, this is not actually an error, but it illustrates a fairly general type of problem you may want to filter out of your data (i.e. apparent inconsistencies between variables).

I went with the hourly file for the entire country in November of 2016.  It weighs in at a modest 3,978,056 records, each with 44 fields.  If you’re following along, I’d recommend turning on caching.

In addition to the problem mentioned above, I wanted to check if we might be missing hourly reports from some of the stations.  If we are, there will be different numbers of records for each station’s unique ID (the WBAN field).

Without further ado, here’s the model:


The four output data nodes correspond to the answers we are seeking:

  • Output_1 – The number of reports per station
  • Output_2 – The station with the highest number of reports with location data merged onto the record
  • Output_3 – The average number of reports per station
  • Output_4 – Those reports which show rain but no hourly precipitation

Here is the data in Output_1, and right away we see there is an issue:


Every station has a different number of data points.  As it turns out, this is because “hourly” actually means “at least once per hour, maybe more”.  If we take a look at Output_3, we see that on average each station reports 1727 hourly readings (for November anyway).  Note that there are 720 hours in November.  Why all the extras?  A lot of the stations actually report every 15 (or fewer) minutes.  That wasn’t immediately obvious to me when I found the data.  Often, this sort of thing is assumed, and further analysis is done based on these (incorrect) assumptions.

Output_2 shows us the stations with the most reports (8053 of them):


By adding a simple filter to the original model, we can see that the station in Elkins reports every 5 minutes:


Now back to our original question: which data points show rain but no hourly precipitation?  Let’s take a look at Output_4:


We end up with 64,468 records that fit this description, and while we do know that in this case it’s not actually an error, this is the sort of error that is easy to overlook in data.  Maybe instead of rain information, the fields pertain to whether a particular object has been shipped and whether it arrived.  It’s easy to assume that these are always the same, but if that isn’t the case, incorrect and very costly business decisions can happen. -Andy

QuickBooks and Transdata
Posted on by Andy KerfontaCategories TRANSDATA ApplicationLeave a comment on QuickBooks and Transdata


Twenty years ago, I left a big company to work for myself providing consulting services to small business. The consulting company is called EmPower Resources, Inc., which provides management, financial and human resource services. The first thing I did when the company was organized was to purchase and set up QB. It is a good product that millions of companies and individuals use. It becomes even better when coupled with knowledge of bookkeeping and accounting.

In 1992 I received my MBA from James Madison University after four years of part-time study. I have worked with numerous accounting folks who did great jobs doing all of the detailed work required for corporate accounting. In 1996, I was on my own. I needed to take my MBA education and apply it to my own company. For the accounting issues, QB met the need. EmPower survives today, and has helped other companies become successful with most of them using QB.

Beyond the checking account and tax preparation value QB provides, small company owners are challenged to use all of the data (project, cost, revenue, profit, etc.) that QB accumulates. QB has numerous pre-made reports that can be generated, and users can create and memorize their own reports. The information can then be exported to xlsx or csv format. I’ll discuss issues with the xlsx format challenges in a future post. Expecting a small business owner to use all of the data in marketing, operations and financial aspects is daunting.

Using a few of the reports, I helped a two-year old company get back on track after I helped them start the company. Business seemed good, but they were short on cash. I found that they needed to raise prices as their margin had reduced after a few years of inflation. It isn’t that easy to raise prices, but we found a way to do that. I also showed them that their Accounts Receivable had started at 5 days and now was 45 days. One of the owners set collections as a much higher priority so cash could be generated.

So how does that owner use the data for maximum benefit? One way is for the owners to do the analysis themselves, and the TRANSDATA™ Platform is just what one needs for a simpler way to do analytics. Another way is to get accountants and consultants to do the analysis for them. It is costly and it also would likely be a slower iterative process.

I use the TRANSDATA™ Platform, and am always looking for new ways to “eat our own dogfood”. Please contact me at, for more information.

What is Ragged Data?
Posted on by Andy KerfontaCategories TRANSDATA ApplicationLeave a comment on What is Ragged Data?

What is ragged data (and why do we care)?
By Andy Kerfonta

Ragged data (also known as a ragged matrix, irregular data, non-rectangular data, or an irregular matrix) is nothing more than a dataset with a different number of columns on each row. Why is this useful? Let’s look at a quick scenario using some baseball data.

We’ll start with two files: Master.csv and Salaries.csv . Master.csv contains a list of all players with their personal information, one row per player. Salaries.csv contains a list of all recorded salaries, one row per salary, so each player may have many rows or none at all.

What happens if we want to merge (join) these two tables together? In traditional rectangular data, you either end up with duplicated players in a vertical table (one row per salary) or a lot of nulls in a more horizontal table (every player has the same number of salary fields – some unused). Here is what happens when Transdata is used to create ragged data:


What we end up with is something much more logical: each player gets a single record with all available salary data. We can then reference particular salary values or do operations over them all (like average or sum by record). It is even possible to do something simple yet useful such as finding the average of every player’s first salary.

So why is ragged data important? The most obvious answer is that it makes merging data a lot simpler and more intuitive. SQL JOINs are easily derailed by messy or inconsistent input data. Since ragged data by definition cannot depend on data shape, field names must be used. This means input data can be ordered differently or even be missing fields altogether without breaking merges. Data integrity is maintained.

Ragged data can also reduce or eliminate data transpositions (swapping of columns and rows). In SAS, “…it is more efficient to store your data in a vertical format and processing the data is easier in a horizontal format.” The same limitation holds true for most rectangular data software. By being able to merge and operate on ragged data efficiently, the number of steps required for even simple transformations can be significantly reduced, and with it, the time spent.