The new VSTACK and HSTACK dynamic array functions perform a kind of act of rescue for the user; they enormously simplify a time-worn data-management task that much of the time could in fact be largely avoided to begin with.
That task is the business of merging or consolidating data that’s been gathered into multiple datasets. A standard example: a freelancer records his income in a series of datasets, each one archiving one year’s worth of activity. In this stereotypical but utterly plausible scenario, our gig economist entrusts each year’s dataset to a new worksheet tab, begging the obvious analytical question: what does he do if he wants to bring all the data together, say for the purposes of tabulating aggregate earnings by client? Or alternatively, had he allocated a distinct dataset to each client, he’d be similarly challenged to compute earnings by year, or month, etc.
I’ve never been partial to Consolidate and its ungainly interface, but the larger design question is why a user would nowadays want to frame multiple datasets to begin with. By committing all the data to a solitary dataset instead and submitting them all to a pivot table, all sorts of aggregates and field breakouts become possible, and with a notable decrement in effort. Pivot tables are happy to enable year-by-year or client-by-client isolations of the data, and by doing so they stand the consolidation model on its head. After all – it’s easier to summon all the data in one place and break out the numbers from there, rather than strew the data across numerous datasets and then attempt to reassemble the numbers meaningfully, in Humpty-Dumpty style.
But that’s one person’s humble and assuredly unasked-for opinion, and my counsel notwithstanding, large numbers of users doubtless continue to practice the multi-dataset policy. And VSTACK and HSTACK have you in mind.
In effect, VSTACK and HSTACK are the new Consolidate. It allows the user, via a very simple formulaic solution, to leave multiple datasets in place while at the same time granting her license to perform a medley of analyses upon them. VSTACK and HSTACK let you have it both ways.
How VSTACK Is Written
=VSTACK(range1, range2…)
Again, Excel substitutes “array” for “range,” but either term conveys the idea. All you need to do is supply the coordinates of the ranges you wish to stack and VSTACK does the rest. But ease of execution notwithstanding, we’ll see that you do need to bring a strategy to your deployment of VSTACK and decide exactly what you want the function to do.
=VSTACK(D8:F11,J4:L8)
That worked, kind of, but something tells me you won’t be completely happy with what you see.
But the fact is that VSTACK has done exactly what we’ve told it to do. It’s stacked both of the ranges we’ve named, but since the ranges happen to contain the same header row – well, so be it.
=VSTACK(D8:F11,J5:L8)
Here, because UNIQUE acts upon records exhibiting three cells each, it eliminates one of the identical headers. (Note that while the UNIQUE remedy works for the above scenario, if some of the records are fully duplicated, all but one of them will be eliminated by UNIQUE as well. Thus, yet another alternative could be brought to bear on the problem: =VSTACK(D8:F11,DROP(J4:L8,1)). The DROP function parses a specified number of rows from a dataset – in the case of J4:J8, row 1 featuring its header – and will be discussed in Chapter 15.)
the “header” simply will surrender itself to the sort, unlike bona fide dataset headers that will be left in place.
And what this means, at least in part, is that if you want to treat your newly stacked data as a conventional dataset, you could simply run a Copy > Paste Values over the results, and you’re ready to go. Then your headers will remain in place.
But what if you want to VSTACK the ranges while at the same time reserving the right to add new records to any of the contributing ranges – and seeing to it that VSTACK will incorporate those new records on the fly?
=VSTACK(Table1, Table2)
The VSTACK formula now references two tables; and because it does, additional data entry to either table will immediately be reflected in the VSTACK output.
When the Data Aren’t There
We’ll learn one simple workaround for this dilemma in the next chapter on the new CHOOSECOLS and CHOOSEROWS functions, and an additional related discussion appears in the chapter on the EXPAND function.
Another Problem, and a Workaround
But a blank cell isn’t quite a zero. VSTACK misleadingly records Jill’s missing test score as 0, when in fact she may have simply missed the exam, and Mary’s 0 ID likewise doesn’t sound like a number she’d actually be assigned. And moreover, Jill’s errant 0 artificially drags the class average down.
You can in fact conceal any zeros scattered across a worksheet by clicking File ➤ Options ➤ Advanced ➤ Display Options for this Worksheet and ticking off Show a zero in cells that have zero value. But that decision merely obscures the zero from view, and won’t suppress its mathematical value. The cell will continue to evaluate to zero.
=IF(ISBLANK(VSTACK(Table1,Table2)),“”,VSTACK(Table1,Table2))
HSTACK: Not Just VSTACK at a Right Angle
=HSTACK(Range1, Range2…)
HSTACK aligns ranges alongside one another, and not atop them, unlike VSTACK. And as a result, HSTACK can perform a very useful data-organization service by extracting selected fields from a larger dataset, thus discarding fields that you decide are irrelevant.
=HSTACK(Country,Salesperson,Order_Amount)
(Again, you’d need to reformat the Order_Amount data suitably. Also, remember that the original field headers aren’t returned atop the results, nor do the original field range names somehow carry over to this new mini-dataset.) If you’re happy with the way the data look you could next perform a Copy > Paste Values over the data, add a header field, and proceed to treat the new output as a standard, slimmed-down dataset. In addition, recall as always that the HSTACK output is a dynamic array spill range, and so if you’ve written HSTACK to say, H6, you’d refer to the data in another formula as H6#.
Thus, we see that a bit of irony is at work here. While the two stack functions are meant to fuse independent ranges together into a larger dataset, they can also reduce the bulk of existing datasets by tossing out extraneous fields that you don’t need, stacking what remains. This capacity of HSTACK for paring larger datasets down to only the fields with which you want to work is a most worthwhile feature, in view of the fact that some datasets contain dozens of fields, many of which simply won’t have any relevance to your work with them.
And there’s more. If you redefine the original salesperson dataset as a table and continue to enter new records there, those entries will also automatically appear in the HSTACK-generated results too. Indeed – you can even write an HSTACK formula drawn from the salesperson dataset to a different workbook, and changes to the source dataset will still be reflected in the HSTACK results.
=HSTACK(Country:Order_Date)
Using HSTACK to Rearrange the Whole Dataset
And for a related option, if you wanted to edit the order in which all the above fields appear, you could write, for example,
=HSTACK(Salesperson,Country,Order_ID,Order_Date,Order_Amount)
Here HSTACK reassigns the positions of all the dataset fields – but again minus the field headers.
The INDIRECT function appropriates the text entry “art” and retools it into a reference to the range of the same name.
=name&“ ”&INDIRECT(K4)
And returned the data as text, thus mandating a round of hoop-jumping in order to disentangle the names from the scores and install the two sets of data into separate columns.
HSTACK and the One-Formula Name Split
=HSTACK(TEXTBEFORE(A1:A10,“ ”,-1),TEXTAFTER(A1:A10,“ ”,-1))
Of course, we could perform this same feat with many thousands of names, all submitting to that single formula.
Coming Up
While VSTACK and HSTACK can build larger data wholes by amalgamating discrete ranges, the next chapter describes a related pair of functions that modify the sizes of existing datasets through a kind of data shrinkage – CHOOSECOLS and CHOOSEROWS. See you soon.