© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. I. KatzUp Up and Array!https://doi.org/10.1007/978-1-4842-8966-2_13

13. VSTACK and HSTACK: What They’re About

Abbott Ira Katz1  
(1)
Edgware, UK
 

The new VSTACK and HSTACK dynamic array functions perform a kind of act of rescue for the user; they enormously simplify a time-worn data-management task that much of the time could in fact be largely avoided to begin with.

That task is the business of merging or consolidating data that’s been gathered into multiple datasets. A standard example: a freelancer records his income in a series of datasets, each one archiving one year’s worth of activity. In this stereotypical but utterly plausible scenario, our gig economist entrusts each year’s dataset to a new worksheet tab, begging the obvious analytical question: what does he do if he wants to bring all the data together, say for the purposes of tabulating aggregate earnings by client? Or alternatively, had he allocated a distinct dataset to each client, he’d be similarly challenged to compute earnings by year, or month, etc.

Those kinds of questions have often been put to Excel’s primal Consolidate feature, a click-intensive means for referring multiple datasets to this dialog box and realizing aggregate outcomes, as in Figure 13-1.

A desktop screenshot of the consolidate tab with options such as function, reference, all references, and checkboxes for the top row, left column, and create links. An o k button is on the bottom right.

Figure 13-1

Consolidate – old-school options for merging datasets

I’ve never been partial to Consolidate and its ungainly interface, but the larger design question is why a user would nowadays want to frame multiple datasets to begin with. By committing all the data to a solitary dataset instead and submitting them all to a pivot table, all sorts of aggregates and field breakouts become possible, and with a notable decrement in effort. Pivot tables are happy to enable year-by-year or client-by-client isolations of the data, and by doing so they stand the consolidation model on its head. After all – it’s easier to summon all the data in one place and break out the numbers from there, rather than strew the data across numerous datasets and then attempt to reassemble the numbers meaningfully, in Humpty-Dumpty style.

But that’s one person’s humble and assuredly unasked-for opinion, and my counsel notwithstanding, large numbers of users doubtless continue to practice the multi-dataset policy. And VSTACK and HSTACK have you in mind.

In effect, VSTACK and HSTACK are the new Consolidate. It allows the user, via a very simple formulaic solution, to leave multiple datasets in place while at the same time granting her license to perform a medley of analyses upon them. VSTACK and HSTACK let you have it both ways.

How VSTACK Is Written

VSTACK’s syntax is about as easy as it gets:
  • =VSTACK(range1, range2…)

Again, Excel substitutes “array” for “range,” but either term conveys the idea. All you need to do is supply the coordinates of the ranges you wish to stack and VSTACK does the rest. But ease of execution notwithstanding, we’ll see that you do need to bring a strategy to your deployment of VSTACK and decide exactly what you want the function to do.

Consider this simple starter example (file: VSTACK) of two diminutive datasets that have been posted to the same sheet in D8:F11 and J4:L8 for the ease of visual reference in Figure 13-2.

A table has 10 columns and 9 rows. The second, third, and fourth columns with rows 6 to 9, and the eighth, ninth, and tenth columns with rows 2 to 6 have data for name, I D, and score.

Figure 13-2

Two mini datasets in pre-stack mode

Here we could write
  • =VSTACK(D8:F11,J4:L8)

And we see the following, per Figure 13-3.

A table has 3 columns and 8 rows. The function involved is, equals v stack left parenthesis D 8 colon F 11, J 4 colon L 8 right parenthesis. The column headers are name, I D, and score. The name is highlighted.

Figure 13-3

The headers are stacked against us

That worked, kind of, but something tells me you won’t be completely happy with what you see.

But the fact is that VSTACK has done exactly what we’ve told it to do. It’s stacked both of the ranges we’ve named, but since the ranges happen to contain the same header row – well, so be it.

Thus we see that VSTACK isn’t “smart” – it won’t recognize a header row as such and grant it any special status; rather the function will simply deem every row of data as a row of the same species. And so, if you want these stacked ranges to sport but one header, you need to rewrite the formula to
  • =VSTACK(D8:F11,J5:L8)

Since VSTACK piles the ranges in the order in which they appear in the formula, and since a second header row had been posted to J4, that row needs to be written out, as we’ve done in Figure 13-4.

A table has 4 columns and 7 rows. The function involved is, equals v stack left parenthesis D 8 colon F 11, J 5 colon L 8 right parenthesis. The column headers are name, I D, and score. The name is highlighted.

Figure 13-4

One row purged, two ranges merged

Another interesting way to stack the ranges while retaining just one header row would be to write the following, as in Figure 13-5.

A table has 3 columns and 7 rows. The function involved is, equals unique left parenthesis v stack left parenthesis D 8 colon F 11, J 4 colon L 8 close all parenthesis. Column headers are name, I D, and score. The name is highlighted.

Figure 13-5

Avoiding duplication: another way to oust one header row

Here, because UNIQUE acts upon records exhibiting three cells each, it eliminates one of the identical headers. (Note that while the UNIQUE remedy works for the above scenario, if some of the records are fully duplicated, all but one of them will be eliminated by UNIQUE as well. Thus, yet another alternative could be brought to bear on the problem: =VSTACK(D8:F11,DROP(J4:L8,1)). The DROP function parses a specified number of rows from a dataset – in the case of J4:J8, row 1 featuring its header – and will be discussed in Chapter 15.)

But even if you banish one of the headers, VSTACK will continue to regard the remaining one as just another row. If you write the following, per Figure 13-6,

A table has 3 columns and 8 rows. The function involved is, sort left parenthesis v stack left parenthesis D 8 colon F 11, J 5 colon L 8 close all parentheses. Column headers are name, I D, and the score is in row 7. Amy is highlighted.

Figure 13-6

Don’t lose your header

the “header” simply will surrender itself to the sort, unlike bona fide dataset headers that will be left in place.

And what this means, at least in part, is that if you want to treat your newly stacked data as a conventional dataset, you could simply run a Copy > Paste Values over the results, and you’re ready to go. Then your headers will remain in place.

But what if you want to VSTACK the ranges while at the same time reserving the right to add new records to any of the contributing ranges – and seeing to it that VSTACK will incorporate those new records on the fly?

Not a problem. Just turn each range into a table, as you see in Figure 13-7.

A table has 2 tables on the top right with 3 columns and 5 rows and the bottom left with 3 columns and 4 rows. The column headers are name, I D, and score with drop-down buttons.

Figure 13-7

Turning the tables on the ranges

And once that deed is done, write
  • =VSTACK(Table1, Table2)

And you’ll see the following, as in Figure 13-8.

A table has 2 tables on the top right with 3 columns and 5 rows, and the center-left with 3 columns and 4 rows. The column headers are name, I D, and score. A merged list of table 1 and table 2 is on the bottom right.

Figure 13-8

VSTACK going live – any new records entered in either table will appear in the stacked result

The VSTACK formula now references two tables; and because it does, additional data entry to either table will immediately be reflected in the VSTACK output.

Note, by the way, that if I introduce a new record to Table1 – the one whose first entry features the name Ted – that record will insert itself between the name Mary – the last entry in Table1 – and the name Jack, the first record of Table2, illustrated in Figure 13-9.

A table has 2 tables on the top right with 3 columns and 5 rows, and the center-left with 3 columns and 5 rows. The column headers are name, I D, and score. A merged list of table 1 and table 2 is on the bottom right.

Figure 13-9

Middle of the pack: the new record Dan from Table1 remains among the Table1 data

That is, Dan won’t seek the next available row in the stack; instead, it continues to cast its lot with the other data of Table1. On the other hand, of course, the new records could be sorted, thus overriding their initial positions, per Figure 13-10.

A table has 3 columns and 8 rows. The function involved is, equals SORT left parenthesis v stack left parenthesis table 1, table 2 close all parenthesis. The name Amy in column 1, row 1 is highlighted.

Figure 13-10

All together now – data from both tables are sorted together

When the Data Aren’t There

And you’ve probably noted that when tables are melded by VSTACK, their header rows simply disappear, much as headers vanish from FILTER results. Moreover, if one of the tables featuring in a VSTACK numbers more columns/fields than the others, that discrepancy is inflicted upon the result, as noted by Figure 13-11.

A table has 2 tables on the top right with 3 columns and 5 rows, and the center-left with 4 columns and 5 rows. The column headers are name, I D, score, and subject. A merged list of table 1 and table 2 is on the bottom right.

Figure 13-11

More than you bargained for; an extra field in one table carries over into VSTACK’s results

We’ll learn one simple workaround for this dilemma in the next chapter on the new CHOOSECOLS and CHOOSEROWS functions, and an additional related discussion appears in the chapter on the EXPAND function.

Another Problem, and a Workaround

And there’s another subtle VSTACK complication that could pay an unwanted visit to your formula. If any data in the contributing tables happen to be missing – a state of affairs known to befall real-world spreadsheets – those vacant cells show up in a VSTACK as zeros, as in Figure 13-12.

A table has 2 tables on the top center with 3 columns and 5 rows, and the center-left with 3 columns and 5 rows. The column headers are name, I D, and score. A merged list of table 1 and table 2 is on the bottom right.

Figure 13-12

Zeroing in on blank data

But a blank cell isn’t quite a zero. VSTACK misleadingly records Jill’s missing test score as 0, when in fact she may have simply missed the exam, and Mary’s 0 ID likewise doesn’t sound like a number she’d actually be assigned. And moreover, Jill’s errant 0 artificially drags the class average down.

Note

You can in fact conceal any zeros scattered across a worksheet by clicking File ➤ Options ➤ Advanced ➤ Display Options for this Worksheet and ticking off Show a zero in cells that have zero value. But that decision merely obscures the zero from view, and won’t suppress its mathematical value. The cell will continue to evaluate to zero.

But help is available. Those pesky zeros can be shown the door with this update to the formula:
  • =IF(ISBLANK(VSTACK(Table1,Table2)),“”,VSTACK(Table1,Table2))

The formula searches the data in both tables for truly blank cells – which again Excel evaluates as zeroes. If it finds any such cells, it performs a kind of redundant substitution; it replaces the blanks with “” – the character expression for a blank, which won’t be treated as zero. As a result, the formula yields the following in Figure 13-13.

A table has 7 columns and 10 rows. The function involved is equal if left parenthesis v stack left parenthesis table 1, table 2 right parenthesis, equals 0, double quotation, v stack open parenthesis table 1, table 2 close all parentheses.

Figure 13-13

Let’s schedule a makeup exam for Jill

HSTACK: Not Just VSTACK at a Right Angle

Because datasets treat new fields in columns differently from new records in rows, the HSTACK function performs a different set of labors in the data-stacking process, even though it’s written similarly:
  • =HSTACK(Range1, Range2…)

HSTACK aligns ranges alongside one another, and not atop them, unlike VSTACK. And as a result, HSTACK can perform a very useful data-organization service by extracting selected fields from a larger dataset, thus discarding fields that you decide are irrelevant.

Here’s what that means: If we return to that familiar collection of sales data (file: HSTACK), with each field again named after its header brought back to in Figure 13-14,

A table has 5 columns and 15 rows. The column headers are country, salesperson, order date, order I D, and order amount. All the cells are filled with data.

Figure 13-14

More of the same: sales data ripe for HSTACKing

perhaps we’re only interested in the data gleaned from the Country, Salesperson, and Order_Amount fields. If so, we could write
  • =HSTACK(Country,Salesperson,Order_Amount)

And that expression would yield the following, in excerpt, per Figure 13-15.

A table has 4 columns and 15 rows. The function involved is, equals H stack left parenthesis country, salesperson, order underscore amount right parenthesis. France in column 1, row 1 is highlighted.

Figure 13-15

Dataset made to order: three fields selected from the original five

(Again, you’d need to reformat the Order_Amount data suitably. Also, remember that the original field headers aren’t returned atop the results, nor do the original field range names somehow carry over to this new mini-dataset.) If you’re happy with the way the data look you could next perform a Copy > Paste Values over the data, add a header field, and proceed to treat the new output as a standard, slimmed-down dataset. In addition, recall as always that the HSTACK output is a dynamic array spill range, and so if you’ve written HSTACK to say, H6, you’d refer to the data in another formula as H6#.

Thus, we see that a bit of irony is at work here. While the two stack functions are meant to fuse independent ranges together into a larger dataset, they can also reduce the bulk of existing datasets by tossing out extraneous fields that you don’t need, stacking what remains. This capacity of HSTACK for paring larger datasets down to only the fields with which you want to work is a most worthwhile feature, in view of the fact that some datasets contain dozens of fields, many of which simply won’t have any relevance to your work with them.

And there’s more. If you redefine the original salesperson dataset as a table and continue to enter new records there, those entries will also automatically appear in the HSTACK-generated results too. Indeed – you can even write an HSTACK formula drawn from the salesperson dataset to a different workbook, and changes to the source dataset will still be reflected in the HSTACK results.

And one more formula option: if you want to apply HSTACK to a subset of fields from the same dataset and those fields happen to be adjoining, for example, Country, Salesperson, and Order Date, you can also write
  • =HSTACK(Country:Order_Date)

Using HSTACK to Rearrange the Whole Dataset

And for a related option, if you wanted to edit the order in which all the above fields appear, you could write, for example,

  • =HSTACK(Salesperson,Country,Order_ID,Order_Date,Order_Amount)

Here HSTACK reassigns the positions of all the dataset fields – but again minus the field headers.

And for another similar but not identical example, if we’re working with our earlier test-score dataset (file: HSTACKstudent grades), each of whose fields carries the name of its header, reviewed by Figure 13-16,

A table has 7 columns and 10 rows. The column headers are name, s o c, Phil, poli sci, art, physics, and chem. All the cells are filled with data.

Figure 13-16

Exam review, this time with HSTACK

we could enter a subject name in say, K4, and write the following, as captured by Figure 13-17.

A table has 2 columns and 13 rows. The function involved is, equals H stack left parenthesis name, indirect left parenthesis K 4 close all parentheses. The name Bill in column 1, row 4 is highlighted.

Figure 13-17

One formula yielding two independent fields

The INDIRECT function appropriates the text entry “art” and retools it into a reference to the range of the same name.

HSTACK’s advantage here is its facility for returning two (or more) distinct fields with one formula, but at the same time allowing each field to occupy its own range of cells. By contrast, a string concatenation formula attempting the same thing would have collapsed the fields into one range, for example,
  • =name&“ ”&INDIRECT(K4)

And returned the data as text, thus mandating a round of hoop-jumping in order to disentangle the names from the scores and install the two sets of data into separate columns.

HSTACK and the One-Formula Name Split

Now if we make our way back to that batch of names with which we earlier put out TEXTBEFORE and TEXTAFTER through their pace (here, the file: HSTACK – separating names), we can direct HSTACK to a similar name-splitting task, but this time accomplishing the deed with a single formula. Say we want to engineer a name split such that all middle names join with the first names; if so, we can write in B1
  • =HSTACK(TEXTBEFORE(A1:A10,“ ”,-1),TEXTAFTER(A1:A10,“ ”,-1))

The first half of the formula searches for all the names preceding the last space in the cell, keyed to the negative number that inaugurates the search in a right-to-left direction; the second segment looks out for all names that follow the cell’s last space, likewise hinging its search on a negative number. HSTACK then welds both expressions into one larger formula, yielding the following, in Figure 13-18.

A table has 9 columns and 10 rows. The formula is written in the first row. The name Angelina is highlighted. First, second, and third columns are filled with names.

Figure 13-18

Plenty of names, one formula

Of course, we could perform this same feat with many thousands of names, all submitting to that single formula.

Coming Up

While VSTACK and HSTACK can build larger data wholes by amalgamating discrete ranges, the next chapter describes a related pair of functions that modify the sizes of existing datasets through a kind of data shrinkage – CHOOSECOLS and CHOOSEROWS. See you soon.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.37.20