Much of the time a dataset warehouses more information than you currently need. That’s why data are so often filtered, after all; and the decision to filter can cut back the data from one of two vantage points: you may need to filter the fields you need and shun the others, but at other times you may leave all the fields intact and apply the filter to selected records. Whichever route you take, the new CHOOSECOLS and CHOOSEROWS dynamic array functions will help point you in the right direction.
We’ve just learned about the VSTACK and HSTACK functions, the latter of which can pluck a subset of fields from a wider dataset and reap a new dataset from the harvest. CHOOSECOLS does something comparable, but with a few important differences that you’ll want to know about.
How CHOOSECOLS Is Written
=CHOOSECOLS(range/array,column_number1, column_number2…)
(The multiple column number selections in effect reiterate the same argument as needed.) CHOOSECOLS requires as its point of inception an existing range or array, from which you decide which fields – or columns – interest you. And those columns are referred to in straightforward numeric terms.
=CHOOSECOLS(All,1,3,5)
Again, the now-standard cautions about these spilled results need to be recalled: the output could do with some reformatting, and the source field headers are absent, befitting a spilled-range result. And again, if you reconstitute the original salesperson dataset into a table, any new data entry conducted there will automatically be shipped to the above CHOOSECOLS results; that is, the new data remain connected to the old.
Now in fact we carried out an exercise in the previous chapter that resembles the one above; there, too, we charged HSTACK with the task of separating several fields from the one dataset and repackaging them into a fresh one. But what’s different here is that CHOOSECOLS identifies the fields by their column or field numbers, while HSTACK requires a field name or set of range coordinates. And the reason why CHOOSECOLS can work with numbers – simpler references, after all – is because it works with fields belonging to the same dataset. Since CHOOSECOLS’ first argument asks for a dataset reference, it naturally knows to which number field(s) the ensuing number arguments refer. HSTACK, on the other hand, can amalgamate disconnected fields from anywhere in the workbook (or even beyond), and so must rely on specific field names or range coordinates.
=CHOOSECOLS(All,5,3,1)
=CHOOSECOLS(All,-1,-3)
=CHOOSECOLS(All,-2)
would extract the next-to-last field of a 279-field dataset, sparing you the chore of knowing to precisely request field number 278.
Keep in mind, however, that this kind of functionality can’t be likened to the significance of negative references in the TEXTBEFORE and TEXTAFTER functions; there, negative values identify different positions of the same delimiter in a cell in which it appears multiple times (e.g., spaces separating each word), and as such can return different numbers of words. Here, the negative numbers always tap a single field, as do the positive values.
Teaming with Other Functions
In any event, once you acquaint yourself with the field/column number rule (analogous to the sort index argument in the SORT function, which also requests the number of a column/field) you can combine CHOOSECOLS with other useful functions – but you have to proceed with care. This formula
=SORT(CHOOSECOLS(All,1,3,5),2)
=SORT(CHOOSECOLS(All,1,5,3),2)
=SORT(CHOOSECOLS(All,1,3,5),4)
will yield an error message – because CHOOSECOLS has identified only three columns.
=FILTER(CHOOSECOLS(All,1,3,5),Salesperson=I2)
The usual formatting issues aside, we see a curious mix and match at work: CHOOSECOLS employs its standard column-number references, while as the FILTER segment insists that we enter a field name – Salesperson – by which the output is filtered. Of course, we’re viewing a double filter, as it were – a filter of Peacock’s sales activity, filtered in turn by only three of the dataset’s fields. And of course, we’re filtering for salesperson, even though that field is nowhere to be found in the CHOOSECOLS segment of the formula.
CHOOSECOLS and Too Much Information
=CHOOSECOLS(VSTACK(Table1,Table2),1,2,3)
Now the above strategy works for a VSTACK in which you need to reduce the field count of one of the stacked datasets. If you need to head in the other direction – that is, you want to increase the field count of one of the datasets in order for it to stack properly with another dataset having more fields – that task awaits the discussion of the EXPAND function.
=FILTER(INDEX(All,SEQUENCE(ROWS(All)),{1,3,5}),Salesperson=I2)
Somehow, I suspect you’ll prefer the CHOOSECOLS alternative.
CHOOSEROWS: Record-Braking
As we indicated at the outset of the chapter, the two CHOOSE functions perform filtering operations of sorts: CHOOSECOLS excises fields from a dataset while retaining others, and CHOOSEROWS gives selected records the boot while holding on to a desired remainder.
CHOOSEROWS indeed filters records, but in a relatively coarse manner, if you need to filter all sales conducted in France, or learn which sales exceeded or equalled $2,000, then you’d turn to the FILTER function. CHOOSEROWS can’t apply detailed criteria, unlike FILTER’s sweeping capabilities; rather, it simply grabs the rows that you specify, which in turn may serve some broader purpose.
=CHOOSEROWS(range/array,row_number1, row_number2…)
=CHOOSEROWS(All,1,2,3,4,5)
(Again, you’ll be faced with the usual reformatting necessities.)
=CHOOSEROWS(All,22,23,24,25,26)
=CHOOSEROWS(All,17,9,24,31,11)
=CHOOSEROWS(All,1,2,3,4,5,6,7,8,…)
That is, do I have to key in every row reference from 1 to 100?
=CHOOSEROWS(All,SEQUENCE(100))
Our old friend SEQUENCE can supply all 100 row values. It’s a dynamic array formula, after all – with each of the SEQUENCE-spawned values from 1 to 100 choosing the dataset row bearing that number.
=CHOOSEROWS(All,-1)
will dredge the very last row in the dataset.
Where the Row Count Starts
=CHOOSEROWS(All,1,2,3,4,5)
attests to the first row in the dataset, irrespective of where the data actually start in the worksheet. 1 always signifies the row position of a dataset’s first record – here, the one immediately beneath the header row.
Unique Records in Non-adjacent Fields: Take 2
You may remember our discussion in the chapter devoted to the UNIQUE function about drawing unique records from the data in two non-adjacent fields, one that was hinged upon a fairly ornate application of the INDEX function. But CHOOSECOLS can essay the same task with a far simpler formula.
Here, the formula simply calculates every unique combination of the records drawn from the dataset’s first and third column – the ones containing Country and Salesperson. It’s so easy you’ll think you did something wrong.
Using CHOOSEROWS to Fill in the Blanks
Now you may remember a few of the exercises we described in the FILTER chapter aiming to filter a dataset rife with blank rows. Now let’s direct CHOOSEROWS to those data, and demonstrate how the function can easily eliminate all those rows and thus craft an eminently usable dataset (file: CHOOSEROWS – blank rows).
But first, we need to review the way in which CHOOSEROWS enumerates the rows with which it works. Remember that our dataset, bearing our usual default name of All and featuring 3788 rows, actually commences in row 2 (see the introductory notes in Chapter 3 under the “Field Notes: Field and Dataset Names” heading). And that row – row number 1 in the dataset, the one immediately beneath the header row – is blank. That means that the first data-bearing row in the dataset is, in the language of CHOOSEROWS, row 2.
=CHOOSEROWS(All,SEQUENCE(1894,,2,2))
How does this formula work? Its SEQUENCE element requests 1894 rows – half of the 3798 rows contributing to the dataset, and thus the number of rows containing data. It kicks off its sequence with the number 2 – again, representing the first row in the dataset featuring data – and increments each following value by a step of 2. Thus, CHOOSEROWS is presented with row values 2,4,6,8…3798, each corresponding to a row armed with data – and proceeds to choose precisely, and only, those rows.
If that looks good and you want the results to serve as your dataset for subsequent analysis, you can aim a Copy > Paste Values routine at the spill range – and remember, it is a spill range. Next, you can copy and paste the original headers above it all, and you’re ready to go – just range-name each field via Create from Selection.
Coming Next: Subjecting Rows and Columns to a Different TAKE
You’ll see that the next pair of new dynamic array functions we’ll examine bear a family resemblance to CHOOSECOLS and CHOOSEROWS, but they certainly don’t qualify as identical twins. They’re called TAKE and DROP, and you’ll see what they share in common with the two CHOOSE functions – and what they don’t.