© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. I. KatzUp Up and Array!https://doi.org/10.1007/978-1-4842-8966-2_14

14. CHOOSECOLS and CHOOSEROWS: Less Is More

Abbott Ira Katz1  
(1)
Edgware, UK
 

Much of the time a dataset warehouses more information than you currently need. That’s why data are so often filtered, after all; and the decision to filter can cut back the data from one of two vantage points: you may need to filter the fields you need and shun the others, but at other times you may leave all the fields intact and apply the filter to selected records. Whichever route you take, the new CHOOSECOLS and CHOOSEROWS dynamic array functions will help point you in the right direction.

We’ve just learned about the VSTACK and HSTACK functions, the latter of which can pluck a subset of fields from a wider dataset and reap a new dataset from the harvest. CHOOSECOLS does something comparable, but with a few important differences that you’ll want to know about.

How CHOOSECOLS Is Written

Like its VSTACK and HSTACK cousins, CHOOSECOLS is easy to write, consisting of two essential arguments:
  • =CHOOSECOLS(range/array,column_number1, column_number2…)

(The multiple column number selections in effect reiterate the same argument as needed.) CHOOSECOLS requires as its point of inception an existing range or array, from which you decide which fields – or columns – interest you. And those columns are referred to in straightforward numeric terms.

For example, if we recycle our salesperson data here (file: CHOOSECOLS), we can write in G6
  • =CHOOSECOLS(All,1,3,5)

That formula asks CHOOSECOLS to prise columns 1, 3, and 5 from the dataset – that is, the Country, Order_Date, and Order_Amount fields, as evoked by Figure 14-1.

A screenshot of an excel sheet. It has 3 columns with entries in 14 rows and the formula CHOOSE COLS at the top. The first cell in column 1 is highlighted.

Figure 14-1

Three out of five ain’t bad: CHOOSECOLS selects fields from the salesperson data.

Again, the now-standard cautions about these spilled results need to be recalled: the output could do with some reformatting, and the source field headers are absent, befitting a spilled-range result. And again, if you reconstitute the original salesperson dataset into a table, any new data entry conducted there will automatically be shipped to the above CHOOSECOLS results; that is, the new data remain connected to the old.

Now in fact we carried out an exercise in the previous chapter that resembles the one above; there, too, we charged HSTACK with the task of separating several fields from the one dataset and repackaging them into a fresh one. But what’s different here is that CHOOSECOLS identifies the fields by their column or field numbers, while HSTACK requires a field name or set of range coordinates. And the reason why CHOOSECOLS can work with numbers – simpler references, after all – is because it works with fields belonging to the same dataset. Since CHOOSECOLS’ first argument asks for a dataset reference, it naturally knows to which number field(s) the ensuing number arguments refer. HSTACK, on the other hand, can amalgamate disconnected fields from anywhere in the workbook (or even beyond), and so must rely on specific field names or range coordinates.

And as with HSTACK, nothing prevents you from resequencing fields with CHOOSECOLS, for example,
  • =CHOOSECOLS(All,5,3,1)

And just for the record, CHOOSECOLS also allows you to earmark column numbers via negative references that point right to left. Thus
  • =CHOOSECOLS(All,-1,-3)

will sift and return Order_Amount and Order_Date from the dataset, and in that order. This option might conceivably be of use should you need to reference a field near the far end of a very extensive dataset comprising hundreds of fields. Thus, for example,
  • =CHOOSECOLS(All,-2)

would extract the next-to-last field of a 279-field dataset, sparing you the chore of knowing to precisely request field number 278.

Keep in mind, however, that this kind of functionality can’t be likened to the significance of negative references in the TEXTBEFORE and TEXTAFTER functions; there, negative values identify different positions of the same delimiter in a cell in which it appears multiple times (e.g., spaces separating each word), and as such can return different numbers of words. Here, the negative numbers always tap a single field, as do the positive values.

Teaming with Other Functions

In any event, once you acquaint yourself with the field/column number rule (analogous to the sort index argument in the SORT function, which also requests the number of a column/field) you can combine CHOOSECOLS with other useful functions – but you have to proceed with care. This formula

  • =SORT(CHOOSECOLS(All,1,3,5),2)

will in fact sort the three selected fields by the second field cited in the formula, that is, Order_Date. The 2 does not refer to Salesperson, which is positioned as the second field in the entire dataset. And so
  • =SORT(CHOOSECOLS(All,1,5,3),2)

will deliver a sort driven by the Order_Amount field, which is positioned in second place in the above formula. And this expression
  • =SORT(CHOOSECOLS(All,1,3,5),4)

will yield an error message – because CHOOSECOLS has identified only three columns.

On the other hand, CHOOSECOLS will work deftly with the FILTER function, and with great flexibility. To demonstrate, write the salesperson name Peacock in I2 and write in I4
  • =FILTER(CHOOSECOLS(All,1,3,5),Salesperson=I2)

You should see the following (in excerpt) via Figure 14-2.

A screenshot of an excel sheet has 3 columns with entries in 17 rows. The second row has the formula FILTER and CHOOSE COLS. The first cell in row 3 is highlighted.

Figure 14-2

Have it your way; choose your columns and your salesperson

The usual formatting issues aside, we see a curious mix and match at work: CHOOSECOLS employs its standard column-number references, while as the FILTER segment insists that we enter a field name – Salesperson – by which the output is filtered. Of course, we’re viewing a double filter, as it were – a filter of Peacock’s sales activity, filtered in turn by only three of the dataset’s fields. And of course, we’re filtering for salesperson, even though that field is nowhere to be found in the CHOOSECOLS segment of the formula.

And what this also means is that if we want to filter the data by say, Country – one of the fields that have been designated by CHOOSECOLS – we still need to write the formula this way, as in Figure 14-3.

A screenshot of an excel sheet has 3 columns with entries in 17 rows. The second row has the formula FILTER and CHOOSE COLS. The first cell with the entry France in row 3 is highlighted.

Figure 14-3

Vive la formula; filtering for France

CHOOSECOLS and Too Much Information

In the course of our review of VSTACK in the previous chapter, we encountered a pair of tables, one of which featured an extra field that was treated by VSTACK as follows, reprised in Figure 14-4.

A screenshot of an excel sheet has 3 parts. The first part has a table with 4 columns of name, I D, score, and subject with 4 rows of data. The second part has a table with 3 columns of name, I D, and score with 4 rows of data. The third part has 4 columns with entries for 8 rows and the formula V STACK on top. The last 4 rows of the last column read hashtag N slash A.

Figure 14-4

One field too many

We stated then that CHOOSECOLS can afford a swift way out of this dilemma, by enabling us to rewrite the above formula as
  • =CHOOSECOLS(VSTACK(Table1,Table2),1,2,3)

By restricting the two stacked tables to their first three fields only, the ones that they share, CHOOSECOLS rids the stack of the superfluous fourth field, per Figure 14-5.

A screenshot of an excel sheet has 3 parts. The first part has a table with 4 columns of name, I D, score, and subject with 4 rows of data. The second part has a table with 3 columns of name, I D, and score with 4 rows of data. The third part has 3 columns with entries for 8 rows and the formula CHOOSE COLS and V STACK on top.

Figure 14-5

Three-column structure, all lined up

Now the above strategy works for a VSTACK in which you need to reduce the field count of one of the stacked datasets. If you need to head in the other direction – that is, you want to increase the field count of one of the datasets in order for it to stack properly with another dataset having more fields – that task awaits the discussion of the EXPAND function.

In sum, what CHOOSECOLS brings to the process is a far easier way to tap into only those fields that interest you. As you may recall from our FILTER discussions, a pre-CHOOSECOLS take on the above formula might look like this:
  • =FILTER(INDEX(All,SEQUENCE(ROWS(All)),{1,3,5}),Salesperson=I2)

Somehow, I suspect you’ll prefer the CHOOSECOLS alternative.

CHOOSEROWS: Record-Braking

As we indicated at the outset of the chapter, the two CHOOSE functions perform filtering operations of sorts: CHOOSECOLS excises fields from a dataset while retaining others, and CHOOSEROWS gives selected records the boot while holding on to a desired remainder.

CHOOSEROWS indeed filters records, but in a relatively coarse manner, if you need to filter all sales conducted in France, or learn which sales exceeded or equalled $2,000, then you’d turn to the FILTER function. CHOOSEROWS can’t apply detailed criteria, unlike FILTER’s sweeping capabilities; rather, it simply grabs the rows that you specify, which in turn may serve some broader purpose.

To explain: CHOOSEROWS exhibits the same basic syntax as CHOOSECOLS:
  • =CHOOSEROWS(range/array,row_number1, row_number2…)

Thus, if you call up the CHOOSEROWS practice file and enter
  • =CHOOSEROWS(All,1,2,3,4,5)

you’ll return, in Figure 14-6.

A screenshot of an excel sheet has 5 columns with entries in 5 rows and the formula CHOOSE ROWS on top. The first cell of column 1 is selected and highlighted.

Figure 14-6

Take five; CHOOSEROWS calls up the first five rows in the dataset

(Again, you’ll be faced with the usual reformatting necessities.)

CHOOSEROWS grants you enormous discretion over which rows you admit to the formula, and in any sequence. These formulas are perfectly proper:
  • =CHOOSEROWS(All,22,23,24,25,26)

  • =CHOOSEROWS(All,17,9,24,31,11)

Now for an obvious follow-on question: suppose I want to choose the first 100 rows from a larger dataset. Must I write
  • =CHOOSEROWS(All,1,2,3,4,5,6,7,8,…)

That is, do I have to key in every row reference from 1 to 100?

I’m glad you asked. The answer: well, no, you don’t. You’ll be happy to know you can write
  • =CHOOSEROWS(All,SEQUENCE(100))

Our old friend SEQUENCE can supply all 100 row values. It’s a dynamic array formula, after all – with each of the SEQUENCE-spawned values from 1 to 100 choosing the dataset row bearing that number.

And yes, as with CHOOSECOLS, you can tell CHOOSEROWS to proceed with negative values. This formula
  • =CHOOSEROWS(All,-1)

will dredge the very last row in the dataset.

Where the Row Count Starts

Another important point: all the row entries in the above formulas denote a row’s relative position in the dataset, and not its absolute row address. Thus, the number 1 in this expression
  • =CHOOSEROWS(All,1,2,3,4,5)

attests to the first row in the dataset, irrespective of where the data actually start in the worksheet. 1 always signifies the row position of a dataset’s first record – here, the one immediately beneath the header row.

Unique Records in Non-adjacent Fields: Take 2

You may remember our discussion in the chapter devoted to the UNIQUE function about drawing unique records from the data in two non-adjacent fields, one that was hinged upon a fairly ornate application of the INDEX function. But CHOOSECOLS can essay the same task with a far simpler formula.

If you open the UNIQUE records in non-adjacent fields with CHOOSECOLS practice file, you’ll reacquaint yourself with these salesperson data, in which the Country and Salesperson fields aren’t contiguous (in excerpt, Figure 14-7).

A table has 15 rows and 5 columns. The column headers are country, order date, salesperson, order I D, and order amount. The values mentioned for the rows include France, Mexico, U K, Canada, and more.

Figure 14-7

Salespersons in far-off Countries

If we again want to develop a list of all the unique combinations of countries and salespersons, but propelled this time by CHOOSECOLS, the journey exhibits far fewer bumps, as witnessed in Figure 14-8 (in excerpt).

A table has 2 columns and 11 rows with the formula UNIQUE and CHOOSE COLS at the top. The columns are titled country and salesperson. The values mentioned for the rows include France, Mexico, U K, and more.

Figure 14-8

Bye, bye INDEX – CHOOSECOLS for a smoother ride

Here, the formula simply calculates every unique combination of the records drawn from the dataset’s first and third column – the ones containing Country and Salesperson. It’s so easy you’ll think you did something wrong.

Using CHOOSEROWS to Fill in the Blanks

Now you may remember a few of the exercises we described in the FILTER chapter aiming to filter a dataset rife with blank rows. Now let’s direct CHOOSEROWS to those data, and demonstrate how the function can easily eliminate all those rows and thus craft an eminently usable dataset (file: CHOOSEROWS – blank rows).

But first, we need to review the way in which CHOOSEROWS enumerates the rows with which it works. Remember that our dataset, bearing our usual default name of All and featuring 3788 rows, actually commences in row 2 (see the introductory notes in Chapter 3 under the “Field Notes: Field and Dataset Names” heading). And that row – row number 1 in the dataset, the one immediately beneath the header row – is blank. That means that the first data-bearing row in the dataset is, in the language of CHOOSEROWS, row 2.

Moreover, we know that every other row in the dataset is empty – that is, all the odd-numbered dataset rows, inaugurated by dataset row 1. With that information in tow, we can click into Sheet 1 of the practice file and write, in B2
  • =CHOOSEROWS(All,SEQUENCE(1894,,2,2))

An expression that yields the following (in excerpt) in Figure 14-9.

A table has 13 columns with entries in 14 rows and the formula CHOOSE ROWS and SEQUENCE at the top. Some of the rows have blank entries.

Figure 14-9

Nothin’ but data: blank rows removed

How does this formula work? Its SEQUENCE element requests 1894 rows – half of the 3798 rows contributing to the dataset, and thus the number of rows containing data. It kicks off its sequence with the number 2 – again, representing the first row in the dataset featuring data – and increments each following value by a step of 2. Thus, CHOOSEROWS is presented with row values 2,4,6,8…3798, each corresponding to a row armed with data – and proceeds to choose precisely, and only, those rows.

If that looks good and you want the results to serve as your dataset for subsequent analysis, you can aim a Copy > Paste Values routine at the spill range – and remember, it is a spill range. Next, you can copy and paste the original headers above it all, and you’re ready to go – just range-name each field via Create from Selection.

Coming Next: Subjecting Rows and Columns to a Different TAKE

You’ll see that the next pair of new dynamic array functions we’ll examine bear a family resemblance to CHOOSECOLS and CHOOSEROWS, but they certainly don’t qualify as identical twins. They’re called TAKE and DROP, and you’ll see what they share in common with the two CHOOSE functions – and what they don’t.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.111.208