© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
A. I. KatzUp Up and Array!https://doi.org/10.1007/978-1-4842-8966-2_15

15. TAKE and DROP: Selecting and Rejecting the Data

Abbott Ira Katz1  
(1)
Edgware, UK
 

Similar – but Different

As with CHOOSECOLS and CHOOSEROWS, the new dynamic array TAKE and DROP functions trim datasets by temporarily removing unwanted records and keeping others. But TAKE and DROP do the trimming in ways that are less flexible than the options offered by the two CHOOSE functions – but at the same time, we’ll see that they can carry out certain tasks more efficiently.

How They’re Written

TAKE and DROP literally proceed in different directions, a distinction that needs to be explained, but they’re written in similar ways. TAKE looks like this:
  • =TAKE(range/array,rows,columns)

(The rows and columns arguments are optional, though you need to supply at least one of them.)

Considering rows for starters, TAKE earns its name by returning, or taking, a specified number of rows from the uppermost rows in the range by default. Thus if we open the TAKE and DROP practice file consisting of the same names with which we worked in the TEXBEFORE/AFTER chapter (I’ve named the range here names) and write
  • =TAKE(names,6)

you’ll see the following in Figure 15-1.

A screenshot of an Excel sheet has 2 columns. The first column has entries in 10 rows, and the second column has entries in 6 rows with the formula TAKE on top. The first entry in column 2 is highlighted.

Figure 15-1

A little off the top: TAKE returns the first six names in the range

That’s pretty self-evident, and it discloses an essential premise of TAKE’s way of working: unlike CHOOSEROWS, the 6 identifies not a row number, but the number of cells to be taken from the top of the range. While CHOOSEROWS enables you to write this expression
  • =CHOOSEROWS(names,3,1,5)

which pulls rows 3, 1, and 5 from the dataset, you can’t issue a multi-row set of references with TAKE.

By the same token, the optional columns argument specifies by default the number of columns to be taken from the left of the dataset. Thus, by way of quick demo, this formula written to our trusty salesperson dataset:
  • =TAKE(all,8,3)

would lift the first eight rows of the data, along with their first three columns, per Figure 15-2.

A screenshot of an Excel sheet has 3 columns with entries in 8 rows and the formula TAKE on top. The first cell of column 1 is highlighted.

Figure 15-2

Eight down, three across: TAKE’s take-up of row and column data

This formula
  • =TAKE(all,,3)

would conscript all the rows, and the first three columns.

By default, then, TAKE can only haul off data from consecutive rows and/or columns emanating from the top rows of a dataset, and their leftmost columns. You can’t deploy TAKE to deliver non-consecutive rows or columns, as you can with CHOOSEROWS or CHOOSECOLUMNS; nor can you take data from a starting point midway into the dataset.

But those are the defaults. As with CHOOSEROWS and CHOOSECOLUMNS, TAKE and DROP will accept negative-numbered arguments. If you write
  • =TAKE(names,-3)

you’ll take the names occupying the last three rows of the dataset, as in Figure 15-3.

A screenshot of an Excel sheet has 2 columns. The first column has entries in 10 rows, and the second column has entries in 3 rows with the formula TAKE on top. The first entry in column 2 is highlighted.

Figure 15-3

Reversing one’s field: TAKE calls up the names from the bottom of the range with its negative argument

By the same token, referencing the column argument with a negative value will initiate a column grab from the rightmost column. But here, too, TAKE gathers data from the extremes – the very bottom of the data moving upwards, and the far-right columns proceeding towards the left.

DROP: Leaving the Data Behind

The DROP function is considerably less “intuitive” than TAKE, because it forces us to think about the data we don’t want to use. Befitting its name, DROP specifies the rows and columns that it wants to exclude from its results, even as its syntax virtually replicates that of TAKE:
  • =DROP(range/array,rows,columns)

Again, the rows and columns arguments are optional, though of course you need to select at least one of them.

For example, this formula written to the practice file
  • =DROP(names,3)

will ignore the names in the first three rows, per Figure 15-4.

A screenshot of an Excel sheet has 2 columns. The first column has entries in 10 rows, and the second column has entries in 7 rows with the formula DROP on top. The first cell in column 2 is highlighted.

Figure 15-4

DROP from the top: the first three rows are disinvited

And this formula, written to the salesperson data
  • =DROP(all,,4)

will return all their rows but will drop, or eliminate, the first four fields, returning the data from column 5 only. Again, the 4 doesn’t refer to column number four, but rather the number of columns to be ignored or dropped, starting from the left of the dataset.

Got that? And yes, DROP will implement negative numbers, too, the better to deepen your confusion. This formula
  • =DROP(names,-4)

will ignore, or drop, the last four names in the range. And that means that these formulas yield equivalent results, brought together in Figure 15-5.

A screenshot of an Excel sheet has 3 columns. The first column has entries in 10 rows. The second column has entries in 6 rows with the formula DROP on top. The third column has entries in 6 rows with the formula TAKE on top.

Figure 15-5

Either way, you end up with the same names: two formulas, one result

And once you recognize where DROP and TAKE’s respective routes take you, two realizations should emerge: (1) many outcomes orchestrated by TAKE can be achieved as well by DROP, once you understand how their arguments get turned inside out, and (2) because TAKE and DROP can only process contiguous rows, you’ll very often need to sort the dataset before you can proceed – because the records that you’ll want to take or drop have to be atop one another, and also must be situated at the very top or the bottom of the range before you.

Some Real-World Uses

One job that TAKE can accomplish most efficiently is a ranking of data, for example, a list of the ten largest sales in a dataset (file: TAKE – ranking data). And while as is often the case in Excel, alternative approaches to the problem might be on hand – via say, the FILTER function – the TAKE alternative is notably slimmer. Here’s the FILTER rendition of a top-10 calculation:
  • =SORT(FILTER(All,RANK(Order_Amount,Order_Amount)<=10),5,-1)

Now compare the above to the TAKE version and the comparison in Figure 15-6.
  • =TAKE(SORT(all,5,-1),10)

A screenshot of an Excel sheet has 2 parts. The first part has 5 columns with entries in 10 rows and the formula SORT, FILTER, and RANK on top. The second part has 5 columns with entries in 10 rows and the formula TAKE and SORT on top. The first cell of column 1 is highlighted in both parts.

Figure 15-6

TAKE’s top ten tops FILTER’s top ten

Here the TAKE formula on the right sorts, via the highest-to-lowest option (the -1 planted inside the SORT formula), the fifth column of the dataset – the one containing the sales order amounts. It next goes on to take the first ten rows of the sorted output, matching FILTER’s output with considerably less keystroking. (Note that this top ten is technically problematic; if two equivalent sales figures hold the tenth position, the TAKE formula depicted above will return only one of them.)

Now what if you wanted to confine the top-ten results to say, the salesperson name and order amount? That’s an assignment TAKE can’t fulfill on its own, because salesperson and order amount are aligned in the second and fifth columns of the dataset – and TAKE can only reap adjoining fields that occupy the left or right edges of the data. As a result, we need to recruit the more agile CHOOSEROWS to help us complete the task, depicted by Figure 15-7.

A screenshot of an Excel sheet has 2 columns with data in 10 rows and the formula TAKE, SORT, and CHOOSE COLS on top. The first cell of column 1 is highlighted.

Figure 15-7

Teamwork: TAKE and CHOOSECOLS deliver top 10 sales by salesperson

CHOOSECOLS specifies that only the second (Salesperson) and fifth (Order Amount) columns/fields are to be returned.

A Unique Exercise

You can also let TAKE streamline the answer to a spreadsheet challenge that’s been described by several spreadsheet experts on the web. The task: to generate a set of ten unique random numbers between say 1 and 50. While of course the RANDARRAY function can unleash thousands or even millions of random values upon your worksheet, it can’t guarantee that none of the values will repeat. But you can write this expression (try this on a blank worksheet):
  • =TAKE(SORTBY(SEQUENCE(50),RANDARRAY(50)),10)

And achieve something like Figure 15-8.

A screenshot of an Excel sheet has data in 10 rows with the formula TAKE, SORT BY, SEQUENCE, and RAND ARRAY on top. The first row is highlighted.

Figure 15-8

Random – and unique – values, each somewhere between 1 and 50

This formula sequences a set of values 1 through 50 (of course you can select any span you wish). It sorts these by a virtual, parallel set of 50 random numbers courtesy of RANDARRAY, and when the sort is executed TAKE simply goes ahead and lops off the first ten rows from the 50 results. Because the SEQUENCE function furnishes 50 consecutive values 1 through 50, these must be unique by definition; sorting them randomly then throws those values into some unpredictable order, from which TAKE grabs the first 10.

More Ranking – but by Percentages

In addition to combing the data for an absolute top ten of values, TAKE can drum up results that yield a given percent of the records. Remaining with the TAKE – ranking data practice file – suppose that we want to serve up the top 15% of sales orders, a requirement that forces us to deal with an immediate, prior question: since the dataset comprises 798 records 15% of which amounts to 119.7, then somewhere in the formula a rounding off of that value has to be effected.

With that little complication in mind let’s enter the following in H6:
  • =TAKE(SORT(all,5,-1),ROWS(all)*0.15)

That expression should yield (in excerpt) what you see in Figure 15-9.

A screenshot of an Excel sheet has 5 columns with data in 17 rows and the formula TAKE, SORT, and ROWS on top. The first cell of column 1 is highlighted.

Figure 15-9

The top 15% of sales, more or less

If you’ve been clicking – and counting – along, you’ve observed that our formula returns 119 records, a round-down of the decimal-punctuated 119.7. As for the formula itself, it sorts the order amounts highest-to-lowest and brings in the ROWS function to supply the rows argument that counts the number of rows in the order_amount field and multiplies that figure by .15. If you want to round 119.7 upwards, you can edit the formula to read
  • =TAKE(SORT(all,5,-1),ROUNDUP(ROWS(all)*0.15,0))

That tweak will lift 119.7 to the next integer – 120 – and rank that many records.

And for a more responsive, facile formula, you could of course post the desired percent to a cell instead, say H3, and modify the percent entry there as you wish (with or without ROUNDUP):
  • =TAKE(SORT(all,5,-1),ROWS(all)*H3)

Letting It DROP: Removing Lowest Grades

Suppose you’re a teacher who’s assigned eight tests to your students across the term. Since you want to do the right thing, you decide to compute each student’s average on the basis of their six highest scores, that is, you’re prepared to discard the lowest two.

That sounds like a job made for the DROP function, and it’s surely up to the task. Open the DROP – lowest test score practice file and try this formula in cell J2, for starters:
  • =DROP(SORT(B2:I2,,,1),,2)

Again, as per the standard DROP/TAKE strategy we’re subjecting the scores in the first row – the one housing Bill’s test scores – to a sort, this time in a columnar orientation as indicated by the 1 inside the SORT parentheses. We’re then commanding DROP to oust the scores in the first two columns of the spill range, as confirmed by the final 2 in the formula. You should see something like Figure 15-10.

A screenshot of an Excel sheet has 9 columns with data in 11 rows. The data entry for row 1 titled Bill has values in 14 columns with the formula DROP and SORT on the side. The ninth value is highlighted for Bill.

Figure 15-10

Bill’s in luck: his two lowest scores never happened

Note the DROP formula spills six results across the row, having jettisoned Bill’s two-lowest 30 and 39. Now we can wrap the standard AVERAGE function around DROP, culminating in a single-cell result flashed in Figure 15-11 and formatted as you wish:

A screenshot of an Excel sheet has 9 columns with data in 1 row. The average value for the entry Bill is highlighted in the tenth column. It has the formula AVERAGE, DROP, and SORT on the side.

Figure 15-11

Bill’s average for all of eight exams: 61.75

If that all looks good, we can simply copy the revised formula down the range for each student, as in Figure 15-12.

A screenshot of an Excel sheet has 9 columns with data in 11 rows. The entry titled Bill in the first row has the average value highlighted in the tenth column. The formula for AVERAGE, DROP, and SORT is given on the side. All the rows have the average value with the formula given.

Figure 15-12

Making the grade(s): six-pack of test scores

Up Next

The next, and final, second-generation dynamic array function we’ll review is one that at first blush seems slightly curious, one that seems to offer a kind of polar opposite to TAKE and DROP. While those functions narrow the data with which the user wants to work, the function that follows – EXPAND – can make room for data that’s yet to be entered. I told you it seems curious; so just turn the page.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.51.157