As with CHOOSECOLS and CHOOSEROWS, the new dynamic array TAKE and DROP functions trim datasets by temporarily removing unwanted records and keeping others. But TAKE and DROP do the trimming in ways that are less flexible than the options offered by the two CHOOSE functions – but at the same time, we’ll see that they can carry out certain tasks more efficiently.
How They’re Written
TAKE and DROP literally proceed in different directions, a distinction that needs to be explained, but they’re written in similar ways. TAKE looks like this:
=TAKE(range/array,rows,columns)
(The rows and columns arguments are optional, though you need to supply at least one of them.)
Considering rows for starters, TAKE earns its name by returning, or taking, a specified number of rows from the uppermost rows in the range by default. Thus if we open the TAKE and DROP practice file consisting of the same names with which we worked in the TEXBEFORE/AFTER chapter (I’ve named the range here names) and write
That’s pretty self-evident, and it discloses an essential premise of TAKE’s way of working: unlike CHOOSEROWS, the 6 identifies not a row number, but the number of cells to be taken from the top of the range. While CHOOSEROWS enables you to write this expression
=CHOOSEROWS(names,3,1,5)
which pulls rows 3, 1, and 5 from the dataset, you can’t issue a multi-row set of references with TAKE.
By the same token, the optional columns argument specifies by default the number of columns to be taken from the left of the dataset. Thus, by way of quick demo, this formula written to our trusty salesperson dataset:
=TAKE(all,8,3)
would lift the first eight rows of the data, along with their first three columns, per Figure 15-2.
This formula
=TAKE(all,,3)
would conscript all the rows, and the first three columns.
By default, then, TAKE can only haul off data from consecutive rows and/or columns emanating from the top rows of a dataset, and their leftmost columns. You can’t deploy TAKE to deliver non-consecutive rows or columns, as you can with CHOOSEROWS or CHOOSECOLUMNS; nor can you take data from a starting point midway into the dataset.
But those are the defaults. As with CHOOSEROWS and CHOOSECOLUMNS, TAKE and DROP will accept negative-numbered arguments. If you write
=TAKE(names,-3)
you’ll take the names occupying the last three rows of the dataset, as in Figure 15-3.
By the same token, referencing the column argument with a negative value will initiate a column grab from the rightmost column. But here, too, TAKE gathers data from the extremes – the very bottom of the data moving upwards, and the far-right columns proceeding towards the left.
DROP: Leaving the Data Behind
The DROP function is considerably less “intuitive” than TAKE, because it forces us to think about the data we don’t want to use. Befitting its name, DROP specifies the rows and columns that it wants to exclude from its results, even as its syntax virtually replicates that of TAKE:
=DROP(range/array,rows,columns)
Again, the rows and columns arguments are optional, though of course you need to select at least one of them.
For example, this formula written to the practice file
=DROP(names,3)
will ignore the names in the first three rows, per Figure 15-4.
And this formula, written to the salesperson data
=DROP(all,,4)
will return all their rows but will drop, or eliminate, the first four fields, returning the data from column 5 only. Again, the 4 doesn’t refer to column number four, but rather the number of columns to be ignored or dropped, starting from the left of the dataset.
Got that? And yes, DROP will implement negative numbers, too, the better to deepen your confusion. This formula
=DROP(names,-4)
will ignore, or drop, the last four names in the range. And that means that these formulas yield equivalent results, brought together in Figure 15-5.
And once you recognize where DROP and TAKE’s respective routes take you, two realizations should emerge: (1) many outcomes orchestrated by TAKE can be achieved as well by DROP, once you understand how their arguments get turned inside out, and (2) because TAKE and DROP can only process contiguous rows, you’ll very often need to sort the dataset before you can proceed – because the records that you’ll want to take or drop have to be atop one another, and also must be situated at the very top or the bottom of the range before you.
Some Real-World Uses
One job that TAKE can accomplish most efficiently is a ranking of data, for example, a list of the ten largest sales in a dataset (file: TAKE – ranking data). And while as is often the case in Excel, alternative approaches to the problem might be on hand – via say, the FILTER function – the TAKE alternative is notably slimmer. Here’s the FILTER rendition of a top-10 calculation:
Now compare the above to the TAKE version and the comparison in Figure 15-6.
=TAKE(SORT(all,5,-1),10)
Here the TAKE formula on the right sorts, via the highest-to-lowest option (the -1 planted inside the SORT formula), the fifth column of the dataset – the one containing the sales order amounts. It next goes on to take the first ten rows of the sorted output, matching FILTER’s output with considerably less keystroking. (Note that this top ten is technically problematic; if two equivalent sales figures hold the tenth position, the TAKE formula depicted above will return only one of them.)
Now what if you wanted to confine the top-ten results to say, the salesperson name and order amount? That’s an assignment TAKE can’t fulfill on its own, because salesperson and order amount are aligned in the second and fifth columns of the dataset – and TAKE can only reap adjoining fields that occupy the left or right edges of the data. As a result, we need to recruit the more agile CHOOSEROWS to help us complete the task, depicted by Figure 15-7.
CHOOSECOLS specifies that only the second (Salesperson) and fifth (Order Amount) columns/fields are to be returned.
A Unique Exercise
You can also let TAKE streamline the answer to a spreadsheet challenge that’s been described by several spreadsheet experts on the web. The task: to generate a set of ten uniquerandom numbers between say 1 and 50. While of course the RANDARRAY function can unleash thousands or even millions of random values upon your worksheet, it can’t guarantee that none of the values will repeat. But you can write this expression (try this on a blank worksheet):
This formula sequences a set of values 1 through 50 (of course you can select any span you wish). It sorts these by a virtual, parallel set of 50 random numbers courtesy of RANDARRAY, and when the sort is executed TAKE simply goes ahead and lops off the first ten rows from the 50 results. Because the SEQUENCE function furnishes 50 consecutive values 1 through 50, these must be unique by definition; sorting them randomly then throws those values into some unpredictable order, from which TAKE grabs the first 10.
More Ranking – but by Percentages
In addition to combing the data for an absolute top ten of values, TAKE can drum up results that yield a given percent of the records. Remaining with the TAKE – ranking data practice file – suppose that we want to serve up the top 15% of sales orders, a requirement that forces us to deal with an immediate, prior question: since the dataset comprises 798 records 15% of which amounts to 119.7, then somewhere in the formula a rounding off of that value has to be effected.
With that little complication in mind let’s enter the following in H6:
=TAKE(SORT(all,5,-1),ROWS(all)*0.15)
That expression should yield (in excerpt) what you see in Figure 15-9.
If you’ve been clicking – and counting – along, you’ve observed that our formula returns 119 records, a round-down of the decimal-punctuated 119.7. As for the formula itself, it sorts the order amounts highest-to-lowest and brings in the ROWS function to supply the rows argument that counts the number of rows in the order_amount field and multiplies that figure by .15. If you want to round 119.7 upwards, you can edit the formula to read
=TAKE(SORT(all,5,-1),ROUNDUP(ROWS(all)*0.15,0))
That tweak will lift 119.7 to the next integer – 120 – and rank that many records.
And for a more responsive, facile formula, you could of course post the desired percent to a cell instead, say H3, and modify the percent entry there as you wish (with or without ROUNDUP):
=TAKE(SORT(all,5,-1),ROWS(all)*H3)
Letting It DROP: Removing Lowest Grades
Suppose you’re a teacher who’s assigned eight tests to your students across the term. Since you want to do the right thing, you decide to compute each student’s average on the basis of their six highest scores, that is, you’re prepared to discard the lowest two.
That sounds like a job made for the DROP function, and it’s surely up to the task. Open the DROP – lowest test score practice file and try this formula in cell J2, for starters:
=DROP(SORT(B2:I2,,,1),,2)
Again, as per the standard DROP/TAKE strategy we’re subjecting the scores in the first row – the one housing Bill’s test scores – to a sort, this time in a columnar orientation as indicated by the 1 inside the SORT parentheses. We’re then commanding DROP to oust the scores in the first two columns of the spill range, as confirmed by the final 2 in the formula. You should see something like Figure 15-10.
Note the DROP formula spills six results across the row, having jettisoned Bill’s two-lowest 30 and 39. Now we can wrap the standard AVERAGE function around DROP, culminating in a single-cell result flashed in Figure 15-11 and formatted as you wish:
If that all looks good, we can simply copy the revised formula down the range for each student, as in Figure 15-12.
Up Next
The next, and final, second-generation dynamic array function we’ll review is one that at first blush seems slightly curious, one that seems to offer a kind of polar opposite to TAKE and DROP. While those functions narrow the data with which the user wants to work, the function that follows – EXPAND – can make room for data that’s yet to be entered. I told you it seems curious; so just turn the page.