Your dataset may be an embarrassment of riches, but sometimes more is less. Every spreadsheet user knows that their data often warehouses more information than they need right now, and could do with a bit of incisive, if temporary, trimming.
And Excel’s new dynamic array FILTER function is very much up to that task. And as with SORT and SORTBY, FILTER contributes a set of formula-centered solutions to a classic spreadsheet chore that’s been handled heretofore by a potpourri of tools, for example, pivot table filters and slicers, the Advanced Filter option, and the standard-issue Filter you see atop tables and the like.
They all have their place, of course, but because FILTER does its thing with formulas, a vast catalog of criteria by which the data can be summoned becomes available, making FILTER a most powerful and exceedingly agile means for getting what you want from the data. FILTER is the dynamic array’s Swiss army knife – it can slice and dice the data in so many ways.
The Formula
=FILTER(range/array,include,if_empty)
The first argument, again officially dubbed “array” by Excel, calls for the dataset you want to filter, and of course accepts range or table names in lieu of cell coordinates.
It’s the second argument – “include” – from which FILTER’s power radiates. Start with an elementary filter (file: FILTER, adapted from fellow Apress author Debra Dalgliesh’s Contexture compendium of free downloads, Again, the dataset in its entirety is called All).
=FILTER(All,City=L1)
(Again, as with SORT, the resulting spill range evinces basic, default formatting that you’d almost surely want to modify; and again, the results are header-free. That row could be copied and pasted atop the spill range.)
City=L1
=FILTER(All,Category=L1)
because Cookies is an item populating that field.
=IF(Category=L1…
Now About That [If Empty] Argument
=FILTER(All,Category=L1,”Not found”)
That caption will replace the error message.
Multiple Filter Criteria: Getting More (or Less) Out of the Data
=FILTER(All,(Category=L1)*(City=M1),“Not found”)
(1st criterion)*(2nd criterion)*(3rd criterion)
(In fact, the asterisk – Excel’s multiplier operator – is actually performing an act of multiplication inside the formula, too, but we can skim past the details of that maneuver here.)
OR Consider This Alternative
=FILTER(All,(City=L1)+(City=M1),“Not found”)
What clearly distinguishes the multi-criteria “or” filter from an “and” is a swap of the asterisk for the plus sign – nothing more.
Mix and Match: AND and OR Together
Next question: what if you want to filter for all the Cookies sales in both Los Angeles and San Diego? That’s a tasty problem, to be sure, requisitioning as it does both “and” as well as “or” criteria. Here, I want the filter to detect sales compiled in either Los Angeles or San Diego – but only of Cookies. How is this one written?
=FILTER(All,((City=L1)+(City=M1))*(Category=N1),“Not found”)
(City=M1))*(Category=N1)
because those pairings always call upon criteria 2 and 3, as they’re adjacent. But the Los Angeles records aren’t invariably matched with Cookies, because that city occupies criterion 1, and isn’t directly associated with criterion 3, the Category reference.)
Now let’s return to our inaugural FILTER formula, the one in which we returned the sales data for Boston. By definition, then, all our results there must disclose that city’s name in the City field – and precisely because that field will contain nothing but Boston, we don’t really need all that redundant information. As a result, and for good presentational reasons, we might want to eject the City field from this filter – and in order to make good on that intention we can reprise the INDEX formula we dusted off in Chapter 5, in the course of our perusal of the UNIQUE function.
Here, our formula reads
=FILTER(INDEX(All,SEQUENCE(ROWS(All)),{1,2,4,5,6,7,8}),City=M1,”Not found”)
Don’t be fooled or confused; in spite of its density, the filter here consists of the same three arguments with which we’ve been working all along. The INDEX formula is assembling an improvised dataset, one appropriating the All minus the City field – the third column in All, thus explaining the missing 3 in the bracketed array. Note in addition the SEQUENCE function here that’s performing the same work it did in the UNIQUE formula – it’s counting, and hence including, all of the rows in the All dataset.
All those Bostons are gone; and for good measure, we could have zapped the Region field, too – the dataset’s second field – from the formula, by excising the 2 from the bracketed array – because, after all, Boston is always in the East.
Our discussion of the new dynamic array function DROP in a later chapter will describe a considerably more elegant way in which this result can be achieved
How About Filtering Part of a Cell?
Thus far, we’ve learned about filtering fields whose data consists of single words, for example, Cookies, or Boston. But what about cells consisting of what’s typically called unstructured text – that is, multi-word content taking up residence in a single cell address? Can FILTER help us to unearth a keyword, or selected words from the larger text sitting in a cell?
Well, the answer is a decisive yes. FILTER can masterfully extract partial text excerpts from a range of cells, and for a case in point break open the FILTER – text search file, an adaptation of a spreadsheet that enumerates the current members of the House of Representatives (source: Everypolitican (https://everypolitician.org/united-states-of-america/house/download.html)).
Suppose for starters we want to filter the name field for all Representatives named John. The problem, of course, is that all the names are bunched into single cells – first, last, and middle names – and we need to somehow uproot “John” from the lengthier entries into which the name has been gathered.
=SEARCH(find_text,within_text,start_number)
The start number argument is optional.
=SEARCH(A2,A6)
will pump out 10. If the searched text isn’t there, a #VALUE! error message will stake the cell instead. Remember, of course, that we’re about to apply SEARCH to an entire range of names, but we’re in dynamic array mode, and that’s something we can do.
=FILTER(name,ISNUMBER(SEARCH(B2,name)),“Not found”)
The ISNUMBER function wrapping itself around SEARCH tests a value or formula for its numeric status; thus if a search of each cell in the name range yields a number – meaning that the search term has been found – that cell will instate itself into the filtered result.
That works – sort of. But we see that our filter has garnered a trio of Johnsons – which also contain the text John – that we probably didn’t have in mind. A workaround: enter in B2 instead
John[space]
Thus we see that FILTER does a superior job of performing text/keyword searches of free, or unstructured, text – a most flexible and dynamic capability.
Some approaches to dividing last and first names into distinct cells will be explored when we review Excel’s newest dynamic array functions TEXTBEFORE and TEXTAFTER.
There’s More
=FILTER(name,LEFT(name,1)=B2)
=COUNTA(B4#)
to deliver a total of members of Congress with names beginning with the designated letter.
FILTER’s Number Wizardry Adds Values to Your Values
And of course, FILTER is at your service should you need to count the numbers. By way of a simple example, let’s call up a familiar-looking dataset, here stored in the FILTER: Values file.
=FILTER(All,Order_Amount>=H3)
(Remember that the greater and less-than operators always precede the = sign in Excel formulas.)
Now for another value-driven task, what if you need to study all the sales conducted, say, between January 1, 2020, and June 30 of that year – that is, a 6-month summary?
=FILTER(all,(Order_Date>=H3)*(Order_Date<=I3))
You can also enter date criteria via this slightly more ponderous method, for example, =FILTER(All,Order_Date=VALUE(“6/10/2013”))
No More Blank Looks About Blank Rows
If you’re looking for a data-quality conundrum that can make the day of any spreadsheet analyst big time, start with blank rows. These nullities are the bane of conventional sorting, pivot tables, and traditional filters, but once you get with the FILTER program you can actually do real work – even if you leave those blanks in place.
By way of example, here’s a modification of a spreadsheet (File: Filter – working with blank rows) released by a New York City agency (for the original file go here https://media.githubusercontent.com/media/thecityny/housing-lottery-data/master/housing-lotteries.csv)), one that itemizes public housing apartments offered up via a lottery. Its blank rows as you see them, however, came that way in the original sheet. So what can we do? (As usual, the dataset is entitled All, and the field names derive from the header row.)
=FILTER(All,Rent<=B3)
And we get the following, in excerpt, in Figure 7-11.
=FILTER(All,(Rent<=B3)*(Rent<>0))
Coming Next
Sure, filtering a dataset half of whose rows contain absolutely nothing in them sounds a touch gimmicky, but you get the idea – and besides, we’ve been working here with an actual spreadsheet. The FILTER function can crunch a subset of your data that meets just about any specification, and the more specifications you can imagine, the more powerful the function. Next up is a very different dynamic array function that performs a very different set of tasks – RANDARRAY.