Performance

Broadly speaking, when the output of a cmdlet is piped to the input of another cmdlet, the first object is retrieved and will get processed. While this happens, the second object will be retrieved. This process continues until the flow of objects has stopped and there is nothing more to process. A clean-up task might occur.

Let's visualize this in a function that accepts pipeline input—more on that in Chapter 5, Writing Reusable Code. The following code sample is a very simple function accepting entire objects from the pipeline:

function UsesPipeline
{
param
(
[Parameter(ValueFromPipeline)]
[string]
$PipedObject
)

begin
{
# Optional - will be executed once before any object is retrieved from the pipeline
# Usually used to initialize things like service connections or variables
Write-Host -ForegroundColor Yellow 'Let the processing begin!'
$pipedObjectList = New-Object -TypeName 'System.Collections.Generic.List[string]'
}

process
{
# Mandatory - each object passing down the pipeline will go through the entire
# block once
# The more expensive the processing is here, the longer it will take!
$pipedObjectList.Add($PipedObject)
}

end
{
# Optional - Will be executed once after all objects have been retrieved from the pipeline.
# Clean-up tasks are usually placed here
Write-Host -ForegroundColor Yellow "We're done here..."
return $pipedObjectList
}
}

$null | UsesPipeline
$objects = 1..100000 | UsesPipeline

Without explaining too much of Chapter 5Writing Reusable Code, this simple function illustrates perfectly what happens behind the scenes. At the very least, the cmdlet that accepts pipeline input will invoke the Process named script block once for each item in the pipeline.

Optionally, begin and end blocks can fulfill the purpose of executing some initialization and clean-up tasks. When we now review the performance of Foreach-Object, take note of the parameters that Foreach-Object supports. Countless examples on the internet simply use the cmdlet without any named parameter:

# Notice the parameters Begin, Process and End - know them from somewhere?
Get-Command -Syntax -Name ForEach-Object

# What countless people write
Get-ChildItem | ForEach-Object {$_.BaseName}

# Even worse
Get-ChildItem | % {$_.BaseName}

# What it actually means
Get-ChildItem | ForEach-Object -Process {$_.BaseName}

# The begin and end blocks have the exact same purpose as they had in our function
Get-ChildItem |
ForEach-Object -Begin {
Write-Host 'Calculating hashes'
} -Process {
Get-FileHash -Path $_.FullName
} -End {
Write-Host 'Hashes returned'
}

The Process block is actually the mandatory parameter for Foreach-Object and is assigned through positional binding. With that in mind, let's examine the performance of Foreach-Object. We will use all available ways of iterating for each object of a collection. The first one will be the Foreach-Object cmdlet. Next, we will use the LINQ-like .ForEach() object method, and lastly we use the foreach statement.

Language-Integrated Query (LINQ) offers .NET developers a structured language to query objects for properties, execute loops, do conversions, filter datasets, and more.

For more information on LINQ (it is not required for PowerShell, however), see https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/.

The following example illustrates the performance difference of the Foreach-Object cmdlet, the ForEach method as well as the foreach language statement:

$inputObjects = 1..10

# Slowest
$startCmdlet = Get-Date
$folders = $inputObjects | Foreach-Object {'Folder{0:d2}' -f $_}
$endCmdlet = Get-Date

# Still slow
$startLinq = Get-Date
$folders = $inputObjects.ForEach( {'Folder{0:d2}' -f $_})
$endLinq = Get-Date

# Acceptable
$startConstruct = Get-Date
$folders = foreach ($i in $inputObjects)
{
'Folder{0:d2}' -f $i
}
$endConstruct = Get-Date

$timeCmdlet = ($endCmdlet - $startCmdlet).Ticks
$timeLinq = ($endLinq - $startLinq).Ticks
$timeConstruct = ($endConstruct - $startConstruct).Ticks

Write-Host ('foreach-Construct was {0:p} faster than the LINQ-like query, and {1:p} faster than Foreach-Object!' -f ($timeLinq / $timeConstruct), ($timeCmdlet / $timeConstruct))

By now, you know why Foreach-Object is on the slower side. Most of you probably have not yet seen the ForEach method that was introduced in PowerShell 4—so basically in the Middle Ages. The ForEach method bears similarities to LINQ and .NET and has pretty easy syntax as well. While this method is faster than Foreach-Object, you cannot top the foreach statement.

Using foreach has other benefits as well. You can use the loop labels break and continue in all PowerShell versions, whereas these labels will only work for Foreach-Object starting with PowerShell 5.

A similar observation applies to filtering data with Where-Object. Using Where-Object in scripts and functions without applying any thoughts regarding performance invites all kinds of performance issues, even for the most basic cmdlets. The following code sample illustrates the performance difference between the Where-Object cmdlet, the Where method and letting the cmdlet filter instead:

$startConstruct = Get-Date
$fastFiles = Get-ChildItem -Recurse -Path $env:SystemRoot -Filter *.dll -ErrorAction SilentlyContinue
$endConstruct = Get-Date

$startLinq = Get-Date
$mediumFiles = (Get-ChildItem -Recurse -Path $env:SystemRoot -ErrorAction SilentlyContinue).Where({$_.Extension -eq '.dll'})
$endLinq = Get-Date

$startCmdlet = Get-Date
$slowFiles = Get-ChildItem -Recurse -Path $env:SystemRoot -ErrorAction SilentlyContinue | Where-Object -Property Extension -eq .dll
$endCmdlet = Get-Date

$timeCmdlet = ($endCmdlet - $startCmdlet).Ticks
$timeLinq = ($endLinq - $startLinq).Ticks
$timeConstruct = ($endConstruct - $startConstruct).Ticks

Write-Host ('Where-Construct was {0:p}% faster than the LINQ-like query, and {1:p}% faster than Where-Object!' -f ($timeLinq / $timeConstruct), ($timeCmdlet / $timeConstruct))

This very simple cmdlet call recursively retrieves all DLLs in a folder. Filtering early gives a potentially massive performance boost. Where-Object waits until all objects have been retrieved, and the cmdlet passes them down the pipeline before doing any work. This is especially excruciating when you have introduced a typo in your filter script.

The following screenshot shows the result of both the loop and the filter comparison.

The effect is even more noticeable when you request data from providers such as the Active Directory, for instance. Compare Get-ADUser -Filter * to Select * from a SQL table. Are SQL administrators happy with Select *, or would they rather not want to use it?

Filter early and use the freed-up time to improve your code!
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.33.87