When administering a system, you naturally spend a significant amount of time working with the files on that system. Many of the things you want to do with these files are simple: get their content, search them for a pattern, or replace text inside them.
For even these simple operations, PowerShell’s object-oriented flavor adds several unique and powerful twists.
Provide the filename as an argument to the
Get-Content
cmdlet:
PS > $content = Get-Content c: empfile.txt
Place the filename in a ${}
section to use the cmdlet Get-Content
variable syntax:
PS > $content = ${c: empfile.txt}
Provide the filename as an argument to the
ReadAllText()
method to
use the System.IO.File
class from the .NET
Framework:
PS > $content = [System.IO.File]::ReadAllText("c: empfile.txt")
PowerShell offers three primary ways to get
the content of a file. The first is the Get-Content
cmdlet—the cmdlet designed for
this purpose. In fact, the Get-Content
cmdlet works on any PowerShell
drive that supports the concept of items with content. This includes
Alias:
, Function:
, and more. The
second and third ways are the Get-
Content
variable syntax and the ReadAllText()
method.
When working against files, the Get-Content
cmdlet returns the content of the
file line by line. When it does this, PowerShell supplies additional
information about that output line. This information, which PowerShell
attaches as properties to each output line, includes the drive and path
from where that line originated, among other things.
If you want PowerShell to split the file
content based on a string that you choose (rather than the default of
newlines), the Get-Content
cmdlet’s
-Delimiter
parameter lets you
provide one.
While useful, having PowerShell attach this
extra information when you are not using it can sometimes slow down
scripts that operate on large files. If you need to process a large file
more quickly, the Get-Content
cmdlet’s ReadCount
parameter lets you
control how many lines PowerShell reads from the file at once. With a
ReadCount
of 1 (which is the
default), PowerShell returns each line one by one. With a ReadCount
of 2, PowerShell returns two lines at a time.
With a ReadCount
of less than 1,
PowerShell returns all lines from the file at once.
Beware of using a ReadCount
of less than 1 for extremely large
files. One of the benefits of the Get-Content
cmdlet is its streaming behavior. No matter how large the file, you
will still be able to process each line of the file without using up
all your system’s memory. Since a ReadCount
of less than 1 reads the entire
file before returning any results, large files have the potential to
use up your system’s memory. For more information about how to
effectively take advantage of PowerShell’s streaming capabilities, see Generate Large Reports and Text Streams.
If performance is a primary concern, the
[File]::ReadAllText()
method from the
.NET Framework reads a file most quickly from the disk. Unlike the
Get-Content
cmdlet, it does not split
the file into newlines, attach any additional information, or work
against any other PowerShell drives. Like the Get-Content
cmdlet with a ReadCount
of less than 1, it reads all the
content from the file before it returns it to you—so be cautious when
using it on extremely large files.
For more information about the Get-Content
cmdlet, type Get-Help Get-Content
. For information on how
to work with more structured files (such as XML and CSV), see Chapter 10. For more information on how to work with
binary files, see Parse and Manage Binary Files.
To search a file for an exact (but
case-insensitive) match, use the -Simple
parameter of
the Select-String
cmdlet:
PS > Select-String -Simple SearchText
file.txt
To search a file for a regular expression,
provide that pattern to the Select-String
cmdlet:
PS > Select-String "(...) ...-...." phone.txt
To recursively search all
*.txt files for a regular expression, pipe the
results of Get-ChildItem
to the
Select-String
cmdlet:
PS > Get-ChildItem -Filter *.txt -Recurse | Select-String pattern
The Select-String
cmdlet is the easiest way to
search files for a pattern or specific string. In contrast to the
traditional text-matching utilities (such as grep
) that support the same type of
functionality, the matches returned by the Select-String
cmdlet include detailed
information about the match itself.
PS > $matches = Select-String "output file" transcript.txt PS > $matches | Select LineNumber,Line LineNumber Line ---------- ---- 7 Transcript started, output file...
With a regular expression match, you’ll often
want to find out exactly what text was matched by the regular
expression. PowerShell captures this in the Matches
property of the result. For each match, the Value
property represents the text matched by your pattern.
PS > Select-String "(...) ...-...." phone.txt | Select -Expand Matches ... Value : (425) 555-1212 ... Value : (416) 556-1213
If your regular expression defines
groups (portions of the pattern enclosed in parentheses), you can access
the text matched by those groups through the Groups
property. The first group (Group[0]
) represents all
of the text matched by your pattern. Additional groups (1 and on)
represent the groups you defined. In this case, we add additional
parentheses around the area code to capture it.
PS > Select-String "((...)) ...-...." phone.txt | Select -Expand Matches | Foreach { $_.Groups[1] } Success : True Captures : {425} Index : 1 Length : 3 Value : 425 Success : True Captures : {416} Index : 1 Length : 3 Value : 416
If your regular expression defines a named capture (with the text
?<
at the
beginning of a group), the Name
>Groups
collection lets you
access those by name. In this example, we capture the area code using
AreaCode
as the capture name.
PS > Select-String "((?<AreaCode>...)) ...-...." phone.txt | Select -Expand Matches | Foreach { $_.Groups["AreaCode"] } Success : True Captures : {425} Index : 1 Length : 3 Value : 425 Success : True Captures : {416} Index : 1 Length : 3 Value : 416
By default, the
Select-String
cmdlet captures only the first match
per line of input. If the input can have multiple matches per line, use
the -AllMatches
parameter.
PS > Get-Content phone.txt (425) 555-1212 (416) 556-1213 (416) 557-1214 PS > Select-String "((...)) ...-...." phone.txt | Select -Expand Matches | Select -Expand Value (425) 555-1212 (416) 556-1213 PS > Select-String "((...)) ...-...." phone.txt -AllMatches | Select -Expand Matches | Select -Expand Value (425) 555-1212 (416) 556-1213 (416) 557-1214
For more information about captures, named captures, and other aspects of regular expressions, see Appendix B.
If the information you need is on a
different line than the line that has the match, use the
-Context
parameter to have that line included in
Select-String
’s output. PowerShell places the
result in the Context.
PreContext
and
Context.PostContext
properties of
Select-String
’s output.
If you want to search multiple files of a
specific extension, the Select-String
cmdlet lets you use wildcards (such as *.txt
) on the
filename. For more complicated lists of files (which includes searching
all files in the directory), it is usually better to use the Get-ChildItem
cmdlet to generate the list of
files as shown previously in the solution.
Since the Select-String
cmdlet outputs the filename,
line number, and matching line for every match it finds, this output may
sometimes include too much detail. A perfect example is when you are
searching for a binary file that contains a specific string. A binary
file (such as a DLL or EXE) rarely makes sense when displayed as text,
so your screen quickly fills with apparent garbage.
The solution to this problem comes from
Select-String
’s -Quiet
switch. It
simply returns true or false, depending on whether the file contains the
string. So, to find the DLL or EXE in the current directory that
contains the text “Debug”:
Get-ChildItem | Where { $_ | Select-String "Debug" -Quiet }
Two other common tools used to search files
for text are the -match
operator and
the switch
statement with the
-file
option. For more information
about those, see Recipes and . For more information about the
Select-String
cmdlet, type Get-Help Select-String
.
You want to parse and analyze a text-based logfile using PowerShell’s standard object management commands.
Use the Convert-TextObject
script given in Program: Convert Text Streams to Objects to work with
text-based logfiles. With your assistance, it converts streams of text
into streams of objects, which you can then easily work with using
PowerShell’s standard commands.
The Convert-TextObject
script primarily takes two
arguments:
As an example, you can use patch logs from the Windows directory. These logs track the patch installation details from updates applied to the machine (except for Windows Vista). One detail included in these logfiles is the names and versions of the files modified by that specific patch, as shown in Example 9-1.
Example 9-1. Getting a list of files modified by hotfixes
PS > cd $env:WINDIR PS > $parseExpression = "(.*): Destination:(.*) ((.*))" PS > $files = dir kb*.log -Exclude *uninst.log PS > $logContent = $files | Get-Content | Select-String $parseExpression PS > $logContent (...) 0.734: Destination:C:WINNTsystem32shell32.dll (6.0.3790.205) 0.734: Destination:C:WINNTsystem32wininet.dll (6.0.3790.218) 0.734: Destination:C:WINNTsystem32urlmon.dll (6.0.3790.218) 0.734: Destination:C:WINNTsystem32shlwapi.dll (6.0.3790.212) 0.734: Destination:C:WINNTsystem32shdocvw.dll (6.0.3790.214) 0.734: Destination:C:WINNTsystem32digest.dll (6.0.3790.0) 0.734: Destination:C:WINNTsystem32rowseui.dll (6.0.3790.218) (...)
Like most logfiles, the format of the text is very regular but hard to manage. In this example, you have:
A number (the number of seconds since the patch started)
The text “:
Destination:
”
The file being patched
An open parenthesis
The version of the file being patched
A close parenthesis
You don’t care about any of the text, but the time, file, and file version are useful properties to track:
$properties = "Time","File","FileVersion"
So now, you use the Convert-TextObject
script to convert the text
output into a stream of objects:
PS > $logObjects = $logContent | Convert-TextObject -ParseExpression $parseExpression -PropertyName $properties
We can now easily query those objects using PowerShell’s built-in commands. For example, you can find the files most commonly affected by patches and service packs, as shown by Example 9-2.
Example 9-2. Finding files most commonly affected by hotfixes
PS > $logObjects | Group-Object file | Sort-Object -Descending Count | Select-Object Count,Name | Format-Table -Auto Count Name ----- ---- 152 C:WINNTsystem32shdocvw.dll 147 C:WINNTsystem32shlwapi.dll 128 C:WINNTsystem32wininet.dll 116 C:WINNTsystem32shell32.dll 92 C:WINNTsystem32 pcss.dll 92 C:WINNTsystem32olecli32.dll 92 C:WINNTsystem32ole32.dll 84 C:WINNTsystem32urlmon.dll (...)
Using this technique, you can work with most text-based logfiles.
In Example 9-2, you got all the
information you needed by splitting the input text into groups of simple
strings. The time offset, file, and version information served their
purposes as is. In addition to the features used by Example 9-2, however, the
Convert-TextObject
script also
supports a parameter that lets you control the data types of those
properties. If one of the properties should be treated as a number or a
DateTime
, you may get incorrect
results if you work with that property as a string. For more information
about this functionality, see the description of the -PropertyType
parameter in the Convert-TextObject
script.
Although most logfiles have entries designed to fit within a single line, some span multiple lines. When a logfile contains entries that span multiple lines, it includes some sort of special marker to separate log entries from each other. Look at this example:
PS > Get-Content AddressBook.txt Name: Chrissy Phone: 555-1212 ---- Name: John Phone: 555-1213
The key to working with this type of logfile
comes from two places. The first is the -Delimiter
parameter of the Get-Content
cmdlet,
which makes it split the file based on that delimiter instead of
newlines. The second is to write a ParseExpression
regular expression that
ignores the newline characters that remain in each record:
PS > $records = gc AddressBook.txt -Delimiter "----" PS > $parseExpression = "(?s)Name: (S*).*Phone: (S*).*" PS > $records | Convert-TextObject -ParseExpression $parseExpression Property1 Property2 --------- --------- Chrissy 555-1212 John 555-1213
The parse expression in this example uses the
single line option (?s
) so that the (.*
)
portion of the regular expression accepts newline characters as well.
For more information about these (and other) regular expression options,
see Appendix B.
For extremely large logfiles, handwritten parsing tools may not meet your needs. In those situations, specialized log management tools can prove helpful. One example is Microsoft’s free Log Parser (http://www.logparser.com). Another common alternative is to import the log entries to a SQL database, and then perform ad hoc queries on database tables instead.
There are two main techniques when working
with binary data in a file. The first is to read the file using the
Byte
encoding, so that PowerShell
does not treat the content as text. The second is to use the BitConverter
class to translate these bytes
back and forth into numbers that you more commonly care about.
Example 9-3 displays the “characteristics” of a Windows executable. The beginning section of any executable (a .DLL, .EXE, or any of several others) starts with a binary section known as the portable executable (PE) header. Part of this header includes characteristics about that file, such as whether the file is a DLL.
For more information about the PE header format, see http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx.
Example 9-3. Get-Characteristics.ps1
############################################################################## ## ## Get-Characteristics ## ## From Windows PowerShell Cookbook (O'Reilly) ## by Lee Holmes (http://www.leeholmes.com/guide) ## ############################################################################## <# .SYNOPSIS Get the file characteristics of a file in the PE Executable File Format. .EXAMPLE Get-Characteristics $env:WINDIR otepad.exe IMAGE_FILE_LOCAL_SYMS_STRIPPED IMAGE_FILE_RELOCS_STRIPPED IMAGE_FILE_EXECUTABLE_IMAGE IMAGE_FILE_32BIT_MACHINE IMAGE_FILE_LINE_NUMS_STRIPPED #> param( ## The path to the file to check [Parameter(Mandatory = $true)] [string] $Path ) Set-StrictMode -Version Latest ## Define the characteristics used in the PE file file header. ## Taken from: ## http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx $characteristics = @{} $characteristics["IMAGE_FILE_RELOCS_STRIPPED"] = 0x0001 $characteristics["IMAGE_FILE_EXECUTABLE_IMAGE"] = 0x0002 $characteristics["IMAGE_FILE_LINE_NUMS_STRIPPED"] = 0x0004 $characteristics["IMAGE_FILE_LOCAL_SYMS_STRIPPED"] = 0x0008 $characteristics["IMAGE_FILE_AGGRESSIVE_WS_TRIM"] = 0x0010 $characteristics["IMAGE_FILE_LARGE_ADDRESS_AWARE"] = 0x0020 $characteristics["RESERVED"] = 0x0040 $characteristics["IMAGE_FILE_BYTES_REVERSED_LO"] = 0x0080 $characteristics["IMAGE_FILE_32BIT_MACHINE"] = 0x0100 $characteristics["IMAGE_FILE_DEBUG_STRIPPED"] = 0x0200 $characteristics["IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP"] = 0x0400 $characteristics["IMAGE_FILE_NET_RUN_FROM_SWAP"] = 0x0800 $characteristics["IMAGE_FILE_SYSTEM"] = 0x1000 $characteristics["IMAGE_FILE_DLL"] = 0x2000 $characteristics["IMAGE_FILE_UP_SYSTEM_ONLY"] = 0x4000 $characteristics["IMAGE_FILE_BYTES_REVERSED_HI"] = 0x8000 ## Get the content of the file, as an array of bytes $fileBytes = Get-Content $path -ReadCount 0 -Encoding byte ## The offset of the signature in the file is stored at location 0x3c. $signatureOffset = $fileBytes[0x3c] ## Ensure it is a PE file $signature = [char[]] $fileBytes[$signatureOffset..($signatureOffset + 3)] if([String]::Join('', $signature) -ne "PE`0`0") { throw "This file does not conform to the PE specification." } ## The location of the COFF header is 4 bytes into the signature $coffHeader = $signatureOffset + 4 ## The characteristics data are 18 bytes into the COFF header. The ## BitConverter class manages the conversion of the 4 bytes into an integer. $characteristicsData = [BitConverter]::ToInt32($fileBytes, $coffHeader + 18) ## Go through each of the characteristics. If the data from the file has that ## flag set, then output that characteristic. foreach($key in $characteristics.Keys) { $flag = $characteristics[$key] if(($characteristicsData -band $flag) -eq $flag) { $key } }
For most files, this technique is the easiest
way to work with binary data. If you actually modify the binary data,
then you will also want to use the Byte
encoding when you send it back to
disk:
$fileBytes | Set-Content modified.exe -Encoding Byte
For extremely large files, though, it may be
unacceptably slow to load the entire file into memory when you work with
it. If you begin to run against this limit, the solution is to use file
management classes from the .NET Framework. These classes include
BinaryReader
, StreamReader
, and
others. For more information about working with classes from the .NET
Framework, see Work with .NET Objects. For more
information about running scripts, see Run Programs, Scripts, and Existing Tools.
You want to create a file for temporary purposes and want to be sure that the file does not already exist.
Use the [System.IO.Path]::GetTempFilename()
method
from the .NET Framework to create a temporary file:
$filename = [System.IO.Path]::GetTempFileName() (... use the file ...) Remove-Item -Force $filename
It is common to want to create a file for temporary purposes. For example, you might want to search and replace text inside a file. Doing this to a large file requires a temporary file (see Search and Replace Text in a File). Another example is the temporary file used by Program: Interactively Filter Lists of Objects.
Often, people create this temporary file wherever they can think of: in C:, the script’s current location, or any number of other places. Although this may work on the author’s system, it rarely works well elsewhere. For example, if the user does not use their Administrator account for day-to-day tasks, your script will not have access to C: and will fail.
Another difficulty comes from trying to create a unique name for the temporary file. If your script just hardcodes a name (no matter how many random characters it has), it will fail if you run two copies at the same time. You might even craft a script smart enough to search for a filename that does not exist, create it, and then use it. Unfortunately, this could still break if another copy of your script creates that file after you see that it is missing but before you actually create the file.
Finally, there are several security vulnerabilities that your script might introduce should it write its temporary files to a location that other users can read or write.
Luckily, the authors of the .NET Framework
provided the [System.IO.Path]::GetTempFilename()
method to
resolve these problems for you. It creates a unique filename in a
reliable location and in a secure manner. The method returns a filename,
which you can then use as you want.
Remember to delete this file when your script no longer needs it; otherwise, your script will waste disk space and cause needless clutter on your users’ systems. Remember: your scripts should solve the administrator’s problems, not cause them!
By default, the GetTempFilename()
method returns a file with a
.tmp extension. For most purposes, the file
extension does not matter, and this works well. In the rare instances
when you need to create a file with a specific extension, the
[System.IO.Path]::ChangeExtension()
method lets
you change the extension of that temporary file. The following example
creates a new temporary file that uses the .cs file
extension:
$filename = [System.IO.Path]::GetTempFileName() $newname = [System.IO.Path]::ChangeExtension($filename, ".cs") Move-Item $filename $newname (... use the file ...) Remove-Item $newname
To search and replace text in a file, first store the content of the file in a variable, and then store the replaced text back in that file, as shown in Example 9-4.
Example 9-4. Replacing text in a file
PS > $filename = "file.txt" PS > $match = "source text" PS > $replacement = "replacement text" PS > PS > $content = Get-Content $filename PS > $content This is some source text that we want to replace. One of the things you may need to be careful about with Source Text is when it spans multiple lines, and may have different Source Text capitalization. PS > PS > $content = $content -creplace $match,$replacement PS > $content This is some replacement text that we want to replace. One of the things you may need to be careful about with Source Text is when it spans multiple lines, and may have different Source Text capitalization. PS > $content | Set-Content $filename
Using PowerShell to search and replace text in a file (or many files!) is one of the best examples of using a tool to automate a repetitive task. What could literally take months by hand can be shortened to a few minutes (or hours, at most).
Notice that the solution uses the -creplace
operator to replace text in a
case-sensitive manner. This is almost always what you will want to do,
as the replacement text uses the exact capitalization that you
provide. If the text you want to replace is capitalized in several
different ways (as in the term "Source
Text
" from the solution), then search and replace several
times with the different possible capitalizations.
Example 9-4 illustrates what is perhaps the simplest (but actually most common) scenario:
If some of those assumptions don’t hold true, then this discussion shows you how to tailor the way you search and replace within this file.
By default, the Set-Content
cmdlet
assumes that you want the output file to contain plain ASCII text. If
you work with a file in another encoding (for example, Unicode or an OEM code page such as Cyrillic), use the
-Encoding
parameter of the
Out-File
cmdlet to specify
that:
$content | Out-File -Encoding Unicode $filename $content | Out-File -Encoding OEM $filename
Although it is most common to replace one literal string with another literal string, you might want to replace text according to a pattern in some advanced scenarios. One example might be swapping first name and last name. PowerShell supports this type of replacement through its support of regular expressions in its replacement operator:
PS > $content = Get-Content names.txt PS > $content John Doe Mary Smith PS > $content -replace '(.*) (.*)','$2, $1' Doe, John Smith, Mary
The Get-Content
cmdlet
used in the solution retrieves a list of lines from the file. When you
use the -replace
operator against
this array, it replaces your text in each of those lines individually.
If your match spans multiple lines, as shown between lines 3 and 4 in
Example 9-4, the -replace
operator will be unaware of the
match and will not perform the replacement.
If you want to replace text that spans multiple lines, then it becomes necessary to stop treating the input text as a collection of lines. Once you stop treating the input as a collection of lines, it is also important to use a replacement expression that can ignore line breaks, as shown in Example 9-5.
The first and second lines of Example 9-5 read the entire
content of the file as a single string. They do this by calling the
[System.IO.File]::ReadAllText()
method from
the .NET Framework, since the Get-Content
cmdlet splits the content of the
file into individual lines.
The third line of this solution replaces the
text by using a regular expression pattern. The section Source(s*)Text
scans for the word Source
, followed optionally by some
whitespace, followed by the word Text
. Since the whitespace portion of the
regular expression has parentheses around it, we want to remember
exactly what that whitespace was. By default, regular expressions do
not let newline characters count as whitespace, so the first portion
of the regular expression uses the single-line
option (?
s) to allow
newline characters to count as whitespace. The replacement portion of
the -replace
operator replaces that match with Replacement
, followed by the exact
whitespace from the match that we captured ($1
), followed by Text
.
For more information, see Simple Operators.
The approaches used so far store the entire contents of the file in memory as they replace the text in them. Once we’ve made the replacements in memory, we write the updated content back to disk. This works well when replacing text in small, medium, and even moderately large files. For extremely large files (for example, more than several hundred megabytes), using this much memory may burden your system and slow down your script. To solve that problem, you can work on the files line by line, rather than with the entire file at once.
Since you’re working with the file line by line, it will still be in use when you try to write replacement text back into it. You can avoid this problem if you write the replacement text into a temporary file until you’ve finished working with the main file. Once you’ve finished scanning through your file, you can delete it and replace it with the temporary file.
$filename = "file.txt" $temporaryFile = [System.IO.Path]::GetTempFileName() $match = "source text" $replacement = "replacement text" Get-Content $filename | Foreach-Object { $_ -creplace $match,$replacement | Add-Content $temporaryFile } Remove-Item $filename Move-Item $temporaryFile $filename
Both PowerShell and the .NET Framework do a lot of work to
hide from you the complexities of file encodings. The
Get-Content
cmdlet automatically detects the encoding
of a file, and then handles all encoding issues before returning the
content to you. When you do need to know the encoding of a file, though,
the solution requires a bit of work.
Example 9-6 resolves
this by doing the hard work for you. Files with unusual encodings are
supposed to (and almost always do) have a byte order mark to identify the
encoding. After the byte order mark, they have the actual content. If a
file lacks the byte order mark (no matter how the content is encoded),
Get-FileEncoding
assumes the .NET
Framework’s default encoding of UTF-7. If the content is not actually
encoded as defined by the byte order mark,
Get-FileEncoding
still outputs the declared
encoding.
Example 9-6. Get-FileEncoding.ps1
############################################################################## ## ## Get-FileEncoding ## ## From Windows PowerShell Cookbook (O'Reilly) ## by Lee Holmes (http://www.leeholmes.com/guide) ## ############################################################################## <# .SYNOPSIS Gets the encoding of a file .EXAMPLE Get-FileEncoding.ps1 .UnicodeScript.ps1 BodyName : unicodeFFFE EncodingName : Unicode (Big-Endian) HeaderName : unicodeFFFE WebName : unicodeFFFE WindowsCodePage : 1200 IsBrowserDisplay : False IsBrowserSave : False IsMailNewsDisplay : False IsMailNewsSave : False IsSingleByte : False EncoderFallback : System.Text.EncoderReplacementFallback DecoderFallback : System.Text.DecoderReplacementFallback IsReadOnly : True CodePage : 1201 #> param( ## The path of the file to get the encoding of. $Path ) Set-StrictMode -Version Latest ## The hashtable used to store our mapping of encoding bytes to their ## name. For example, "255-254 = Unicode" $encodings = @{} ## Find all of the encodings understood by the .NET Framework. For each, ## determine the bytes at the start of the file (the preamble) that the .NET ## Framework uses to identify that encoding. $encodingMembers = [System.Text.Encoding] | Get-Member -Static -MemberType Property $encodingMembers | Foreach-Object { $encodingBytes = [System.Text.Encoding]::($_.Name).GetPreamble() -join '-' $encodings[$encodingBytes] = $_.Name } ## Find out the lengths of all of the preambles. $encodingLengths = $encodings.Keys | Where-Object { $_ } | Foreach-Object { ($_ -split "-").Count } ## Assume the encoding is UTF7 by default $result = "UTF7" ## Go through each of the possible preamble lengths, read that many ## bytes from the file, and then see if it matches one of the encodings ## we know about. foreach($encodingLength in $encodingLengths | Sort -Descending) { $bytes = (Get-Content -encoding byte -readcount $encodingLength $path)[0] $encoding = $encodings[$bytes -join '-'] ## If we found an encoding that had the same preamble bytes, ## save that output and break. if($encoding) { $result = $encoding break } } ## Finally, output the encoding. [System.Text.Encoding]::$result
For more information about running scripts, see Run Programs, Scripts, and Existing Tools.
When dealing with binary data, it is often useful to see the value of the actual bytes being used in that binary data. In addition to the value of the data, finding its offset in the file or content is usually important as well.
Example 9-7 enables both scenarios by displaying content in a report that shows all of this information. The leftmost column displays the offset into the content, increasing by 16 bytes at a time. The middle 16 columns display the hexadecimal representation of the byte at that position in the content. The header of each column shows how far into the 16-byte chunk that character is. The far-right column displays the ASCII representation of the characters in that row.
To determine the position of a byte within the input, add the number
at the far-left of the row to the number at the top of the column for that
character. For example, 0000230
(shown at the far left)
+ C
(shown at the top of the column) =
000023C
. Therefore, the byte in this example is at
offset 23C in the content.
Example 9-7. Format-Hex.ps1
############################################################################## ## ## Format-Hex ## ## From Windows PowerShell Cookbook (O'Reilly) ## by Lee Holmes (http://www.leeholmes.com/guide) ## ############################################################################## <# .SYNOPSIS Outputs a file or pipelined input as a hexadecimal display. To determine the offset of a character in the input, add the number at the far-left of the row with the the number at the top of the column for that character. .EXAMPLE "Hello World" | Format-Hex 0 1 2 3 4 5 6 7 8 9 A B C D E F 00000000 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 H.e.l.l.o. .W.o. 00000010 72 00 6C 00 64 00 r.l.d. .EXAMPLE Format-Hex c: empexample.bmp #> [CmdletBinding(DefaultParameterSetName = "ByPath")] param( ## The file to read the content from [Parameter(ParameterSetName = "ByPath", Position = 0)] [string] $Path, ## The input (bytes or strings) to format as hexadecimal [Parameter( ParameterSetName = "ByInput", Position = 0, ValueFromPipeline = $true)] [Object] $InputObject ) begin { Set-StrictMode -Version Latest ## Create the array to hold the content. If the user specified the ## -Path parameter, read the bytes from the path. [byte[]] $inputBytes = $null if($Path) { $inputBytes = [IO.File]::ReadAllBytes((Resolve-Path $Path)) } ## Store our header, and formatting information $counter = 0 $header = " 0 1 2 3 4 5 6 7 8 9 A B C D E F" $nextLine = "{0} " -f [Convert]::ToString( $counter, 16).ToUpper().PadLeft(8, '0') $asciiEnd = "" ## Output the header "`r`n$header`r`n" } process { ## If they specified the -InputObject parameter, retrieve the bytes ## from that input if(Test-Path variable:InputObject) { ## If it's an actual byte, add it to the inputBytes array. if($InputObject -is [Byte]) { $inputBytes = $InputObject } else { ## Otherwise, convert it to a string and extract the bytes ## from that. $inputString = [string] $InputObject $inputBytes = [Text.Encoding]::Unicode.GetBytes($inputString) } } ## Now go through the input bytes foreach($byte in $inputBytes) { ## Display each byte, in 2-digit hexidecimal, and add that to the ## left-hand side. $nextLine += "{0:X2} " -f $byte ## If the character is printable, add its ascii representation to ## the righthand side. Otherwise, add a dot to the righthand side. if(($byte -ge 0x20) -and ($byte -le 0xFE)) { $asciiEnd += [char] $byte } else { $asciiEnd += "." } $counter++; ## If we've hit the end of a line, combine the right half with the ## left half, and start a new line. if(($counter % 16) -eq 0) { "$nextLine $asciiEnd" $nextLine = "{0} " -f [Convert]::ToString( $counter, 16).ToUpper().PadLeft(8, '0') $asciiEnd = ""; } } } end { ## At the end of the file, we might not have had the chance to output ## the end of the line yet. Only do this if we didn't exit on the 16-byte ## boundary, though. if(($counter % 16) -ne 0) { while(($counter % 16) -ne 0) { $nextLine += " " $asciiEnd += " " $counter++; } "$nextLine $asciiEnd" } "" }
For more information about running scripts, see Run Programs, Scripts, and Existing Tools.
3.17.81.201