Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9. Simple Files

Introduction

When administering a system, you naturally spend a significant amount of time working with the files on that system. Many of the things you want to do with these files are simple: get their content, search them for a pattern, or replace text inside them.

For even these simple operations, PowerShell’s object-oriented flavor adds several unique and powerful twists.

Get the Content of a File

Problem

You want to get the content of a file.

Solution

Provide the filename as an argument to the Get-Content cmdlet:

PS > $content = Get-Content c:	empfile.txt

Place the filename in a ${} section to use the cmdlet Get-Content variable syntax:

PS > $content = ${c:	empfile.txt}

Provide the filename as an argument to the ReadAllText() method to use the System.IO.File class from the .NET Framework:

PS > $content = [System.IO.File]::ReadAllText("c:	empfile.txt")

Discussion

PowerShell offers three primary ways to get the content of a file. The first is the Get-Content cmdlet—the cmdlet designed for this purpose. In fact, the Get-Content cmdlet works on any PowerShell drive that supports the concept of items with content. This includes Alias:, Function:, and more. The second and third ways are the Get-Content variable syntax and the ReadAllText() method.

When working against files, the Get-Content cmdlet returns the content of the file line by line. When it does this, PowerShell supplies additional information about that output line. This information, which PowerShell attaches as properties to each output line, includes the drive and path from where that line originated, among other things.

Note

If you want PowerShell to split the file content based on a string that you choose (rather than the default of newlines), the Get-Content cmdlet’s -Delimiter parameter lets you provide one.

While useful, having PowerShell attach this extra information when you are not using it can sometimes slow down scripts that operate on large files. If you need to process a large file more quickly, the Get-Content cmdlet’s ReadCount parameter lets you control how many lines PowerShell reads from the file at once. With a ReadCount of 1 (which is the default), PowerShell returns each line one by one. With a ReadCount of 2, PowerShell returns two lines at a time. With a ReadCount of less than 1, PowerShell returns all lines from the file at once.

Warning

Beware of using a ReadCount of less than 1 for extremely large files. One of the benefits of the Get-Content cmdlet is its streaming behavior. No matter how large the file, you will still be able to process each line of the file without using up all your system’s memory. Since a ReadCount of less than 1 reads the entire file before returning any results, large files have the potential to use up your system’s memory. For more information about how to effectively take advantage of PowerShell’s streaming capabilities, see Generate Large Reports and Text Streams.

If performance is a primary concern, the [File]::ReadAllText() method from the .NET Framework reads a file most quickly from the disk. Unlike the Get-Content cmdlet, it does not split the file into newlines, attach any additional information, or work against any other PowerShell drives. Like the Get-Content cmdlet with a ReadCount of less than 1, it reads all the content from the file before it returns it to you—so be cautious when using it on extremely large files.

For more information about the Get-Content cmdlet, type Get-Help Get-Content. For information on how to work with more structured files (such as XML and CSV), see Chapter 10. For more information on how to work with binary files, see Parse and Manage Binary Files.

Search a File for Text or a Pattern

Problem

You want to find a string or regular expression in a file.

Solution

To search a file for an exact (but case-insensitive) match, use the -Simple parameter of the Select-String cmdlet:

PS > Select-String -Simple SearchText file.txt

To search a file for a regular expression, provide that pattern to the Select-String cmdlet:

PS > Select-String "(...) ...-...." phone.txt

To recursively search all *.txt files for a regular expression, pipe the results of Get-ChildItem to the Select-String cmdlet:

PS > Get-ChildItem -Filter *.txt -Recurse | Select-String pattern

Discussion

The Select-String cmdlet is the easiest way to search files for a pattern or specific string. In contrast to the traditional text-matching utilities (such as grep) that support the same type of functionality, the matches returned by the Select-String cmdlet include detailed information about the match itself.

PS > $matches = Select-String "output file" transcript.txt
PS > $matches | Select LineNumber,Line

                        LineNumber Line
                        ---------- ----
                                 7 Transcript started, output file...

With a regular expression match, you’ll often want to find out exactly what text was matched by the regular expression. PowerShell captures this in the Matches property of the result. For each match, the Value property represents the text matched by your pattern.

PS > Select-String "(...) ...-...." phone.txt | Select -Expand Matches

...
Value    : (425) 555-1212

...
Value    : (416) 556-1213

If your regular expression defines groups (portions of the pattern enclosed in parentheses), you can access the text matched by those groups through the Groups property. The first group (Group[0]) represents all of the text matched by your pattern. Additional groups (1 and on) represent the groups you defined. In this case, we add additional parentheses around the area code to capture it.

PS > Select-String "((...)) ...-...." phone.txt |
    Select -Expand Matches | Foreach { $_.Groups[1] }



Success  : True
Captures : {425}
Index    : 1
Length   : 3
Value    : 425

Success  : True
Captures : {416}
Index    : 1
Length   : 3
Value    : 416

If your regular expression defines a named capture (with the text ?<Name> at the beginning of a group), the Groups collection lets you access those by name. In this example, we capture the area code using AreaCode as the capture name.

PS > Select-String "((?<AreaCode>...)) ...-...." phone.txt |
     Select -Expand Matches | Foreach { $_.Groups["AreaCode"] }



Success  : True
Captures : {425}
Index    : 1
Length   : 3
Value    : 425

Success  : True
Captures : {416}
Index    : 1
Length   : 3
Value    : 416

By default, the Select-String cmdlet captures only the first match per line of input. If the input can have multiple matches per line, use the -AllMatches parameter.

PS > Get-Content phone.txt
(425) 555-1212
(416) 556-1213 (416) 557-1214

PS > Select-String "((...)) ...-...." phone.txt |
    Select -Expand Matches | Select -Expand Value

(425) 555-1212
(416) 556-1213

PS > Select-String "((...)) ...-...." phone.txt -AllMatches |
    Select -Expand Matches | Select -Expand Value

(425) 555-1212
(416) 556-1213
(416) 557-1214

For more information about captures, named captures, and other aspects of regular expressions, see Appendix B.

Note

If the information you need is on a different line than the line that has the match, use the -Context parameter to have that line included in Select-String’s output. PowerShell places the result in the Context.PreContext and Context.PostContext properties of Select-String’s output.

If you want to search multiple files of a specific extension, the Select-String cmdlet lets you use wildcards (such as *.txt) on the filename. For more complicated lists of files (which includes searching all files in the directory), it is usually better to use the Get-ChildItem cmdlet to generate the list of files as shown previously in the solution.

Since the Select-String cmdlet outputs the filename, line number, and matching line for every match it finds, this output may sometimes include too much detail. A perfect example is when you are searching for a binary file that contains a specific string. A binary file (such as a DLL or EXE) rarely makes sense when displayed as text, so your screen quickly fills with apparent garbage.

The solution to this problem comes from Select-String’s -Quiet switch. It simply returns true or false, depending on whether the file contains the string. So, to find the DLL or EXE in the current directory that contains the text “Debug”:

Get-ChildItem | Where { $_ | Select-String "Debug" -Quiet }

Two other common tools used to search files for text are the -match operator and the switch statement with the -file option. For more information about those, see Recipes and . For more information about the Select-String cmdlet, type Get-Help Select-String.

Parse and Manage Text-Based Logfiles

Problem

You want to parse and analyze a text-based logfile using PowerShell’s standard object management commands.

Solution

Use the Convert-TextObject script given in Program: Convert Text Streams to Objects to work with text-based logfiles. With your assistance, it converts streams of text into streams of objects, which you can then easily work with using PowerShell’s standard commands.

The Convert-TextObject script primarily takes two arguments:

A regular expression that describes how to break the incoming text into groups
A list of property names that the script then assigns to those text groups

As an example, you can use patch logs from the Windows directory. These logs track the patch installation details from updates applied to the machine (except for Windows Vista). One detail included in these logfiles is the names and versions of the files modified by that specific patch, as shown in Example 9-1.

Example 9-1. Getting a list of files modified by hotfixes

PS > cd $env:WINDIR
PS > $parseExpression = "(.*): Destination:(.*) ((.*))"
PS > $files = dir kb*.log -Exclude *uninst.log
PS > $logContent = $files | Get-Content | Select-String $parseExpression
PS > $logContent

(...)
0.734: Destination:C:WINNTsystem32shell32.dll (6.0.3790.205)
0.734: Destination:C:WINNTsystem32wininet.dll (6.0.3790.218)
0.734: Destination:C:WINNTsystem32urlmon.dll (6.0.3790.218)
0.734: Destination:C:WINNTsystem32shlwapi.dll (6.0.3790.212)
0.734: Destination:C:WINNTsystem32shdocvw.dll (6.0.3790.214)
0.734: Destination:C:WINNTsystem32digest.dll (6.0.3790.0)
0.734: Destination:C:WINNTsystem32rowseui.dll (6.0.3790.218)
(...)

Like most logfiles, the format of the text is very regular but hard to manage. In this example, you have:

A number (the number of seconds since the patch started)
The text “: Destination:”
The file being patched
An open parenthesis
The version of the file being patched
A close parenthesis

You don’t care about any of the text, but the time, file, and file version are useful properties to track:

$properties = "Time","File","FileVersion"

So now, you use the Convert-TextObject script to convert the text output into a stream of objects:

PS > $logObjects = $logContent |
    Convert-TextObject -ParseExpression $parseExpression -PropertyName $properties

We can now easily query those objects using PowerShell’s built-in commands. For example, you can find the files most commonly affected by patches and service packs, as shown by Example 9-2.

Example 9-2. Finding files most commonly affected by hotfixes

PS > $logObjects | Group-Object file | Sort-Object -Descending Count |
    Select-Object Count,Name | Format-Table -Auto


Count Name
----- ----
  152 C:WINNTsystem32shdocvw.dll
  147 C:WINNTsystem32shlwapi.dll

  128 C:WINNTsystem32wininet.dll
  116 C:WINNTsystem32shell32.dll
   92 C:WINNTsystem32
pcss.dll
   92 C:WINNTsystem32olecli32.dll
   92 C:WINNTsystem32ole32.dll
   84 C:WINNTsystem32urlmon.dll
(...)

Using this technique, you can work with most text-based logfiles.

Discussion

In Example 9-2, you got all the information you needed by splitting the input text into groups of simple strings. The time offset, file, and version information served their purposes as is. In addition to the features used by Example 9-2, however, the Convert-TextObject script also supports a parameter that lets you control the data types of those properties. If one of the properties should be treated as a number or a DateTime, you may get incorrect results if you work with that property as a string. For more information about this functionality, see the description of the -PropertyType parameter in the Convert-TextObject script.

Although most logfiles have entries designed to fit within a single line, some span multiple lines. When a logfile contains entries that span multiple lines, it includes some sort of special marker to separate log entries from each other. Look at this example:

PS > Get-Content AddressBook.txt
Name: Chrissy
Phone: 555-1212
----
Name: John
Phone: 555-1213

The key to working with this type of logfile comes from two places. The first is the -Delimiter parameter of the Get-Content cmdlet, which makes it split the file based on that delimiter instead of newlines. The second is to write a ParseExpression regular expression that ignores the newline characters that remain in each record:

PS > $records = gc AddressBook.txt -Delimiter "----"
PS > $parseExpression = "(?s)Name: (S*).*Phone: (S*).*"
PS > $records | Convert-TextObject -ParseExpression $parseExpression

Property1                                                   Property2
---------                                                   ---------
Chrissy                                                     555-1212
John                                                        555-1213

The parse expression in this example uses the single line option (?s) so that the (.*) portion of the regular expression accepts newline characters as well. For more information about these (and other) regular expression options, see Appendix B.

For extremely large logfiles, handwritten parsing tools may not meet your needs. In those situations, specialized log management tools can prove helpful. One example is Microsoft’s free Log Parser (http://www.logparser.com). Another common alternative is to import the log entries to a SQL database, and then perform ad hoc queries on database tables instead.

Parse and Manage Binary Files

Problem

You want to work with binary data in a file.

Solution

There are two main techniques when working with binary data in a file. The first is to read the file using the Byte encoding, so that PowerShell does not treat the content as text. The second is to use the BitConverter class to translate these bytes back and forth into numbers that you more commonly care about.

Example 9-3 displays the “characteristics” of a Windows executable. The beginning section of any executable (a .DLL, .EXE, or any of several others) starts with a binary section known as the portable executable (PE) header. Part of this header includes characteristics about that file, such as whether the file is a DLL.

For more information about the PE header format, see http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx.

Example 9-3. Get-Characteristics.ps1

##############################################################################
##
## Get-Characteristics
##
## From Windows PowerShell Cookbook (O'Reilly)
## by Lee Holmes (http://www.leeholmes.com/guide)
##
##############################################################################

<#

.SYNOPSIS

Get the file characteristics of a file in the PE Executable File Format.

.EXAMPLE

Get-Characteristics $env:WINDIR
otepad.exe
IMAGE_FILE_LOCAL_SYMS_STRIPPED
IMAGE_FILE_RELOCS_STRIPPED
IMAGE_FILE_EXECUTABLE_IMAGE
IMAGE_FILE_32BIT_MACHINE
IMAGE_FILE_LINE_NUMS_STRIPPED

#>

param(
    ## The path to the file to check
    [Parameter(Mandatory = $true)]
    [string] $Path
)

Set-StrictMode -Version Latest

## Define the characteristics used in the PE file file header.
## Taken from:
## http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx
$characteristics = @{}
$characteristics["IMAGE_FILE_RELOCS_STRIPPED"] = 0x0001
$characteristics["IMAGE_FILE_EXECUTABLE_IMAGE"] = 0x0002
$characteristics["IMAGE_FILE_LINE_NUMS_STRIPPED"] = 0x0004
$characteristics["IMAGE_FILE_LOCAL_SYMS_STRIPPED"] = 0x0008
$characteristics["IMAGE_FILE_AGGRESSIVE_WS_TRIM"] = 0x0010
$characteristics["IMAGE_FILE_LARGE_ADDRESS_AWARE"] = 0x0020
$characteristics["RESERVED"] = 0x0040
$characteristics["IMAGE_FILE_BYTES_REVERSED_LO"] = 0x0080
$characteristics["IMAGE_FILE_32BIT_MACHINE"] = 0x0100
$characteristics["IMAGE_FILE_DEBUG_STRIPPED"] = 0x0200
$characteristics["IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP"] = 0x0400
$characteristics["IMAGE_FILE_NET_RUN_FROM_SWAP"] = 0x0800
$characteristics["IMAGE_FILE_SYSTEM"] = 0x1000
$characteristics["IMAGE_FILE_DLL"] = 0x2000
$characteristics["IMAGE_FILE_UP_SYSTEM_ONLY"] = 0x4000
$characteristics["IMAGE_FILE_BYTES_REVERSED_HI"] = 0x8000

## Get the content of the file, as an array of bytes
$fileBytes = Get-Content $path -ReadCount 0 -Encoding byte

## The offset of the signature in the file is stored at location 0x3c.
$signatureOffset = $fileBytes[0x3c]

## Ensure it is a PE file
$signature = [char[]] $fileBytes[$signatureOffset..($signatureOffset + 3)]
if([String]::Join('', $signature) -ne "PE`0`0")
{
    throw "This file does not conform to the PE specification."
}

## The location of the COFF header is 4 bytes into the signature
$coffHeader = $signatureOffset + 4

## The characteristics data are 18 bytes into the COFF header. The
## BitConverter class manages the conversion of the 4 bytes into an integer.
$characteristicsData = [BitConverter]::ToInt32($fileBytes, $coffHeader + 18)

## Go through each of the characteristics. If the data from the file has that
## flag set, then output that characteristic.
foreach($key in $characteristics.Keys)
{
    $flag = $characteristics[$key]
    if(($characteristicsData -band $flag) -eq $flag)
    {
        $key
    }
}

Discussion

For most files, this technique is the easiest way to work with binary data. If you actually modify the binary data, then you will also want to use the Byte encoding when you send it back to disk:

$fileBytes | Set-Content modified.exe -Encoding Byte

For extremely large files, though, it may be unacceptably slow to load the entire file into memory when you work with it. If you begin to run against this limit, the solution is to use file management classes from the .NET Framework. These classes include BinaryReader, StreamReader, and others. For more information about working with classes from the .NET Framework, see Work with .NET Objects. For more information about running scripts, see Run Programs, Scripts, and Existing Tools.

Create a Temporary File

Problem

You want to create a file for temporary purposes and want to be sure that the file does not already exist.

Solution

Use the [System.IO.Path]::GetTempFilename() method from the .NET Framework to create a temporary file:

$filename = [System.IO.Path]::GetTempFileName()
 (... use the file ...)
Remove-Item -Force $filename

Discussion

It is common to want to create a file for temporary purposes. For example, you might want to search and replace text inside a file. Doing this to a large file requires a temporary file (see Search and Replace Text in a File). Another example is the temporary file used by Program: Interactively Filter Lists of Objects.

Often, people create this temporary file wherever they can think of: in C:, the script’s current location, or any number of other places. Although this may work on the author’s system, it rarely works well elsewhere. For example, if the user does not use their Administrator account for day-to-day tasks, your script will not have access to C: and will fail.

Another difficulty comes from trying to create a unique name for the temporary file. If your script just hardcodes a name (no matter how many random characters it has), it will fail if you run two copies at the same time. You might even craft a script smart enough to search for a filename that does not exist, create it, and then use it. Unfortunately, this could still break if another copy of your script creates that file after you see that it is missing but before you actually create the file.

Finally, there are several security vulnerabilities that your script might introduce should it write its temporary files to a location that other users can read or write.

Luckily, the authors of the .NET Framework provided the [System.IO.Path]::GetTempFilename() method to resolve these problems for you. It creates a unique filename in a reliable location and in a secure manner. The method returns a filename, which you can then use as you want.

Note

Remember to delete this file when your script no longer needs it; otherwise, your script will waste disk space and cause needless clutter on your users’ systems. Remember: your scripts should solve the administrator’s problems, not cause them!

By default, the GetTempFilename() method returns a file with a .tmp extension. For most purposes, the file extension does not matter, and this works well. In the rare instances when you need to create a file with a specific extension, the [System.IO.Path]::ChangeExtension() method lets you change the extension of that temporary file. The following example creates a new temporary file that uses the .cs file extension:

$filename = [System.IO.Path]::GetTempFileName()
$newname = [System.IO.Path]::ChangeExtension($filename, ".cs")
Move-Item $filename $newname
(... use the file ...)
Remove-Item $newname

Problem

You want to search for text in a file and replace that text with something new.

Solution

To search and replace text in a file, first store the content of the file in a variable, and then store the replaced text back in that file, as shown in Example 9-4.

Example 9-4. Replacing text in a file

PS > $filename = "file.txt"
PS > $match = "source text"
PS > $replacement = "replacement text"
PS >
PS > $content = Get-Content $filename
PS > $content
This is some source text that we want
to replace. One of the things you may need
to be careful about with Source
Text is when it spans multiple lines,
and may have different Source Text
capitalization.
PS >
PS > $content = $content -creplace $match,$replacement
PS > $content
This is some replacement text that we want
to replace. One of the things you may need
to be careful about with Source
Text is when it spans multiple lines,
and may have different Source Text
capitalization.
PS > $content | Set-Content $filename

Discussion

Using PowerShell to search and replace text in a file (or many files!) is one of the best examples of using a tool to automate a repetitive task. What could literally take months by hand can be shortened to a few minutes (or hours, at most).

Note

Notice that the solution uses the -creplace operator to replace text in a case-sensitive manner. This is almost always what you will want to do, as the replacement text uses the exact capitalization that you provide. If the text you want to replace is capitalized in several different ways (as in the term "Source Text" from the solution), then search and replace several times with the different possible capitalizations.

Example 9-4 illustrates what is perhaps the simplest (but actually most common) scenario:

You work with an ASCII text file.
You replace some literal text with a literal text replacement.
You don’t worry that the text match might span multiple lines.
Your text file is relatively small.

If some of those assumptions don’t hold true, then this discussion shows you how to tailor the way you search and replace within this file.

Work with files encoded in Unicode or another (OEM) code page

By default, the Set-Content cmdlet assumes that you want the output file to contain plain ASCII text. If you work with a file in another encoding (for example, Unicode or an OEM code page such as Cyrillic), use the -Encoding parameter of the Out-File cmdlet to specify that:

$content | Out-File -Encoding Unicode $filename
$content | Out-File -Encoding OEM $filename

Replace text using a pattern instead of plain text

Although it is most common to replace one literal string with another literal string, you might want to replace text according to a pattern in some advanced scenarios. One example might be swapping first name and last name. PowerShell supports this type of replacement through its support of regular expressions in its replacement operator:

PS > $content = Get-Content names.txt
PS > $content
John Doe
Mary Smith
PS > $content -replace '(.*) (.*)','$2, $1'
Doe, John
Smith, Mary

Replace text that spans multiple lines

The Get-Content cmdlet used in the solution retrieves a list of lines from the file. When you use the -replace operator against this array, it replaces your text in each of those lines individually. If your match spans multiple lines, as shown between lines 3 and 4 in Example 9-4, the -replace operator will be unaware of the match and will not perform the replacement.

If you want to replace text that spans multiple lines, then it becomes necessary to stop treating the input text as a collection of lines. Once you stop treating the input as a collection of lines, it is also important to use a replacement expression that can ignore line breaks, as shown in Example 9-5.

Example 9-5. Replacing text across multiple lines in a file

$filename = Get-Item file.txt
$singleLine = [System.IO.File]::ReadAllText($filename.FullName)
$content = $singleLine -creplace "(?s)Source(s*)Text",'Replacement$1Text'

The first and second lines of Example 9-5 read the entire content of the file as a single string. They do this by calling the [System.IO.File]::ReadAllText() method from the .NET Framework, since the Get-Content cmdlet splits the content of the file into individual lines.

The third line of this solution replaces the text by using a regular expression pattern. The section Source(s*)Text scans for the word Source, followed optionally by some whitespace, followed by the word Text. Since the whitespace portion of the regular expression has parentheses around it, we want to remember exactly what that whitespace was. By default, regular expressions do not let newline characters count as whitespace, so the first portion of the regular expression uses the single-line option (?s) to allow newline characters to count as whitespace. The replacement portion of the -replace operator replaces that match with Replacement, followed by the exact whitespace from the match that we captured ($1), followed by Text. For more information, see Simple Operators.

Replace text in large files

The approaches used so far store the entire contents of the file in memory as they replace the text in them. Once we’ve made the replacements in memory, we write the updated content back to disk. This works well when replacing text in small, medium, and even moderately large files. For extremely large files (for example, more than several hundred megabytes), using this much memory may burden your system and slow down your script. To solve that problem, you can work on the files line by line, rather than with the entire file at once.

Since you’re working with the file line by line, it will still be in use when you try to write replacement text back into it. You can avoid this problem if you write the replacement text into a temporary file until you’ve finished working with the main file. Once you’ve finished scanning through your file, you can delete it and replace it with the temporary file.

$filename = "file.txt"
$temporaryFile = [System.IO.Path]::GetTempFileName()

$match = "source text"
$replacement = "replacement text"

Get-Content $filename |
    Foreach-Object { $_ -creplace $match,$replacement | Add-Content $temporaryFile }

Remove-Item $filename
Move-Item $temporaryFile $filename

Program: Get the Encoding of a File

Both PowerShell and the .NET Framework do a lot of work to hide from you the complexities of file encodings. The Get-Content cmdlet automatically detects the encoding of a file, and then handles all encoding issues before returning the content to you. When you do need to know the encoding of a file, though, the solution requires a bit of work.

Example 9-6 resolves this by doing the hard work for you. Files with unusual encodings are supposed to (and almost always do) have a byte order mark to identify the encoding. After the byte order mark, they have the actual content. If a file lacks the byte order mark (no matter how the content is encoded), Get-FileEncoding assumes the .NET Framework’s default encoding of UTF-7. If the content is not actually encoded as defined by the byte order mark, Get-FileEncoding still outputs the declared encoding.

Example 9-6. Get-FileEncoding.ps1

##############################################################################
##
## Get-FileEncoding
##
## From Windows PowerShell Cookbook (O'Reilly)
## by Lee Holmes (http://www.leeholmes.com/guide)
##
##############################################################################

<#

.SYNOPSIS

Gets the encoding of a file

.EXAMPLE

Get-FileEncoding.ps1 .UnicodeScript.ps1

BodyName          : unicodeFFFE
EncodingName      : Unicode (Big-Endian)
HeaderName        : unicodeFFFE
WebName           : unicodeFFFE
WindowsCodePage   : 1200
IsBrowserDisplay  : False
IsBrowserSave     : False
IsMailNewsDisplay : False
IsMailNewsSave    : False
IsSingleByte      : False
EncoderFallback   : System.Text.EncoderReplacementFallback
DecoderFallback   : System.Text.DecoderReplacementFallback
IsReadOnly        : True
CodePage          : 1201

#>

param(
    ## The path of the file to get the encoding of.
    $Path
)

Set-StrictMode -Version Latest

## The hashtable used to store our mapping of encoding bytes to their
## name. For example, "255-254 = Unicode"
$encodings = @{}

## Find all of the encodings understood by the .NET Framework. For each,
## determine the bytes at the start of the file (the preamble) that the .NET
## Framework uses to identify that encoding.
$encodingMembers = [System.Text.Encoding] |
    Get-Member -Static -MemberType Property

$encodingMembers | Foreach-Object {
    $encodingBytes = [System.Text.Encoding]::($_.Name).GetPreamble() -join '-'
    $encodings[$encodingBytes] = $_.Name
}

## Find out the lengths of all of the preambles.
$encodingLengths = $encodings.Keys | Where-Object { $_ } |
    Foreach-Object { ($_ -split "-").Count }

## Assume the encoding is UTF7 by default
$result = "UTF7"

## Go through each of the possible preamble lengths, read that many
## bytes from the file, and then see if it matches one of the encodings
## we know about.
foreach($encodingLength in $encodingLengths | Sort -Descending)
{
    $bytes = (Get-Content -encoding byte -readcount $encodingLength $path)[0]
    $encoding = $encodings[$bytes -join '-']

    ## If we found an encoding that had the same preamble bytes,
    ## save that output and break.
    if($encoding)
    {
        $result = $encoding
        break
    }
}

## Finally, output the encoding.
[System.Text.Encoding]::$result

For more information about running scripts, see Run Programs, Scripts, and Existing Tools.

Program: View the Hexadecimal Representation of Content

When dealing with binary data, it is often useful to see the value of the actual bytes being used in that binary data. In addition to the value of the data, finding its offset in the file or content is usually important as well.

Example 9-7 enables both scenarios by displaying content in a report that shows all of this information. The leftmost column displays the offset into the content, increasing by 16 bytes at a time. The middle 16 columns display the hexadecimal representation of the byte at that position in the content. The header of each column shows how far into the 16-byte chunk that character is. The far-right column displays the ASCII representation of the characters in that row.

To determine the position of a byte within the input, add the number at the far-left of the row to the number at the top of the column for that character. For example, 0000230 (shown at the far left) + C (shown at the top of the column) = 000023C. Therefore, the byte in this example is at offset 23C in the content.

Example 9-7. Format-Hex.ps1

##############################################################################
##
## Format-Hex
##
## From Windows PowerShell Cookbook (O'Reilly)
## by Lee Holmes (http://www.leeholmes.com/guide)
##
##############################################################################

<#

.SYNOPSIS

Outputs a file or pipelined input as a hexadecimal display. To determine the
offset of a character in the input, add the number at the far-left of the row
with the the number at the top of the column for that character.

.EXAMPLE

"Hello World" | Format-Hex

            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F

00000000   48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00  H.e.l.l.o. .W.o.
00000010   72 00 6C 00 64 00                                r.l.d.

.EXAMPLE

Format-Hex c:	empexample.bmp

#>

[CmdletBinding(DefaultParameterSetName = "ByPath")]
param(
    ## The file to read the content from
    [Parameter(ParameterSetName = "ByPath", Position = 0)]
    [string] $Path,

    ## The input (bytes or strings) to format as hexadecimal
    [Parameter(
        ParameterSetName = "ByInput", Position = 0,
        ValueFromPipeline = $true)]
    [Object] $InputObject
)

begin
{
    Set-StrictMode -Version Latest

    ## Create the array to hold the content. If the user specified the
    ## -Path parameter, read the bytes from the path.
    [byte[]] $inputBytes = $null
    if($Path) { $inputBytes = [IO.File]::ReadAllBytes((Resolve-Path $Path)) }

    ## Store our header, and formatting information
    $counter = 0
    $header = "            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F"
    $nextLine = "{0}   " -f  [Convert]::ToString(
        $counter, 16).ToUpper().PadLeft(8, '0')
    $asciiEnd = ""

    ## Output the header
    "`r`n$header`r`n"
}

process
{
    ## If they specified the -InputObject parameter, retrieve the bytes
    ## from that input
    if(Test-Path variable:InputObject)
    {
        ## If it's an actual byte, add it to the inputBytes array.
        if($InputObject -is [Byte])
        {
            $inputBytes = $InputObject
        }
        else
        {
            ## Otherwise, convert it to a string and extract the bytes
            ## from that.
            $inputString = [string] $InputObject
            $inputBytes = [Text.Encoding]::Unicode.GetBytes($inputString)
        }
    }

    ## Now go through the input bytes
    foreach($byte in $inputBytes)
    {
        ## Display each byte, in 2-digit hexidecimal, and add that to the
        ## left-hand side.
        $nextLine += "{0:X2} " -f $byte

        ## If the character is printable, add its ascii representation to
        ## the righthand side.  Otherwise, add a dot to the righthand side.
        if(($byte -ge 0x20) -and ($byte -le 0xFE))
        {
            $asciiEnd += [char] $byte
        }
        else
        {
            $asciiEnd += "."
        }

        $counter++;

        ## If we've hit the end of a line, combine the right half with the
        ## left half, and start a new line.
        if(($counter % 16) -eq 0)
        {

            "$nextLine $asciiEnd"
            $nextLine = "{0}   " -f [Convert]::ToString(
                $counter, 16).ToUpper().PadLeft(8, '0')
            $asciiEnd = "";
        }
    }
}

end
{
    ## At the end of the file, we might not have had the chance to output
    ## the end of the line yet. Only do this if we didn't exit on the 16-byte
    ## boundary, though.
    if(($counter % 16) -ne 0)
    {
        while(($counter % 16) -ne 0)
        {
            $nextLine += "   "
            $asciiEnd += " "
            $counter++;
        }
        "$nextLine $asciiEnd"
    }

    ""
}

For more information about running scripts, see Run Programs, Scripts, and Existing Tools.

Table of Contents for 9. Simple Files

Create new playlist

Sign In

Sign Up

Chapter 9. Simple Files

Introduction

Get the Content of a File

Problem

Solution

Discussion

Note

Warning

See Also

Search a File for Text or a Pattern

Problem

Solution

Discussion

Note

See Also

Parse and Manage Text-Based Logfiles

Problem

Solution

Discussion

See Also

Parse and Manage Binary Files

Problem

Solution

Discussion

See Also

Create a Temporary File

Problem

Solution

Discussion

Note

See Also

Search and Replace Text in a File

Problem

Solution

Discussion

Note

Work with files encoded in Unicode or another (OEM) code page

Replace text using a pattern instead of plain text

Replace text that spans multiple lines

Replace text in large files

See Also

Program: Get the Encoding of a File

See Also

Program: View the Hexadecimal Representation of Content

See Also

Table of Contents for
9. Simple Files