In order to size any mail archiving solution it is important to understand the amount of archive data currently in use. For many companies this is in the form of Outlook Data Files (PST’s). Unforutnately, the only resource Microsoft provide is a VBScript dating back to 2005 on the technet script center.
I decided to have a go at implementing two methods to locate PST files on the network using Powershell, the two options for locating files I came up with are:
- Enumerate the Outlook settings on client computers to determine PST files loaded on client computers;
- Use WMI to call the search APIs on remote computers to locate the files;
In testing the two options, using the search APIs via WMI located twice as many files as just relying on the information located in the windows registry. Both scripts will also read the first 11 bytes of the PST file to determine the file format, whether it’s an ANSI or Unicode PST file.
Using the registry
The advantage of using the data in the Windows registry is that it’s quick. We can quickly find enumerate user profiles and identify PST files loaded in Outlook. Once we have that information, the file infromation can be checked using SMB. This does however does require the Remote Registry service to be enabled and TCP port 139 to be open on client computers.
This script uses PowerShell background jobs, checking 5 computers at a time. To check more computers at once, increase the $MaxThreads variable.
cls
#
# PST Scanning Utility (Registry)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Retrieves a list of computers (recursively) from an OU in Active
# Directory then uses Remote Registry calls to locate PST files for each
# user profile.
#
# The advantage to querying the registry is speed, however accurancy is
# potentially reduced since PST files may not be added to Outlook.
#
# Changelog
# ~~~~~~~~~
# 2012.03.28 Dave Hope Initial version.
# 2012.03.30 Dave Hope Added try/catch for Get-Item $tmpPath to
# handle missing files.
#
# ======================================================================
# SETTINGS
# ======================================================================
$cfgOU = "LDAP://DC=nwtraders,DC=msft"
$cfgInterval = -30 # Difference to lastLogonTimestamp (days)
$cfgOutpath = "H:\Registry.CSV"
$MaxThreads = 5 # Maximum number of checks to run at once.
$SleepTimer = 500 # Wait between checks.
# ======================================================================
# STOP CHANGING HERE.
# ======================================================================
#
# Uses CIFS/SMB and the remote registry APIs to determine remote PST
# files and their sizes for a given computername.
# ======================================================================
$GetPSTInfo = {
Param(
[string]$ComputerName = $(throw "ComputerName required.")
)
$ReturnArray = @()
# Test connection (ICMP) first rather than relying on the slow
# Get-WMiObject call to fail.
if( (Test-Connection -ComputerName $ComputerName -Count 1 -Quiet) -eq $false )
{
Write-Host "Failed communicating with $ComputerName - ICMP Unreachable"
return;
}
# Connect to remote system.
try
{
$RegHive = [Microsoft.Win32.RegistryKey]::OpenRemoteBaseKey( "Users", $ComputerName )
}
catch {
Write-Host "Failed communicating with $ComputerName - OpenRemoteBaseKey failed"
return;
}
# Get the list of user profiles.
$RegUsers = $RegHive.getSubKeyNames()
# Iterate over user profiles on the computer
foreach( $RegUser in $RegUsers )
{
# Get list of Outlook datafiles in use for this profile.
$RegPath = "$RegUser\Software\Microsoft\Office\12.0\Outlook\Catalog"
$Catalogs = $RegHive.OpenSubKey( $RegPath );
if( $Catalogs -eq $null )
{
continue;
}
# Iterate over the data files, if the file name ends with
# something other than .pst, or resides somewhere other than
# C: skip it.
$Archives = $Catalogs.GetValueNames()
foreach( $Archive in $Archives )
{
if(
$Archive.ToLower().EndsWith(".pst") -And $Archive.ToLower().StartsWith("c:") )
{
# Replace the local path with that of a remote path.
$tmpPath = $Archive -Replace "C:\\", "\\$ComputerName\c$\"
try
{
$FileInfo = Get-Item $tmpPath -ErrorAction Stop
}
Catch
{
# PST file doesn't exist, or we can't reach it,
# continue on with the next one.
continue;
}
# Add file info to array.
$FileReturn = "" | Select Computer, Owner, Path, Size, Modified, Version
$FileReturn.Computer = $ComputerName
$FileReturn.Owner = (Get-Acl $tmpPath | select Owner).Owner
$FileReturn.Path = $Archive
$FileReturn.Size = $FileInfo.Length
$FileReturn.Modified = $FileInfo.LastWriteTime
#
# PST Version.
[system.io.stream]$fileStream = [system.io.File]::Open( (Get-Item $tmpPath) , 'Open' , 'Read' , 'ReadWrite' )
try
{
[byte[]]$fileBytes = New-Object byte[] 11 # Length we need.
[void]$fileStream.Read( $fileBytes, 0, 11);
if ($fileBytes[10] -eq 23 )
{
$FileReturn.Version = "2003";
}
elseif ( ($fileBytes[10] -eq 14) -or ($fileBytes[10] -eq 15) )
{
$FileReturn.Version = "1997";
}
else
{
$FileReturn.Version = "Unknown";
}
}
catch
{
$FileReturn.Version = "Error";
}
$fileStream.Close();
$ReturnArray += $FileReturn
}
}
}
return $ReturnArray;
}
#
# Gets a list of object names from AD recursively
# ======================================================================
Function GetAdObjects
{
Param(
[string]$Path = $(throw "Path required."),
[string]$desiredObjectClass = $(throw "DesiredObjectClass required.")
)
$ReturnArray = $null
# Bind to AD using the provided path.
$objADSI = [ADSI]$Path
# Iterate over each object and add its name to the array.
foreach( $obj in $objADSI.Children )
{
$thisItem = $obj | select objectClass,distinguishedName,name
if (
$thisItem.objectClass.Count -gt 0 -And
$thisItem.objectClass.Contains( $desiredObjectClass)
)
{
$ReturnArray += $thisItem.distinguishedName
}
elseif(
$thisItem.objectClass.Count -gt 0 -And
$thisItem.objectClass.Contains("organizationalUnit")
)
{
# Init to null rather than @() so we dont add empty
# values.
$RecurseItems = $null
$RecurseItems += GetAdObjects "LDAP://$($thisItem.distinguishedName.ToString())" $desiredObjectClass
if( $RecurseItems.Count -gt 0 )
{
$ReturnArray += $RecurseItems
}
}
}
# Make sure we have items to return, otherwise we'll push
# empty items to the array.
if( $ReturnArray.Count -gt 0)
{
return $ReturnArray;
}
}
#
# Converts a COMObect to a LargeInteger
# ======================================================================
function Convert-IADSLargeInteger([object]$LargeInteger)
{
$type = $LargeInteger.GetType()
$highPart = $type.InvokeMember("HighPart","GetProperty",$null,$LargeInteger,$null)
$lowPart = $type.InvokeMember("LowPart","GetProperty",$null,$LargeInteger,$null)
$bytes = [System.BitConverter]::GetBytes($highPart)
$tmp = New-Object System.Byte[] 8
[Array]::Copy($bytes,0,$tmp,4,4)
$highPart = [System.BitConverter]::ToInt64($tmp,0)
$bytes = [System.BitConverter]::GetBytes($lowPart)
$lowPart = [System.BitConverter]::ToUInt32($bytes,0)
$lowPart + $highPart
}
#
# Evaluate the lastLogonTimestamp attribute for accounts and pull ones
# from the last 30 days only.
# ======================================================================
Function GetObjectsLoggedIntoSince
{
Param(
[Array] $Computers = $(throw "Computers required"),
[int] $LoginDays = $(throw "LoginDays required")
)
$earliestAllowedLogon = [DateTime]::Today.AddDays($LoginDays)
foreach( $Computer in $Computers )
{
$objADSI = [ADSI]"LDAP://$Computer"
if( $objADSI.Properties.Contains("lastLogonTimeStamp") -eq $false )
{
continue;
}
$lastLogon = [DateTime]::FromFileTime(
[Int64]::Parse(
$(Convert-IADSLargeInteger $objADSI.lastlogontimestamp.value)
)
)
if( [DateTime]::Compare( $earliestAllowedLogon , $lastLogon) -eq -1 )
{
$objADSI.name
}
continue;
}
}
#
# Get computer accounts from Active Directory.
$OutArray = @()
$Computers = GetAdObjects "$cfgOU" "computer"
$Computers = GetObjectsLoggedIntoSince $Computers $cfgInterval
#
# Remove any previous jobs.
$jobsTotal = $(Get-Job).Count
$i = 0
if( $jobsTotal -gt 0)
{
foreach( $job in Get-Job)
{
Write-Progress -Activity "Locating PST files" -Status "Removing existing jobs" -CurrentOperation "$i of $jobsTotal" -PercentComplete ($i / $jobsTotal * 100)
$job | Remove-Job -Force
$i++
}
}
#
# If we have no computers to check, just exit.
if( $Computers.Count -le 0 )
{
return;
}
#
# Create all the jobs.
$i = 0
ForEach ($Computer in $Computers)
{
#
# We're currently running at $MaxThreads, wait for one to close.
While ((Get-Job -state running).count -ge $MaxThreads)
{
$statTotal = $computers.count
$statComplete = $((Get-Job -state completed).count)
$statInProgress = $((Get-Job -state running).count)
Write-Progress -Activity "Locating PST files" -Status "Waiting for a scan to finish before starting another" -CurrentOperation "Total: $statTotal , Complete: $statComplete , In Progress: $statInProgress" -PercentComplete ($i / $Computers.count * 100)
Start-Sleep -Milliseconds $SleepTimer
$JobsRunning = (Get-Job -state running).count
}
#
# Start job.
$i++
Start-Job -ScriptBlock $GetPSTInfo -ArgumentList $Computer -Name $Computer | out-null
$statTotal = $computers.count
$statComplete = $((Get-Job -state completed).count)
$statInProgress = $((Get-Job -state running).count)
Write-Progress -Activity "Locating PST files" -Status "Starting a scan" -CurrentOperation "Total: $statTotal , Complete: $statComplete , In Progress: $statInProgress" -PercentComplete ($i / $Computers.count * 100)
}
#
# Finishhed creating all jobs, waiting for remaining running jobs to
# complete.
While (@(Get-Job -State Running).count -gt 0)
{
$statTotal = @(Get-Job).count
$statComplete = $((Get-Job -state completed).count)
$statInProgress = $((Get-Job -state running).count)
Write-Progress -Activity "Locating PST files" -Status "Waiting on final scans to complete" -CurrentOperation "Total: $statTotal , Complete: $statComplete , In Progress: $statInProgress" -PercentComplete ($statComplete / $statTotal * 100)
Start-Sleep -Milliseconds $SleepTimer
}
#
# Handle completed jobs
ForEach($Job in Get-Job)
{
$retVal = (Receive-Job $Job)
if( $retVal -ne $null)
{
$OutArray += $retVal
}
}
$OutArray | Export-Csv "$cfgOutpath" -NoClobber -NoTypeInformation
Using WMI
Using WMI is slow compared to relying on the registry, but will locate files that are not open in Outlook. The Windows Firewall “Windows Firewall: Allow remote administration exception” should be enabled so that WMI can be accessed remotely.
Unfortunately I couldn’t get the PowerShell job functionality to work well with Get-WMiObject, so systems are checked one by one which also slows things down.
#
# PST Scanning Utility (WMI)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
# Retrieves a list of computers (recursively) from an OU in Active
# Directory then uses WMI to search for PST files on the remote 'C:'
# drive, saving the results to CSV format with location, size and owner
# details.
#
# The advantage to performing a search is that PST files in non-default
# locations will be found, enumerating the registry only shows files in
# use by Outlook.
#
# This script doesn't make use of threading (Jobs) due to hangs/locks
# experienced when they were implemented.
#
# Changelog
# ~~~~~~~~~
# 2012.03.28 Dave Hope Initial version.
# 2012.04.24 Dave Hope Added PST file version information.
# 2012.04.26 Dave Hope Added try/catch around file owner check.
#
# ======================================================================
# SETTINGS
# ======================================================================
$cfgOU = "LDAP://DC=nwtraders,DC=msft"
$cfgInterval = -30
$cfgOutpath = "H:\WMI.CSV"
# ======================================================================
# STOP CHANGING HERE.
# ======================================================================
#
# Scans the specified hostname for PST files, returning an array of data
# must of this is inline due to the nature of job functionality in PS.
# ======================================================================
Function GetPSTInfo
{
Param( [string]$ComputerName = $(throw "ComputerName required.") )
$ReturnArray = @()
# Test connection first rather than relying on the slow
# Get-WMiObject call to fail.
if( (Test-Connection -ComputerName $ComputerName -Count 1 -Quiet) -eq $false )
{
# Write-Host "Failed communicating with $ComputerName - ICMP Unreachable"
return;
}
# Connect and execute query.
try
{
#Path,FileSize,LastModified,LastAccessed,Extension,Drive
$PstFiles = Get-Wmiobject -namespace "root\CIMV2" -computername $computerName -ErrorAction Stop -Query "SELECT * FROM CIM_DataFile WHERE Extension = 'pst' AND Drive = 'c:'"
}
Catch
{
# Write-Host "Failed communicating with $ComputerName - Get-WMIObject failed"
return;
}
# Iterate over the found PST files.
foreach ($file in $PstFiles)
{
if($File.FileName)
{
$FileReturn = "" | select Computer,Owner,Path,FileSize,LastModified,LastAccessed,Version
$filepath = $file.description
#
# Try and find the owner of the file.
$Owner = "Unknown";
try
{
$query = "ASSOCIATORS OF {Win32_LogicalFileSecuritySetting=`'$filepath`'} WHERE AssocClass=Win32_LogicalFileOwner ResultRole=Owner"
$Owner = @(Get-Wmiobject -namespace "root\CIMV2" -computername $computerName -Query $query)
$Owner = "$($Owner[0].ReferencedDomainName)\$($Owner[0].AccountName)"
}
catch
{
# Write-Host "Unable to determine the owner of a PST File on $ComputerName"
}
$FileReturn.Computer = $computerName
$FileReturn.Path = $filepath
$FileReturn.FileSize = $file.FileSize/1KB
$FileReturn.Owner = $Owner
$FileReturn.LastModified = [System.Management.ManagementDateTimeConverter]::ToDateTime($($file.LastModified))
$FileReturn.LastAccessed = [System.Management.ManagementDateTimeConverter]::ToDateTime($($file.LastAccessed))
#
# Here, we're examining part of the PST file header.
# We only need wVer (2bytes), so we seek to that position in
# the file.
$tmpPath = $filepath -Replace "C:\\", "\\$ComputerName\c$\"
[system.io.stream]$fileStream = [system.io.File]::Open( (Get-Item $tmpPath) , 'Open' , 'Read' , 'ReadWrite' )
try
{
[byte[]]$fileBytes = New-Object byte[] 11 # Length we need.
[void]$fileStream.Read( $fileBytes, 0, 11);
if ($fileBytes[10] -eq 23 )
{
$FileReturn.Version = "2003";
}
elseif ( ($fileBytes[10] -eq 14) -or ($fileBytes[10] -eq 15) )
{
$FileReturn.Version = "1997";
}
else
{
$FileReturn.Version = "Unknown";
}
}
catch
{
$FileReturn.Version = "Error";
}
$fileStream.Close();
$ReturnArray += $FileReturn
}
}
return $ReturnArray;
}
#
# Gets a list of object names from AD recursively
# ======================================================================
Function GetAdObjects
{
Param(
[string]$Path = $(throw "Path required."),
[string]$desiredObjectClass = $(throw "DesiredObjectClass required.")
)
$ReturnArray = $null
# Bind to AD using the provided path.
$objADSI = [ADSI]$Path
# Iterate over each object and add its name to the array.
foreach( $obj in $objADSI.Children )
{
$thisItem = $obj | select objectClass,distinguishedName,name
if (
$thisItem.objectClass.Count -gt 0 -And
$thisItem.objectClass.Contains( $desiredObjectClass)
)
{
$ReturnArray += $thisItem.distinguishedName
}
elseif(
$thisItem.objectClass.Count -gt 0 -And
$thisItem.objectClass.Contains("organizationalUnit")
)
{
# Init to null rather than @() so we dont add empty
# values.
$RecurseItems = $null
$RecurseItems += GetAdObjects "LDAP://$($thisItem.distinguishedName.ToString())" $desiredObjectClass
if( $RecurseItems.Count -gt 0 )
{
$ReturnArray += $RecurseItems
}
}
}
# Make sure we have items to return, otherwise we'll push
# empty items to the array.
if( $ReturnArray.Count -gt 0)
{
return $ReturnArray;
}
}
#
# Converts a COMObect to a LargeInteger
# ======================================================================
function Convert-IADSLargeInteger([object]$LargeInteger)
{
$type = $LargeInteger.GetType()
$highPart = $type.InvokeMember("HighPart","GetProperty",$null,$LargeInteger,$null)
$lowPart = $type.InvokeMember("LowPart","GetProperty",$null,$LargeInteger,$null)
$bytes = [System.BitConverter]::GetBytes($highPart)
$tmp = New-Object System.Byte[] 8
[Array]::Copy($bytes,0,$tmp,4,4)
$highPart = [System.BitConverter]::ToInt64($tmp,0)
$bytes = [System.BitConverter]::GetBytes($lowPart)
$lowPart = [System.BitConverter]::ToUInt32($bytes,0)
$lowPart + $highPart
}
#
# Evaluate the lastLogonTimestamp attribute for accounts and pull ones
# from the last 30 days only.
# ======================================================================
Function GetObjectsLoggedIntoSince
{
Param(
[Array] $Computers = $(throw "Computers required"),
[int] $LoginDays = $(throw "LoginDays required")
)
$earliestAllowedLogon = [DateTime]::Today.AddDays($LoginDays)
foreach( $Computer in $Computers )
{
$objADSI = [ADSI]"LDAP://$Computer"
if( $objADSI.Properties.Contains("lastLogonTimeStamp") -eq $false )
{
continue;
}
$lastLogon = [DateTime]::FromFileTime(
[Int64]::Parse(
$(Convert-IADSLargeInteger $objADSI.lastlogontimestamp.value)
)
)
if( [DateTime]::Compare( $earliestAllowedLogon , $lastLogon) -eq -1 )
{
$objADSI.name
}
continue;
}
}
#
# Get computer accounts from Active Directory.
$OutArray = @()
$Computers = GetAdObjects "$cfgOU" "computer"
$Computers = GetObjectsLoggedIntoSince $Computers $cfgInterval
#
# If we have no computers to check, just exit.
if( $Computers.Count -le 0 )
{
return;
}
#
# Create all the jobs.
$statTotal = $computers.count
$statComplete = 0
ForEach ($Computer in $Computers)
{
Write-Progress -Activity "Locating PST files" -Status "Waiting for a scan to finish before starting another" -CurrentOperation "Total: $statTotal , Complete: $statComplete" -PercentComplete ($statComplete/$statTotal * 100)
$RetVal = GetPSTInfo $Computer
if( $RetVal -ne $null)
{
$OutArray += $retVal
}
$statComplete++
}
$OutArray | Export-Csv "$cfgOutpath" -NoClobber -NoTypeInformation
Which method is best?
In the environments I’ve tested these scripts in the WMI method returned significantly more PST files due to the simple fact that it runs a search on each client computer. If anyone has feedback on running these in their environments I’d love to hear it.
The post Locating PST files on a network appeared first on Blog of Dave Hope.