I need to do some data extraction from several hundreds of Access databases extracted from zip files. The extracted data will allow us to do some statistics on our customer’s behavior.
In order to do that, I decided to use Powershell, because it has all the features I need, bundled in one neat language.
The scaffolding for the scripts uses PSake for the task launcher, Pester for Powershell unit tests, and a few Nuget packages (including PSake).
In a vendor folder, add the Nuget exe, as well as a packages.config file listing the necessary Nuget packages. I have added PSake, and NUnitOrange to transform the Pester results to a nice HTML report.
I’m not using the Pester Nuget package because it includes the Pester unit tests, and some of them fail, which breaks my build (in addition to adding a thousand tests I don’t care about). Instead, I directly include the Pester scripts, and I cleaned up the samples and tests.
Since the script needs to read Access databases using some sort of ADODB or OLEDB provider, you will have to install either the Access 2007 or Access 2010 provider. If you’re like me (with Office x86 already installed), you won’t be able to install the Access 2010 x64 provider, which will trigger “The Jet/ACE OLEDB provider is not registered on the local machine” errors. So you will have to run the x86 version of Powershell. Which means you will have to use the x86 version of the SQLPS module (make sure you download the x86 version). Don’t worry, the SQL 2012 version of SQLPS is compatible with older SQL Servers (at least 2008 R2).
The batch bootstrapper is inspired by the bootstrapper from Pester:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
@echo off rem Install nuget packages %~dp0vendor\nuget.exe Install %~dp0vendor\packages.config -o %~dp0packages SET INPUT=%~dp0input SET OUTPUT=%~dp0output IF NOT '%1'=='' SET INPUT=%1 IF NOT '%2'=='' SET OUTPUT=%2 rem Detect 32 or 64 bits Windows: http://stackoverflow.com/a/24590583/6776 SET PSPATH=powershell reg Query "HKLM\Hardware\Description\System\CentralProcessor\0" | find /i "x86" > NUL && set OS=32BIT || set OS=64BIT if %OS%==64BIT SET PSPATH=%SystemRoot%\syswow64\WindowsPowerShell\v1.0\powershell.exe rem See: https://github.com/pester/Pester/blob/master/Build.bat %PSPATH% -NoProfile -ExecutionPolicy Bypass -Command ^ "$psakeDir = ([array](dir %~dp0packages\psake.*))[-1] ; Import-Module $psakeDir\tools\psake.psm1 ; Invoke-psake %~dp0tasks.ps1 Mining -properties @{'in'='%INPUT%';'out'='%OUTPUT%'}" |
The tasks in tasks.ps1 are pretty standard and minimalist, so that the maximum is tested in modules via Pester.
Reading Access databases is as simple as using OleDb, old-school style:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
function Read-AccessDataSet ( [string] $file, [string] $query ) { $conn = New-Object System.Data.OleDb.OleDbConnection "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=$file" $conn.Open() $cmd = New-Object System.Data.OleDb.OleDbCommand($query, $conn) $adapter = New-Object System.Data.OleDb.OleDbDataAdapter $cmd $dataset = New-Object System.Data.DataSet $adapter.Fill($dataset) | Out-Null # returns the number of rows $conn.Close() return $dataset } |
And the corresponding Pester tests, using actual Access test databases:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$here = Split-Path -Parent $MyInvocation.MyCommand.Path Import-Module "$here\db.psm1" -Force Describe "database module" { $in = Resolve-Path "$here\..\tests\input" Context "reading Access DB" { It "should read multiple Access rows" { (Read-AccessDataSet -file "$in\input.dat" -query "SELECT * FROM Users").Tables[0].Rows.Count | Should Be 10 } } } |
Executing SQL scripts uses the much more powerful Invoke-SqlCmd cmdlet from the SQLPS module. The most is that it returns a Powershell object, so I can do something like this:
1 2 3 4 5 |
$users = Invoke-SqlCommand -query "SELECT * FROM users" foreach ($user in $users) { write "id: $($user.id) ; name: $($user.name)" } |