Messing with the Zips
What’s this? Another post? Yep, I’m on a roll. This particular installment will be relatively quick, as such things go, but I hope it will also be valuable.
Use Case: View text in a Zip file, without extracting the file to disk
First off, if you just want to know how to extract or compress a Zip file using PowerShell, that is super easy. Both Microsoft PowerShell (5.1) and PowerShell (6+) contain native cmdlets for this. Simply run ‘Get-Command -Noun archive’ and you’ll see that there are cmdlets for Compress and Expand. What you will not find however, is a Get, Set, Add, Remove, or other possibly expected verbs for interacting with Zip files. That said, the help for the Compress-Archive cmdlet indicates you can technically also add new files to a Zip by specifying the -Update switch, which also provides a means of updating existing files with newer versions if you use the same reference location. A quick search on the PowerShell Gallery will also reveal a number of other modules out there for interacting with Zip files to varying degrees, though I didn’t see anything that offered reading without extraction to the file system first. Why would you want to do this? That, my friend, is not what this blog is about. I’m sure you have your reasons, just as I did.
In my specific use case, I have a module folder location that already has a bunch of content. Specifically, I have pseudo nested modules, each with their own module manifest file. The expectation is that users of this module might wish to add more of these ‘plugin’ type sub-modules. To keep things orderly, I naturally wanted to build a set of cmdlets that would validate elements of these plugins and properly integrate them into the larger module without a lot of fuss on the part of the user. To that end, I have a cmdlet that deploys a base template, another that packages the template for distribution, and one for deploying a package within the main module, and it is the last of these where my need arose. In order to keep things ‘orderly’, I wanted to make sure it wasn’t already installed, as well as to provide a means of updating to newer versions in the event a given plugin was updated. I definitely did not want to extract the files to a temp directory just to perform these checks however. I also did not want to create additional dependencies on other modules or libraries when this is clearly something that Windows already knows how to do natively, as evidenced by the ability to simply click on a Zip in Explorer and treat it as a folder.
I won’t go into the all the details around why I’ve architected this module the way I have in this article. Suffice to say that there are good reasons, at least in my mind, so for now we’ll just move on.
The Road to Success
The first step is to make the our required base class available as shown below. I found this info by looking at some of the other implementations of Zip solutions in the Gallery, as well as a few articles that covered extracting archives.
Add-Type -Assembly System.IO.Compression.FileSystem
Next we create a binding to our Zip file so that we can get at the contents. This can be accomplished via IO.Compression.ZipFile class, which you can read more about here.
$ZipFile = [IO.Compression.ZipFile]::OpenRead((Get-Item Testing.Zip).FullName).Entries
The above code will open the Zip file for reading and obtain a list of all the files stored within. The content within will be an array of System.IO.Compression.ZipArchiveEntry items, which include helpful things such as the full name (path within the Zip and the filename), any comments, compressed size, the name by itself, and even a LastWriteTime, among other items.
$ZipFile | Get-Member
TypeName: System.IO.Compression.ZipArchiveEntry
Name MemberType Definition
---- ---------- ----------
Delete Method void Delete()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
Open Method System.IO.Stream Open()
ToString Method string ToString()
Archive Property System.IO.Compression.ZipArchive Archive {get;}
Comment Property string Comment {get;set;}
CompressedLength Property long CompressedLength {get;}
Crc32 Property uint Crc32 {get;}
ExternalAttributes Property int ExternalAttributes {get;set;}
FullName Property string FullName {get;}
IsEncrypted Property bool IsEncrypted {get;}
LastWriteTime Property System.DateTimeOffset LastWriteTime {get;set;}
Length Property long Length {get;}
Name Property string Name {get;}
The ‘eagle-eyed’ among you will no doubt have noticed the same thing I did right off the bat, which is that there is a ‘ToString’ method. Curb your excitement however, as all this does is return the value of the FullName property as a string. The other challenge we are facing, is that we are dealing with a bunch of entries, when we actually only want to mess with one. We have a couple of different ways we can address this latter problem. For example, we can pass the array to a Where-Object and look for entries that end in our desired format (PSD1 in my case). For my own purposes, I knew what the filename would be, and what the structure would look like, since it all aligned with my templatized approach. So to figure out a more streamlined approach, let’s take a step back and pass the root object to Get-Member…or you can cheat and go look at the doc link above…either way.
$ZipFile = [IO.Compression.ZipFile]::OpenRead((Get-Item Testing.Zip).FullName)
$ZipFile | Get-Member
TypeName: System.IO.Compression.ZipArchive
Name MemberType Definition
---- ---------- ----------
CreateEntry Method System.IO.Compression.ZipArchiveEntry CreateEntry(string entryName), System.IO.Compression.ZipArchiveEntry CreateEntry(string entryName, System.IO.Compression.CompressionLevel compressionLevel)
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetEntry Method System.IO.Compression.ZipArchiveEntry GetEntry(string entryName)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
Comment Property string Comment {get;set;}
Entries Property System.Collections.ObjectModel.ReadOnlyCollection[System.IO.Compression.ZipArchiveEntry] Entries {get;}
Mode Property System.IO.Compression.ZipArchiveMode Mode {get;}
As you can see above, there is a GetEntry method that accepts a string value. It took a couple of tries, but I figured out that this is the ‘FullName’ value from the ‘Entries’ list, not just the filename. If the filename is in the root of the Zip, just the name is sufficient, but if the file resides in a subdirectory, then both the directory and the name must be supplied.
($ZipFile.Entries).Where({$_.Name -like "*.psd1"})
Archive : System.IO.Compression.ZipArchive
Crc32 : 2950623540
IsEncrypted : False
CompressedLength : 1271
ExternalAttributes : 32
Comment :
FullName : TestPlugin/TestPlugin.psd1
LastWriteTime : 4/18/2023 2:58:32 PM -05:00
Length : 3025
Name : TestPlugin.psd1
$MyEntry = $ZipFile.GetEntry('TestPlugin/TestPlugin.psd1')
$MyEntry | Get-Member
TypeName: System.IO.Compression.ZipArchiveEntry
Name MemberType Definition
---- ---------- ----------
Delete Method void Delete()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
Open Method System.IO.Stream Open()
ToString Method string ToString()
Archive Property System.IO.Compression.ZipArchive Archive {get;}
Comment Property string Comment {get;set;}
CompressedLength Property long CompressedLength {get;}
Crc32 Property uint Crc32 {get;}
ExternalAttributes Property int ExternalAttributes {get;set;}
FullName Property string FullName {get;}
IsEncrypted Property bool IsEncrypted {get;}
LastWriteTime Property System.DateTimeOffset LastWriteTime {get;set;}
Length Property long Length {get;}
Name Property string Name {get;}
Now we have our single entry for which we want to read some data. This file is simple text and, as you can see, there is an Open method available. One thing to take note of here though, is the definition. If you look at it carefully, you’ll note that using this method returns a System.IO.Stream object. When we execute the method, the information is…less than helpful out of the box.
$MyEntry.Open()
BaseStream : System.IO.Compression.SubReadStream
CanRead : True
CanWrite : False
CanSeek : False
Length :
Position :
CanTimeout : False
ReadTimeout :
WriteTimeout :
In order for the OS to be able to render this file as anything useful, we are going to need a StreamReader.
$reader = New-Object System.IO.StreamReader($($MyEntry.Open()))
$reader | Get-Member
TypeName: System.IO.StreamReader
Name MemberType Definition
---- ---------- ----------
Close Method void Close()
DiscardBufferedData Method void DiscardBufferedData()
Dispose Method void Dispose(), void IDisposable.Dispose()
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetLifetimeService Method System.Object GetLifetimeService()
GetType Method type GetType()
InitializeLifetimeService Method System.Object InitializeLifetimeService()
Peek Method int Peek()
Read Method int Read(), int Read(char[] buffer, int index, int count), int Read(System.Span[char] buffer)
ReadAsync Method System.Threading.Tasks.Task[int] ReadAsync(char[] buffer, int index, int count), System.Threading.Tasks.ValueTask[int] ReadAsync(System.Memory[char] buffer, System.Threading.CancellationToken cancellationToken = default)
ReadBlock Method int ReadBlock(char[] buffer, int index, int count), int ReadBlock(System.Span[char] buffer)
ReadBlockAsync Method System.Threading.Tasks.Task[int] ReadBlockAsync(char[] buffer, int index, int count), System.Threading.Tasks.ValueTask[int] ReadBlockAsync(System.Memory[char] buffer, System.Threading.CancellationToken cancellationToken = default)
ReadLine Method string ReadLine()
ReadLineAsync Method System.Threading.Tasks.Task[string] ReadLineAsync(), System.Threading.Tasks.ValueTask[string] ReadLineAsync(System.Threading.CancellationToken cancellationToken)
ReadToEnd Method string ReadToEnd()
ReadToEndAsync Method System.Threading.Tasks.Task[string] ReadToEndAsync(), System.Threading.Tasks.Task[string] ReadToEndAsync(System.Threading.CancellationToken cancellationToken)
ToString Method string ToString()
BaseStream Property System.IO.Stream BaseStream {get;}
CurrentEncoding Property System.Text.Encoding CurrentEncoding {get;}
EndOfStream Property bool EndOfStream {get;}
Our StreamReader object has a particularly helpful method called ‘ReadToEnd’. While researching this, I discovered several other approaches as well of varying complexity. One approach that I came across required analyzing the bytes to determine what kind of encoding the file employed. For my use case, I only needed the typical UTF8, which is default, so I went with the simpler approach. In the interest of completeness however, I’ve shown the additional steps below.
$stream = $MyEntry.Open()
$memStream = [System.IO.MemoryStream]::new()
$stream.CopyTo($memStream)
$memStream | Get-Member
TypeName: System.IO.MemoryStream
Name MemberType Definition
---- ---------- ----------
BeginRead Method System.IAsyncResult BeginRead(byte[] buffer, int offset, int count, System.AsyncCallback callback, System.Object state)
BeginWrite Method System.IAsyncResult BeginWrite(byte[] buffer, int offset, int count, System.AsyncCallback callback, System.Object state)
Close Method void Close()
CopyTo Method void CopyTo(System.IO.Stream destination, int bufferSize), void CopyTo(System.IO.Stream destination)
CopyToAsync Method System.Threading.Tasks.Task CopyToAsync(System.IO.Stream destination, int bufferSize, System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.Task CopyToAsync(System.IO.Stream destination), System.Threading.Ta…
Dispose Method void Dispose(), void IDisposable.Dispose()
DisposeAsync Method System.Threading.Tasks.ValueTask DisposeAsync(), System.Threading.Tasks.ValueTask IAsyncDisposable.DisposeAsync()
EndRead Method int EndRead(System.IAsyncResult asyncResult)
EndWrite Method void EndWrite(System.IAsyncResult asyncResult)
Equals Method bool Equals(System.Object obj)
Flush Method void Flush()
FlushAsync Method System.Threading.Tasks.Task FlushAsync(System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.Task FlushAsync()
GetBuffer Method byte[] GetBuffer()
GetHashCode Method int GetHashCode()
GetLifetimeService Method System.Object GetLifetimeService()
GetType Method type GetType()
InitializeLifetimeService Method System.Object InitializeLifetimeService()
Read Method int Read(byte[] buffer, int offset, int count), int Read(System.Span[byte] buffer)
ReadAsync Method System.Threading.Tasks.Task[int] ReadAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.ValueTask[int] ReadAsync(System.Memory[byte] buffer, System.Threading.Ca…
ReadAtLeast Method int ReadAtLeast(System.Span[byte] buffer, int minimumBytes, bool throwOnEndOfStream = True)
ReadAtLeastAsync Method System.Threading.Tasks.ValueTask[int] ReadAtLeastAsync(System.Memory[byte] buffer, int minimumBytes, bool throwOnEndOfStream = True, System.Threading.CancellationToken cancellationToken = default)
ReadByte Method int ReadByte()
ReadExactly Method void ReadExactly(System.Span[byte] buffer), void ReadExactly(byte[] buffer, int offset, int count)
ReadExactlyAsync Method System.Threading.Tasks.ValueTask ReadExactlyAsync(System.Memory[byte] buffer, System.Threading.CancellationToken cancellationToken = default), System.Threading.Tasks.ValueTask ReadExactlyAsync(byte[] buffer, int offset, int count, …
Seek Method long Seek(long offset, System.IO.SeekOrigin loc)
SetLength Method void SetLength(long value)
ToArray Method byte[] ToArray()
ToString Method string ToString()
TryGetBuffer Method bool TryGetBuffer([ref] System.ArraySegment[byte] buffer)
Write Method void Write(byte[] buffer, int offset, int count), void Write(System.ReadOnlySpan[byte] buffer)
WriteAsync Method System.Threading.Tasks.Task WriteAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.ValueTask WriteAsync(System.ReadOnlyMemory[byte] buffer, System.Threading.Ca…
WriteByte Method void WriteByte(byte value)
WriteTo Method void WriteTo(System.IO.Stream stream)
CanRead Property bool CanRead {get;}
CanSeek Property bool CanSeek {get;}
CanTimeout Property bool CanTimeout {get;}
CanWrite Property bool CanWrite {get;}
Capacity Property int Capacity {get;set;}
Length Property long Length {get;}
Position Property long Position {get;set;}
ReadTimeout Property int ReadTimeout {get;set;}
WriteTimeout Property int WriteTimeout {get;set;}
$stream.Close()
$bin = $memStream.ToArray()
$bin | Get-Member
TypeName: System.Byte
Name MemberType Definition
---- ---------- ----------
CompareTo Method int CompareTo(System.Object value), int CompareTo(byte value), int IComparable.CompareTo(System.Object obj), int IComparable[byte].CompareTo(byte other)
Equals Method bool Equals(System.Object obj), bool Equals(byte obj), bool IEquatable[byte].Equals(byte other)
GetByteCount Method int IBinaryInteger[byte].GetByteCount()
GetHashCode Method int GetHashCode()
GetShortestBitLength Method int IBinaryInteger[byte].GetShortestBitLength()
GetType Method type GetType()
GetTypeCode Method System.TypeCode GetTypeCode(), System.TypeCode IConvertible.GetTypeCode()
ToBoolean Method bool IConvertible.ToBoolean(System.IFormatProvider provider)
ToByte Method byte IConvertible.ToByte(System.IFormatProvider provider)
ToChar Method char IConvertible.ToChar(System.IFormatProvider provider)
ToDateTime Method datetime IConvertible.ToDateTime(System.IFormatProvider provider)
ToDecimal Method decimal IConvertible.ToDecimal(System.IFormatProvider provider)
ToDouble Method double IConvertible.ToDouble(System.IFormatProvider provider)
ToInt16 Method short IConvertible.ToInt16(System.IFormatProvider provider)
ToInt32 Method int IConvertible.ToInt32(System.IFormatProvider provider)
ToInt64 Method long IConvertible.ToInt64(System.IFormatProvider provider)
ToSByte Method sbyte IConvertible.ToSByte(System.IFormatProvider provider)
ToSingle Method float IConvertible.ToSingle(System.IFormatProvider provider)
ToString Method string ToString(), string ToString(string format), string ToString(System.IFormatProvider provider), string ToString(string format, System.IFormatProvider provider), string IConvertible.ToString(System.IFormatProvider provider), string …
ToType Method System.Object IConvertible.ToType(type conversionType, System.IFormatProvider provider)
ToUInt16 Method ushort IConvertible.ToUInt16(System.IFormatProvider provider)
ToUInt32 Method uint IConvertible.ToUInt32(System.IFormatProvider provider)
ToUInt64 Method ulong IConvertible.ToUInt64(System.IFormatProvider provider)
TryFormat Method bool TryFormat(System.Span[char] destination, [ref] int charsWritten, System.ReadOnlySpan[char] format = default, System.IFormatProvider provider = null), bool ISpanFormattable.TryFormat(System.Span[char] destination, [ref] int charsWri…
TryWriteBigEndian Method bool IBinaryInteger[byte].TryWriteBigEndian(System.Span[byte] destination, [ref] int bytesWritten)
TryWriteLittleEndian Method bool IBinaryInteger[byte].TryWriteLittleEndian(System.Span[byte] destination, [ref] int bytesWritten)
WriteBigEndian Method int IBinaryInteger[byte].WriteBigEndian(byte[] destination), int IBinaryInteger[byte].WriteBigEndian(byte[] destination, int startIndex), int IBinaryInteger[byte].WriteBigEndian(System.Span[byte] destination)
WriteLittleEndian Method int IBinaryInteger[byte].WriteLittleEndian(byte[] destination), int IBinaryInteger[byte].WriteLittleEndian(byte[] destination, int startIndex), int IBinaryInteger[byte].WriteLittleEndian(System.Span[byte] destination)
I’ll take a moment to pause and explain before finishing up with the second method. The biggest difference here is that, instead of a System.IO.StreamReader object, we’ve created a System.IO.MemoryStream object. The former class is an actual text reader that is able to consume the streamed data and translate it into an in-memory string. The latter class is still technically a stream, but backed by system memory as opposed to the actual file, which lets us close and dispose of the file while we continue to process the content. With the first approach, simply executing the ‘ReadToEnd’ method will produce a string when pulling a text file…not sure what it does with more complex files. In the latter approach, everything shown above only gives us a byte array, but this allows us to perform some additional analysis to determine the type of encoding. From there, we can re-encode the byte array into the correct encoding format for that file. While I suspect that a file encoded as ASCII or UTF-16 might not come out properly with the first method, the second method should be able to reliably get us to the same end point with the right information. I won’t list out the code for decoding that here, since you can see it in action in the ziphelper module available on the PowerShell Gallery here.
Going back to our first approach, we can now execute the ReadToEnd method of our StreamReader and store the results in a variable like so.
$content = $reader.ReadToEnd()
$content | Get-Member
$content.GetType()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True String System.Object
Theoretically, we’re now done…we can dump $content to the screen and see our content and, as shown above, the content is indeed now a string. Unfortunately there is one last gotcha (because of course there is). The key thing that I myself needed to do was to retrieve the ModuleVersion value, which is precisely what ‘Select-String’ was made for. Unfortunately this did not work. Select-String is expecting an array of strings by default, which is how PowerShell pulls text content in when you use Get-Content, unless you add the Raw switch. When pulling in the content Raw, you get a single string object instead of an array of strings representing each row of the file. Select-String does have a Raw switch of its own, but it had no effect on my streamed string content, so searching for ‘ModuleVersion’ returned the entire content no matter what I did. Fortunately for me, after some digging, the solution was to leverage the Split method as shown below.
$content.Split("`n") | Select-String -Pattern "ModuleVersion"
ModuleVersion = '0.1.0'
For those not ‘in-the-know’, the back-tic (usually right above the Tab key on US keyboards) followed by ‘n’ indicates ‘New Line’. With the encoding intact, even though we didn’t have separate strings for each line, the New Line values were still present within the file. This can be seen by dumping $content to the screen and noting that all the formatting of the file remains intact. By splitting on the ‘New Line’ character, we essentially converted the value from a single string blob into an array of strings such as Get-Content produces. Once I had that, Select-String was able to process the content normally. This is somewhat weird to me, since the help says that the ‘Raw’ switch is supposed to be the closes thing to a Unix grep or Windows findstr commands, which means it should have processed the string just fine. Ah well…another mystery for another day.
Until next time, stay fresh cheese bags!