Read Text Content (w/o Unzipping)

Messing with the Zips

What’s this? Another post? Yep, I’m on a roll. This particular installment will be relatively quick, as such things go, but I hope it will also be valuable.

Use Case: View text in a Zip file, without extracting the file to disk

First off, if you just want to know how to extract or compress a Zip file using PowerShell, that is super easy. Both Microsoft PowerShell (5.1) and PowerShell (6+) contain native cmdlets for this. Simply run ‘Get-Command -Noun archive’ and you’ll see that there are cmdlets for Compress and Expand. What you will not find however, is a Get, Set, Add, Remove, or other possibly expected verbs for interacting with Zip files. That said, the help for the Compress-Archive cmdlet indicates you can technically also add new files to a Zip by specifying the -Update switch, which also provides a means of updating existing files with newer versions if you use the same reference location. A quick search on the PowerShell Gallery will also reveal a number of other modules out there for interacting with Zip files to varying degrees, though I didn’t see anything that offered reading without extraction to the file system first. Why would you want to do this? That, my friend, is not what this blog is about. I’m sure you have your reasons, just as I did.

In my specific use case, I have a module folder location that already has a bunch of content. Specifically, I have pseudo nested modules, each with their own module manifest file. The expectation is that users of this module might wish to add more of these ‘plugin’ type sub-modules. To keep things orderly, I naturally wanted to build a set of cmdlets that would validate elements of these plugins and properly integrate them into the larger module without a lot of fuss on the part of the user. To that end, I have a cmdlet that deploys a base template, another that packages the template for distribution, and one for deploying a package within the main module, and it is the last of these where my need arose. In order to keep things ‘orderly’, I wanted to make sure it wasn’t already installed, as well as to provide a means of updating to newer versions in the event a given plugin was updated. I definitely did not want to extract the files to a temp directory just to perform these checks however. I also did not want to create additional dependencies on other modules or libraries when this is clearly something that Windows already knows how to do natively, as evidenced by the ability to simply click on a Zip in Explorer and treat it as a folder.

I won’t go into the all the details around why I’ve architected this module the way I have in this article. Suffice to say that there are good reasons, at least in my mind, so for now we’ll just move on.

The Road to Success

The first step is to make the our required base class available as shown below. I found this info by looking at some of the other implementations of Zip solutions in the Gallery, as well as a few articles that covered extracting archives.

Add-Type -Assembly System.IO.Compression.FileSystem

Next we create a binding to our Zip file so that we can get at the contents. This can be accomplished via IO.Compression.ZipFile class, which you can read more about here.

$ZipFile = [IO.Compression.ZipFile]::OpenRead((Get-Item Testing.Zip).FullName).Entries

The above code will open the Zip file for reading and obtain a list of all the files stored within. The content within will be an array of System.IO.Compression.ZipArchiveEntry items, which include helpful things such as the full name (path within the Zip and the filename), any comments, compressed size, the name by itself, and even a LastWriteTime, among other items.

$ZipFile | Get-Member

   TypeName: System.IO.Compression.ZipArchiveEntry

Name               MemberType Definition
----               ---------- ----------
Delete             Method     void Delete()
Equals             Method     bool Equals(System.Object obj)
GetHashCode        Method     int GetHashCode()
GetType            Method     type GetType()
Open               Method     System.IO.Stream Open()
ToString           Method     string ToString()
Archive            Property   System.IO.Compression.ZipArchive Archive {get;}
Comment            Property   string Comment {get;set;}
CompressedLength   Property   long CompressedLength {get;}
Crc32              Property   uint Crc32 {get;}
ExternalAttributes Property   int ExternalAttributes {get;set;}
FullName           Property   string FullName {get;}
IsEncrypted        Property   bool IsEncrypted {get;}
LastWriteTime      Property   System.DateTimeOffset LastWriteTime {get;set;}
Length             Property   long Length {get;}
Name               Property   string Name {get;}

The ‘eagle-eyed’ among you will no doubt have noticed the same thing I did right off the bat, which is that there is a ‘ToString’ method. Curb your excitement however, as all this does is return the value of the FullName property as a string. The other challenge we are facing, is that we are dealing with a bunch of entries, when we actually only want to mess with one. We have a couple of different ways we can address this latter problem. For example, we can pass the array to a Where-Object and look for entries that end in our desired format (PSD1 in my case). For my own purposes, I knew what the filename would be, and what the structure would look like, since it all aligned with my templatized approach. So to figure out a more streamlined approach, let’s take a step back and pass the root object to Get-Member…or you can cheat and go look at the doc link above…either way.

$ZipFile = [IO.Compression.ZipFile]::OpenRead((Get-Item Testing.Zip).FullName)
$ZipFile | Get-Member

   TypeName: System.IO.Compression.ZipArchive

Name        MemberType Definition
----        ---------- ----------
CreateEntry Method     System.IO.Compression.ZipArchiveEntry CreateEntry(string entryName), System.IO.Compression.ZipArchiveEntry CreateEntry(string entryName, System.IO.Compression.CompressionLevel compressionLevel)
Dispose     Method     void Dispose(), void IDisposable.Dispose()
Equals      Method     bool Equals(System.Object obj)
GetEntry    Method     System.IO.Compression.ZipArchiveEntry GetEntry(string entryName)
GetHashCode Method     int GetHashCode()
GetType     Method     type GetType()
ToString    Method     string ToString()
Comment     Property   string Comment {get;set;}
Entries     Property   System.Collections.ObjectModel.ReadOnlyCollection[System.IO.Compression.ZipArchiveEntry] Entries {get;}
Mode        Property   System.IO.Compression.ZipArchiveMode Mode {get;}

As you can see above, there is a GetEntry method that accepts a string value. It took a couple of tries, but I figured out that this is the ‘FullName’ value from the ‘Entries’ list, not just the filename. If the filename is in the root of the Zip, just the name is sufficient, but if the file resides in a subdirectory, then both the directory and the name must be supplied.

($ZipFile.Entries).Where({$_.Name -like "*.psd1"})

Archive            : System.IO.Compression.ZipArchive
Crc32              : 2950623540
IsEncrypted        : False
CompressedLength   : 1271
ExternalAttributes : 32
Comment            :
FullName           : TestPlugin/TestPlugin.psd1
LastWriteTime      : 4/18/2023 2:58:32 PM -05:00
Length             : 3025
Name               : TestPlugin.psd1

$MyEntry = $ZipFile.GetEntry('TestPlugin/TestPlugin.psd1')
$MyEntry | Get-Member

   TypeName: System.IO.Compression.ZipArchiveEntry

Name               MemberType Definition
----               ---------- ----------
Delete             Method     void Delete()
Equals             Method     bool Equals(System.Object obj)
GetHashCode        Method     int GetHashCode()
GetType            Method     type GetType()
Open               Method     System.IO.Stream Open()
ToString           Method     string ToString()
Archive            Property   System.IO.Compression.ZipArchive Archive {get;}
Comment            Property   string Comment {get;set;}
CompressedLength   Property   long CompressedLength {get;}
Crc32              Property   uint Crc32 {get;}
ExternalAttributes Property   int ExternalAttributes {get;set;}
FullName           Property   string FullName {get;}
IsEncrypted        Property   bool IsEncrypted {get;}
LastWriteTime      Property   System.DateTimeOffset LastWriteTime {get;set;}
Length             Property   long Length {get;}
Name               Property   string Name {get;}

Now we have our single entry for which we want to read some data. This file is simple text and, as you can see, there is an Open method available. One thing to take note of here though, is the definition. If you look at it carefully, you’ll note that using this method returns a System.IO.Stream object. When we execute the method, the information is…less than helpful out of the box.

$MyEntry.Open()

BaseStream   : System.IO.Compression.SubReadStream
CanRead      : True
CanWrite     : False
CanSeek      : False
Length       :
Position     :
CanTimeout   : False
ReadTimeout  :
WriteTimeout :

In order for the OS to be able to render this file as anything useful, we are going to need a StreamReader.

$reader = New-Object System.IO.StreamReader($($MyEntry.Open()))
$reader | Get-Member

   TypeName: System.IO.StreamReader

Name                      MemberType Definition
----                      ---------- ----------
Close                     Method     void Close()
DiscardBufferedData       Method     void DiscardBufferedData()
Dispose                   Method     void Dispose(), void IDisposable.Dispose()
Equals                    Method     bool Equals(System.Object obj)
GetHashCode               Method     int GetHashCode()
GetLifetimeService        Method     System.Object GetLifetimeService()
GetType                   Method     type GetType()
InitializeLifetimeService Method     System.Object InitializeLifetimeService()
Peek                      Method     int Peek()
Read                      Method     int Read(), int Read(char[] buffer, int index, int count), int Read(System.Span[char] buffer)
ReadAsync                 Method     System.Threading.Tasks.Task[int] ReadAsync(char[] buffer, int index, int count), System.Threading.Tasks.ValueTask[int] ReadAsync(System.Memory[char] buffer, System.Threading.CancellationToken cancellationToken = default)
ReadBlock                 Method     int ReadBlock(char[] buffer, int index, int count), int ReadBlock(System.Span[char] buffer)
ReadBlockAsync            Method     System.Threading.Tasks.Task[int] ReadBlockAsync(char[] buffer, int index, int count), System.Threading.Tasks.ValueTask[int] ReadBlockAsync(System.Memory[char] buffer, System.Threading.CancellationToken cancellationToken = default)
ReadLine                  Method     string ReadLine()
ReadLineAsync             Method     System.Threading.Tasks.Task[string] ReadLineAsync(), System.Threading.Tasks.ValueTask[string] ReadLineAsync(System.Threading.CancellationToken cancellationToken)
ReadToEnd                 Method     string ReadToEnd()
ReadToEndAsync            Method     System.Threading.Tasks.Task[string] ReadToEndAsync(), System.Threading.Tasks.Task[string] ReadToEndAsync(System.Threading.CancellationToken cancellationToken)
ToString                  Method     string ToString()
BaseStream                Property   System.IO.Stream BaseStream {get;}
CurrentEncoding           Property   System.Text.Encoding CurrentEncoding {get;}
EndOfStream               Property   bool EndOfStream {get;}

Our StreamReader object has a particularly helpful method called ‘ReadToEnd’. While researching this, I discovered several other approaches as well of varying complexity. One approach that I came across required analyzing the bytes to determine what kind of encoding the file employed. For my use case, I only needed the typical UTF8, which is default, so I went with the simpler approach. In the interest of completeness however, I’ve shown the additional steps below.

$stream = $MyEntry.Open()
$memStream = [System.IO.MemoryStream]::new()
$stream.CopyTo($memStream)
$memStream | Get-Member

   TypeName: System.IO.MemoryStream

Name                      MemberType Definition
----                      ---------- ----------
BeginRead                 Method     System.IAsyncResult BeginRead(byte[] buffer, int offset, int count, System.AsyncCallback callback, System.Object state)
BeginWrite                Method     System.IAsyncResult BeginWrite(byte[] buffer, int offset, int count, System.AsyncCallback callback, System.Object state)
Close                     Method     void Close()
CopyTo                    Method     void CopyTo(System.IO.Stream destination, int bufferSize), void CopyTo(System.IO.Stream destination)
CopyToAsync               Method     System.Threading.Tasks.Task CopyToAsync(System.IO.Stream destination, int bufferSize, System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.Task CopyToAsync(System.IO.Stream destination), System.Threading.Ta…
Dispose                   Method     void Dispose(), void IDisposable.Dispose()
DisposeAsync              Method     System.Threading.Tasks.ValueTask DisposeAsync(), System.Threading.Tasks.ValueTask IAsyncDisposable.DisposeAsync()
EndRead                   Method     int EndRead(System.IAsyncResult asyncResult)
EndWrite                  Method     void EndWrite(System.IAsyncResult asyncResult)
Equals                    Method     bool Equals(System.Object obj)
Flush                     Method     void Flush()
FlushAsync                Method     System.Threading.Tasks.Task FlushAsync(System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.Task FlushAsync()
GetBuffer                 Method     byte[] GetBuffer()
GetHashCode               Method     int GetHashCode()
GetLifetimeService        Method     System.Object GetLifetimeService()
GetType                   Method     type GetType()
InitializeLifetimeService Method     System.Object InitializeLifetimeService()
Read                      Method     int Read(byte[] buffer, int offset, int count), int Read(System.Span[byte] buffer)
ReadAsync                 Method     System.Threading.Tasks.Task[int] ReadAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.ValueTask[int] ReadAsync(System.Memory[byte] buffer, System.Threading.Ca…
ReadAtLeast               Method     int ReadAtLeast(System.Span[byte] buffer, int minimumBytes, bool throwOnEndOfStream = True)
ReadAtLeastAsync          Method     System.Threading.Tasks.ValueTask[int] ReadAtLeastAsync(System.Memory[byte] buffer, int minimumBytes, bool throwOnEndOfStream = True, System.Threading.CancellationToken cancellationToken = default)
ReadByte                  Method     int ReadByte()
ReadExactly               Method     void ReadExactly(System.Span[byte] buffer), void ReadExactly(byte[] buffer, int offset, int count)
ReadExactlyAsync          Method     System.Threading.Tasks.ValueTask ReadExactlyAsync(System.Memory[byte] buffer, System.Threading.CancellationToken cancellationToken = default), System.Threading.Tasks.ValueTask ReadExactlyAsync(byte[] buffer, int offset, int count,
Seek                      Method     long Seek(long offset, System.IO.SeekOrigin loc)
SetLength                 Method     void SetLength(long value)
ToArray                   Method     byte[] ToArray()
ToString                  Method     string ToString()
TryGetBuffer              Method     bool TryGetBuffer([ref] System.ArraySegment[byte] buffer)
Write                     Method     void Write(byte[] buffer, int offset, int count), void Write(System.ReadOnlySpan[byte] buffer)
WriteAsync                Method     System.Threading.Tasks.Task WriteAsync(byte[] buffer, int offset, int count, System.Threading.CancellationToken cancellationToken), System.Threading.Tasks.ValueTask WriteAsync(System.ReadOnlyMemory[byte] buffer, System.Threading.Ca…
WriteByte                 Method     void WriteByte(byte value)
WriteTo                   Method     void WriteTo(System.IO.Stream stream)
CanRead                   Property   bool CanRead {get;}
CanSeek                   Property   bool CanSeek {get;}
CanTimeout                Property   bool CanTimeout {get;}
CanWrite                  Property   bool CanWrite {get;}
Capacity                  Property   int Capacity {get;set;}
Length                    Property   long Length {get;}
Position                  Property   long Position {get;set;}
ReadTimeout               Property   int ReadTimeout {get;set;}
WriteTimeout              Property   int WriteTimeout {get;set;}

$stream.Close()
$bin = $memStream.ToArray()
$bin | Get-Member

   TypeName: System.Byte

Name                 MemberType Definition
----                 ---------- ----------
CompareTo            Method     int CompareTo(System.Object value), int CompareTo(byte value), int IComparable.CompareTo(System.Object obj), int IComparable[byte].CompareTo(byte other)
Equals               Method     bool Equals(System.Object obj), bool Equals(byte obj), bool IEquatable[byte].Equals(byte other)
GetByteCount         Method     int IBinaryInteger[byte].GetByteCount()
GetHashCode          Method     int GetHashCode()
GetShortestBitLength Method     int IBinaryInteger[byte].GetShortestBitLength()
GetType              Method     type GetType()
GetTypeCode          Method     System.TypeCode GetTypeCode(), System.TypeCode IConvertible.GetTypeCode()
ToBoolean            Method     bool IConvertible.ToBoolean(System.IFormatProvider provider)
ToByte               Method     byte IConvertible.ToByte(System.IFormatProvider provider)
ToChar               Method     char IConvertible.ToChar(System.IFormatProvider provider)
ToDateTime           Method     datetime IConvertible.ToDateTime(System.IFormatProvider provider)
ToDecimal            Method     decimal IConvertible.ToDecimal(System.IFormatProvider provider)
ToDouble             Method     double IConvertible.ToDouble(System.IFormatProvider provider)
ToInt16              Method     short IConvertible.ToInt16(System.IFormatProvider provider)
ToInt32              Method     int IConvertible.ToInt32(System.IFormatProvider provider)
ToInt64              Method     long IConvertible.ToInt64(System.IFormatProvider provider)
ToSByte              Method     sbyte IConvertible.ToSByte(System.IFormatProvider provider)
ToSingle             Method     float IConvertible.ToSingle(System.IFormatProvider provider)
ToString             Method     string ToString(), string ToString(string format), string ToString(System.IFormatProvider provider), string ToString(string format, System.IFormatProvider provider), string IConvertible.ToString(System.IFormatProvider provider), string …
ToType               Method     System.Object IConvertible.ToType(type conversionType, System.IFormatProvider provider)
ToUInt16             Method     ushort IConvertible.ToUInt16(System.IFormatProvider provider)
ToUInt32             Method     uint IConvertible.ToUInt32(System.IFormatProvider provider)
ToUInt64             Method     ulong IConvertible.ToUInt64(System.IFormatProvider provider)
TryFormat            Method     bool TryFormat(System.Span[char] destination, [ref] int charsWritten, System.ReadOnlySpan[char] format = default, System.IFormatProvider provider = null), bool ISpanFormattable.TryFormat(System.Span[char] destination, [ref] int charsWri…
TryWriteBigEndian    Method     bool IBinaryInteger[byte].TryWriteBigEndian(System.Span[byte] destination, [ref] int bytesWritten)
TryWriteLittleEndian Method     bool IBinaryInteger[byte].TryWriteLittleEndian(System.Span[byte] destination, [ref] int bytesWritten)
WriteBigEndian       Method     int IBinaryInteger[byte].WriteBigEndian(byte[] destination), int IBinaryInteger[byte].WriteBigEndian(byte[] destination, int startIndex), int IBinaryInteger[byte].WriteBigEndian(System.Span[byte] destination)
WriteLittleEndian    Method     int IBinaryInteger[byte].WriteLittleEndian(byte[] destination), int IBinaryInteger[byte].WriteLittleEndian(byte[] destination, int startIndex), int IBinaryInteger[byte].WriteLittleEndian(System.Span[byte] destination)

I’ll take a moment to pause and explain before finishing up with the second method. The biggest difference here is that, instead of a System.IO.StreamReader object, we’ve created a System.IO.MemoryStream object. The former class is an actual text reader that is able to consume the streamed data and translate it into an in-memory string. The latter class is still technically a stream, but backed by system memory as opposed to the actual file, which lets us close and dispose of the file while we continue to process the content. With the first approach, simply executing the ‘ReadToEnd’ method will produce a string when pulling a text file…not sure what it does with more complex files. In the latter approach, everything shown above only gives us a byte array, but this allows us to perform some additional analysis to determine the type of encoding. From there, we can re-encode the byte array into the correct encoding format for that file. While I suspect that a file encoded as ASCII or UTF-16 might not come out properly with the first method, the second method should be able to reliably get us to the same end point with the right information. I won’t list out the code for decoding that here, since you can see it in action in the ziphelper module available on the PowerShell Gallery here.

Going back to our first approach, we can now execute the ReadToEnd method of our StreamReader and store the results in a variable like so.

$content = $reader.ReadToEnd()
$content | Get-Member
$content.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     String                                   System.Object

Theoretically, we’re now done…we can dump $content to the screen and see our content and, as shown above, the content is indeed now a string. Unfortunately there is one last gotcha (because of course there is). The key thing that I myself needed to do was to retrieve the ModuleVersion value, which is precisely what ‘Select-String’ was made for. Unfortunately this did not work. Select-String is expecting an array of strings by default, which is how PowerShell pulls text content in when you use Get-Content, unless you add the Raw switch. When pulling in the content Raw, you get a single string object instead of an array of strings representing each row of the file. Select-String does have a Raw switch of its own, but it had no effect on my streamed string content, so searching for ‘ModuleVersion’ returned the entire content no matter what I did. Fortunately for me, after some digging, the solution was to leverage the Split method as shown below.

$content.Split("`n") | Select-String -Pattern "ModuleVersion"

   ModuleVersion = '0.1.0'

For those not ‘in-the-know’, the back-tic (usually right above the Tab key on US keyboards) followed by ‘n’ indicates ‘New Line’. With the encoding intact, even though we didn’t have separate strings for each line, the New Line values were still present within the file. This can be seen by dumping $content to the screen and noting that all the formatting of the file remains intact. By splitting on the ‘New Line’ character, we essentially converted the value from a single string blob into an array of strings such as Get-Content produces. Once I had that, Select-String was able to process the content normally. This is somewhat weird to me, since the help says that the ‘Raw’ switch is supposed to be the closes thing to a Unix grep or Windows findstr commands, which means it should have processed the string just fine. Ah well…another mystery for another day.

Until next time, stay fresh cheese bags!