Further down the AST rabbit-hole…
If you follow this blog, then hopefully you’ve had a chance to dig into some of my previous articles talking about Abstract Syntax Tree, or ‘AST’ for short. If you haven’t had a chance yet, feel free to go read them using the links below. Don’t worry…I’ll wait.
- The Trouble with AST in PowerShell
- Read Text Content (w/o Unzipping)
- Parsing PowerShell – A Follow-Up
For today’s post, we’re going to go a bit further down the proverbial rabbit-hole of AST, by looking at the kinds of details you can get from different types of AST language constructs. The goal for the post is not to take a full deep-dive into PowerShell AST. To get into that, I would need to possess a lot more knowledge and experience with both programming and parsing. Instead, our goal today is to cover the specific kinds of information that can be obtained from various AST types (e.g. How is X useful). This, of course, continues to be driven by my efforts at building a new PowerShell module leveraging nested pseudo-modules, which I’ll be covering in greater detail in another blog once it is ready for beta release.
If you would like to take a deeper look into all things AST on your own, I suggest starting with the Microsoft documentation found here. You might also find some value in downloading the ShowPSAst PowerShell module from the PowerShell Gallery. While I don’t think it provides a comprehensive enumeration, it does provide some useful insight in the event you are trying to parse a few things out yourself.
One thing to note is that I will not be covering every available Ast sub-class below, just the ones I think you might be able to find some use case for. I encourage you to do your own explorations and tests, and if you find something I didn’t cover that you think is useful, I’d love to hear about it in the comments.
Getting Started
In general, PowerShell does a really fantastic job with discoverability. Piping just about any object in the shell to the ‘Get-Member’ cmdlet will tell you all kinds of things about that object. You can even take this a step further by adding the ‘-Force’ switch to see any properties and methods that have been hidden from you. I wrote an article talking a bit about this previously (All Your PSBase Are Belong to Us!!). Unfortunately for us, AST is not nearly so forthcoming with its secrets, probably due to the fact that we are essentially taking a peek behind the curtain, so to speak.
For those who are just starting to look into AST for parsing PowerShell, the first step is binding to the content using the ‘System.Management.Automation.Language.Parser’ class. This can be performed in a variety of ways, but the two most common methods are by using a file path, or by parsing the raw content. Both methods are shown below.
#Binding by file path to a PS1 in C:\temp
$myAst = (([System.Management.Automation.Language.Parser]::ParseFile('C:\temp\myScript.ps1',[ref]$null,[ref]$null)).GetScriptBlock()).Ast
#Binding by raw content
$myScriptContent = Get-Content -Path C:\temp\myScript.ps1 -Raw
$myAst = (([System.Management.Automation.Language.Parser]::ParseInput($myScriptContent,[ref]$null,[ref]$null)).GetScriptBlock()).Ast
Obviously I’m using PowerShell’s order of operations capabilities to take a couple of shortcuts. If I were to break everything out into individual steps, first I would bind a variable to the parsed input, then I would possibly using a second variable to hold the content of the ‘GetScriptBlock’ method, and finally I would likely use a third variable to bind specifically to the ‘Ast’ property. I could have used the same technique for the ‘myScriptContent’ as well, but I wanted to really highlight the differences between the two approaches.
As alluded to, and then shown in the example above, we have to provide the Raw content of our PS1. This is because the Parse methods only accept a single string value in the form of a file path, or file content. When you use ‘Get-Content’ without the ‘-Raw’ switch, what you are actually getting back is an array of strings, with each line represented as a separate string object. For cmdlets that take advantage of the pipeline (e.g. Select-String), this is beneficial, as it allows the PowerShell engine to more efficiently process the content. If you passed the standard content to the Parser however, you would end up getting an error, because only the first line would actually be parsed.
Another element that I wanted to cover briefly is the two additional arguments after the content. I will be honest and tell you that I am not entirely sure what the first token is used for, though I believe it is for storing the output back into an existing variable. The second token is definitely used for the purpose of storing parser errors in a previously defined variable. The ‘[ref]’ is referencing an item previously created, though in a more ‘programming’ sort of approach. While referencing something within PowerShell can normally be done by simply providing the variable, things work a bit differently in the programming world in some cases. If anyone with a better grounding in programming wants to chime in within the comments with a better explanation, I would love to hear from you! In the meantime, I’m just going to accept that things work a little differently in programmer land and move along.
Pieces of the Pie
Naturally we are still dealing with objects, since we are still working with PowerShell, which means we still have properties and methods to deal with. One of the interesting things about working with AST, is that all of the base AST types have the same base properties and methods inherited from the parent AST class, in addition to any item specific ones. Every single one will have the following:
- Properties
- Extent
- Parent
- Methods
- Copy
- Find
- FindAll
- SafeGetValue
- ToString
- Visit
From a properties perspective, ‘Extent’ always contains the full content of the item being found. For example, if you perform a FindAll from the AST object looking for ‘FunctionDefinitionAst’ items, then you will end up with an array of items of that type, and the ‘Extent’ property will content the entire string representing that function. The ‘Parent’ property, on the other hand, will be populated differently depending on how the item was bound. You might expect that the same FindAll that captures the entirety of the Function block in a PSM1 would show the module, or perhaps the script, as the parent. In actuality, you end up with the Parent and Extent properties both containing the entire Function block.
There is clearly a way to walk the tree, so to speak, to show items more cohesively, as this can be seen when using the ShowPSAst module. When you use the Find* functionality however, you are skipping the tree and just parsing the content looking for the specific item type within a given block. Just something to keep in mind as you dig in further.
ArrayExpressionAst and ArrayLiteralAst
Obviously both of the above will find arrays, but it is the nature of those arrays that paints the difference between them.
#Array Expressions
$myArray = @()
#or
$myArray = @('item1','item2','item3')
#or
$myArray = @((Get-Process).Name)
#Array Literal
'item1','item2','item3' | Some-Cmdlet
#or
$myArray = 'item1','item2','item3'
If you use the ‘FindAll’ method of our $myAst object to find all the arrays in a PS1, or a script, or a function, you will get back objects that look somewhat like the ones shown above. There are some other differences, as one would expect, in terms of available properties as well.
Both the Expression and the Literal have a ‘StaticType’ property, which will always be a typeof(object[]). This is because arrays are not strongly typed, and can therefore hold multiple types of item at once.
The Expression has an additional property, which is ‘SubExpression’. While I can’t provide an example of what this looks like, I believe this corresponds to the content within the parenthesis, if statically set. This would mean that, in my second example above, SubExpression would equate to ‘item1′,’item2’, and ‘item3’ as three separate items.
The Literal also has a unique property, which is ‘Elements’. When you pull up the properties of the Elements property, you may actually be surprised to find that it is essentially just an array of Expressions, so the sub-items have the SubExpression property. Just as with the Expression type, you will see that each SubExpression is essentially the content of the individual items contained in the array.
The biggest use case I can see for this one is in finding any collections of objects. Limited utility, but possibly useful information if generating documentation for the script or function.
AssignmentStatementAst
This one is pretty straight-forward. Using find with this class will return every variable statement (e.g. $myVar = myValue or $myVar += $myItem, or anything that uses similar assignments) in the searched content. It contains properties that will show the Operator, as well as Left and Right values, showing what is on either side of the operator. The remaining property is ErrorPosition, which indicates the position at which an error would be generated, such as the dreaded DivideByZero error when you only have one item, and therefore you don’t have an actual ‘array’ to index into.
Brief aside on that; If you always use ‘$myArr = @(‘mystring’)’ when setting the value, then it forces the item to be an array, even if there is only a single value in it. This allows you to process the variable using your choice of looping construct, regardless of whether you get a single string back, or dozens of objects.
As far as use cases, this could readily be used to compile a list of all unique variable names, identify any variable name re-uses (not a great practice), of find places where someone is leveraging ‘$_’ in a loop rather than using something like ‘foreach($item in $items)’. The latter of these is more readable in my opinion. Another possible use case is as a step in mapping the number of dependencies on a given variable by finding all the places it gets used, possibly via the ExpressionAst or the more specific VariableExpressionAst sub-classes. With the latter of those two however, you will likely need to traverse up the tree by accessing the Parent to see where it is specifically being used, though it’s good for getting a count (e.g. This variable is found X number of times, then subtract the number of assignments made to get number of dependent uses).
AttributeAst, AttributeBaseAst, AttributedExpressionAst, ParamBlockAst, and ParameterAst
While I have grouped these two together, they have no common properties between them. The former has both NamedArguments and PositionalArguments as properties, while the latter has only TypeName. As one might expect however, the ‘Base’ is also contained within the ‘AttributeAst’ itself. The key difference is the focus of each.
AttributeBaseAst is essentially focused on any classes or class accelerators that are within your content. Essentially anything that is within square braces, such as [PSCustomObject], [string], or [switch], but also things like [Parameter()] and [CmdletBinding()]. The sole property, TypeName, essentially shows the type of the object.
The ‘AttributeAst’ is more focused on items that have the ability to provide arguments within the definition. This means items like [Parameter()] or [CmdletBinding()], or essentially anything else that has a set of parenthesis inside of the square brackets. If you aren’t parsing an advanced function with those types of blocks present, then you won’t find any entries of this type. Since it is fairly common to skip specifying [Parameter()] unless you need to set one as Mandatory or the like, you may not always find this particular class useful.
Now, if you read my last article on AST, you’ll likely notice that there are other classes that get somewhat similar information; the ParameterAst and the ParamBlockAst classes.
The latter of these two classes essentially gets the entire Param() block as a whole. Unless you have additional child functions defined inside your function that also have parameters, or you have multiple functions defined in the content you are parsing, you will generally only get one item back. Binding to this block will allow you to get to the individual parameters, and their various settings, by traversing the various child properties. While this technically ‘works’, it can also be unpleasant if you are trying to get to a single text value specifying the name of a parameter. This is because the single ParamBlockAst class contains what amounts to ParameterAst objects for each defined parameter. I say unpleasant because, even when binding directly to the ParameterAst, getting just the name of the parameter as a string requires the path ‘Name.Variable.UserPath’. Binding to the block instead means you must first traverse the block to get to the Parameter, and then also traverse the parameter tree to get to the name as a string.
With regards to use cases, the utility here is low, outside of documentation and validation. I suspect that many tools leverage these classes for things like generating the minimal help information when no help info exists.
CommandBaseAst, CommandAst, CommandElementAst, CommandExpressionAst, and CommandParameterAst
In similar vein to the AttributeBaseAst class above, the CommandBaseAst is the root type for the CommandAst. I personally haven’t found a lot of utility personally for this one, unless you simply wished to inventory all the command calls, or to traverse the tree.
The CommandAst sub-class could be particularly handy, depending on what you were trying to accomplish. This class has three properties; CommandElements, DefiningKeyword, and InvocationOperator. The most interesting of these, in my own opinion, is the CommandElements property, which itself contains CommandElementAst and possibly other sub-class types, depending on the command and how it is being called. What this Ast essentially gives you is the ability to identify any call to a function or cmdlet made from within the main Ast, along with any parameters used, and the variables or values being passed to each parameter.
Some ways this might be helpful would be things such as checking for the use of aliases, or figuring out module dependencies. For example, if someone is calling the ‘Connect-ExchangeOnline’ cmdlet, you would be able to determine that this function depended on the ExchangeOnlineManagement module. It would therefore be a relatively simple process to write a function that the author of a script could run to analyze their script for dependencies before distributing the script.
The CommandExpression and CommandParameter Ast sub-classes are obviously closely related, however I haven’t found much in the way of utility for these as of yet.
StatementAst, LoopStatementAst, LabeledStatementAst, DoUntilStatementAst, DoWhileStatementAst, WhileStatementAst, ForEachStatementAst, and ForStatementAst
Obviously these are the Ast sub-classes representing the various looping constructs in PowerShell, not including the Process block. All of these sub-classes have essentially the same set of properties; Body, Condition, Extent, Label, and Parent. Some classes, such as the ‘For’ type classes, have additional properties for Flags, Variable and, for newer PowerShell versions (6+), a ThrottleLimit.
As is common with many of the other classes represented here, the StatementAst is essentially a generic parent for the other items, with LoopStatementAst being specific to all of the looping constructs. Finding all StatementAst objects will also net you any other class with ‘statement’ in the name as well (Block, Break, Continue, Data, Error, Exit, If, etc.). If you look at the LabeledStatementAst, you are looking specifically at constructs like Switch, While, etc.
As with a number of other areas, I think the biggest utility for these is in validating preferred patterns and practices. If you want to ensure others avoid passing things directly to Foreach-Object, you can simply leverage CommandAst though.
VariableExpressionAst
The last Ast that I will be covering in this post is the VariableExpressionAst, which I mentioned briefly in the AssignmentStatementAst section above. This Ast is a more specific instance of the ExpressionAst that is focused on variable usage. I believe it will also capture the variable assignment instance, though not the substance of the assignment.
As mentioned in the prior section, this is mostly around documenting dependencies. Being able to tell how many times a given variable is referenced, which includes splatting by the way, can be useful in determining how broken something might end up being if you mess with what is being stored in the variable. At the very least, it tells you how much testing you might need to do.
Wrapping Up
Obviously there are a LOT of Ast classes I didn’t cover. This is because, as mentioned above, I tried sticking to the ones that I found usage for in my own work.
If enough people write into the comments asking about other classes, or offering info, or if I suddenly find new utility, I may post a follow-up to this article in the future. In the meantime, that’s all from me for now.
Stay fresh cheese bags!