I’ll start off by stating that I am very much not a data analyst type of person. That means I don’t know data theory, or the best approaches to building business intelligence dashboards and the like. If you are such a person, and you know of better ways of handling this, please do leave a comment or reach out, as I am always trying to learn more.
That all stated, those that know me, or have followed this blog (even though I rarely post), may find yourselves wondering why I would be posting about Power BI, when all my other content is security and PowerShell. My intent for this blog has always been to write about whatever technical items I might come across that either gave me a particularly difficult time because I couldn’t find anything on the web, or because everything I was able to find required some sort of advanced degree to interpret. I generally try to refrain from posting in forums and what not to get help, because I’m never satisfied with the typical “Just do X and that will fix it.” or “Why would you do it that way?” kinds of answers. I need to understand why that particular approach is the answer, or that a given practice is wrong. As for “Why Power BI”, well I’m a huge fan of using whatever tool is most efficient for getting the job done, and the more I can automate, the happier I’ll be. PowerShell remains my number one choice for a great many things, but using it to automate SharePoint actions or to create reports is just…not efficient. You absolutely can do either of those, but you will have to work twice as hard to do something that better tools already exist for. Power BI, in general, makes a good deal of sense to me, and I’ve found that effective data visualizations go a very long way towards helping C-Suites wrap their heads around my message far easier than data.
Today, I’m writing about some very specific challenges that I encountered, as well as how I (eventually) overcame them. As is so often the case, trying to find answer online was more than a little frustrating, particularly because I had two compounding issues, combined with a looming deadline, all ratcheting up the pressure. I even gave in and tried using GitHub Copilot to try and solve the problem and, while the solution it gave was partially right, it completely missed other elements entirely. At the end of the day, I only solved the problem because I stumbled across a random one-off comment in a forum somewhere, so I decided to write this post in hopes of helping others. One last note to make is to apologize for a lack of screen shots for this one, but the environment where the issues occurred is a customer, and the issue is not something I can easily reproduce outside of it without some time, and I really wanted to get this down while it was fresh.
Background Info
We had a base data set with a lot of holes, so we had to collect and incorporate data from a bunch of different sources. Some of this was managed using merges, while the rest was put into a nice star relationship to enable dynamic associations and lookups. In some cases, data needed to be extracted or recombined in various ways, and in a few others, we had to leverage ad-hoc data sources that we built as Power Query defined tables so we could easily adjust them as we identified additional keyword use cases. In addition, we needed to validate information points in two different Active Directory environments, one of which had more than 189,000 objects. While this is not a substantial data set for Power BI, anyone who has ever tried to pull data from AD directly using Power Query will tell you that the process is…not fun…or quick (got an article coming on that soon). For this reason, among others, we used a number of custom functions so that we could call them from various data sources to pull in supplemental details. One such function was being used to analyze user objects from AD and categorize them based on various keywords and traits from one of those Power Query defined tables I mentioned. The tests of the function on ad-hoc data points all worked as expected, and the column within the preview populated as expected. The data Column Quality assessment showed no errors in the 1,000 record preview, but when we tried to apply our changes, that’s where things went sideways.
Problem Description
Applying the model resulted in an popup indicating that we had errors on 8,957 rows out of 8,957 rows. When this happens on the apply stage, Power BI offers you the option to see the errors. Doing this opens the Transform Data window and creates a new collection of tables that are supposed to show you the rows with errors. From there, you can usually click on the cell column where the error occurred and see the error message. In this instance however, there were no rows shown, though the prior steps all showed data, up until the ‘Keep Errors’ step. Again, all tests we were able to perform showed everything was good, but each time the model was applied, the same message would appear. If we simply cleared the message, the data would all populate, with the exception of the column we used the function for, which was entirely blank. This only deepened the mystery, as normally all the rows would populate, even if some had errors instead of data points, but we had nothing. We spent more time than I’d like to admit trying everything we could think of to identify the problem and work past it, but nothing was working. Worse, because neither myself nor my colleague were Power Query masters, we weren’t really sure how to go about tracking down the problem, and what information was available on the web required quite a bit more background in Power Query to understand.
Due to time constraints, we had to switch tactics, so we pivoted (heh, see what I did there? Data puns!) to using DAX instead to get the job done. This was a bit more cumbersome, but it worked so we moved on, though with the intent to revisit it later. Unfortunately for me, I encountered a seemingly identical issue with a completely different data set and a much simpler function only a short time later. Persistence paid off, and I was able to identify the cause of the cause of the issue, which ended up actually being two completely different problems compounding on each other.
Issue 1
The first issue was the more basic and simple of the two. As mentioned in the description above, we were using some built-for-purpose keyword tables for our functions. An excerpt from the more complex of the two functions that contains the issue is shown below.
(UserName as text, CanonicalName as text, HasPrivMark as logical, Description as nullable text, EmployeeID as nullable text, CustomKeywords as nullable table) as text =>
let
DefaultKeywordsTable = #shared[CategorySource],
CombinedKeywordsTable = if CustomKeywords = null then DefaultKeywordsTable else Table.Combine({DefaultKeywordsTable, CustomKeywords}),
//.....
The first part of the problem was due to the use of ‘#shared[CategorySource]’, which was used to refer to the outside table. This was used, in part, because I had only ever seen outside tables used in functions when specifically passed as an argument, so I had assumed that this was some limitation imposed when creating a custom function, versus a normal table. Even though I did have a parameter that could accept such a table, I only provided that to make it easier for others to customize the keywords on the fly. Since I wasn’t sure of the correct approach, I decided to try asking GitHub Copilot to suggest updates that would allow the reference to the table within the function, and the ‘shared’ line is what it recommended.
Obviously, I don’t just take Copilot at it’s word. I tried to do some spot research that came up mostly empty, because I didn’t know how to correctly word the search to reach content from the real pros. I ended up testing the code and it seemed to work just fine within the editor, as mentioned before.
The problem here, was the ‘shared’ keyword. Apparently, this is a special use item for accessing some of the mystical inner workings of the Power Query engine. When developing query extensions, this keyword is used to tell the mashup engine (I guess, because data mashup?) that a given query can be referenced from outside of the extension itself. On the inside, most extensions still leverage a lot of Power Query code with some additional syntax elements, and any functions that have this keyword can be called by using projection. The ‘#shared’ is a dataset containing all of the globally available functions, and a specific function can be called by placing its name inside of square brackets, as I did with the table name.
As a neat side note (at least, I think it is), you can see and browse this namespace by create a new blank query, and setting the source to just ‘#shared’, though only if you have existing data connections loaded, as this is what loads the functions into the global namespace. If you try enumerating this space without any connections, you end up causing a stack overflow. Within this space, you’ll be able to see your tables and functions, as well as any publicly exposed functions, including many of the various core Power Query ones.
All this sounds great right? Well, it turns out that these functions are semi-private for a reason, and that reason is that they are limited in their interactions or use outside of the advanced editor. My first issue was that my code-defined table was not available via this namespace reference as part of the import process, regardless of whether or not it was configured to Load (I tried). Because this table wasn’t available, the function had nothing to use for keyword checks when it was called, which caused the second issue.
Issue 2
One of admittedly many things I didn’t know about Power Query, was how errors were functionally handled within the language, which is the use of ‘containment-based programming principles’. This nifty tidbit is covered in the MS docs theoretically, but I was apparently not enough of a developer to grasp all the mechanics. As part of my efforts to solve this problem however, I ended up buying a digital copy a very excellent book called The Definitive Guide to Power Query (M), which did a much better job of explaining things in simple enough terms that even I could understand it.
Containment-based principles essentially boil down to a sort of hostage negotiation for implementing good coding practices. Within this framework, if you write good code that handles all of your potential errors, then those errors can be ‘contained’, meaning that whatever caused the error is restricted to the individual action and item that caused it. As an example, say you have a data set that contains numbers, but some of the numbers have been stored as text instead of as a numerical value. If you attempt to process this column using one of the built-in Power Query functions for working with numbers, it will throw an error when it hits the text formatted value. In the case of the built-in functions, Microsoft took care of the error handling part for you, so you end up with a nice neat error record with all the details. If, on the other hand, you call a custom function without strong typing for the parameters, you end up with an uncontained error. When this happens, the Power Query engine essentially rage quits on you and undoes any previously evaluated parts of the expression, which undoes any work already done, and then it stops processing. If this happens when you are adding a custom column, this results in every row showing as having an error, because none of the rows remained processed. As for the reason there are no error rows, that’s because no error was technically recorded by the function to show you.
If you ask me, this is really kind of a raw deal for anyone trying to learn Power Query, because you don’t get an error that tells you anything useful. At best, you might get a data source access issue because you excluded certain elements, or you might get a stack overflow error, but neither of those actually tells you what the problem is.
Solution
The solution to the first issue is really quite simple; don’t use the ‘#shared’ keyword to reference functions or tables unless you know what you’re doing. In case that is in question for you, I’m sad to say that the fact that you are here reading this article likely means that you do not qualify.
Does this mean that you can’t reference tables within your functions directly? Absolutely not. This was simply a case of me overthinking things based on a set of examples. You do need to create a reference handle for use within the function, but you can do this by simply referencing the properly cased name of your table. Using the example from my code sample above, I should have just used ‘DefaultKeywordsTable = CategorySource’, and that part of the problem would never have existed. Try not to overthink kids…it’s not good for the nerves.
As to avoiding the second issue, that will take a bit more explaining.
Unfortunately for all of us, MS decided to give us the freedom to be lazy in our coding. What this means is that you could generally avoid the issue as simply as wrapping things into a ‘try/otherwise’ construct. This can be wrapped around a single statement, as shown below.
// Single statement
AddNewColumn = Table.AddColumn(
PriorStep,
each try if Text.Contains([SomeColumn],"Stuff") then "Good" else "Big Dummy" otherwise ""
)
With this in place, in the event that ‘SomeColumn’ contains a non-text value that causes Text.Contains to throw up, the problem is basically swept under the rug and ignored. Instead of throwing an error, the ‘otherwise’ condition returns whatever “default” you want to specify. Depending on your use case, this might be enough, but the problem is visibility. If this becomes your standard approach (I put that sh!!t on every-thang), then you will never know that your data is bad, because you will never have an error pop up for it. In cases where this is in-line within your main query and you are using a built-in function, it might be better to allow the error to pop so you can find the issue and fix it, though this was an admittedly very basic example where it might be fine to suppress the error. If you are wanting to use a custom function however, then I would recommend stepping up your game, otherwise you’ll have headaches like this one to deal with.
Stepping Up
If you really, really, want to step up your game, you want to implement real error handling functionality, which means returning rich objects so you can know what to look at when things go sideways. There isn’t any sort of proscriptive guidance in this area that I’ve come across, other than “handle your errors”, and handling methods/practices will vary based on different data types. For example, when dealing with ‘logical’ items, an empty or null value can ruin your whole day. One method of dealing with this, that doesn’t involve returning an error, is to use the Coalesce operator, which is a double question mark (??), which in its simplest form can be used to specify a ‘default value’. As an example, say you had a data column called ‘hasADObject’ that is supposed to be a logical true/false value. You could refer to the value as ‘hasADObject ?? false’, to indicate that the default value is a negative in the event the passed value causes a problem, or is missing.
In terms of more robust error handling, MS once again provides options from the simple and lazy ‘error “This thing is borked”‘ to generation an error record, or even lazier, just an ellipsis (…) can be used. Triggering the error, much like in PowerShell or other languages, can be accomplished using ‘try/catch’, where the ‘catch’ portion gets the necessary information, or even as part of a perhaps more familiar ‘if/else’. Instead of just a simple message, you can provide an automatically formatted record using a built-in function as shown below.
error Error.Record("Wrong Info Dude!", "That nonsense you tried to pass me was suss.", [MyColumnVal = [MyColumn], Action = "Format MyColumn"]
For those who are really wanting to go the extra mile, the book I mentioned before provides a really nice example of a custom error function that accepts a set of values as a record and then generates an error report. I plan to study their example to figure out what all it does and then try to make my own at some point. Also, before anyone asks, no I won’t post their work here for people to use. Buy their book. Not an advert, and I don’t get any kickbacks, it’s just one of the better books I’ve bought…honestly the first one that was actually helpful, and I have like five or six books on Power Query alone, so I think that says something.
Anyway, that does it for me this time around. Hopefully I’ll have some more time soon to post some more! As I said, I think I’ve got some really good stuff in the barrel for working with AD in Power BI, so stay tuned!
Comments