I’m Still Here, ESAE, and Other Stuff

So I know it’s been a while since I posted…right after I got started, I up and stopped posting for…well…months. The truth is there were quite a number of things that got in the way for a bit. First, I went on a long overdue vacation to Hawaii (highly recommend btw) back in mid-February. Right after that, I accepted an offer to join a new company, so I had all the normal stuff you’d expect in wrapping things up for one employer and ramping up at the other. On top of that, my new employer has a LOT more rules on things like blogs, so I had a bit of an approval process to go through, and then the pandemic, and…and…well you get the idea.

In case you were worried about whether I’ll be finishing my post series that is only a few articles in, worry not! That content is still underway, and I’ll get those posts up as soon as I get them all vetted for compliance with the requirements of my employer. The challenge is one of independence, as my new employer does tax audits (among many other things), so it’s very important that I ensure my posts aren’t showing partiality to any particular vendors…particularly Microsoft…which is slightly awkward…since that’s what I specialize in…but I digress. So now you know, and if you weren’t worried about it…well…carry on I suppose?

So let’s get into the real topic for today’s post…ESAE. Don’t know what that is? It standards for Enhanced Security Administrative Environment, and it’s a ‘zero trust’ framework introduced by Microsoft, and briefly pushed pretty hard by MS and their partners. You may also have heard it referred to as ‘red forest’, which is a nickname largely adopted due to the coloring in presentations MS used, which depicted a master administrative forest in red, and a second production forest in which all your users and resources generally lived, colored gold, thus Red Forest, and Gold Forest. In addition to the two forests, the model also called for separating objects within the production forest into risk tiers, with DCs and the like being in ‘Tier-0’, servers and most data in ‘Tier-1’, and everything else in ‘Tier-2’. The quicker of you may have noticed that I am using the past tense here…it’s not that MS has dropped the idea of ESAE exactly…it’s just they no longer use any of those terms, nor are they developing the framework any longer. They still provide guidance docs that leverage a lot of the elements from ESAE, but that’s it. I have absolutely ZERO inside information, but I suspect that this is largely due to the current direction of the company, which appears to be ‘who needs on-premises anything…just use Azure for ALL the things!!’, but I think it also has to do with the fact that their model had entirely too many flaws (shots fired!!). Yes, I said it.

Don’t get me wrong, I’m a fan of the zero trust model, and have been since long before it was cool (way back in 2003). I’m also a fan of a lot of the concepts introduced with ESAE, and their more recent guidance on securing privileged access (https://docs.microsoft.com/en-us/windows-server/identity/securing-privileged-access/securing-privileged-access). If you do some searches on the web, you’ll find a lot of people that have posted various concerns or problems with regards to ESAE, from security researchers to admins. Most argue that it’s not effective at stopping the main thing it was supposed to (pass-the-hash), while others argue that it isn’t realistic to use…MS tried to make it one-size-fits-all and that just doesn’t work. From the perspective of not being effective, the fact is that ESAE only works for its primary focus when it is consistently and thoroughly implemented and maintained which, without compensating controls and tools, is an exceedingly difficult prospect. The second part of the problem is the implementation…you can’t make EVERY account that has even the slightest elevated access be tagged as ‘Tier-0’ in a global organization without also increasing the number of so called ‘gold card admins’ to support those accounts. The more ultimate admins you have, the more risk to your environment. So, in the end, you either end up with a model that is easy to understand and maintain, but is not really any more effective than it was before, or you end up with something too complex to maintain.

So, how do we fix this problem? Unfortunately the only answer is to add more granularity and increase complexity, but only in a logical manner, with clear rules and guidance driven by the assessed organizational risk, rather than trying to make a ‘one-size-fits-all’ model. Of course, the more complexity, no matter how logically driven, the more pieces you have to build and maintain to achieve the required granularity, which increases risk of mistakes from simple human error. A lot of this, of course, comes down to underlying frameworks and tools that I cover extensively in my first blog entry (https://mer-bach.com/2020/01/04/maturing-your-organizational-ad-security-posture-part-1/) , and will continue to cover in that series as it moves forward.

Outside of those details, then next important detail is that of the delegation model itself…that is, how will you delegate administrative control over objects in a way that limits risk. This is a particularly big deal when you realize that, according to a 2018 Verizon study, nearly 40% of all attacks involve internal threat actors. Clearly, those admins that you make work ridiculous hours month after month, with no resources and no reprieve, don’t all have the absolute best interests of your company in mind at all times…after you fail to give them a raise…again…that is so crazy, right? Before you go off, no, I’m not saying that YOU would ever do such a thing, because I know that YOU have morals and a driving passion to do better…why else would you be reading this blog entry after all? What all this means though, is that you need to find a way to compartmentalize access so that when someone ELSE decides to give the ultimate resignation, or make that extra cheddar, or (let’s face it) someone just completely flubs something, the larger environment is protected.

Since we don’t want to deal with individual object level delegations (you do not…trust me), that means having a *shocked face* NOT flat OU structure upon which to hang your delegations. Sure, you’ve got separation by risk tiers (Tier-0, Tier-1, Tier-2), whether it’s a privileged or regular object, and probably object type (Users, Computers, Groups)…what more do you need right? What if you are a global company that has multiple helpdesks that support different business groups, or what if you outsource your helpdesk functions to another company (that may themselves outsource somewhere else)? Do you really want those users to have the ability to to reset the password for the VP of HR or Finance? I suppose you could put those accounts in a different tier, but if you take the MS model, all of these admins are managed by your Tier-0 admins, which are ideally JUST your Gold cards…you really want 2-4 people providing support for hundreds of passwords? I mean, you could I guess…but I wouldn’t recommend it. So, that means we need a more complex structure than just three levels, and I’m sure you’re hoping that I will give you the magic structure that will meet the needs of every organization…sorry, but I’m going to have to disappoint you, because it doesn’t exist. What I can provide however, is a framework I’ve developed over the last 10 years or so to hopefully move you in the right direction, and maybe some tooling to help.

The very first part of the framework is about trust levels, which is going to be entirely specific to each organization as far as complexity, but we’ll start simple with a four-level structure:

  • None – Users man…ALL of them. No matter how smart someone is (or think they are), people do stupid stuff with their computers, and this should be the trust level for most of the org
  • Low – Ok, so maybe you’ve got some basic admins that are ok, and you can trust them a little bit more (like a really little), which is where we’d put maybe our desktop support or regional helpdesk folks
  • Medium – These guys…ok these guys don’t completely suck and, if properly trained (you ARE providing regular security training right? I mean, beyond just how to set a good password?), so this is likely to be our server admins
  • High – These are your elites…your best trained (and hopefully best paid because…well…talent retention has a price), and these should be the few, the proud, the Gold Card Admins

In some orgs, I also have a Very High and sometimes a Very Low in the mix as well, depending on the size of the organization and how I want to delegate things. This trust is the basis for defining the limits we’ll use for our roles and access down the road.

I know what you’re likely thinking “I thought we were talking about OU structure”…we are…if you aren’t sure how access applies to OU structure, I have unfortunate news for you…you need more training…like…a LOT more. An OU, despite its appearance in the MMC, is not a simple folder to allow you to organize things to make them easier to find (I know…shocking right?). The purpose of an OU, and what should drive the structure, is delegation of control…that’s all…not GPO assignment, not organization, just delegation. If you put the right data on your object attributes in a uniform manner, and keep it current, you shouldn’t need a folder for it…that’s what metadata filter views are for…sorry…kind of a pet peeve of mine…I’ll get back on track now.

Once you have your trust levels defined, you use that to determine what kind of objects each level of trust should be permitted to manage within each risk tier…not the permissions, just what kind of object…standard users, admin users, servers, file shares, distribution groups, etc. Once you have that, then you have to decide if you want to apply a limit to the objects that can be managed within that risk tier. This is largely going to depend on how you set up your trust levels and applied them to your people…should George from accounting, who has a Low trust level in Tier 2, be allowed to administer all distribution groups, or just Accounting specific distribution groups, or perhaps just the Accounting specific distribution groups for the manufacturing line of business? Obviously this can get out of hand pretty quickly, and this is where the first painful cut has to be made. While we’ve already shown why flat has challenges, we also don’t want fifty levels of depth either, and we also have to keep in mind that the key is consistency…there should be no one-off structures…anywhere…ever. For one, that’s just lazy design, and for another, the more one-off exceptions you have, the harder it is to keep track of what should and should not be there. Attackers like to hide in plain site when they can, and the more chaotic the environment, the easier that is to do…plus automating, which is absolutely critical to our success here, is FAR easier to build when you keep things relatively standardized. It doesn’t mean you will NEVER have an exception, but this should be avoided…it’s not efficient (don’t be a lazy admin…I believe in you!!). What this all boils down to is that some people aren’t going to be able to do things they used to do anymore….either for the sake of consistency, or fairness, or security…or narnia…er….

So, let’s cut to the quick…or…well as quick as possible considering this blog has already gotten somewhat away from me. My own personal model for OU structure is composed of up to four layers (not to be confused with levels); Tier, Focus, Functional, and Object. Tier and Focus are very similar to the original ESAE approach, with three top-level OUs to represent each AD risk tier, followed by a layer that tells us something about the type of objects. In my own models, I like to use Admin, Standard, and Server (ADM, STD, and SRV respectively). Any objects within the risk tier that have elevated permissions, or require specialized care outside of the standard, go in the Admin focus. Anything else, other than servers, goes into Standard. Servers is where some have a sticking point as people often ask why these can’t just go into Admin, and the answer is to avoid confusion and…again, automation. While YOU know the difference between a server and a workstation, the AD schema doesn’t really care except in very specific circumstances, like DCs. Since we will have Privileged Admin Workstations (PAWs) in the Admin structure (you are using PAWs, or a PAW equivalent…right? Of course you are), we don’t want confusion between the two, since PAWs are hardened much more than is practical for most servers, as well as a few other things we’ll get into a bit….for right now, just accept that there is a Server focus.

The next level down is the Functional level…I used to refer to it as the ‘Organizational’ level, with the intention that this was the layer that was organization specific, but those same people who keep thinking of OUs as folders got confused into think it was to organize objects…that is not the purpose of this layer. This is the layer where all that work on trust levels and compartmentalization comes into play…these are the compartments to allow you to limit scope. Now, you could make this VERY flat if you wanted, or skip it, but that means you are at a higher level of risk. What people seem to fail to understand…consistently…is that ESAE is not about keeping the bad guys out…there is nothing, short of complete removal of users and connection to the internet, that will keep the bad guys out (even then, if they want in bad enough, they are GETTING in). All of these are about A) slowing threat actors down when they do get in and B) making it easier to detect threat actors once they get in and C) making it easier to kick them back out quickly, hopefully minimizing the damage done. Remember that stat on insider threats from above? If they are already inside your perimeter, and worse, they already have access, then the only thing left is limiting the damage they can do…and that means limiting the scope of access. I should point out that this scope should be based on what they should do on a daily basis, NOT what Susan might have to do if Jim-Bob gets sick and someone needs access right now. That’s what you build contingency access and emergency plans for…again…don’t be a lazy admin…unless….unless you want it all to be your fault when Susan goes off the rails and decides to delete everyone’s files on her way out the door…nah…why would you be here if that were the case?

I generally recommend that the Functional layer be limited to anywhere from one, to as many as three levels. That said, you can theoretically have as many levels as you want…up to the maximum path depth of 254 characters that is (including the object name)…you just then have to maintain it, though I have a tool that may help with that (more on that in a bit). The key piece here however, is that this layer should be directly related to the Trust level and compartmentalization exercise completed previously. The reason we do that first, rather than later, is to allow us to tie this layer directly to organizational attributes. Not only does this may it easier to maintain the structure, since we have clear criteria on how it gets adjusted, but it allows us to answer that other key purpose of the framework…making it easier to find the bad guys. Yes, the first thing a bad guy is going to do is to perform reconnaissance on the environment, but generally their goal is to identify targets that might have access to the data they want, or the ability to escalate their permissions to get them to the data another way. While they may grab some other basic information as well to enable a better mimic for establishing persistence with their own account, it’s unlikely to be enough to let them fit in entirely…tie organization attributes to functional container location, and use codes…you can see if something sticks out, but it isn’t overly obvious to others…at least, that’s been the feedback I’ve gotten when people run pen tests against my models and I find them…every…single…time…well, eventually.

The last layer is, of course, the object type. I keep this strictly to the object types defined within the AD schema. Not only does this make automation easier, but it also increases supportability. This is the ONLY place at which I define, or allow, ACLs to be defined. I use a standardized set of groups that grant specific access, and I use them at only those specific points. If another permission pops up anywhere else, I can immediately notice and alert on it. If another permission is set on the object type container, and the naming convention doesn’t match what’s allowed for that container, it’s another obvious deviation. So, here’s the point at which I tend to get into…enthusiastic discussions…with other admins and customers. Despite the whole ‘AD is not a folder structure’, nearly every organization wants a ‘folder’ to separate some objects, such as Win10 vs Mac or Employees vs Contractors…obviously I am firmly against this practice, and my personal framework provides other ways of addressing this. There is NO reason to create OUs specifically to organize objects for something like this…or to target GPOs…policy preferences and WMI filters…use them…better yet, implement a desired state solution like DSC, Ansible, Chef, etc. The same thing happens for the servers…customers argue all the time that they need to use a container to group all servers of a particular type, or associated to a particular app together…I’m going to again refer you to my other blog article on CMDB (https://mer-bach.com/2020/01/04/maturing-your-organizational-ad-security-posture-part-1/)…please just….just don’t. If, in spite of my caution, you just can’t let go of what you’ve ‘always done’, then at least follow a few simple guidelines;

  • Limit it to a single level only…period
  • Be consistent…deploy the same sub-containers for everyone, and in every tier…same containers, same names, everything

Again, the reasons for this are A) ability to automate…more deviation means more work, and more chance of mistake and B) the more deviations you have, the harder it becomes to spot what doesn’t belong. Now, if you’re anything like me, this makes you groan…deploying all those OUs, all the granular delegation groups (please don’t assign permissions directly to roles), and of course setting up all the ACLs is a LOT of work, particularly if you have thousands of permutations to deal with. Don’t you worry my friends…I’ve got your back. I’ve don’t eh hard work of building a PowerShell module to automate the whole process, for which a pre-release version is available on my GitHub here (https://github.com/merddyin/ADDeploy). The module uses an SQLite DB (no infrastructure required) that holds all the associations. You build out your Functional layer structure in the correct DB table (still working on a setup utility), and the module deploys everything with a single command. Don’t like the conventions I’ve used (maybe you want to call ‘Tier-0’ something like ‘Top-Secret’ or something), you can customize them. Need a new delegation group that wasn’t defined originally, specify the object type and permission using the defined variations, and the tool will create it for every single object container with the correct name and ACLs dynamically….need a new group of properties, add a new definition to the DB table (again, working on a utility). No code should have to be changed…just flip bits in the right DB table to turn items off or on, change naming, whatever, and the module will use it. Have hundreds of thousands of objects (or millions), the module uses runspaces for multi-threading support and leverages memory optimizations to keep the foot print small. I’ve deployed over 120K OUs, each with 142 groups, and thousands of ACLs, in under 4 hours.

There’s lots of work that still needs to be done it of course, which is why it’s still pre-release, but all the deployment functionality is fully functional based on the model I’ve described above. Making hard cuts in access, adopting uniformity, and employing automation are the keys to making ESAE functional for its designed purpose, and sooooo much more. As I said at the beginning of the article though, the effectiveness of the framework is the same as the effectiveness for anything else…it’s only as good as the effort you put into it (both the direct AND the indirect stuff…like training), and the degree of consistency you employ. If you follow the MS guidance, and you are still vulnerable to the same pass-the-hash attacks, you did it wrong…if that’s the case…call a professional (like me), and have them help you shore things up. Business won’t buy in, the professional will help you build your case…can’t get Susie to give up her rights? Security metrics and risk assessments are your friends…as long as your own house is in order first….no…not your Hogwarts house (nerd…kidding)…your tech house.