New challenges…
… New skills
So, in my last post I wrote about ESAE, and introduced everyone to my budding deployment module, which is still available on my GitHub if you want to check it out. No, unfortunately that project is not yet finished, and the module is still very late-stage alpha. All I can say is…well…consulting is hard sometimes. I’m absolutely planning to continue to build out it’s capabilities, but I’m having to balance that with the creation of a net-new competing module that I may not get to make open source this time. Time will tell I suppose, but that doesn’t mean that I can’t write about some of my trials, tribulations, and discoveries as I pick up some new skills along the way.
In my most recent engagement, I ran into some interesting challenges with my module that brought to light some limitations that, unfortunately, caused me to have to scramble a bit. For one, this particular customer was not a fan of the whole deployment uniformity approach. Their perspective was that it was too complex, and they didn’t believe me when I tried to point out that too much differentiation would simply create more work and complexity down the line (I know…consultant-world problems). On top of this, they also had not just one or two, but at least FIVE different groups that would require access to leverage the module, and several of these groups were from different outsourcing companies working in a largely remote capacity. These challenges made me think a lot about the design approach of the module, and what I wanted it to be able to do. In the end, I got it all working for the customer, but I went back to the drawing board for the module. In the end, I decided to move towards a centralized DB oriented approach for the data elements, and use this to fix the data distribution issue and the various security issues with having localized data, all in one go….and ya know what, let’s make it cross-platform and Pwsh 7 compatible while we’re at it…easy right?
So, here’s the rub…I’m NOT a developer…shocking, I know. I come from an admin background and, while I’ve learned some nifty tricks over the years working with PowerShell and SharePoint and various other technologies, trying to put something together like this was going to be…a challenge. Everything I know about ‘real’ programming comes from what I’ve learned over the years working with PowerShell…which means I know about objects, properties, and methods, a VERY little bit about object classes, and the tiniest sliver of knowledge around C#, and nothing about .NET Core. There’s LOTS of information out there for getting started on .NET, but that isn’t really cross-platform. There’s also a decent amount of information out there on .NET Core. The problem with both options though, is that almost everything I could find all assumes you already know how to develop…sure, there are a few ‘hello world’ tutorials, but nothing that was really focused on a programming newb like me.
So this blog entry is going to serve as a posterity entry for things I’ve learned this week, and I’ll do more entries as I collect enough to write about. Hopefully I’ll finish the series before I get pulled in another direction this time lol.
I decided to use a SQL on Linux Docker container for my new central data source. It’s portable everywhere, and supports everything I am hoping to do with it down the road as well. This could just as easily be SQL in Azure, or SQL Compact/Express in Docker as well, and at some point I may do a post around my data architecture as well, but not today.
Once I had the data source worked out, I needed to figure out how I was going to connect to it from PowerShell in a cross-platform supported manner. Yes, I could have used one of the several SQL modules out there, but that would have created module dependencies I am hoping to minimize this time around. I also wanted to steer clear of having a bunch of SQL statements in the module code and try to obfuscate things a bit more, as well as add some security, so I opted for using a Restful API web service. I elected to use the latest and greatest version of .NET Core, which is 5 at the time of writing, and to leverage Entity Framework Core instead of ADO or ODBC as my access layer. From my research, this would theoretically be easier than the legacy methods, and would also provide better future-proofing and agility in terms of database back end.
I could have used VSCode of Visual Studio Community, but I instead opted to use full Visual Studio since my company is nice enough to pay for a subscription. I configured it for C# web development for the initial setup and also added the SQL development components. Once I had done that, I created a new project using the ASP.NET Core Web Application template…I already had a SQL 2019 on Linux Docker image up and running, and had already created my database, not realizing that Entity Framework prefers a ‘code-first’ approach. I also grabbed and installed a copy of the SDK for 5.0.100, available here (https://dotnet.microsoft.com/download/dotnet/5.0), since I seemed to recall reading somewhere that having the relevant SDK installed made some things easier…it was also required for one of the extensions I installed later.
From there, I needed a number of libraries and packages to support the specific use case I had in mind. This can be accomplished by going to Project -> Manage NuGet Packages:
This should open up the package browser within Visual Studio, and it should show a source of ‘nuget.org’. If it doesn’t, just click the little Gear icon in the upper right corner to access the Package Sources dialog. Click the green plus icon to add a new entry, specify a name, add the URL (https://api.nuget.org/v3/index.json), and click OK.
Change over to the ‘Browse’ view, and add the following packages (note: Some of these I installed because they are listed as dependencies in the package description, but don’t automatically get installed):
- Microsoft.EntityFrameworkCore
- Holds the main library information for working with the Entity Framework
- Sub-Packages (not sure if they are all needed)
- Microsoft.EntityFrameworkCore.Abstractions
- Microsoft.EntityFrameworkCore.Analyzers
- Microsoft.EntityFrameworkCore.Design
- Microsoft.EntityFrameworkCore.Relational
- Microsoft.EntityFrameworkCore.SqlServer
- This one provides logic for connection and interaction with a specific type of DB provider, so select the one that fits your use case
- Sub-Packages (not sure if they are all needed)
- Microsoft.EntityFrameworkCore.SqlServer.Design
- Microsoft.EntityFrameworkCore.SqlServer.NetTopologySuite (needed if you need to recognize IP address data type fields I think)
- Microsoft.EntityFrameworkCore.Tools
- This adds some PowerShell type commands, which only work in the VS interactive command window, for performing some tasks
- EntityFrameworkCore.SqlServer.HierarchyId
- Only needed if you are using the HierarchyId field data type in SQL, which is a more efficient approach then specifying parent IDs, as the core library doesn’t recognize it for some reason
- Sub-Packages (not sure if they are all needed)
- EntityFrameworkCore.SqlServer.HierarchyId.Abstractions
- Microsoft.VisualStudio.Web.CodeGeneration.Design
- This adds some scaffolding tooling for creating the web service APIs
For extensions, I installed the EF Core Power Tools from the Extensions manager in VS. Finally, I added a connection to my SQL instance running in my Docker container, which for me showed up under Tools -> Connect to Database. From some experimentation, there are some generators that will handle the DB connection piece for you as part of the scaffolding process, but I elected to go this route instead.
With VS now fully set up, and my new project opened and ready to go, I was now able to right-click on my solution name in the Solution Explorer and go to EF Core Power Tools -> Reverse Engineer.
If you already added a database connection, you will see it already selected and available, and in my case I chose to turn on the option for ‘Use EF Core 5’, as shown below.
So…all of this work and steps to accomplish…what? What all this does is give us a shortcut to a basic set of files that are reverse engineered from our previously created database. With Entity Framework, this is apparently the opposite direction of what you should do, which is to define all of these files first, then you create a ‘Migration’, which is how they refer to the process used to create or update your database in a code-first approach. In my case, I used a database-first approach to generate the code-first files for me, and going forward I can use the correct approach. For us infrastructure folks, this may seem a bit backwards, but think of it as ‘Infrastructure-as-Code’…you never have to document all the steps, because it’s already documented in your code, which is then used to build. Pair this with source control, and you have a lot easier time of keeping your documentation updated (if you do it right), as well as MUCH better rollback and testing capabilities.
So, depending on the specific options you selected during the reverse engineering process, you’re going to end up with a bunch of .cs files, depending on the number of tables you have. In my case, I initially selected most of the options, and it turned out to be the right approach for me. Some of the generators provide the ability to do a single file as well, but using individual files I believe is cleaner, as it allows you to keep work split out per table or the connection, and also follows preferred practices. While we’re on that topic, apparently another preferred practice relates to capitalization of the first character of the table names…something I did not do in my case…so…pro-tip?
So now we get into the real ‘dev’ pieces. Again, I would ask you to keep in mind that I am NOT an experienced developer…this is simply what I have picked up and how I understand it. If you are a developer and come across this blog, more than happy to hear any constructive feedback.
The first thing we see if we open up one of these .cs files in the editor is a bunch of lines that say ‘using’ followed by various items referred to as ‘namespaces’ (more on that in a sec). These namespaces represent DLL files that other developers created, and these namespaces contain the instructions for dealing with whatever objects they are related to. This saves us time, as we don’t have to go through and reinvent the wheel…instead we get to ‘inherit’ the work someone already else did. Sort of like using a PowerShell module someone else built to give us commands to do a task instead of having to write the script to do all the work ourselves…even if we are writing our own module, this saves us work and troubleshooting. These DLLs also let the IDE know how to handle syntax highlighting and intellisense for the type of task we are working on.
So, if we open up one of the .cs files representing a table, we’ll see something like the below:
// <auto-generated> This file has been auto generated by EF Core Power Tools. </auto-generated>
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;
using Microsoft.EntityFrameworkCore;
Some of those you will see, and likely use, in nearly everything from what I’ve seen, because ‘System’ is one of the core namespaces used in C#. Others are obviously specific to what we are doing and the needs of our specific project.
The next part is where things get more interesting, as it’s where we’ll start to put in our own code. The first thing we need is our very own ‘namespace’ designation. This will be what gets referenced by other .cs files down the road, so it’s important that the namespace makes sense so you know where to look for things. I’ve seen lots of different approaches, but it looks like the leading practice here is to use the name of the solution (no spaces or special characters), followed by the specific sub-component based on logical grouping, and separated by a single dot/period character. So far in our little project, we have two main types of file focuses, depending on our DB and the options selected during the reverse engineering process; database connection/definition, and one or more table files. I’ve also elected, in my sample project, to organize the different file focus types into directories. If I use the typical approach I’ve seen modeled elsewhere, I now have the following folders:
- Properties
- Controllers
- Data
- Models
All of my table definition files are in the Models folder, so if my project was called ‘My Awesome Web Service’, it might make sense to use the folder in my namespace, which could then be ‘MyAwesomeWebService.Models’ for the tables files, ‘MyAwesomeWebService.Data’ for the context files, and ‘MyAwesomeWebService.Controllers’ for the controller files, etc. Again, this is largely a name reference thing, to keep a degree of separation when you need to refer to certain elements in the future. You could theoretically make everything part of a single namespace if you have a very simple app, but if the app keeps expanding over time, you may find that you need to go back and refactor things. I think this is partially a memory management thing, and partially a performance/optimization thing. If you have a single namespace, then the entire set of code has to be gone through when you make calls for certain classes, functions, or methods, and the whole namespace always has to be resident in memory. For smaller programs this probably isn’t a huge issue, but I’d imagine it could get ugly if you have thousands and thousands of lines of code. I don’t PLAN to have thousands of lines of code at this stage, but who knows what will happen in the future and, just like trying to build a module or a script in PowerShell, it’s probably better to start with leading patterns and practices if I can to possibly save some potential headache later…even if it’s something simple.
Now, within your namespace you will have one or more classes. At the most basic, the class appears to be a definition of the object in question. In PowerShell parlance, that means we are defining the properties, and potentially also the methods, for that type of object. Each class definition appears to start with a scope indicator, such as public, private, internal, static, etc, which determines how the class can be accessed and used. Public classes, as expected, are visible inside and outside of the library, and new instances can be created for these items in things like PowerShell, or as reference items in other libraries. Microsoft has a bunch of documentation on the various options, but the best explanation I found came from stackoverflow (https://stackoverflow.com/questions/614818/in-c-what-is-the-difference-between-public-private-protected-and-having-no). There are all kinds of responses showing the information in different ways, including several graphical formats, which I really liked.
After the scope indicator, you’ll also see the type of construct, which is of course going to be a ‘class’, followed by the name of the class. Again, leading practice here is that the first letter should be capitalized. If your table name is formatted this way, it gets translated that way in the generated code as well. In my case, I did not do this, and I discovered the hard way that trying to go back and change it can be a bit of a pain (pro-tip, if your IDE offers to go change all your references for you, let it). Now, depending on the options you select if reverse engineering, you might see a ‘partial’ right before the ‘class’ keyword. From what I am able to understand, this tells the system that this particular class is split across multiple class definition blocks, or even files. I believe that, once it gets compiled, all these pieces get put together into one item, but in this instance splitting them out helps us if we ever decide to make DB-first changes again and re-run the generation process…it allows us to separate out the base components from any custom code that we do other things with, and allows the two pieces to be separately maintained…which reduces the chances that we might break our DB by accidentally pushing unintended changes back during the build process.
The last bit that I’ll be covering for today are the items within the class. These are that actual properties and methods we talked about from before. Just as with the class definition itself, we start off with an accessibility/scope designation. If we want something to not be visible outside of the class for example, we could set this to private. For the model files, these will be the various column names for our database table, which we will presumably want to access elsewhere, so we likely want them to be public. The next piece of the puzzle is the data type. If you are familiar with databases, or PowerShell, then you’ll be used to this…is the object a string of text (string), a number (integer), a very small number (tinyint for SQL or byte for .NET), or perhaps a boolean value (bool, true/false, 0/1). In order for .NET to know how to interact with it, it has to understand what kind of data it is. This is followed by the name of the property (again, capitalize the first letter), without spaces or special characters, and finally by the actions.
In our case, it will typically be ‘get’ and ‘set’, meaning that we can read/get the current value, or write/set a new value. In some cases, you may wish to restrict this. For example, in my solution I have a HierarchyId field that I’m using to track a set of parent and child relationships in a tree…sort of like an org chart. I have another column that shows what level the object is at within the hierarchy (at the root, the bottom, etc) in the form of a number. Sure, I could get the level each time using the available HierarchyId functions, but I’m lazy…and because I’m lazy, I also don’t want to have to write code to keep this column value up-to-date in the event the item is moved somewhere else in the tree, so this column is a calculated column that SQL dynamically sets and manages on its own. That being the case, I don’t want anyone trying to set that value from somewhere else…such as a .NET Core Web API for example. The easiest way to do this is to remove the ‘set’ option, and leave only the get. There may be other actions you can set here as well, but I haven’t had a need to look into that yet.
Each of these is, of course, a property, and therefore may have a few other pieces, depending on the type of class you’re defining. For example, in our table construct class, we can indicate certain limitations for the column/property, or other type indicators. For example, if you have a primary key column, it will have a ‘[Key]’ indicator. If a value is mandatory, it would have a ‘[Required]’ value. You can also have more than one of these, for example if an item is both required and has a maximum size, say you have a VarChar column that is limited to only 3 characters, then you could have something like ‘[StringLength(3)]’.
In my own database, I have a number of tables that have relationships…most of them actually, and these are represented within the class as well. Since these reference another class, they show up as ‘[InverseProperty]’ and ‘[ForeignKey]’ with references to the class and property. These will show up as ‘virtual’ after the scope designation, because they are not part of our class directly. In PowerShell, this is like those annoying objects that have properties that are actually a completely separate object. I say ‘annoying’ because it always annoys me when sending them out to a CSV, because they usually show up as ‘<System.Object>’ in the output if not handled the right way. The generated code also uses a ‘nameof’ convention, which my brief research seems to indicate has to do with protection against certain types of exploit. How exactly this works and what it does I haven’t dug into yet…for now it’s enough for me to know that it adds security.
Alrighty kids…I think that’s enough for one post. I know I didn’t cover any of the content around the Data or Controller files yet, but there’s a lot of info in this post already, and I have a chronic over sharing issue I’m working on. I plan to write another entry that digs into the DB context (Data) files next, but the short of it is that these files define the DB schema, as well as how to connect to the DB and what kind of provider to use…still working on fully understanding the Controllers, but these are essentially the actual API definitions that we’ll use to interact with our database later from our PowerShell module.
Hopefully someone finds this article helpful.