Fun with Source Generators

As part of dotnet 5 we got source generators, new compiler infastructure which allows us to leverage all of the metadata and tooling avaliable to Roslyn to generate new source code which is added during a later stage in compilation.

Building off the tech used in Roslyn analyzers, generators are incredibly powerful and allow us to do a lot of things that were previously impractical or impossible. They have unfortunately flown a bit under the radar so far, so here I'd like to give them some much needed attention and show you some of what they're capable of.

A Practical Example

If you've ever created an ASP.Net Core application, you've seen a controller like this:

public class HomeController : Controller
{
    private ILogger<HomeController> _logger;

    public HomeController(ILogger<HomeController> logger)
    {
        _logger = logger;
    }

    public IActionResult Index()
    {
        return View();
    }
}

The dependency injection model forces you to have an extremely noisy constructor which provides no useful information, especially when your controller needs an ILogger and a DatabaseService and an EmailService and so on.

Let's create a source generator to help remove some of this boilerplate.

First, we need a way to signal to the generator that we want it to run for any particular class, which we'll do using an attribute. When the generator finds a class with our attribute, it will create a constructor with every private field in the class. Generators are explicitly additive only; that is, they cannot modify existing code, only add new code. Hence, we will have to declare HomeController as partial, to allow it to exist in multiple places at once.

We're going to make two new projects, a classlib where we will write out source generator, and an mvc app to test it.

dotnet new classlib -f netstandard2.0 -n Generator
dotnet new mvc -n Test

If you are using Visual Studio, generators must have a <TargetFramework> of netstandard2.0 or lower, or you will get arcane compiler errors. The CLI will happily compile with any <TargetFramework>.

We will update HomeController.cs in Test to look like this, so that it fits the constraints we outlined just above:

Test/Controllers/HomeController.cs
[InjectDependencies]
public partial class HomeController : Controller
{
    private ILogger<HomeController> _logger;

    public IActionResult Index()
    {
        return View();
    }
}

We're going to add two dependencies to Generator: Microsoft.CodeAnalysis.Common, and Microsoft.CodeAnalysis.CSharp. These are where the types needed to work with C# source code, and where the interface for creating a source generator live.

Now we create the outline for our source generator:

Generator/DIConstructorGenerator.cs
using Microsoft.CodeAnalysis;

namespace Generator
{
    [Generator]
    class DIConstructorGenerator : ISourceGenerator
    {
        public void Execute(GeneratorExecutionContext context)
        {

        }

        public void Initialize(GeneratorInitializationcontext context)
        {
            // defined below.
            context.RegisterForSyntaxNotifications(() => new SyntaxReciever());
        }
    }
}

Generation happens in two steps. First we recieve a compilation object containing all the code that's going to be compiled (Initialize), which we're going to filer and collect the candidates we want to generate source code for. Then, we generate the code and add it back to the compilation object (Execute)

In practical terms, we need to create a class that implements ISyntaxReceiver and store a list of candidates in it.

internal class SyntaxReciever : ISyntaxReciever
{
    public readonly List<ClassDeclarationSyntax> Candidates = new List<ClassDeclarationSyntax>();

    public void OnVistSyntaxNode(SyntaxNode node)
    {
        // Notice that these fit the constraints we outlined
        // initially, and that we check if the node has at least 1
        // attribute, instead of our `InjectDependenciesAttribute`.
        if (node is ClassDeclarationSyntax syntax
            && syntax.AttributeLists.Count > 0
            && syntax.Modifiers.Any(SyntaxKind.PartialKeyword))
        {
            Candidates.Add(syntax);
        }
    }
}

Moving on to Execute, we're first going to create our InjectDependenciesAttribute, and add it as a source file.

private const string k_injectDependenciesAttribute =
@"namespace Generator
{
    using System;
    [AttributeUsage(AttributeTargets.Class)]
    public class InjectDependenciesAttribute : Attribute
    {
    }
}";

context.AddSource("InjectDependenciesAttribute.g", k_injectDependenciesAttribute);

What should stick out here is that our attribute is just a string. We're not working with any specific format or syntax trees or anything like that - it's all just strings. We're free to build up our generated code however we please.

Our generator doesn't do a whole lot right now, it just creates our attribute for us, but it's a good time to move over to our Test project and see it in action. Source generators are treated a lot like Roslyn analyzers by msbuild and the compiler, and so like analyzers we have to reference them a bit differently.

Test/Test.csproj
<Project SDK="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TargetFramework>net5.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <!-- The two attributes on the project reference are what makes it act as a analyzer/generator, rather than a normal package.  -->
    <ProjectReference Include="../Generator/Generator.csproj" OutputItemType="Analyzer" ReferenceOutputAssembly="false" />
  </ItemGroup>
</Project>

If we go back to our our mvc Test project now and compile it, we should see that it builds correctly, even though we never wrote the InjectDependenciesAttribute as code anywhere! We're still not generating the constructor though, so if we try and use the ILogger it'll throw an exception, so let's get that implemented!

Back to Generator and the Execute function, we're going to make working with our candidates a bit easier. The compiler effectively has two "views" of our code; How it actually looks (syntax) with all the braces and whitespace and sofourth, and what we actually mean by our code (semantics). Broadly, if something inherits from SyntaxNode it's in syntax land, and if it inherits from ISymbol it's in sematics land. We want to get the semantic model of our code because it's far easier to reason about the properties of it (e.g. Is this field readonly? does it have XAttribute? is it nullable?).

We're going to convert our ClassDeclarationsyntax to an INamedTypeSymbol, and filter off everything that doesn't have our InjectDependenciesAttribute.

Generator/DIConstructorGenerator.cs
// We use `StartsWith` here, since the attribute might be 
// written out  fully, I.E. InjectDependenciesAttribute instead
// of  InjectDependencies
static bool HasAttribute(ISymbol symbol, string attributeName)
    => symbol.GetAttributes().Any(attr => attr.AttributeClass.Name.StartsWith(attributeName));

private IEnumerable<INamedTypeSymbol> GetSymbols(GeneratorExecutionContext context)
{
    var candidates = ((SyntaxReciever)context.SyntaxReciever).Candidates
    foreach (var candidate in candidates)
    {
        var model = context.Compilation.GetSemanticModel(candidate.SyntaxTree);
        var symbol = model.GetDeclaredSymbol(candidate);

        if (HasAttribute(symbol, "InjectDependencies"))
        {
            yield return symbol;
        }
    }
}

Now that we have our final list of symbols, we can now generate our constuctor for each of them.

foreach (var symbol in GetSymbols(context))
{
    var privateMembers = symbol.GetMembers()
        .OfType<IFieldSymbol>()
        .Where(member => member.DeclaredAccessiblilty = Accessibility.Private)
        .ToList();

    StringBuilder sb = new StringBuilder($@"namespace {symbol.ContainingNamespace}
{{
    partial class {symbol.Name}
    {{
        public {symbol.Name}(");

    foreach (var member in privateMembers)
    {
        sb.Append($"{member.Type} {member.Name},");
    }

    sb.Length -= 1;
    sb.Append("){");

    foreach (var member in privateMembers)
    {
        sb.AppendLine($"this.{member.Name} = {member.Name};")
    }

    sb.Append("}}}");
    context.AddSource($"{symbol.Name}.privateFieldConstructor", sb.ToString());
}

Unfortunately, emacs' csharp-mode and csharp-tree-sitter both have terrible support for interpolated strings, which makes this harder to read than it should be.

We're using a StringBuilder to help build up our source code. Since we're making a constructor, we need to add all the private members to the method signature, then assign them in the body. We can use member.Type to get the fully qualified type name (e.g. System.Guid), and member.Name to get the name of the field. After writing out the signature of our constructor, we take 1 from sb.Length so that the next time we write to the StringBuilder the last char is overwritten. Our constructor would otherwise have a trailing comma, which is invalid syntax.

... And that's it! No, really. The whole thing is less than 100 lines of code, and actually generating our constructor is only 30.

Rebuilding the Test project now, our controller actually has the ILogger injected via the constructor we generated, and we can even write to it!

This is only scratching the tip of the iceberg of what's possible; the wealth of metadata and tooling around Roslyn means we can generate code for basically any situation or requirements or constraints, or take advantage of GeneratorExecutionContext's AdditionalFiles to take arbitrary file types and generate code from them, like I did for brainfuck. The sky is absolutely the limit with generators.

Source generators have really impressed me so far, and I think they're going to make the next few years of C# development very interesting. I hope you learned something reading this, and if you want to give generators a try yourself the code for this article is on Github.

Thanks for reading.

またね~

Tidbits

Viewing Generated Files

If you want to get a closer look at the code that is being generated, add these MSBuild properties to your .csproj:

<PropertyGroup>
  <CompilerGeneratedFilesOutputPath>$(MSBuildProjectDirectory)/Generated</CompilerGeneratedFilesOutputPath>
  <EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles>
</PropertyGroup>

When you next build the project the directory /Generated will be created, containing sub-directories for all of the source generators and their generated files.

Used just like this, you will have to manually delete /Generated each time you build the project, since these source files are now part of the project tree and picked up by the compiler at the start of the build process. This causes the compiler to think there is duplicate code when the source generators are called. We can fix this with a new build target.

<Target Name="ExcludeGenerated" BeforeTargets="AssignTargetPaths">
  <ItemGroup>
    <Generated Include="Generated/**/*.cs" />
    <Compile Remove="@(Generated)" />
  </ItemGroup>
  <Delete Files="@(Generated)" />
</Target>

Nuget Package

If we want to make our source generator into a NuGet package, we've got to mess around with our csproj a little more. In our Test project, we added special attributes to the <ProjectReference> which told the build system to treat it as a Roslyn analyzer, rather than a normal dll. When creating a NuGet package, we need instead need to do this from the generator's side.

<PropertyGroup>
  <GeneratePackageOnBuild>true</GeneratePackageOnBuild>
  <IncludeBuildOutput>false</IncludeBuildOutput>
</PropertyGroup>

<ItemGroup>
  <None Include="$(OutputPath)/$(AssemblyName).dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" />
</ItemGroup>

The important lines to look at here are <IncludeBuildOutput>, which stops msbuild from copying the built dll to the default location in the .nupkg, and the line beginning <None>, which tells the build system to put our compiled assembly in the "analyzers" folder.

Running dotnet pack now should produce a .nupkg which correctly functions as a source generator when installed.

.Net Framework

Source generators do work with old .Net Framework projects, assuming you have a .Net 5 or higher compiler. If you're using <ProjectReference>, everything will just sort of work out of the box, no extra configuration required, but a NuGet package is a little more difficult to work with.

This has to do with how analyzers work in old style msbuild projects, that is, the dlls have to be explicitly added to the .csproj in an <Analyzer> tag, but NuGet is too stupid to manage this for you.

To handle this programmatically, we need to create install and uninstall Powershell scripts, which I have left documented examples of in the Github repository.

Consuming Dependencies

If our generator relies on a package at runtime, we can add dependencies to our generator as normal. When publishing a NuGet package however and if we included them normally they would become runtime dependencies for downstream packages, which is not ideal.

The way around this is to fiddle around with our csproj some more.

<!-- Assume we also have everything from the NuGet Package section. -->
<ItemGroup>
  <PackageReference Include="Newtonsoft.Json" Version="13.0.1" PrivateAssets="all" GeneratePathProperty="true" />
  <None Include="$(PkgNewtonsoft_Json)/lib/netstandard2.0/*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" />
</ItemGroup>

I've got you all fooled thinking this is a blogpost about C#, it's clearly about msbuild/xml.

Just like we copy our generator's dll to a specific location when making a NuGet package normally, we tell msbuild here that Newtonsoft.Json isn't a required dependency of ours (PrivateAssets="All"), and copy the files we need directly to the correct folder ourselves.

Since NuGet pulls from a global cache we use GeneratePropertyPath to get the variable PkgNewtonsoft_Json, which will resolve to the directory of the package at compile time.

Resources

I used all of these links while writing this article.