Oleg Sych - » Pros and Cons of T4 in Visual Studio 2008
Pros and Cons of T4 in Visual Studio 2008
T4 is a fully-featured, template-based code generation engine built into Visual Studio 2008. It offers rich functionality comparable to leading .NET code generation tools such as CodeSmith, MyGeneration and NVelocity. However, several areas of T4 could be improved: template editing, debugging and integration with Visual Studio. At the foundation level, T4 needs provide support for component-oriented programming and a more powerful code generation framework. The main reason preventing a wider adoption of T4 today is the limited number of readily available code generators. This article provides detailed analysis of pros and cons of T4 from the standpoint of an application developer.
What works great in T4
Proven, template-based approach to code generation. T4 template syntax is similar to ASP.NET, ASP, and other web development platforms. It is intuitively familiar to most application developers today, making them immediately productive and significantly reducing the learning curve.
Power. With a two-stage code generation process T4 allows you to use modern .NET languages - C# or Visual Basic to implement your code generators. Assembly directive gives you access to any of the available .NET APIs and class feature blocks allow you to use virtually any programming construct available in the language.
Simplicity. There is no new language to learn, no new tool to install. T4 did not fall into the XML trap: there is no custom XML-based language to describe configuration of your code generator, no custom API to access the configuration, no custom UI to configure it and no limitations on what can be configured. You can do it all in plain old .NET code, which you already know how to do. Of course, if you prefer using XML configuration files, you can do that as well.
Visual Studio integration. T4 is installed as part of Visual Studio 2008. Code generation is (almost) invisible, as it should be. Visual Studio automatically generates the output file when you add a code generator to your project. It also regenerates the output file when change the code generator. The output file is automatically added to the project and source control and checked out if necessary.
Price. It is free, assuming you already own Visual Studio.
Support. It is a supported Microsoft product.
What different developers need from T4
T4 is a solid code generation engine, but nothing is perfect, of course. There are some things that could be improved. However, different groups of T4 users have different, sometimes conflicting needs and the answers to questions like “what could be improved” and “how it needs to be done” will depend on who you ask.
Tool developers create, package and distribute code generators as a product, such as a Visual Studio add-on or a standalone utility. For this audience, it is important to seamlessly integrate code generation with modeling tools, target multiple output languages from a single code generator and protect intellectual property. People in this group are typically experienced developers, so increased complexity will not be as important as increased code reuse or improved functionality of the product.
Application architects use, customize and create code generators as tools to solve business problems. They define parts of the application code that will be generated and how code generators will be used in the development process. Application architects choose a set of available built-in and third-party code generators to produce as much of the “plumbing” code as possible. For this audience, it is important to have a large number of ready-to-use code generators and be able to quickly customize code generators to fit the needs of a particular application. Application architects are typically experienced developers who will choose increased flexibility and shorter initial time over added complexity of using a particular code generator.
Application developers use generated “plumbing” code to write “real” code - business logic and user interface. Application developers use code generators over and over again and need them to be easy to use and require as few manual steps as possible. People in this group may be entry- or mid-level developers who prefer simplicity and may need to be shielded from extra complexity added by code generation. On the other hand, people in this role may be experienced developers also performing duties of an application architect.
Although requirements for these user groups largely overlap, there are two major trade-offs. On one hand, tool developers need to target multiple output languages with a single code generator. On the other hand, application architects need to generate code for just one language, but as quickly as possible. Tool developers want to simplify deployment and protect intellectual property by packaging code generators inside of .NET assemblies. However, application architects need to be able to customize virtually any code generator at their disposal. When a built-in code generator doesn’t satisfy application requirements and doesn’t allow customization, the application architect may be forced to choose a third-party replacement, which is immediately makes application developer’s job more difficult.
What can be improved in T4
To avoid making self-contradicting suggestions, I will argue that the needs of application architects and developers outweigh the needs of tool developers. As the booming market of third-party code generators shows, we can be certain that no packaged code generator will ever be perfect and that application architects will always want to generate code their way. Tool developers need to provide ability to quickly customize their code generators or risk their tools becoming irrelevant in mainstream software development.
Please keep in mind that I am not trying to diminish the needs of tool developers. I am only taking this standpoint in this article to make it more useful for Gareth, Jean-Mark and the rest of their team as I am sure they are getting plenty of feedback from other tool developers using T4 at Microsoft.
On a similar note, I will argue that needs of application developers outweigh the needs of application architects. There are many modeling and code generation products designed to be as powerful and as flexible as possible and satisfy even the most demanding software architect. However, many of these tools are too difficult to understand and cumbersome to use to make an average application developer more productive, which makes these tools equally irrelevant in mainstream software development.
Editing experience
Development experience for editing of T4 files in Visual Studio currently lacks some features we came to take for granted over the past decade. There is no built-in syntax coloring for T4 files, which makes it hard to distinguish generated application code in text blocks from the template code in code blocks. There is also no built-in IntelliSense for T4 directives and for C#/Visual Basic code. As a workaround, you can use T4 Editor by Tangible Engineering which provides syntax coloring and IntelliSense.
IntelliSense is by far the first productivity feature that developers look for in a modern programming environment. However, the underlying programming platform itself determines how accurate the IntelliSense will be or if it will even be possible. The lack of support for component-oriented programming in T4 makes it impossible for tools like the T4 Editor to provide meaningful IntelliSense when editing T4 fragments - .tt files that contain fragments of the larger code generator, assembled from multiple files with include directives.
Debugging experience
Visual Studio doesn’t allow precise placing of breakpoints in T4 code. Developers have to use calls to Debugger.Break method as a work around. This becomes a hassle when you need to debug a T4 file written by someone else, such as a third-party code generator, which may be read-only or checked into source control.
In order to debug a T4 file, you currently have to use a second instance of Visual Studio attached as a debugger to the first instance where you are editing the file. This limitation may be the result of T4 performing the code generation running inside of the Visual Studio process itself (devenv.exe). Perhaps changing it to run in a separate process would allow using a single instance of Visual Studio to both edit and debug T4 code.
Discovering how to debug a T4 file is not easy. You need to know to either trigger the Just-In-Time debugger with a call to Debugger.Break or attach a second instance of Visual Studio as a debugger to the first. It would be nice to be able to start debugging of a T4 file by right-clicking it in Solution Explorer and selecting “Debug” from the context menu. This would be similar to how you start debugging of a C# or Visual Basic project and would be intuitive for most developers.
Visual Studio integration
If you want to use a T4 file to generate code based on another (input) file in your project, such as a LINQ to SQL dbml file, currently you have to have a separate .tt file in addition to the input file itself. The .tt file would typically include a reusable T4 file to do the actual code generation. For well-packaged T4 code generators this unnecessarily increases the number of .tt files you have to maintain. It would be nice for T4 to allow you to associate a single .tt file as a custom tool with an input file directly. In other words, allow using T4 files as custom tools without having to implement and register IVsSingleFileGenerator.
In order to make T4 files work as custom tools, we would also need the ability to associate code generation parameters with the input files. Standard custom tools in Visual Studio support a single parameter - CustomToolNamespace editable in Properties Window and stored in the project file. Letting users specify a list of parameters for a code generator would make T4 custom tools both easier to use and more powerful.
In order to access parameters specified by the user in Visual Studio from T4 template code, we also need a standard mechanism for defining and accessing external parameters. The property directive added to T4 by the Guidance Automation Toolkit is a good starting point as it defines parameter name, type, converter and editor. However, I would recommend against using the GAT implementation of this directive because it relies on a custom T4 engine host. Instead, the approach used by Clarius in the implementation of this directive included with their T4 Editor relies on CallContext, is a lot more robust and elegant.
Another side effect of not being able to associate T4 files directly with their input files is that a change made in the input file does not automatically result in regeneration of the output. For simple scenarios where a code generator depends on a single input file, the custom tool solution suggested above would be sufficient. However, for more complex scenarios, where a single code generator uses multiple input files, it would be nice for T4 to provide an explicit mechanism for describing external input files that should trigger template transformation. Visual Studio would need to be able to get this information from a .tt file and automatically run the code generator when one or more of its input files are changed. A new processing directive may work better for this purpose.
Visual Studio projects provide rich metadata that can be very useful during code generation. Unfortunately, in order to access this metadata from a T4 template, you have to use Visual Studio extensibility APIs.It would be nice for T4 to provide easy access to project and environment variables. Although the immediate suggestion would be to allow using MSBuild-like syntax - $(VariableName) in both processing directives and code blocks, this may fundamentally change the T4 language and not be feasible to implement.
Visual Studio provides a limited support for single-time code generation in project item templates based on simple parameter substitution. However, this approach is very limited and inferior compared to T4 templates. Guidance Automation Toolkit provides an extension that does allow you to use T4 templates in the Visual Studio project item templates. Unfortunately, this implementation uses a custom T4 engine host which significantly changes and limits the built-in assembly and include directives. This custom host also leaks memory by loading compiled transformation assemblies into the main AppDomain. It would be nice for T4 itself to provide an extension for using T4 in Visual Studio project item templates. Implementation of this extension would also rely on the standard mechanism for defining and accessing external parameters described above.
Component-oriented programming
T4 provides include directive and class feature blocks that allow you to build complex code generators from simpler parts defined in separate files. Unfortunately, several shortcomings prevent you from building these parts in a truly component-oriented manner. Let’s define a T4 component as a self-contained .tt file that defines one or more .NET classes along with all of their template processing requirements and dependencies on other components defined in external .NET assemblies or other .tt files. An ideal T4 component is readily available for reuse in another T4 component or code generator by adding a single include directive that points to the location of the component’s .tt file. Think of T4 component as if it’s a .NET assembly, only its not compiled and has to be executed by the T4 engine.
Although T4 allows using relative paths in the include directive, when a T4 file includes a second file, which in its turn includes a third file, T4 will resolve paths relative to the root file only. As a workaround, you have to either reduce the number of include levels, encode knowledge of the root file location in the included files or rely on the IncludeFolders registry setting. Each of these options breaks encapsulation and creates additional challenges when T4 code generators are used in team environment. It would be nice for T4 to resolve include paths relative to the location of the file containing the include directive.
When the same T4 file is referenced multiple times with separate include directives, T4 will add multiple copies of its contents to the GeneratedTextTransformation, one for each include directive that references it. So if a particular T4 file defines a reusable class called Helper, which is referenced by several T4 files that are then combined in a complex code generator, the resulting GeneratedTextTransformation will have several definitions of the Helper class and will not compile. As a workaround, you currently have to make sure that each T4 file is included only once, which breaks encapsulation of T4 components and forces you to define dependencies of individual components in the topmost T4 file. It would be nice for T4 to use a single copy of the file referenced by multiple include directives.
It is interesting that when the same .NET assembly is referenced multiple times with separate assembly directives, T4 will compile the GeneratedTextTransformation successfully. However, this appears to be a coincidence rather than the intended behavior in current implementation. T4 actually passes multiple assembly references to C# and Visual Basic compilers, which are smart enough to ignore the duplicates. However, to make reading the .cmdline files easier, it would be nice for T4 to use a single compiler reference for the assembly referenced by multiple assembly directives.
When an included T4 file has an import directive, it has a global effect on all other T4 files included by the GeneratedTextTransformation. As a result, when trying to reuse a separately developed T4 file to build a code generator, the user may run into unexpected naming collisions by simply including a file with an import directive. As a workaround, developers have to avoid using the import directive in T4 files intended for reuse and use fully qualified type names instead, making T4 code unnecessarily verbose and difficult to read. It would be nice for T4 to limit scope of the import directive to the file in which it is used. The current limitation may be the result of T4 engine merging content of all included files into a single source file for compilation. This limitation could be eliminated by changing the T4 engine to keep included files in separate partial source files that are compiled together.
When more than one of the included files uses a template or an output directives, T4 currently uses the first directive, ignores the rest and displays a warning in the Error List of Visual Studio. This becomes a problem with the hostspecific parameter of the template directive. When a T4 component requires the GeneratedTextTransformation to be host-specific, we want to specify it in the T4 component that defines it. Unfortunately, it is not possible because multiple template directives will always conflict and the random first directive will win. As a workaround, you have to use the template and output directives only in the top-most T4 file, which breaks encapsulation of a T4 component by moving its internal interdependencies outside of its definition. It would be nice for T4 to allow multiple, non-conflicting instances of template and output directives. This way if two T4 components set hostspecific parameter of the template directive to true, there is no conflict and the user can use a third template directive in their T4 file to set debug parameter of the template directive without overwriting the hostspecific parameter. A warning or an error should be produced only when two instances of the same directive specify conflicting parameter values.
Code Generation Framework
T4 is currently designed to produce a single output file per template file. While this works well for the simplest code generation tasks, it is a significant limitation for real-world business application projects. Using DAL code generation as an example, the amount of generated code can reach tens of thousands lines of code for a mid-size project and hundreds of thousands for a large project. Managing a single generated file of this size is very difficult. As a workaround, you can create multiple T4 files and break the generated code into several output files. However, having to maintain multiple T4 files for a single logical input increases complexity, chances of errors and doesn’t scale will with team size. To be relevant to needs of application developers and architects on mid- to large-scale projects, T4 must allow generating multiple output files from a single code generation file.
Although it is possible to use proven object-oriented design techniques to make T4 templates reusable and extensible, the vast majority of the T4 templates available in Visual Studio SDK and on the Internet rely on making changes directly in the original source code for extensibility. This approach is very fragile and will result in a significant amount of rework required to maintain customizations as new versions of the widely adopted T4 code generators are released. It would be nice for T4 to define an explicit extensibility model and provide guidance on good template design.
It is my belief that the best way to accomplish these goals is by providing an object-oriented framework that can be used and extended inside of standard T4 code blocks. .NET framework design is a well established engineering discipline. On the other hand, extending template syntax with specialized processing directives to accomplish these goals means extending the T4 programming language. Design of new programming languages is still closer art than science; it is difficult to get right and takes a lot more time and effort.
Ready-to-use code generators
Developers will be happy to live with the current limitations of T4 if it provides enough immediate value in the form of ready-to-use code generators to justify the initial learning curve. Unfortunately, the limited number of ready-to-use code generators coming from Microsoft teams are currently embedded in heavy software factories. It would be nice for T4 provide a number of ready-to-use code generators that don’t require additional investment. From application development standpoint, it would be ideal if all code generators in Visual Studio were implemented in T4 or allowed using T4 to extend them.