Effective C# 原则32:选择小而内聚的程序集
Item 32: Prefer Smaller, Cohesive Assemblies
这一原则实际应该取这个名字:“应该创建大小合理而且包含少量公共类型的程序集”。但这太沉长了,所以就以我认为最常见的错误来命名:开发人员总是把所有的东西,除了厨房里水沟以外(译注:夸张说法,kitchen sink可能是个口语词,没能查到是什么意思,所以就直译了。),都放到一个程序集。这不利于重用其中的组件,也不利于系统中小部份的更新。很多以二进制组件形式存在的小程序集可以让这些都变得简单。
然而这个标题对于程序集的内聚来说也很醒目的。程序集的内聚性是指概念单元到单个组件的职责程度。聚合组件可以简单的用一句话概括,你可以从很多.Net的FCL程序集中看到这些。有两个简单的例子:System.Collections程序集就是负责为相关对象的有序集提供数据结构,而System.Windows.Forms程序集则提供Windows控件类的模型。Web form和Windows Form在不同的程序集中,因为它们不相关。你应该用同样的方式,用简单的一句话来描述你的程序集。不要玩花样:一个MyApplication程序集提供了你想要的一切内容。是的,这也是简单的一句,但这也太刁懒了吧,而且你很可能在My2ndApplication(我想你很可能会要重用到其中的一些内容。这里“其中的一些内容”应该放到一个独立的程序集中。)程序集并不须要使用所有的功能。
你不应该只用一个公共类来创建一个程序程序集。应该有一个折衷的方法,如果你太偏激,创建了太多的程序集,你就失去了使用封装的一些好处:首先就是你失去了使用内部类型的机会,内部类型是在一个程序集中与封装(打包)无关的公共类(参见原则33)(译注:简单的说,内部类型就是只能在一个公共的程序集中访问类,程序集以外限制访问)。JIT编译器可以在一个程序集内有很的内联效率,这比起在多程序集中穿梭效率要高得多。这就是说,在一个程序集中放置一些相关的类型对你是有好处的。我们的目标就是为我们的组件创建大小最合适的程序集。这一目标很容易实现,就是一个组件应该只有一个职责。
在某些情况下,一个程序集就是类的二进制表现形式,我们用类来封装算法和存储数据。只有公共的接口才能成为“官方”的合约,也就是只有公共接口才能被用户访问。同样,程序集为相关类提供二进制的包,在这个程序集以外,只有公共和受保护的类是可见的。工具类可以是程序集的内部类。确实,它们对于私有的嵌套类来说它们应该具有更更宽的访问范围,但你有一个机制可以共享程序集内部通用的实现,而不用暴露这个实现给所有的用户。那就是封装相关类,然后从程序集中分离成多个程序。
其实,使用多程序集可以让很多不同布署选项变得很简单。考虑一个三层应用程序,一部份程序以智能客户端的形式在运行,而另一部份则是在服务器上运行。你在客户端上提供了一些验证原则,用于确保用户反馈的数据输入和修改是正确的。而在服务器上你又要重复这些原则,而且复合一些验证以保证验证更严格。而这些在服务器端的业务原则应该是一个完整的集合,而在每个客户端上只是一个子集。
确实,你也可以通过重用源文件来为客户端和服务器的业务原则创建不同的程序集,但这对你的布署机制来说会成为一个复杂的问题。当你更新这些业务原则时,你就有两个安装要完成。相反,你可以从严格的服务器端验证中分离一部分验证,封装成不同的程序集放置到客户端。这样,你就重用封装成程序集的二进制对象。这比起重用代码或者资源,重新编译成多个程序集要好得多。
做为一个程序,应该是一个包含相关功能的组织结构库。这已经是大家熟悉的了,但在实际操作中却很难实现。实际上,对于一个分布式应用程序,你可能不能提前知道哪些类应该同时分布到服务器和客户端上。即使可能,服务端和客户端的功能也有可能是流动的;你将来很有可能要面临两边都要处理的地步。通过尽可能能的让程序集小,你就有可能更简单的重新布署服务器和客户端。程序集是应用程序的二进制块,对于一个工作的应用程序来说,很容易添加一个新的组件插件。如果你不小心出了什么错误,创建过多的程序集要比个别很太的程序要容易处理得多。
我经常程序集和二进制组件类似的看作是Lego。你可以很容易的抽出一个Lego然后用另一个代替。同样的,对于有相同接口的程序集来说,你应该可以很容易的把它抽出来然后用一个新的来替换。而且程序其它部份应该可以继续像往常一样运行。这和Lego有点像,如果你的所有参数和返回值都是接口,那么任何一个程序集就可以很容易的用另一个有相同接口的来代替(参见原则19)。
更小的程序集同样可以让你对程序启动时的开销进行分期处理。更大的程序要花上更多的CUP时间来加载,以及更多的时间来编译必须的IL到机器指令。应该只在启动时JIT一些必须的内容,而程序集是整个载入的,而且CLR要为程序集中的每个方法保存一个存根。
稍微休息一下,而且确保我们不会走到极端。这一原则是确保你不会创建出单个单片电路的程序,而是创建基于二进制的整体系统,而且是可重用的组件。不要参考这一原则而走到另一个极端。一个基于太多小程序集的大型应用程序的开销是相关的。如果你的程序使用了太多的程序集,那么在程序集之间的穿梭会产生更多的开销。在加载更多的程序集并转化IL为机器指令时,CLR的加载器有一点额外的工作要完成,那就是调整函数入口地址。
同样,以程序集之间穿梭时,安全性检查也会成为一个额外的开销。同一个程序集中的所有的代码具有相同的信任级别(并不是同样的访问级别,而是可信级别)。 无论何时,只要代码访问超出了一个程序集,CLR都要完成一些安全验证。程序花在程序集间穿梭的时间越少,相对程序的效率就更高。
这些与性能相关的说明并没有一个是劝阻你把一个大程序集分离成小程序集的。性能的损失是其次的,C#和.Net的设计是以组件为核心思想的,更好的伸缩性通常更有价值。
那么,你决定一个程序集中放多少代码或者多少类呢?更重要的是,你是如何决定哪些代码应该在一个程序集中?这很大程度上取决于实际的应用程序,因此这并没有一个确论。我这里有一个推荐:通过观察所有的公共类开始,用一个公共基类合并这些类到一个程序集中。然后添加一些工具类到这个程序集中,这些工具类主要是负责提供所有相关类的功能。把相关的公共接口封装到一个独立的程序集中。最后一步,查看那些在应用程序中横向访问的对象,这些是有可能成为广泛使用的工具程序集的候选对象,它们可能会包含在应用程序的工具库中。
最后的结果就是,你的组件只在一个简单的相关集合中,这个集合中只有一些必须的公共类,以及一些工具类来支持它们。这样,你就创建了一个足够小的程序集,而且很容易从更新和重用中得到好处,同时也在最小化多个程序集相关的开销。一个设计好的内聚组件可以用一句话来概括。例如,“Common.Storage.dll 用管理所有离线用户数据缓存以及用户设置。”就描述了一低内聚的组件。相反,做两个组件:“Common.Data.dll 管理离线数据缓存。Common.Settings.dll 管理用户设置。” 当你把它们分开后,你可能还要使用一个第三方组件:“Common.EncryptedStorage.dll 为本地加密存储管理文件系统IO” ,这样你就可以独立的更新这三个组件了。
小,是一个相对的条件。Mscorlib.dll就大概有2MB,System.Web. RegularExpressions.dll却只有56KB。但它们都满足小的核心设计目标,重用程序集:它们都包含相关类和接口的集合。绝对大小的不同应该根据功能的不同来决定:mscorlib.dll包含了所有应用程序中要使用的最底层的类。而System.Web.RegularExpressions.dll却很特殊,它只包含一些在Web控件中要使用的正则表达式类。这就创建了两种不同类型的组件:一个就是小,而大的程序集则是集中在特殊的功能上,广泛应用的程序集包含通用的功能。不论哪种情况,应该它们尽可能合理的小,直到不能再小。
======================
Item 32: Prefer Smaller, Cohesive Assemblies
This item should really be titled "Build Assemblies That Are the Right Size and Contain a Small Number of Public Types." But that's too wordy, so I titled it based on the most common mistake I see: developers putting everything but the kitchen sink in one assembly. That makes it hard to reuse components and harder to update parts of a system. Many smaller assemblies make it easier to use your classes as binary components.
The title also highlights the importance of cohesion. Cohesion is the degree to which the responsibilities of a single component form a meaningful unit. Cohesive components can be described in a single simple sentence. You can see this in many of the .NET FCL assemblies. Two examples are: the System.Collections assembly provides data structures for storing sets of related objects and the System.Windows.Forms assembly provides classes that model Windows controls. Web forms and Windows Forms are in different assemblies because they are not related. You should be able to describe your own assemblies in the same fashion using one simple sentence. No cheating: The MyApplication assembly provides everything you need. Yes, that's a single sentence. But it's also lazy, and you probably don't need all of that functionality in My2ndApplication (though you'd probably like to reuse some of it. That "some of it" should be packaged in its own assembly).
You should not create assemblies with only one public class. You do need to find the middle ground. If you go too far and create too many assemblies, you lose some benefits of encapsulation: You lose the benefits of internal types by not packaging related public classes in the same assembly (see Item 33). The JIT compiler can perform more efficient inlining inside an assembly than across assembly boundaries. This means that packaging related types in the same assembly is to your advantage. Your goal is to create the best-sized package for the functionality you are delivering in your component. This goal is easier to achieve with cohesive components: Each component should have one responsibility.
In some sense, an assembly is the binary equivalent of class. We use classes to encapsulate algorithms and data storage. Only the public interfaces are part of the official contract, so only the public interfaces are visible to users. In the same sense, assemblies provide a binary package for a related set of classes. Only public and protected classes are visible outside an assembly. Utility classes can be internal to the assembly. Yes, they are more visible than private nested classes, but you have a mechanism to share common implementation inside that assembly without exposing that implementation to all users of your classes. Partitioning your application into multiple assemblies encapsulates related types in a single package.
Second, using multiple assemblies makes a number of different deployment options easier. Consider a three-tiered application, in which part of the application runs as a smart client and part of the application runs on the server. You supply some validation rules on the client so that users get feedback as they enter or edit data. You replicate those rules on the server and combine them with other rules to provide more robust validation. The complete set of business rules is implemented at the server, and only a subset is maintained at each client.
Sure, you could reuse the source code and create different assemblies for the client and server-side business rules, but that would complicate your delivery mechanism. That leaves you with two builds and two installations to perform when you update the rules. Instead, separate the client-side validation from the more robust server-side validation by placing them in different assemblies. You are reusing binary objects, packaged in assemblies, rather than reusing object code or source code by compiling those objects into the multiple assemblies.
An assembly should contain an organized library of related functionality. That's an easy platitude, but it's much harder to implement in practice. The reality is that you might not know beforehand which classes will be distributed to both the server and client portions of a distributed application. Even more likely, the set of server- and client-side functionality will be somewhat fluid; you'll move features between the two locations. By keeping the assemblies small, you'll be more likely to redeploy more easily on both client and server. The assembly is a binary building block for your application. That makes it easier to plug a new component into place in a working application. If you make a mistake, make too many smaller assemblies rather than too few large ones.
I often use Legos as an analogy for assemblies and binary components. You can pull out one Lego and replace it easily; it's a small block. In the same way, you should be able to pull out one assembly and replace it with another assembly that has the same interfaces. The rest of the application should continue as if nothing happened. Follow the Lego analogy a little farther. If all your parameters and return values are interfaces, any assembly can be replaced by another that implements the same interfaces (see Item 19).
Smaller assemblies also let you amortize the cost of application startup. The larger an assembly is, the more work the CPU does to load the assembly and convert the necessary IL into machine instructions. Only the routines called at startup are JITed, but the entire assembly gets loaded and the CLR creates stubs for every method in the assembly.
Time to take a break and make sure we don't go to extremes. This item is about making sure that you don't create single monolithic programs, but that you build systems of binary, reusable components. You can take this advice too far. Some costs are associated with a large program built on too many small assemblies. You will incur a performance penalty when program flow crosses assembly boundaries. The CLR loader has a little more work to do to load many assemblies and turn IL into machine instructions, particularly resolving function addresses.
Extra security checks also are done across assembly boundaries. All code from the same assembly has the same level of trust (not necessarily the same access rights, but the same trust level). The CLR performs some security checks whenever code flow crosses an assembly boundary. The fewer times your program flow crosses assembly boundaries, the more efficient it will be.
None of these performance concerns should dissuade you from breaking up assemblies that are too large. The performance penalties are minor. C# and .NET were designed with components in mind, and the greater flexibility is usually worth the price.
So how do you decide how much code or how many classes go in one assembly? More important, how do you decide which code goesin an assembly? It depends greatly on the specific application, so there is not one answer. Here's my recommendation: Start by looking at all your public classes. Combine public classes with common base classes into assemblies. Then add the utility classes necessary to provide all the functionality associated with the public classes in that same assembly. Package related public interfaces into their own assemblies. As a final step, look for classes that are used horizontally across your application. Those are candidates for a broad-based utility assembly that contains your application's utility library.
The end result is that you create a component with a single related set of public classes and the utility classes necessary to support it. You create an assembly that is small enough to get the benefits of easy updates and easier reuse, while still minimizing the costs associated with multiple assemblies. Well-designed, cohesive components can be described in one simple sentence. For example, "Common.Storage.dll manages the offline data cache and all user settings" describes a component with low cohesion. Instead, make two components: "Common.Data.dll manages the offline data cache. Common.Settings.dll manages user settings." When you've split those up, you might need a third component: "Common.EncryptedStorage.dll manages file system IO for encrypted local storage." You can update any of those three components independently.
Small is a relative term. Mscorlib.dll is roughly 2MB; System.Web. RegularExpressions.dll is merely 56KB. But both satisfy the core design goal of a small, reusable assembly: They contain a related set of classes and interfaces. The difference in absolute size has to do with the difference in functionality: mscorlib.dll contains all the low-level classes you need in every application. System.Web.RegularExpressions.dll is very specific; it contains only those classes needed to support regular expressions in Web controls. You will create both kinds of components: small, focused assemblies for one specific feature and larger, broad-based assemblies that contain common functionality. In either case, make them as small as what's reasonable, but not smaller.