500 Lines or Less: A Template Engine(模板引擎)

介绍：多数项目都是包含很多的逻辑处理，只有少部分的文字文本处理。编程语言非常擅长这类项目。但是有一些项目只包含了少量的逻辑处理，大量的文本数据处理。对于这些任务，我们期望有一个工具能够很好的处理这些文本问题。模板引擎就是这样的一个工具。在这个章节中，我们会建立一个模板引擎。
对于富文本来说，多数的项目样例就是web应用。web应用中一个重要的方面就是产生浏览器上的HTML。只有很少的HTML网页是静止的。它们至少都拥有一部分的动态数据，比如用户名。一般来说，他们都包含大量的动态数据：产品列表，朋友的更新等等。
于此同时，每个HTML页面都包含了大量的静态文本。并且这些页面都非常的大，包含了上万字节的文本。对于web应用的开发者来说有一个问题必须解决：如何用最小的动态，静态数据生成大的字符串。
为了直观一些，让我们假设产生这样的HTML：
<p>Welcome, Charlie!</p><p>Products:</p><ul>
    <li>Apple: $1.00</li>
    <li>Fig: $1.50</li>
    <li>Pomegranate: $3.25</li></ul>
在这里，用户名是动态的，同样产品名称和价格也是动态的。甚至产品的数目也不是固定的：在某些时刻，将会有更多或者更少的产品展示
通过在代码中产生这些字符串并且把它们连接起来生成HTML是一种方式。动态数据可以通过字符串插入。一些动态数据是重复性的，比如产品清单。这就意味着有大量重复的HTML，这些数据必须分开处理然后和页面的其他数据结合起来。
通过这种方式产生的页面就像下面这种：
# The main HTML for the whole page.
PAGE_HTML = """<p>Welcome, {name}!</p><p>Products:</p><ul>{products}</ul>"""
# The HTML for each product displayed.
PRODUCT_HTML = "<li>{prodname}: {price}</li>
"
def make_page(username, products):
    product_html = ""
    for prodname, price in products:
        product_html += PRODUCT_HTML.format(
            prodname=prodname, price=format_price(price))
    html = PAGE_HTML.format(name=username, products=product_html)
    return html
这样也能工作，但是会造成混乱。由各种字符串变量组成的HTML嵌入到了我们的代码中。这会导致页面的逻辑不直观，因为一个完整的文本被分成了不同的部分。如果要修改HTML页面，前端工程师需要通过编辑python代码来修改页面。想象一下当页面有10倍甚至百倍复杂的时候代码会变成什么样子。很快这种方式就会行不通
模板：
更好产生HTML页面的方式是模板。HTML通过模板来编辑，意味着文件基本是静态的HTML，其中动态数据通过特殊的记号插入进去。上面的玩具页面通过模板表达如下：
<p>Welcome, {{user_name}}!</p><p>Products:</p><ul>
{% for product in product_list %}
    <li>{{ product.name }}:
        {{ product.price|format_price }}</li>
{% endfor %}</ul>
在这里重点是HTML文本，逻辑代码嵌入到HTML中。对比下文本中心的方法和上面的逻辑中心代码。先前的项目都是python代码，HTML页面嵌入到python代码中。现在我们的项目大多数都是静态的HTML标记。
模板里面的编程语言是如何工作的呢？在python中，大多数的源文件都是可执行的代码。如果你需要一个静态的文字文本，需要将其嵌入到字符串中
def hello():
    print("Hello, world!")
 
hello()
当python读到这样的源文件，它会将def hello()解释为可运行的函数。print（“hello,world”）
意味着双括号里面的都是文本。对于大多数编程语言来说，静态部分都是包含在双括号中。
模板文件大多数都是静态的页面，通过特殊的记号来指示可运行的的动态部分。
<p>Welcome, {{user_name}}!</p>
因此在HTML页面中，{{意味着动态模式的开始，user_name变量将会在输出中显示
在python中通过"foo = {foo}!".format(foo=17)”的方式从一个字符串来创建一段文本。模板也采用了类似的方法
为了使用HTML模板，我们必须要有一个模板引擎。这个模板引擎需要将模板中的静态数据和动态数据结合起来。这个引擎的作用就是解释这个模板并且用珍视的数据来替换动态部分。
在python中，下面的三种方式分别代表不同的意义。
dict["key"]
obj.attr
obj.method()
但是在模板语法中，所有的操作同时通过.来表示
dict.key
obj.attr
obj.method
比如
<p>The price is: {{product.price}}, with a {{product.discount}}% discount.</p>
还有一种过滤的方式来修改数据，过滤是通过管道来实现的
<p>Short name: {{story.subject|slugify|lower}}</p>
逻辑判断方式：
{% if user.is_logged_in %}
    <p>Welcome, {{ user.name }}!</p>
{% endif %}
循环方式：
<p>Products:</p><ul>
{% for product in product_list %}
    <li>{{ product.name }}: {{ product.price|format_price }}</li>
{% endfor %}</ul>
 
解析模板有两种方式：1 解释模式 2 编译模式，在这里作者是采用的编译模式。
首先来看下面的HTML代码：
<p>Welcome, {{user_name}}!</p><p>Products:</p><ul>
{% for product in product_list %}
    <li>{{ product.name }}:
        {{ product.price|format_price }}</li>
{% endfor %}</ul>
下面是python代码的方式来生成上面的HTML代码：
def render_function(context, do_dots):
    c_user_name = context['user_name']
    c_product_list = context['product_list']
    c_format_price = context['format_price']
 
    result = []
    append_result = result.append
    extend_result = result.extend
    to_str = str
 
    extend_result([
        '<p>Welcome, ',
        to_str(c_user_name),
        '!</p>
<p>Products:</p>
<ul>
'
    ])
    for c_product in c_product_list:
        extend_result([
            '
    <li>',
            to_str(do_dots(c_product, 'name')),
            ':
        ',
            to_str(c_format_price(do_dots(c_product, 'price'))),
            '</li>
'
        ])
    append_result('
</ul>
')
    return ''.join(result)
通过这段代码，可以看到将不同的HTML代码都加入到列表中，然后通过.join(result)的方式连接成一段完整的代码
下面就开始写引擎：
引擎类：首先是通过一段文本来构造一个引擎类的实例，然后通过render()方法来生成一个完整的页面。
templite = Templite('''    <h1>Hello {{name|upper}}!</h1>    {% for topic in topics %}        <p>You are interested in {{topic}}.</p>    {% endfor %}    ''',
    {'upper': str.upper},
)
# Later, use it to render some data.
text = templite.render({
    'name': "Ned",
    'topics': ['Python', 'Geometry', 'Juggling'],
})
 
代码生成器：
前面介绍过在引擎中，我们需要将模板中的动态表示部分表示成代码来执行，最后生成数据在模板中显示。代码生成器就是用来生成可执行的python代码的。
首先在每个代码生成器中会初始化两个变量:self.code和self.indent_level。code是用来存储python代码的。indent_level是指示代码缩进的。初始值为0
class CodeBuilder(object):
    """Build source code conveniently."""
 
    def __init__(self, indent=0):
        self.code = []
        self.indent_level = indent
add_line是在self.code中插入代码。插入的方式是空格乘以缩进数目+代码行+换行符。这也是为了适配Python代码的格式，因为Python就是通过缩进来区分不同层次的。
def add_line(self, line):
        """Add a line of source to the code.
        Indentation and newline will be added for you, don't provide them.
        """
        self.code.extend([" " * self.indent_level, line, "
"])
 
indent和dedent是用来生成缩进的。缩进步长被初始化为4. indent就是将缩进步长加4, dedent是将缩进步长减4.
INDENT_STEP = 4      # PEP8 says so!
 
    def indent(self):
        """Increase the current indent for following lines."""
        self.indent_level += self.INDENT_STEP
 
    def dedent(self):
        """Decrease the current indent for following lines."""
        self.indent_level -= self.INDENT_STEP
add_section的作用是在每个层次下新增代码。具体实现是通过新实例化一个CodeBuilder实例并且传入缩进的程度。然后插入代码并返回。
def add_section(self):
        """Add a section, a sub-CodeBuilder."""
        section = CodeBuilder(self.indent_level)
        self.code.append(section)
        return section
最后__str__是生成类的表达方式。这里return的其实就是将self.code中的代码连接在一起
def __str__(self):
        return "".join(str(c) for c in self.code)
get_globals的作用就是执行python代码。通过str(self)的方式调用__str__得到完整的python代码。
 
def get_globals(self):
        """Execute the code, and return a dict of globals it defines."""
        # A check that the caller really finished all the blocks they started.
        assert self.indent_level == 0
        # Get the Python source as a single string.
        python_source = str(self)
        # Execute the source, defining globals, and return them.
        global_namespace = {}
        exec(python_source, global_namespace)
        return global_namespace
exec()来执行python代码，并将结果存储在global_namespaces这个字典中。来看一个使用实例：
python_source = """SEVENTEEN = 17
def three():    return 3"""
global_namespace = {}
exec(python_source, global_namespace)
在这里执行global_namespace[SEVENTEEN]得到的是17,global_namespace[three]得到的是3.
接下来我们再来验证下CodeBuilder的作用：
code = CodeBuilder()
code.add_line("def render_function(context, do_dots):")
code.indent()
vars_code = code.add_section()
code.add_line("result = []")
code.add_line("if 'a' in result:")
code.indent()
code.add_line("pass")
code.dedent()
code.add_line("append_result = result.append")
code.add_line("extend_result = result.extend")
code.add_line("to_str = str")
print(code)
每次需要缩进的时候调用indent，然后调用add_line插入代码。缩进完成后调用dedent推出缩进。然后继续插入代码。
运行结果：
def render_function(context, do_dots):
    result = []
    if 'a' in result:
        pass
    append_result = result.append
    extend_result = result.extend
to_str = str
下面来看模板类：
首先在__init__中，传入text以及*contexts。text就是传入的文本，*context是在传入的时候带入的其他变量
    def __init__(self, text, *contexts):
        self.context = {}
        for context in contexts:
            self.context.update(context)
比如进行如下的初始化，contexts就是{'upper': str.upper}并被赋值给self.context 
Templite('''
            <h1>Hello {{name|upper}}!</h1>
            {% for topic in topics %}
                <p>You are interested in {{topic}}.</p>
            {% endif %}
            ''',
            {'upper': str.upper},
        )
定义两个set变量，分别存储所有变量(all_vars)以及在循环中的变量(loop_vars)。
self.all_vars = set()
self.loop_vars = set()
接下来实例化一个代码生成器，在这里定义了一个函数render_function。
code = CodeBuilder()
code.add_line("def render_function(context, do_dots):")
code.indent()
vars_code = code.add_section()
code.add_line("result = []")
code.add_line("append_result = result.append")
code.add_line("extend_result = result.extend")
code.add_line("to_str = str")
1 首先定义了一个函数render_function
2 接着缩进，然后定义了一个代码分段，vars_code. 在后面的实现中将往这个代码分段中添加参数扩展的代码
3 最后是4行固定的代码，定义了列表，字符串以及列表的append以及extend功能
 
 
buffered列表和flush_output:
        buffered = []
        def flush_output():
            """Force `buffered` to the code builder."""
            if len(buffered) == 1:
                code.add_line("append_result(%s)" % buffered[0])
            elif len(buffered) > 1:
                code.add_line("extend_result([%s])" % ", ".join(buffered))
            del buffered[:]
buffered是用来存储在网页中的变量以及网页标记，在后面的代码中将介绍对网页代码的解析，比如遇到{{，{%等的用法。在解析到{{的时候，可以肯定的是这是个变量。因此会调用buffered.append("to_str(%s)" % expr)的方式存储在buffered中。
如果既不是{{，{%，也就是说既不是变量也不是循环体的时候，那么只能是网页，此时调用buffered.append(repr(token))添加到buffered中。
 
那么再来看flush_output的作用，在代码中，当buffered的长度等于1的时候，采用append_result的方法存储数据，当buffered的长度大于1的时候，采用extend_result的方法存储数据。为什么会有这种不同的处理方式呢，来看下下面的实例代码：
    buffer=[]
    buffer.append('<h1>Hello)')
    buffer.append('!</h1>')
    buffer.append('name')
    string_result=[', '.join(buffer)]
    result.append(string_result)
    del buffer[:]
    print(result)
    buffer.append('topic')
    result.append(buffer)
print(result)
运行结果：
['<h1>Hello),!</h1>,name', ['topic']]
当我们采用append的时候，是向列表添加一个对象的object。extend是把一个序列的seq内容添加到列表中。因此可以看到buffer.append(‘topic’)的时候，添加的是[‘topic’]。
如果改成buffer.append('topic')   result.extend(buffer)
那么运行结果将是：['<h1>Hello), !</h1>, name', 'topic']
 
最后来看下flush_output的调用。flush_output是个闭包函数。当解析到{% 开头(循环调用代码)的文本时候，就会首先调用flush_output添加存储在buffered中的数据代码。
 
 
接下来就要进入代码的重点，对于网页文本内容的解析：
1 tokens = re.split(r"(?s)({{.*?}}|{%.*?%}|{#.*?#})", text) 这里采用了正则表达式对网页数据的解析。这里{?s}是模式修饰符：即Singleline(单行模式)。表示更改.的含义，使它与每一个字符匹配（包括换行 符
）
正则表达式的修饰符如下：
(?i)即匹配时不区分大小写。表示匹配时不区分大小写。
(?s)即Singleline(单行模式)。表示更改.的含义，使它与每一个字符匹配（包括换行 符
）。
(?m)即Multiline(多行模式) 。  表示更改^和$的 含义，使它们分别在任意一行的行首和行尾匹配，而不仅仅在整个字符串的开头和结尾匹配。(在此模式下,$的 精确含意是:匹配
之前的位置以及字符串结束前的位置.)   
(?x)：表示如果加上该修饰符，表达式中的空白字符将会被忽略，除非它已经被转义。 
(?e)：表示本修饰符仅仅对于replacement有用，代表在replacement中作为PHP代码。 
(?A)：表示如果使用这个修饰符，那么表达式必须是匹配的字符串中的开头部分。比如说"/a/A"匹配"abcd"。 
(?E)：与"m"相反，表示如果使用这个修饰符，那么"$"将匹配绝对字符串的结尾，而不是换行符前面，默认就打开了这个模式。 
(?U)：表示和问号的作用差不多，用于设置"贪婪模式"。
这里通过tokens将网页不同的数据给区分开来，比如下面下面这段网页代码
text="""
    <h1>Hello {{name|upper}}!</h1>
            {% for topic in topics %}
                <p>You are interested in {{topic}}.</p>
            {% endif %}
            {'upper': str.upper}
"""
解析出来就是下面的结果。
<h1>Hello
{{name|upper}}
!</h1>
{% for topic in topics %}
<p>You are interested in
{{topic}}
.</p>
{% endif %}
{'upper': str.upper}
 
2 对各个部分分别做解析：
 
（一）如果是注释，则直接忽略
            if token.startswith('{#'):
                # Comment: ignore it and move on.
                continue
 
（二）如果是{{，则代表是变量。首先调用self._expr_code(token[2:-2].strip())得到所有的变量
            elif token.startswith('{{'):
                # An expression to evaluate.
                expr = self._expr_code(token[2:-2].strip())
                buffered.append("to_str(%s)" def get_globals(self):
        """Execute the code, and return a dict of globals it defines."""
        # A check that the caller really finished all the blocks they started.
        assert self.indent_level == 0
        # Get the Python source as a single string.
        python_source = str(self)
        # Execute the source, defining globals, and return them.
        global_namespace = {}
        exec(python_source, global_namespace)
        return global_namespace% expr)
这里介绍下__expr_code。我们可能会遇到下面3种形式{{name}},{{user.name}},{{name|func}}. 对于第一种方式的处理很简单。直接采用token[2:-2].strip()的方式就可以提取出来。并最终通过c_name的方式返回
如果是{{user.name}}的方式。处理方式如下
        elif "." in expr:
            dots = expr.split(".")
            code = self._expr_code(dots[0])
            args = ", ".join(repr(d) for d in dots[1:])
            code = "do_dots(%s, %s)" % (code, args)
首先将各个.的变量分离出来。然后传入到do_dots中。这里有两个变量。code和args。比如user.name.那么传入的code就是user,args就是name
那么再来看下do_dots具体干了什么呢，来看下代码：
首先判断所有的args是否是value的属性，如果是的话则直接返回属性的值。如果不是属性的话则一般是字典，通过字典的方式返回对应的值。
def _do_dots(self, value, *dots):
        for dot in dots:
            try:
                value = getattr(value, dot)
            except AttributeError:
                value = value[dot]
            if callable(value):
                value = value()
        return value
 
如果是{{user|func}}。处理方式如下
        if "|" in expr:
            pipes = expr.split("|")
            code = self._expr_code(pipes[0])
            for func in pipes[1:]:
                self._variable(func, self.all_vars)
                code = "c_%s(%s)" % (func, code)
管道的方式其实可以看作是func(user)，func是函数名，user是变量。那么首先将变量提取出来。如果有多个管道。则依次取出这些管道函数，并存储在all_vars这个集合中。最后依次形成函数调用的字符串。比如{{user|func}}，最终的结果就是c_func(c_user). 如果是{{use|func|display}}这种形式，则结果就是c_display(c_func(c_user))
 
(三) 如果是{%, 则进入到了控制或者循环部分。首先调用flush_output()将变量进行存储
如果是if语句。首先将if添加进ops_stack栈，然后构造if语句代码，形成缩进
                if words[0] == 'if':
                    # An if statement: evaluate the expression to determine if.
                    if len(words) != 2:
                        self._syntax_error("Don't understand if", token)
                    ops_stack.append('if')
                    code.add_line("if %s:" % self._expr_code(words[1]))
                    code.indent()
 
如果是for语句，首先将for添加进ops_stack栈，然后通过self._variable(words[1], self.loop_vars)将循环变量添加到loop_vars中。而将循环的对象添加到all_vars中。最后形成for语句以及缩进。比如for topic in topics. topic添加到loop_vars中，而topics则添加到all_vars中
                elif words[0] == 'for':
                    # A loop: iterate over expression result.
                    if len(words) != 4 or words[2] != 'in':
                        self._syntax_error("Don't understand for", token)
                    ops_stack.append('for')
                    self._variable(words[1], self.loop_vars)
                    code.add_line(
                        "for c_%s in %s:" % (
                            words[1],
                            self._expr_code(words[3])
                        )
                    )
                    code.indent()
 
如果是end语句，则代表控制语句的结束。通过end_what = words[0][3:]判断是if还是for循环体的结束。然后通过start_what = ops_stack.pop()将上次对应的控制语句出栈，如果和本次end的控制符语句不相等。则抛出异常。最后code.dedent()完成缩进。
                elif words[0].startswith('end'):
                    # Endsomething.  Pop the ops stack.
                    if len(words) != 1:
                        self._syntax_error("Don't understand end", token)
                    end_what = words[0][3:]
                    if not ops_stack:
                        self._syntax_error("Too many ends", token)
                    start_what = ops_stack.pop()
                    if start_what != end_what:
                        self._syntax_error("Mismatched end tag", end_what)
                    code.dedent()
                else:
                    self._syntax_error("Don't understand tag", words[0])
 
至此所有的解析都已经完成，所有的变量都存储在all_vars和loop_vars中。现在需要将在循环体外的变量提取出来
比如下面的这段文本
<h1>Hello {{name|upper}}!</h1>
                {% for topic in topics %}
                    <p>You are interested in {{topic}}.</p>
                {% endfor %}
all_vars中的变量{'topic', 'upper', 'name', 'topics'}
loop_vars中的变量{'topic'}
而topic其实是属于循环体的变量。因此采用var_name in self.all_vars - self.loop_vars
的方式将循环体外的变量全部提取出来。然后形成提取变量的代码插入到之前的section vars_code中。
        for var_name in self.all_vars - self.loop_vars:
            vars_code.add_line("c_%s = context[%r]" % (var_name, var_name))
 
在__init__的最后添加返回代码，完成缩进。并将_render_function赋值为 code.get_globals()['render_function'].  也就是render_fuction这个函数对象。<function render_function at 0x7f4eb1632510>
 
code.add_line("return ''.join(result)")
code.dedent()
self._render_function = code.get_globals()['render_function']
 
最后一步render函数，这个函数的作用就是更新所有的变量值。这个变量是在self.context上进行更新。最终返回函数的调用并添加进参数。
    def render(self, context=None):
        render_context = dict(self.context)
        if context:
            render_context.update(context)
        return self._render_function(render_context, self._do_dots)
 
比如下面的调用：首先在__init__中{‘upper’:’str.upper}被更新到了self.context中
templite = Templite('''
                <h1>Hello {{name|upper}}!</h1>
                {% for topic in topics %}
                    <p>You are interested in {{topic}}.</p>
                {% endfor %}
                ''',
                        {'upper': str.upper},
                        )
当继续调用templite.render的时候， 'name': "Ned",'topics': ['Python', 'Geometry', 'Juggling']也被更新进self.context并最终传递给render_function
    text = templite.render({
        'name': "Ned",
        'topics': ['Python', 'Geometry', 'Juggling'],
})
 
至此代码就结束了，下面我来实际运行下看下效果：
class var_init(object):
    def __init__(self):
        self.value=1
 
if __name__=="__main__":
    v=var_init()
    templite = Templite('''
                <h1>Hello {{name|upper}}!</h1>
                {% for topic in topics %}
                    <p>You are interested in {{topic}}.</p>
                {% endfor %}
                {% if para.value %}
                    <p>it is true.</p>
                {% endif %}    
                ''',
                        {'upper': str.upper},
                        )
    text = templite.render({
        'name': "Ned",
        'topics': ['Python', 'Geometry', 'Juggling'],'para':v
    })
print(text)
运行结果：
代码打印：
def render_function(context, do_dots):
    c_topics = context['topics']
    c_upper = context['upper']
    c_para = context['para']
    c_name = context['name']
    result = []
    append_result = result.append
    extend_result = result.extend
    to_str = str
    extend_result(['
                <h1>Hello ', to_str(c_upper(c_name)), '!</h1>
                '])
    for c_topic in c_topics:
        extend_result(['
                    <p>You are interested in ', to_str(c_topic), '.</p>
                '])
    append_result('
                ')
    if do_dots(c_para, 'value'):
        append_result('
                    <p>it is true.</p>
                ')
    append_result('    
                ')
    print(result)
    return ''.join(result)
生成的网页打印：
                <h1>Hello NED!</h1>
                
                    <p>You are interested in Python.</p>
                
                    <p>You are interested in Geometry.</p>
                
                    <p>You are interested in Juggling.</p>
                    <p>it is true.</p>
相关阅读:
从mysql中dump数据到本地
浮点数为何不能进行相等性比较
Flume安装
Java 一致性Hash算法的学习
zookeeper 四字命令的使用
Mac Eclipse安装lombok
Linux Tomcat8 启动堆内存溢出
Netty5+Jboss(Marshalling)完成对象序列化传输
Elasticsearch基础
Elasticsearch设置最大返回条数
原文地址：https://www.cnblogs.com/zhanghongfeng/p/8732956.html