递归创建决策树

递归创建决策树
一、什么是递归？
- 在函数内部，可以调用其他函数，如果一个函数内部调用自己本身，这个函数就叫做递归函数。
  - PS : 在函数内部调用其他函数不是函数的嵌套，而在函数的内部定义子函数才是函数的嵌套。
- 递归的特性：
  - 递归函数必须有一个明确的结束条件
  - 每进入更深一层的递归时，问题规模相对于上一次递归都应减少
  - 相邻两次重复之间有紧密的联系，前一次要为后一次做准备（通常前一次的输出作为后一次的输入）
  - 递归的效率不高，递归层次过多会导致栈溢出（在计算机中，函数调用是通过栈（stack）这种数据结构实现的，每当进入一次方法调用，栈就会加一层栈帧，每当返回一层栈帧，栈就会减一层栈帧。由于栈的大小不是无限的，所以，递归调用的次数过多，会导致栈溢出）
- 先看一个例子，一个关于实现叠加的两种方法的例子：
```
import sys
#通过循环来实现叠加
def sum1(n):
    '''
    1 to n,The sum function
    '''
    sum = 0
    for i in range(1,n + 1):
        sum += i
    return sum

#通过函数的递归来实现叠加
def sum2(n):
    '''
    1 to n,The sum function
    '''
    if n > 0:
        return n + sum_recu(n - 1)　　　　#调用函数自身
    else:
        return 0

print("循环叠加-->",sum1(100))
print("递归叠加-->",sum2(100))

#两者实现的效果均是：5050
```
  - 从上述的例子可以看出，两者都实现了叠加的效果，那么后者相对于前者有什么优点和缺点？
二、递归函数有什么优缺点？
- 递归函数的优点
  - 定义简单，逻辑（logic）清晰。理论上，所有的递归都可以写成循环的方式，但循环的逻辑不如递归清晰
- 递归的缺点
  - 递归调用的次数过多，会导致栈溢出（stackoverflow）
三、我们使用递归函数创建决策树
- Implement the function build_tree(rows). This is the function we use to actually build our tree. Please follow the steps below,
  - We will be using recursive function here (递归函数)
  - Find the best split using the method we implemented before, store information gain and the question to a local variable
  - Define the ending condition. If there is no gain, i.e. gain == 0, return a leaf node Leaf(rows)
  - Otherwise, get the partition of the tree at the current node with the best question(Determine object that we got before)
  - We use DFS(Depth First Search) to build the tree, and do the true_branch recursively first.
  - We then split the false_branch recursively
  - At last, we need to return something. We will return a DecisionNode object here since the starting point is also a DecisionNode
  - Notes:
    
    This function might take you some time and thinking. Be patient
    
    You need to understand the logic behind our DT before you even start to think. Talk to me if you are not feeling confident enough
    
    Look up recursive function and depth first search if necessary.
- code is as follows
```
def build_tree(rows):
    """
    开始创建我们的决策树，使用递归法
    Building our tree recursively
    :param rows: 一部分数据 a subset of our data set
    :return: recursively return a decision node and finally a tree
    """
    #  Your code here**-**
    #  找到这组数据的最佳分割点   looking for the datasets best split
    #  此处build_tree_best_question本身就是一对象，可以直接使用
    build_tree_best_gain, build_tree_best_question = find_best_split(rows)
    # When info_gain = 0, return Leaf(rows)
    if build_tree_best_gain == 0:
        return Leaf(rows)
    # 按照最佳分割点进行分割
    true_node, false_node = partition(rows,build_tree_best_question)
    left_tree = build_tree(true_node)
    right_tree = build_tree(false_node)
    # otherwise return DecisionNode
    return DecisionNode(build_tree_best_question,left_tree,right_tree)
```
- JAN 1.9
相关阅读:
Inception V1-V4
NDCG的理解
 进程与线程
 Java中的接口和抽象类
 HashMap的工作原理
 基于比较排序的算法复杂度的下界
 数据库-left join，right join，inner join，full join
外排序 External sorting
数据流中的中位数 Find Median from Data Stream
Codeforces Round #272 (Div. 2)
原文地址：https://www.cnblogs.com/jcjc/p/10245847.html

递归创建决策树

一、什么是递归？

先看一个例子，一个关于实现叠加的两种方法的例子：

二 、递归函数有什么优缺点？

三、我们使用递归函数创建决策树

JAN 1.9

二、递归函数有什么优缺点？