蒙特卡洛法计算定积分—Importance Sampling

rocks in a box

如上图所示，计算区间[a b]上f(x)的积分即求曲线与X轴围成红色区域的面积。下面使用蒙特卡洛法计算区间[2 3]上的定积分：∫(x²+4*x*sin(x))dx

 1 # -*- coding: utf-8 -*-
 2 import numpy as np
 3 import matplotlib.pyplot as plt
 4 
 5 def f(x):
 6     return x**2 + 4*x*np.sin(x) 
 7 
 8 def intf(x): 
 9     return x**3/3.0+4.0*np.sin(x) - 4.0*x*np.cos(x)
10 
11 a = 2;    
12 b = 3; 
13 
14 # use N draws 
15 N= 10000
16 
17 X = np.random.uniform(low=a, high=b, size=N) # N values uniformly drawn from a to b 
18 Y =f(X)   # CALCULATE THE f(x) 
19 
20 # 蒙特卡洛法计算定积分：面积=宽度*平均高度
21 Imc= (b-a) * np.sum(Y)/ N;
22 
23 exactval=intf(b)-intf(a)
24 
25 print "Monte Carlo estimation=",Imc, "Exact number=", intf(b)-intf(a)
26 
27 # --How does the accuracy depends on the number of points(samples)? Lets try the same 1-D integral 
28 # The Monte Carlo methods yield approximate answers whose accuracy depends on the number of draws.
29 Imc=np.zeros(1000)
30 Na = np.linspace(0,1000,1000)
31 
32 exactval= intf(b)-intf(a)
33 
34 for N in np.arange(0,1000):
35     X = np.random.uniform(low=a, high=b, size=N) # N values uniformly drawn from a to b 
36     Y =f(X)   # CALCULATE THE f(x) 
37     Imc[N]= (b-a) * np.sum(Y)/ N;
38      
39 plt.plot(Na[10:],np.sqrt((Imc[10:]-exactval)**2), alpha=0.7)
40 plt.plot(Na[10:], 1/np.sqrt(Na[10:]), 'r')
41 plt.xlabel("N")
42 plt.ylabel("sqrt((Imc-ExactValue)$^2$)")
43 plt.show()

>>>

Monte Carlo estimation= 11.8181144118 Exact number= 11.8113589251

从上图可以看出，随着采样点数的增加，计算误差逐渐减小。想要提高模拟结果的精确度有两个途径：其一是增加试验次数N；其二是降低方差σ². 增加试验次数势必使解题所用计算机的总时间增加，要想以此来达到提高精度之目的显然是不合适的。下面来介绍重要抽样法来减小方差，提高积分计算的精度。

重要性抽样法的特点在于，它不是从给定的过程的概率分布抽样，而是从修改的概率分布抽样，使对模拟结果有重要作用的事件更多出现，从而提高抽样效率，减少花费在对模拟结果无关紧要的事件上的计算时间。比如在区间[a b]上求g(x)的积分，若采用均匀抽样，在函数值g(x)比较小的区间内产生的抽样点跟函数值较大处区间内产生的抽样点的数目接近，显然抽样效率不高，可以将抽样概率密度函数改为f(x)，使f(x)与g(x)的形状相近，就可以保证对积分计算贡献较大的抽样值出现的机会大于贡献小的抽样值，即可以将积分运算改写为：

x是按照概率密度f(x)抽样获得的随机变量，显然在区间[a b]内应该有：

因此，可容易将积分值I看成是随机变量 Y = g(x)/f(x)的期望，式子中x_i是服从概率密度f(x)的采样点

下面的例子采用一个正态分布函数f(x)来近似g(x)=sin(x)*x，并依据正态分布选取采样值计算区间[0 pi]上的积分个∫g(x)dx

 1 # -*- coding: utf-8 -*-
 2 # Example: Calculate ∫sin(x)xdx
 3 
 4 # The function has a shape that is similar to Gaussian and therefore
 5 # we choose here a Gaussian as importance sampling distribution.
 6 
 7 from scipy import stats
 8 from scipy.stats import norm
 9 import numpy as np
10 import matplotlib.pyplot as plt
11 
12 
13 mu = 2;
14 sig =.7;
15 
16 f = lambda x: np.sin(x)*x
17 infun = lambda x: np.sin(x)-x*np.cos(x)
18 p = lambda x: (1/np.sqrt(2*np.pi*sig**2))*np.exp(-(x-mu)**2/(2.0*sig**2))
19 normfun = lambda x:  norm.cdf(x-mu, scale=sig)
20 
21 plt.figure(figsize=(18,8))  # set the figure size
22 
23 # range of integration
24 xmax =np.pi 
25 xmin =0
26 
27 # Number of draws 
28 N =1000
29 
30 # Just want to plot the function
31 x=np.linspace(xmin, xmax, 1000)
32 plt.subplot(1,2,1)
33 plt.plot(x, f(x), 'b', label=u'Original  $xsin(x)$')
34 plt.plot(x, p(x), 'r', label=u'Importance Sampling Function: Normal')
35 plt.xlabel('x')
36 plt.legend()
37 # =============================================
38 # EXACT SOLUTION 
39 # =============================================
40 Iexact = infun(xmax)-infun(xmin)
41 print Iexact
42 # ============================================
43 # VANILLA MONTE CARLO 
44 # ============================================
45 Ivmc = np.zeros(1000)
46 for k in np.arange(0,1000):
47     x = np.random.uniform(low=xmin, high=xmax, size=N)
48     Ivmc[k] = (xmax-xmin)*np.mean(f(x))
49 
50 
51 # ============================================
52 # IMPORTANCE SAMPLING 
53 # ============================================
54 # CHOOSE Gaussian so it similar to the original functions
55 
56 # Importance sampling: choose the random points so that
57 # more points are chosen around the peak, less where the integrand is small.
58 Iis = np.zeros(1000)
59 for k in np.arange(0,1000):
60     # DRAW FROM THE GAUSSIAN: xis~N(mu,sig^2)
61     xis = mu + sig*np.random.randn(N,1);
62     xis = xis[ (xis<xmax) & (xis>xmin)] ;
63 
64     # normalization for gaussian from 0..pi
65     normal = normfun(np.pi)-normfun(0)      # 注意:概率密度函数在采样区间[0 pi]上的积分需要等于1
66     Iis[k] =np.mean(f(xis)/p(xis))*normal   # 因此,此处需要乘一个系数即p(x)在[0 pi]上的积分
67 
68 plt.subplot(1,2,2)
69 plt.hist(Iis,30, histtype='step', label=u'Importance Sampling');
70 plt.hist(Ivmc, 30, color='r',histtype='step', label=u'Vanilla MC');
71 plt.vlines(np.pi, 0, 100, color='g', linestyle='dashed')
72 plt.legend()
73 plt.show()

从图中可以看出曲线sin(x)*x的形状和正态分布曲线的形状相近，因此在曲线峰值处的采样点数目会比曲线上位置低的地方要多。精确计算的结果为pi，从上面的右图中可以看出：两种方法均计算定积分1000次，靠近精确值pi=3.1415处的结果最多，离精确值越远数目越少，显然这符合常规。但是采用传统方法(红色直方图)计算出的积分值方的差明显比采用重要抽样法(蓝色直方图)要大。因此，采用重要抽样法计算可以降低方差，提高精度。另外需要注意的是：关于函数f(x)的选择会对计算结果的精度产生影响，当我们选择的函数f(x)与g(x)相差较大时，计算结果的方差也会加大。

参考：

http://iacs-courses.seas.harvard.edu/courses/am207/blog/lecture-3.html

相关阅读:
KNN算法
 mysql必须知道的
 励志的演讲
 30条程序员名言警句
 别人的文章：为什么软件开发，人多，事少，还会工作量大？
分享一个比较受启发的文章“学历代表过去，能力代表现在，学习力代表未来”
mvc 学前必知
 启动docker容器防火墙问题报错 ! -i docker0' failed: iptables: No chain/target/match by that name.
git指令详解总结
 git reset 版本回退的三种用法总结
原文地址：https://www.cnblogs.com/21207-iHome/p/5269191.html