• 两道概率算法题


    题目1. 给定一个包含n行数据的文件(n未知),要求设计一个算法,只遍历文件一遍就能等概率地输出某一行。即,每行被输出的概率是相等的(1/n,n未知)。假设文件可能会很大,内存有限,不能保存所有的文件数据。

    题目2. n个数据排除一行,即数据Ci在位置Li, 1<=i<=n,设计一个算法把数据打乱,使得每个数据在等概率地出现在每个位置,即 p(Ci, Lj)=1/n, 其中 1<=i<=n, 1<=j<=n.

    -------------------------------------------------------------------

    解答1.

    【算法】打开文件,读第一行数据,记录第一行到内存中,即R=line(1); 读第二行数据,生成一个(0,1)的随机数,如果小于0.5,将记录中的数据替代为第二行数据, R=line(2); ...读第k行,生成一个(0, 1)的随机数,如果小于1/k, 将记录中的数据替代为第k行数据,即R=line(k)...一直到读完所有行(n行),输出R。R中的数据是line(i)的概率为1/n, 1<=i<=n。

    【证明】假设文件共有n行,n>=1。记P(k)为读到第k行的时候,R中的数据是line(k)的概率为1/k, k>=1

      basis step: 读到第一行的时候,R中的数据是line1的概率为1,P(1) is true.

          induction step: if P(k) is true, k>=1, 当读到第k+1行的时候,如果随机数小于1/(k+1), 则R中的数据是line(k+1),即R=line(k+1)的概率是1/(k+1)。当随机数大于1/(k+1)的时候,R=R_old, 即以概率k/(k+1)为R_old,by the hypothesis, 此时R在数据是line[i] (1<=i<=k)的概率是k/(k+1)*1/k=1/(k+1), 即P(k+1) is true.

    Since both basis step and induction step are true, we can show that P(n) is true for every positive interger n.

    【后记】这个题目是从《c专家编程》上看来的

    解答2:

    【算法】假设n个元素保存在数组中,数据C[i]都在位置L[i],即p(C[i], L[i])=1. 当前在位置L[k], k=1.

      step1, 如果k==n, stop. 否则,生成一个在(0-1)的随机数,如果大于1/(n+1-k), 跳到step3,否则跳到step2

      step2, 生成一个[k+1, n]的随机整数,如果等于m,将C[k]和C[m]交换位置

      step3, 到下一个位置,即k=k+1,go to step1

    【证明】let propostional function Q(k) be p(C[i], L[k])=1/n, where i=1,2...n, and p(C[k], L[j])=1/n, where j=1,2...n for every positive interger k.

      basis step: we will show Q(1) is true. If the rand number is small than 1/n, C[1] will stay on L[1], i.e. p(C[1], L[1])=1/n. Otherwise, a random interger m will be generated which is in [2, n] and will be swap with C[1]. The probability that m=i (2<=i<=n) is that 1/(n-1), so p(C[i], L[1]) = (1-1/n)*1/(n-1)=1/n.

    At the same time, this also means p(C[1], L[j])=1/n, j=1,2...n

      indution step: we will prove [Q(1)^Q(2)^...^Q(k)]->Q(k+1), k>=1. So the hypothesis is as following.

      a) p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...k

      b) p(C[i], L[j]) = 1/n, where i=1,2...k, and j=1,2...n

      Firstly, we can show p(C[i], L[k+1]) = 1/n, k+1<=i<=n. The only way to make C[k+1] stay location L[k+1] is that C[K+1] stays at Location L[k+1] before the (k+1)th pass, and stay there after this pass, so p(C[k+1], L[k+1]) = (1-k/n)*1/(n-k)=1/n; For K+2<=i<=n, the only way to make C[i] stays at location L[k+1] is that C[i] stays at L[i] before the (k+1) pass, and is swapped with C[k+1] at this pass, so p(C[i], L[K+1]) = (1-k/n)*1/(n-k)=1/n. So p(C[i], L[k+1]) = 1/n, where k+1<=i<=n.

        Secondly, we can show p(C[i], L[k+1]) = 1/n for 1<=i<=k after the (k+1)th pass. There is one way to make C[i] stays at location L[k+1] after the (k+1) th pass which is C[i] stayed at L[k+1] before this pass and stay there after this pass. The probability is 1/n*1/(n-k). The others way is that assuming C[i] stays at L[j] (k+2<=j<n) and C[j] is swapped with L[k+1], which has the probability of 1/n*1/(n-k)*(n-(k+2)+1). We can add them together to get p(C[i], L[k+1]) = 1/n, where 1<=i<=k.

      Thirdly, we can show p(C[k+1], L[j]) = 1/n where k+2<=j<=n. One way to make C[k+1] stay at L[j] is that C[k+1] stayed at L[k+1] before the (k+1)th pass, and is swapped with C[k+1], so the probability is p(C[k+1], L[j]) = (n-k)/n*1/(n-k)=1/n. Note p(C[k+1], L[k+1]) = 1/n has been proved.

      Finally, we can show p(C[k+1], L[j]) = 1/n where 1<=j<=k. This haved been determined by the previous pass before (k+1)th pass.

    In conclusion, we have proved both basis step and induction step, so we can show that Q(n) is true for any positive interger. This also means Q(1)^Q(2)^...Q(n) is true, in other words, p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...n; and p(C[i], L[j]) = 1/n, where i=1,2...n, and j=1,2...n

    【后记】这道题目好像是《编程珠玑》里面的

  • 相关阅读:
    java工具类之按对象中某属性排序
    使用 CSS3 实现超炫的 Loading(加载)动画效果
    chrome使用技巧(看了定不让你失望)
    暗影精灵2pro——使用一年多后电池鼓包,传说中的更新BIOS问题(惠普15ax-226tx)
    【转载】 阿里面试后的问题总结
    temporal credit assignment in reinforcement learning 【强化学习 经典论文】
    【转载】 “强化学习之父”萨顿:预测学习马上要火,AI将帮我们理解人类意识
    【PPT】 Least squares temporal difference learning
    【转载】 TensorflowOnSpark:1)Standalone集群初体验
    【转载】 pytorch笔记:06)requires_grad和volatile
  • 原文地址:https://www.cnblogs.com/Torstan/p/2645489.html
Copyright © 2020-2023  润新知