45
小数值 |
二进制表示 |
十进制表示 |
1/8 |
0.001 |
0.125 |
3/4 |
1/2+1/4 = 0.11 |
0.75 |
25/16 |
(16+8+1)/16 = (11001b)/16 = 1.1001 |
1.5625 |
(101011b)/2^4 = 43/16 |
10.1011 |
2.6875 |
(1001b)/2^3 = 9/8 |
1.001 |
1.125 |
(5*8+7)/8=47/8 |
101111b/8 = 101.111 |
5.875 |
(51/16) |
110011b/16 = 11.0011 |
3.1875 |
46
A: 0.1 -x 的二进制表示
0.1 = 0.0001100110011001100110011[0011]
x = 0.0001100110011001100110000
0.1 - x = 0.0000000000000000000000011[0011]...
B: 0.1 - x的近似十进制值
x = 0.0001100110011001100110000
= 00001100110011001100110000 / 2^25
= 110011001100110011 / 2^21
= 209715/2097152
= 0.0999999046325684
0.1 - x = 0.0000000953674316 = 0.953674316 * 10^(-7)
C: 100h = 360000s => count = 3600000
count * x = (3600000*209715)/2097152 = 359999.6566772461s
deta = 3600000 - 359999.6566772461 = 0.34332275390625s
误差为 0.34332275390625s秒。
D: 每秒误差 = (0.1-x) * 10 * 2000m/s= 0.953674316 * 10^(-6)s *2000m/s = 1.907348632 * 10^(-3)s ~= 1.91毫米.
47
Bias = 2^(k-1) -1 = 2^1 - 1 = 1
位 |
e |
E |
2^E |
f |
M |
2^E * M |
V |
十进制 |
0 00 00 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 00 01 |
0 |
0 |
1 |
1/4 |
1/4 |
1/4 |
1/4 |
0.25 |
0 00 10 |
0 |
0 |
1 |
1/2 |
1/2 |
2/4 |
1/2 |
0.5 |
0 00 11 |
0 |
0 |
1 |
3/4 |
3/4 |
3/4 |
3/4 |
0.75 |
0 01 00 |
1 |
0 |
1 |
0 |
1 |
4/4 |
1 |
1 |
0 01 01 |
1 |
0 |
1 |
1/4 |
5/4 |
5/4 |
5/4 |
1.25 |
0 01 10 |
1 |
0 |
1 |
1/2 |
3/2 |
6/4 |
3/2 |
1.5 |
0 01 11 |
1 |
0 |
1 |
3/4 |
7/4 |
7/4 |
7/4 |
1.75 |
0 10 00 |
2 |
1 |
2 |
0 |
1 |
8/4 |
2 |
2 |
0 10 01 |
2 |
1 |
2 |
1/4 |
5/4 |
10/4 |
5/2 |
2.5 |
0 10 10 |
2 |
1 |
2 |
1/2 |
3/2 |
12/4 |
3 |
3 |
0 10 11 |
2 |
1 |
2 |
3/4 |
7/4 |
14/4 |
7/2 |
3.5 |
0 11 00 |
- |
- |
- |
- |
- |
- |
正无穷 |
- |
0 11 01 |
- |
- |
- |
- |
- |
- |
NaN |
- |
0 11 10 |
- |
- |
- |
- |
- |
- |
NaN |
- |
0 11 11 |
- |
- |
- |
- |
- |
- |
NaN |
- |
48
3510593 = 1101011001000101000001b
3510593.0 = 0x4a564504 = 1001010010101100100010100000100 = 0 10010100 10101100100010100000100
M = 1 . 10101100100010100000100
e = 10010100 = 148
E = 148 - 127 = 21
所以小数点移动右移动21位, V = 1101011001000101000001.00 = 3510593.00
49
A 这个正整数是 2^(n+2) + 1
B 那就是2^25 + 1
50
|
|
数值 |
舍入 |
数值 |
A |
10.010 |
2.25 |
10.0 |
2 |
B |
10.011 |
2.375 |
10.1 |
2.5 |
C |
10.110 |
2.75 |
11.0 |
3 |
D |
10.001 |
2.125 |
10.0 |
2 |
51
x = 0.00011001100110011001100
0.1 = 0.00011001100110011001100 110011[0011]
x' = 0.00011001100110011001101
x' - 0.1 = 00011001100110011001101 / 2^23 - 0.1
= 838861/8388608 - 0.1
= (838861 - 838860.8)/8388608
= 2 / 8388608
= 1 / 4194304
= 2.384185791015625e-7
52
位 |
值 |
位 |
值 |
011 0000 |
1 |
0111 000 |
1 |
101 1110 |
7.5 |
1001 111 |
7.5 |
010 1001 |
0.78125 |
0110 100 |
0.75 |
110 1111 |
15.5 |
1011 000 |
16 |
000 0001 |
0.015625 |
0001 000 |
0.015625 |
53
#define HUGE_NUM (1.0e300)
#define POS_INFINITY (HUGE_NUM*HUGE_NUM)
#define NEG_INFINITY (-1*POS_INFINITY)
#define NEG_ZERO (0)
54
A true,因为double的n有52位,所以任何32位整数都可以精确的表示。
B x = 0x7fffffff; 这个数字包含31个1,所以float没法精确标示。
C d= 1.11111111111111111111111111111 1的个数超过23位就可以了。
D true, 一样,double可以表示所有的float数字.
E true, 浮点数的正数和负数的表示范围一样,不会出现溢出.
F true
G true 浮点数没有负溢出,正数乘法的结果永远是正数。
H d = 18014398509481984; f = 2; 构造方法是找个大的数d, 让d+f无法精确表示,但是d-f能表示.