主要涉及计算机中数的表示法:
(1)整数: two's complement,即补码表示法
假设用N位bit表示整数w: 其中最左边一位为符号位,符号位为0,表示正数,为1表示负数。
(2)浮点数: 浮点数采用类似科学计数法的方式
以float为例:编码分为三部分:首位为符号位S,然后是8位指数位exp,最后是23位有效数位frac。
即: x = S*M*2^E
例如: -1.10110 × 2^10
其中:
通常 E = exp - bais, 对于float, bais = 2^(8-1)-1 = 127; M = 1 + frac。
浮点数根据exp的不同有不同解码方式
A. 当exp = 0xFF 时,若frac全为0,表示±∞;若frac不全为0,则表示NaN(Not A Num).
B. 当exp = 0x00 时, 为非规格化的,此时exp=0, 但是 E ≠ 0 - bais 而是规定 E = 1 - bais,
另外,M也不是1+frac, 而是 M=frac, 所以当exp=0且frac=0时,表示±0;
C. 当exp≠0xFF,也≠0x00时位规格化的,此时才有E = exp - bais, M = 1 + frac。
需要说明的是:B中这种设计的特点:
第一,可以编码0,第二,在0附近的数是均匀分布的,最后,从非规格数到规格数是平滑过度的。
(例子参考下面 datalab中的 float_twice)
浮点数的舍入
浮点数的frac部分长度有限,因此精度就有限,比如float精度最大为23位,
若有超过这个精度的数转换为float数,就存在舍入的问题。 一般浮点数舍入遵循两点:
就近舍入(round-to-nearest)和向偶数舍入(round-to-even).
(例子参考下面 datalab中的 float_i2f)
另外给出一个例子:
1 int main(int argc, char *argv[]){ 2 double dt = 0x0.0000008p+0; 3 double d0 = 0x1.0000010p+0; 4 for (int i = 0; i < 6; ++i) { 5 printf("======= "); 6 printf("double: %a ", d0); 7 printf("float: %a ", (float)d0); 8 d0 += dt; 9 } 10 }
结果:
1 ======= 2 double: 0x1.000001p+0 3 float: 0x1p+0 4 ======= 5 double: 0x1.0000018p+0 6 float: 0x1.000002p+0 7 ======= 8 double: 0x1.000002p+0 9 float: 0x1.000002p+0 10 ======= 11 double: 0x1.0000028p+0 12 float: 0x1.000002p+0 13 ======= 14 double: 0x1.000003p+0 15 float: 0x1.000004p+0 16 ======= 17 double: 0x1.0000038p+0 18 float: 0x1.000004p+0
解释略去。
data lab
1 /* 2 * CS:APP Data Lab 3 * 4 * <Please put your name and userid here> 5 * 6 * bits.c - Source file with your solutions to the Lab. 7 * This is the file you will hand in to your instructor. 8 * 9 * WARNING: Do not include the <stdio.h> header; it confuses the dlc 10 * compiler. You can still use printf for debugging without including 11 * <stdio.h>, although you might get a compiler warning. In general, 12 * it's not good practice to ignore compiler warnings, but in this 13 * case it's OK. 14 */ 15 16 #if 0 17 /* 18 * Instructions to Students: 19 * 20 * STEP 1: Read the following instructions carefully. 21 */ 22 23 You will provide your solution to the Data Lab by 24 editing the collection of functions in this source file. 25 26 INTEGER CODING RULES: 27 28 Replace the "return" statement in each function with one 29 or more lines of C code that implements the function. Your code 30 must conform to the following style: 31 32 int Funct(arg1, arg2, ...) { 33 /* brief description of how your implementation works */ 34 int var1 = Expr1; 35 ... 36 int varM = ExprM; 37 38 varJ = ExprJ; 39 ... 40 varN = ExprN; 41 return ExprR; 42 } 43 44 Each "Expr" is an expression using ONLY the following: 45 1. Integer constants 0 through 255 (0xFF), inclusive. You are 46 not allowed to use big constants such as 0xffffffff. 47 2. Function arguments and local variables (no global variables). 48 3. Unary integer operations ! ~ 49 4. Binary integer operations & ^ | + << >> 50 51 Some of the problems restrict the set of allowed operators even further. 52 Each "Expr" may consist of multiple operators. You are not restricted to 53 one operator per line. 54 55 You are expressly forbidden to: 56 1. Use any control constructs such as if, do, while, for, switch, etc. 57 2. Define or use any macros. 58 3. Define any additional functions in this file. 59 4. Call any functions. 60 5. Use any other operations, such as &&, ||, -, or ?: 61 6. Use any form of casting. 62 7. Use any data type other than int. This implies that you 63 cannot use arrays, structs, or unions. 64 65 66 You may assume that your machine: 67 1. Uses 2s complement, 32-bit representations of integers. 68 2. Performs right shifts arithmetically. 69 3. Has unpredictable behavior when shifting an integer by more 70 than the word size. 71 72 EXAMPLES OF ACCEPTABLE CODING STYLE: 73 /* 74 * pow2plus1 - returns 2^x + 1, where 0 <= x <= 31 75 */ 76 int pow2plus1(int x) { 77 /* exploit ability of shifts to compute powers of 2 */ 78 return (1 << x) + 1; 79 } 80 81 /* 82 * pow2plus4 - returns 2^x + 4, where 0 <= x <= 31 83 */ 84 int pow2plus4(int x) { 85 /* exploit ability of shifts to compute powers of 2 */ 86 int result = (1 << x); 87 result += 4; 88 return result; 89 } 90 91 FLOATING POINT CODING RULES 92 93 For the problems that require you to implent floating-point operations, 94 the coding rules are less strict. You are allowed to use looping and 95 conditional control. You are allowed to use both ints and unsigneds. 96 You can use arbitrary integer and unsigned constants. 97 98 You are expressly forbidden to: 99 1. Define or use any macros. 100 2. Define any additional functions in this file. 101 3. Call any functions. 102 4. Use any form of casting. 103 5. Use any data type other than int or unsigned. This means that you 104 cannot use arrays, structs, or unions. 105 6. Use any floating point data types, operations, or constants. 106 107 108 NOTES: 109 1. Use the dlc (data lab checker) compiler (described in the handout) to 110 check the legality of your solutions. 111 2. Each function has a maximum number of operators (! ~ & ^ | + << >>) 112 that you are allowed to use for your implementation of the function. 113 The max operator count is checked by dlc. Note that '=' is not 114 counted; you may use as many of these as you want without penalty. 115 3. Use the btest test harness to check your functions for correctness. 116 4. Use the BDD checker to formally verify your functions 117 5. The maximum number of ops for each function is given in the 118 header comment for each function. If there are any inconsistencies 119 between the maximum ops in the writeup and in this file, consider 120 this file the authoritative source. 121 122 /* 123 * STEP 2: Modify the following functions according the coding rules. 124 * 125 * IMPORTANT. TO AVOID GRADING SURPRISES: 126 * 1. Use the dlc compiler to check that your solutions conform 127 * to the coding rules. 128 * 2. Use the BDD checker to formally verify that your solutions produce 129 * the correct answers. 130 */ 131 132 133 #endif 134 135 /* 136 * bitAnd - x&y using only ~ and | 137 * Example: bitAnd(6, 5) = 4 138 * Legal ops: ~ | 139 * Max ops: 8 140 * Rating: 1 141 */ 142 int bitAnd(int x, int y) { 143 return ~((~x) | (~y)); 144 } 145 146 /* 147 * getByte - Extract byte n from word x 148 * Bytes numbered from 0 (LSB) to 3 (MSB) 149 * Examples: getByte(0x12345678,1) = 0x56 150 * Legal ops: ! ~ & ^ | + << >> 151 * Max ops: 6 152 * Rating: 2 153 */ 154 int getByte(int x, int n) { 155 int y = x >> (n << 3); 156 return y & 0xFF; 157 } 158 159 /* 160 * logicalShift - shift x to the right by n, using a logical shift 161 * Can assume that 0 <= n <= 31 162 * Examples: logicalShift(0x87654321,4) = 0x08765432 163 * Legal ops: ! ~ & ^ | + << >> 164 * Max ops: 20 165 * Rating: 3 166 */ 167 int logicalShift(int x, int n) { 168 int y = x >> n; 169 170 int helper = (1 << 31) >> n; 171 helper = ~(helper << 1); 172 return y & helper; 173 } 174 175 /* 176 * bitCount - returns count of number of 1's in word 177 * Examples: bitCount(5) = 2, bitCount(7) = 3 178 * Legal ops: ! ~ & ^ | + << >> 179 * Max ops: 40 180 * Rating: 4 181 */ 182 int bitCount(int x) { 183 int mk1, mk2, mk3, mk4, mk5, result; 184 mk5 = 0xff | (0xff << 8); 185 mk4 = 0xff | (0xff << 16); 186 mk3 = 0x0f | (0x0f << 8); 187 mk3 = mk3 | (mk3 << 16); 188 mk2 = 0x33 | (0x33 << 8); 189 mk2 = mk2 | (mk2 << 16); 190 mk1 = 0x55 | (0x55 << 8); 191 mk1 = mk1 | (mk1 << 16); 192 193 // 先把16个相邻两位有几个1,并用这两位表示,然后以此类推, 194 // 即: 32->16, 16->8, 8->4, 4->2, 2->1 195 result = (mk1 & x) + (mk1 & (x >> 1)); 196 result = (mk2 & result) + (mk2 & (result >> 2)); 197 result = mk3 & (result + (result >> 4)); 198 result = mk4 & (result + (result >> 8)); 199 result = mk5 & (result + (result >> 16)); 200 return result; 201 } 202 203 /* 204 * bang - Compute !x without using ! 205 * Examples: bang(3) = 0, bang(0) = 1 206 * Legal ops: ~ & ^ | + << >> 207 * Max ops: 12 208 * Rating: 4 209 */ 210 int bang(int x) { 211 return ((x | (~x + 1)) >> 31) + 1; 212 } 213 214 /* 215 * tmin - return minimum two's complement integer 216 * Legal ops: ! ~ & ^ | + << >> 217 * Max ops: 4 218 * Rating: 1 219 */ 220 int tmin(void) { 221 return 1 << 31; 222 } 223 224 /* 225 * fitsBits - return 1 if x can be represented as an 226 * n-bit, two's complement integer. 227 * 1 <= n <= 32 228 * Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1 229 * Legal ops: ! ~ & ^ | + << >> 230 * Max ops: 15 231 * Rating: 2 232 */ 233 int fitsBits(int x, int n) { 234 /* 235 n 能表示的数,除去符号位,剩下n-1位,对应到32位int数中: 236 正数应该是前32-(n-1)位都是0,负数应该是32-(n-1)位都是1。 237 */ 238 int signX = x >> 31; 239 int y = x >> (n + (~0)); 240 return !(signX ^ y); 241 } 242 243 /* 244 * divpwr2 - Compute x/(2^n), for 0 <= n <= 30 245 * Round toward zero 246 * Examples: divpwr2(15,1) = 7, divpwr2(-33,4) = -2 247 * Legal ops: ! ~ & ^ | + << >> 248 * Max ops: 15 249 * Rating: 2 250 */ 251 int divpwr2(int x, int n) { 252 int signX = x >> 31; 253 int bias = (1 << n) + (~0); 254 bias = signX & bias; 255 return (x + bias) >> n; 256 } 257 258 /* 259 * negate - return -x 260 * Example: negate(1) = -1. 261 * Legal ops: ! ~ & ^ | + << >> 262 * Max ops: 5 263 * Rating: 2 264 */ 265 int negate(int x) { 266 return (~x) + 1; 267 } 268 269 /* 270 * isPositive - return 1 if x > 0, return 0 otherwise 271 * Example: isPositive(-1) = 0. 272 * Legal ops: ! ~ & ^ | + << >> 273 * Max ops: 8 274 * Rating: 3 275 */ 276 int isPositive(int x) { 277 return !((x >> 31) | (!x)); 278 } 279 280 /* 281 * isLessOrEqual - if x <= y then return 1, else return 0 282 * Example: isLessOrEqual(4,5) = 1. 283 * Legal ops: ! ~ & ^ | + << >> 284 * Max ops: 24 285 * Rating: 3 286 */ 287 int isLessOrEqual(int x, int y) { 288 int signX = x >> 31; 289 int signY = y >> 31; 290 int signSame = !(signX ^ signY); 291 int diff = x + (~y) + 1; 292 int diffNegZero = (diff >> 31) | (!diff); 293 return (signSame & diffNegZero) | ((!signSame) & signX); 294 } 295 296 /* 297 * ilog2 - return floor(log base 2 of x), where x > 0 298 * Example: ilog2(16) = 4 299 * Legal ops: ! ~ & ^ | + << >> 300 * Max ops: 90 301 * Rating: 4 302 */ 303 int ilog2(int x) { 304 int bn = (!!(x >> 16)) << 4; 305 bn = bn + ((!!(x >> (bn + 8))) << 3); 306 bn = bn + ((!!(x >> (bn + 4))) << 2); 307 bn = bn + ((!!(x >> (bn + 2))) << 1); 308 bn = bn + (!!(x >> (bn + 1))); 309 return bn; 310 } 311 312 /* 313 * float_neg - Return bit-level equivalent of expression -f for 314 * floating point argument f. 315 * Both the argument and result are passed as unsigned int's, but 316 * they are to be interpreted as the bit-level representations of 317 * single-precision floating point values. 318 * When argument is NaN, return argument. 319 * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while 320 * Max ops: 10 321 * Rating: 2 322 */ 323 unsigned float_neg(unsigned uf) { 324 /* 325 * s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx 326 * s is sign bit, when xs are all ZERO, this represents inf, 327 * and when xs are not all ZERO, it's NaN. 328 */ 329 unsigned fracMask, expMask; 330 unsigned fracPart, expPart; 331 fracMask = (1 << 23) - 1; 332 expMask = 0xff << 23; 333 fracPart = uf & fracMask; 334 expPart = uf & expMask; 335 if ((expMask == expPart) && fracPart) { 336 return uf; 337 } 338 339 return (1 << 31) + uf; 340 } 341 342 /* 343 * float_i2f - Return bit-level equivalent of expression (float) x 344 * Result is returned as unsigned int, but 345 * it is to be interpreted as the bit-level representation of a 346 * single-precision floating point values. 347 * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while 348 * Max ops: 30https://www.linuxmint.com/start/sarah/ 349 * Rating: 4 350 */ 351 unsigned float_i2f(int x) { 352 unsigned signX, expPart, fracPart; 353 unsigned absX; 354 unsigned hp = 1 << 31; 355 unsigned shiftLeft = 0; 356 unsigned roundTail; 357 unsigned result; 358 if (0 == x) { 359 return 0; 360 } 361 absX = x; 362 signX = 0; 363 if (x < 0) { 364 absX = -x; 365 signX = hp; 366 } 367 while (0 == (hp & absX)) { 368 absX = absX << 1; 369 shiftLeft += 1; 370 } 371 expPart = 127 + 31 - shiftLeft; 372 roundTail = absX & 0xff; 373 fracPart = (~(hp >> 8)) & (absX >> 8); 374 result = signX | (expPart << 23) | fracPart; 375 // 离大数更近时,进位;离小数更近时,舍位。 376 if (roundTail > 0x80) { 377 result += 1; 378 } else if (0x80 == roundTail) { 379 // 离两边同样近时,根据左边一位舍入到偶数,左边一位为1则进,为0则舍。 380 if (fracPart & 1) { 381 result += 1; 382 } 383 } 384 return result; 385 } 386 387 /* 388 * float_twice - Return bit-level equivalent of expression 2*f for 389 * floating point argument f. 390 * Both the argument and result are passed as unsigned int's, but 391 * they are to be interpreted as the bit-level representation of 392 * single-precision floating point values. 393 * When argument is NaN, return argument 394 * Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while 395 * Max ops: 30 396 * Rating: 4 397 */ 398 unsigned float_twice(unsigned uf) { 399 unsigned signX, expPart, fracPart; 400 unsigned helper = 1 << 31; 401 unsigned fracMask = (1 << 23) - 1; 402 if (0 == uf) { // positive 0 403 return 0; 404 } 405 if (helper == uf) { // negative 0 406 return helper; 407 } 408 signX = uf & helper; 409 expPart = (uf >> 23) & 0xff; 410 if (expPart == 0xff) { 411 return uf; 412 } 413 fracPart = uf & fracMask; 414 if (0 == expPart) { // 非规格化值 415 fracPart = fracPart << 1; 416 if (fracPart & (1 << 23)) { 417 fracPart = fracPart & fracMask; 418 expPart += 1; 419 } 420 } else { 421 expPart += 1; 422 } 423 return signX | (expPart << 23) | fracPart; 424 }