• 1.猿人学web爬虫攻防第一题 js混淆源码乱码


    题目链接:http://match.yuanrenxue.com/match/1

    image-20210201112628150

    1.首先我们打开F12开发者工具,点击.....

    image-20210201112723491

    我们可以看到在请求中有m的加密参数!

    image-20210201113028246

    2.根据题目js混淆,我们寻找m参数的js方法。

    • 这里我搜索的是|的符号,因为这种符号一般很少出现在js代码中。当然还有其他的方法,因人而异。

      image-20210201113542984

    Ok,我们可以看到搜索结果已经出现,我们的js代码找到了。

    将js代码复制,格式化找到我们需要的。

    image-20210201114122359

    • 可以看到m参数是由oo0O0函数+ window.f;生成。
    • 继续寻找oo0O0函数。

    image-20210201114348998

    可以看到oo0O0函数也是在这个js当中,我们将该函数复制下来。

    function oo0O0(mw) {
        window.b = '';
        for (var i = 0, len = window.a.length; i < len; i++) {
            console.log(window.a[i]);
            window.b += String[document.e + document.g](window.a[i][document.f + document.h]() - i - window.c)
        }
        var U = ['W5r5W6VdIHZcT8kU', 'WQ8CWRaxWQirAW=='];
        var J = function (o, E) {
            o = o - 0x0;
            var N = U[o];
            if (J['bSSGte'] === undefined) {
                var Y = function (w) {
                    var m = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/=',
                    T = String(w)['replace'](/=+$/, '');
                    var A = '';
                    for (var C = 0x0, b, W, l = 0x0; W = T['charAt'](l++); ~W && (b = C % 0x4 ? b * 0x40 + W : W, C++ % 0x4) ? A += String['fromCharCode'](0xff & b >> (-0x2 * C & 0x6)) : 0x0) {
                        W = m['indexOf'](W)
                    }
                    return A
                };
                var t = function (w, m) {
                    var T = [],
                    A = 0x0,
                    C,
                    b = '',
                    W = '';
                    w = Y(w);
                    for (var R = 0x0, v = w['length']; R < v; R++) {
                        W += '%' + ('00' + w['charCodeAt'](R)['toString'](0x10))['slice'](-0x2)
                    }
                    w = decodeURIComponent(W);
                    var l;
                    for (l = 0x0; l < 0x100; l++) {
                        T[l] = l
                    }
                    for (l = 0x0; l < 0x100; l++) {
                        A = (A + T[l] + m['charCodeAt'](l % m['length'])) % 0x100,
                        C = T[l],
                        T[l] = T[A],
                        T[A] = C
                    }
                    l = 0x0,
                    A = 0x0;
                    for (var L = 0x0; L < w['length']; L++) {
                        l = (l + 0x1) % 0x100,
                        A = (A + T[l]) % 0x100,
                        C = T[l],
                        T[l] = T[A],
                        T[A] = C,
                        b += String['fromCharCode'](w['charCodeAt'](L) ^ T[(T[l] + T[A]) % 0x100])
                    }
                    return b
                };
                J['luAabU'] = t,
                J['qlVPZg'] = {},
                J['bSSGte'] = !![]
            }
            var H = J['qlVPZg'][o];
            return H === undefined ? (J['TUDBIJ'] === undefined && (J['TUDBIJ'] = !![]), N = J['luAabU'](N, E), J['qlVPZg'][o] = N) : N = H,
            N
        };
        eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), 'x27' + mw + 'x27'));
        return ''
    }
    

    有点奇怪的是这里return '',所以m的值完全取决于 window.f;

    image-20210201114828170

    我们看oo0O0函数的最后部分。

      eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), 'x27' + mw + 'x27'));
      return ''
    

    JS中的 eval() 函数可计算某个字符串,并执行其中的的 JavaScript 代码

    而eval()里面还调用了一个函数atob()
    atob() 方法用于解码使用 base-64 编码的字符串
    我们将 【atob(window['b'])】复制下来,在调试工具的Console运行一下,可以得到了下列代码

    image-20210201115628993

    将其复制,进行,格式化,得到。

    var hexcase = 0;
    var b64pad = "";
    var chrsz = 16;
    function hex_md5(a) {
        return binl2hex(core_md5(str2binl(a), a.length * chrsz))
    }
    function b64_md5(a) {
        return binl2b64(core_md5(str2binl(a), a.length * chrsz))
    }
    function str_md5(a) {
        return binl2str(core_md5(str2binl(a), a.length * chrsz))
    }
    function hex_hmac_md5(a, b) {
        return binl2hex(core_hmac_md5(a, b))
    }
    function b64_hmac_md5(a, b) {
        return binl2b64(core_hmac_md5(a, b))
    }
    function str_hmac_md5(a, b) {
        return binl2str(core_hmac_md5(a, b))
    }
    function md5_vm_test() {
        return hex_md5("abc") == "900150983cd24fb0d6963f7d28e17f72"
    }
    function core_md5(p, k) {
        p[k >> 5] |= 128 << ((k) % 32);
        p[(((k + 64) >>> 9) << 4) + 14] = k;
        var o = 1732584193;
        var n = -271733879;
        var m = -1732584194;
        var l = 271733878;
        for (var g = 0; g < p.length; g += 16) {
            var j = o;
            var h = n;
            var f = m;
            var e = l;
            o = md5_ff(o, n, m, l, p[g + 0], 7, -680976936);
            l = md5_ff(l, o, n, m, p[g + 1], 12, -389564586);
            m = md5_ff(m, l, o, n, p[g + 2], 17, 606105819);
            n = md5_ff(n, m, l, o, p[g + 3], 22, -1044525330);
            o = md5_ff(o, n, m, l, p[g + 4], 7, -176418897);
            l = md5_ff(l, o, n, m, p[g + 5], 12, 1200080426);
            m = md5_ff(m, l, o, n, p[g + 6], 17, -1473231341);
            n = md5_ff(n, m, l, o, p[g + 7], 22, -45705983);
            o = md5_ff(o, n, m, l, p[g + 8], 7, 1770035416);
            l = md5_ff(l, o, n, m, p[g + 9], 12, -1958414417);
            m = md5_ff(m, l, o, n, p[g + 10], 17, -42063);
            n = md5_ff(n, m, l, o, p[g + 11], 22, -1990404162);
            o = md5_ff(o, n, m, l, p[g + 12], 7, 1804660682);
            l = md5_ff(l, o, n, m, p[g + 13], 12, -40341101);
            m = md5_ff(m, l, o, n, p[g + 14], 17, -1502002290);
            n = md5_ff(n, m, l, o, p[g + 15], 22, 1236535329);
            o = md5_gg(o, n, m, l, p[g + 1], 5, -165796510);
            l = md5_gg(l, o, n, m, p[g + 6], 9, -1069501632);
            m = md5_gg(m, l, o, n, p[g + 11], 14, 643717713);
            n = md5_gg(n, m, l, o, p[g + 0], 20, -373897302);
            o = md5_gg(o, n, m, l, p[g + 5], 5, -701558691);
            l = md5_gg(l, o, n, m, p[g + 10], 9, 38016083);
            m = md5_gg(m, l, o, n, p[g + 15], 14, -660478335);
            n = md5_gg(n, m, l, o, p[g + 4], 20, -405537848);
            o = md5_gg(o, n, m, l, p[g + 9], 5, 568446438);
            l = md5_gg(l, o, n, m, p[g + 14], 9, -1019803690);
            m = md5_gg(m, l, o, n, p[g + 3], 14, -187363961);
            n = md5_gg(n, m, l, o, p[g + 8], 20, 1163531501);
            o = md5_gg(o, n, m, l, p[g + 13], 5, -1444681467);
            l = md5_gg(l, o, n, m, p[g + 2], 9, -51403784);
            m = md5_gg(m, l, o, n, p[g + 7], 14, 1735328473);
            n = md5_gg(n, m, l, o, p[g + 12], 20, -1921207734);
            o = md5_hh(o, n, m, l, p[g + 5], 4, -378558);
            l = md5_hh(l, o, n, m, p[g + 8], 11, -2022574463);
            m = md5_hh(m, l, o, n, p[g + 11], 16, 1839030562);
            n = md5_hh(n, m, l, o, p[g + 14], 23, -35309556);
            o = md5_hh(o, n, m, l, p[g + 1], 4, -1530992060);
            l = md5_hh(l, o, n, m, p[g + 4], 11, 1272893353);
            m = md5_hh(m, l, o, n, p[g + 7], 16, -155497632);
            n = md5_hh(n, m, l, o, p[g + 10], 23, -1094730640);
            o = md5_hh(o, n, m, l, p[g + 13], 4, 681279174);
            l = md5_hh(l, o, n, m, p[g + 0], 11, -358537222);
            m = md5_hh(m, l, o, n, p[g + 3], 16, -722881979);
            n = md5_hh(n, m, l, o, p[g + 6], 23, 76029189);
            o = md5_hh(o, n, m, l, p[g + 9], 4, -640364487);
            l = md5_hh(l, o, n, m, p[g + 12], 11, -421815835);
            m = md5_hh(m, l, o, n, p[g + 15], 16, 530742520);
            n = md5_hh(n, m, l, o, p[g + 2], 23, -995338651);
            o = md5_ii(o, n, m, l, p[g + 0], 6, -198630844);
            l = md5_ii(l, o, n, m, p[g + 7], 10, 11261161415);
            m = md5_ii(m, l, o, n, p[g + 14], 15, -1416354905);
            n = md5_ii(n, m, l, o, p[g + 5], 21, -57434055);
            o = md5_ii(o, n, m, l, p[g + 12], 6, 1700485571);
            l = md5_ii(l, o, n, m, p[g + 3], 10, -1894446606);
            m = md5_ii(m, l, o, n, p[g + 10], 15, -1051523);
            n = md5_ii(n, m, l, o, p[g + 1], 21, -2054922799);
            o = md5_ii(o, n, m, l, p[g + 8], 6, 1873313359);
            l = md5_ii(l, o, n, m, p[g + 15], 10, -30611744);
            m = md5_ii(m, l, o, n, p[g + 6], 15, -1560198380);
            n = md5_ii(n, m, l, o, p[g + 13], 21, 1309151649);
            o = md5_ii(o, n, m, l, p[g + 4], 6, -145523070);
            l = md5_ii(l, o, n, m, p[g + 11], 10, -1120210379);
            m = md5_ii(m, l, o, n, p[g + 2], 15, 718787259);
            n = md5_ii(n, m, l, o, p[g + 9], 21, -343485551);
            o = safe_add(o, j);
            n = safe_add(n, h);
            m = safe_add(m, f);
            l = safe_add(l, e)
        }
        return Array(o, n, m, l)
    }
    function md5_cmn(h, e, d, c, g, f) {
        return safe_add(bit_rol(safe_add(safe_add(e, h), safe_add(c, f)), g), d)
    }
    function md5_ff(g, f, k, j, e, i, h) {
        return md5_cmn((f & k) | ((~f) & j), g, f, e, i, h)
    }
    function md5_gg(g, f, k, j, e, i, h) {
        return md5_cmn((f & j) | (k & (~j)), g, f, e, i, h)
    }
    function md5_hh(g, f, k, j, e, i, h) {
        return md5_cmn(f ^ k ^ j, g, f, e, i, h)
    }
    function md5_ii(g, f, k, j, e, i, h) {
        return md5_cmn(k ^ (f | (~j)), g, f, e, i, h)
    }
    function core_hmac_md5(c, f) {
        var e = str2binl(c);
        if (e.length > 16) {
            e = core_md5(e, c.length * chrsz)
        }
        var a = Array(16),
        d = Array(16);
        for (var b = 0; b < 16; b++) {
            a[b] = e[b] ^ 909522486;
            d[b] = e[b] ^ 1549556828
        }
        var g = core_md5(a.concat(str2binl(f)), 512 + f.length * chrsz);
        return core_md5(d.concat(g), 512 + 128)
    }
    function safe_add(a, d) {
        var c = (a & 65535) + (d & 65535);
        var b = (a >> 16) + (d >> 16) + (c >> 16);
        return (b << 16) | (c & 65535)
    }
    function bit_rol(a, b) {
        return (a << b) | (a >>> (32 - b))
    }
    function str2binl(d) {
        var c = Array();
        var a = (1 << chrsz) - 1;
        for (var b = 0; b < d.length * chrsz; b += chrsz) {
            c[b >> 5] |= (d.charCodeAt(b / chrsz) & a) << (b % 32)
        }
        return c
    }
    function binl2str(c) {
        var d = "";
        var a = (1 << chrsz) - 1;
        for (var b = 0; b < c.length * 32; b += chrsz) {
            d += String.fromCharCode((c[b >> 5] >>> (b % 32)) & a)
        }
        return d
    }
    function binl2hex(c) {
        var b = hexcase ? "0123456789ABCDEF" : "0123456789abcdef";
        var d = "";
        for (var a = 0; a < c.length * 4; a++) {
            d += b.charAt((c[a >> 2] >> ((a % 4) * 8 + 4)) & 15) + b.charAt((c[a >> 2] >> ((a % 4) * 8)) & 15)
        }
        return d
    }
    function binl2b64(d) {
        var c = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
        var f = "";
        for (var b = 0; b < d.length * 4; b += 3) {
            var e = (((d[b >> 2] >> 8 * (b % 4)) & 255) << 16) | (((d[b + 1 >> 2] >> 8 * ((b + 1) % 4)) & 255) << 8) | ((d[b + 2 >> 2] >> 8 * ((b + 2) % 4)) & 255);
            for (var a = 0; a < 4; a++) {
                if (b * 8 + a * 6 > d.length * 32) {
                    f += b64pad
                } else {
                    f += c.charAt((e >> 6 * (3 - a)) & 63)
                }
            }
        }
        return f
    };
    window.f = hex_md5(mwqqppz)
    

    查看js代码,看到window.f = hex_md5(mwqqppz)

    window.f是由hex_md5方法进行处理,而mwqqppz是什么?继续分析

    eval函数里面除了atob(window['b'],还有:

    • J('0x0', ']dQW')
    • J('0x1', 'GTu!')
    • 'x27' + mw + 'x27'

    继续尝试:

    image-20210201120303668

    我们发现J 不存在,将js代码中的J 复制出来 执行。

        var U = ['W5r5W6VdIHZcT8kU', 'WQ8CWRaxWQirAW=='];
        var J = function (o, E) {
            o = o - 0x0;
            var N = U[o];
            if (J['bSSGte'] === undefined) {
                var Y = function (w) {
                    var m = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/=',
                    T = String(w)['replace'](/=+$/, '');
                    var A = '';
                    for (var C = 0x0, b, W, l = 0x0; W = T['charAt'](l++); ~W && (b = C % 0x4 ? b * 0x40 + W : W, C++ % 0x4) ? A += String['fromCharCode'](0xff & b >> (-0x2 * C & 0x6)) : 0x0) {
                        W = m['indexOf'](W)
                    }
                    return A
                };
                var t = function (w, m) {
                    var T = [],
                    A = 0x0,
                    C,
                    b = '',
                    W = '';
                    w = Y(w);
                    for (var R = 0x0, v = w['length']; R < v; R++) {
                        W += '%' + ('00' + w['charCodeAt'](R)['toString'](0x10))['slice'](-0x2)
                    }
                    w = decodeURIComponent(W);
                    var l;
                    for (l = 0x0; l < 0x100; l++) {
                        T[l] = l
                    }
                    for (l = 0x0; l < 0x100; l++) {
                        A = (A + T[l] + m['charCodeAt'](l % m['length'])) % 0x100,
                        C = T[l],
                        T[l] = T[A],
                        T[A] = C
                    }
                    l = 0x0,
                    A = 0x0;
                    for (var L = 0x0; L < w['length']; L++) {
                        l = (l + 0x1) % 0x100,
                        A = (A + T[l]) % 0x100,
                        C = T[l],
                        T[l] = T[A],
                        T[A] = C,
                        b += String['fromCharCode'](w['charCodeAt'](L) ^ T[(T[l] + T[A]) % 0x100])
                    }
                    return b
                };
                J['luAabU'] = t,
                J['qlVPZg'] = {},
                J['bSSGte'] = !![]
            }
            var H = J['qlVPZg'][o];
            return H === undefined ? (J['TUDBIJ'] === undefined && (J['TUDBIJ'] = !![]), N = J['luAabU'](N, E), J['qlVPZg'][o] = N) : N = H,
            N
        };
    

    再次尝试,发现可以了。

    image-20210201120557577

    再次尝试J('0x1', 'GTu!')

    image-20210201120907442

    再次尝试'x27' + mw + 'x27';

    image-20210201121055765

    mw没有找到 !这是因为[ mw ]是oo0O0()函数的一个形参
    image-20210201121230236

    image-20210201121329648

    Ok,这样看 就非常清晰思路。

    3.翻译eval()

    eval(atob(window['b'])[J('0x0', ']dQW')](J('0x1', 'GTu!'), 'x27' + mw + 'x27'));
    

    翻译后:

    eval(atob(window['b'])["replace"]("mwqqppz",'mw'));
    

    'x27',其实就是

    image-20210201121654037

    这个函数的意思实际上就是将【atob(window['b'])】里面的【mwqqppz】替换成【mw】

    而mw实际上就是时间戳字符串。

    // 我们一步步分析下来也就是
    var timestamp = Date.parse(new Date()) + 100000000;
    var m = oo0O0(timestamp.toString()) + window.f;
    var m = window.f;
    var m = hex_md5(mwqqppz);
    var m = hex_md5(timestamp);
    

    4.结果

    function get_m() {
    //var timestamp = Date.parse(new Date()) + 100000000;
    var timestamp = '1612253498000';
    var m = hex_md5(timestamp);
    return m;
    }
    

    image-20210201141441461

    5.最后的爬虫和解出答案

    在Python代码中可以通过第三方库execjs,执行这个JS文件,得到密文

    # -*- coding: utf-8 -*-
    '''
    @Time    : 2021/2/1 11:24
    @Author  : 水一RAR
    '''
    
    import requests
    import execjs
    import time
    
    def get_md5_value():
        # 导入JS,读取需要的js文件
        with open(r'./js代码/01.js',encoding='utf-8',mode='r') as f:
            JsData = f.read()
        # 加载js文件,使用call()函数执行,传入需要执行函数即可获取返回值
        psd = execjs.compile(JsData).call('get_m')
        psd = psd.replace('丨','%E4%B8%A8')
        return psd
    
    def get_data(page_num,md5):
        url = f'http://match.yuanrenxue.com/api/match/1?page={page_num}&m={md5}'
        headers = {
            'Host':'match.yuanrenxue.com',
            'Referer':'http://match.yuanrenxue.com/match/1',
            'User-Agent':'yuanrenxue.project',
        }
        response = requests.get(url,headers=headers)
        return response.json()
    
    if __name__ == '__main__':
    
        sum_num = 0
        index_num = 0
    
        for page_num in range(1,6):
            info = get_data(page_num,get_md5_value())
            price_list = [i['value'] for i in info['data']]
            print(f'第{page_num}页的价格列表{price_list}')
            sum_num += sum(price_list)
            index_num += len(price_list)
            time.sleep(1)
    
        average_price = sum_num / index_num
        print(f'机票价格的平均值:{average_price}')
    
    

    6.结果:

    image-20210201141657203

  • 相关阅读:
    四则运算3.2
    第二周进度条
    构建之法阅读笔记02
    四则运算2
    第一周进度条
    构建之法阅读笔记01
    四则运算 Python
    第一周第二周学习进度条
    《构建之法》学习中疑问
    小学四则运算1.0
  • 原文地址:https://www.cnblogs.com/fushengliuyi/p/14356418.html
Copyright © 2020-2023  润新知