• Android(Java) 模拟登录知乎并抓取用户信息


    前不久。看到一篇文章我用爬虫一天时间“偷了”知乎一百万用户。仅仅为证明PHP是世界上最好的语言,该文章中使用的登录方式是直接复制cookie到代码中,这里呢,我不以爬信息为目的。仅仅是简单的介绍使用java来进行模拟登录的基本过程。之前写过的文章android 项目实战——打造超级课程表一键提取课表功能事实上就是模拟登录的范畴。再加上近期在知乎上看到非常多人问关于超级课程表的实现,事实上本质就是模拟登录,掌握了这篇文章的内容,你不再操心抓不到信息了。然后,这篇文章会使用到之前的一篇Cookie保持的文章Android OkHttp的Cookie自己主动化管理,还有Jsoup的使用 Jsoup库使用全然解析,为了简单处理,直接使用javaSE来,而不再使用Android进行。假设要移植到Android,唯一的处理可能就是把网络请求工作扔到子线程中去 。

    首先使用Chrome打开知乎首页 , 点击登录,你会看到以下这个界面
    这里写图片描写叙述

    在Chorme中按F12,调出开发人员工具,切到Network选项卡,勾选Preserve Log。记得一定要勾选,不然你会看不到信息。

    这里写图片描写叙述

    一切就绪后,在输入框中输出账号密码点击登录。登录成功后你会看到这么一条记录

    这里写图片描写叙述

    点击图中的email,在最下方你会看到本次请求提交了4个參数,以及在上方,你会看到本次请求的地址是http://www.zhihu.com/login/email

    这里写图片描写叙述

    这里写图片描写叙述

    你会吃惊的发现知乎的密码是明文传输的,提交的參数的意思也非常easy,email就是账号,password就是密码。remember_me就是是否记住,这里传true就能够了,另一个_xsrf參数,这个毛估估应该是防爬虫的。

    因此在提交前我们要从源代码中将这个值抓取下来。该值在表单的隐藏域中

    这里写图片描写叙述

    一切准备就绪后,你就兴高採烈的用代码去模拟登录,然后你会发现会返回一个验证码错误的信息。事实上,我们还须要提交一个验证码,其參数名为captcha,验证码的地址为,

    http://www.zhihu.com/captcha.gif?r=时间戳

    于是我们得出了这种一个数据。

    • 请求地址
    http://www.zhihu.com/login/email
    • 请求參数
    _xsrf 表单中提取的隐藏域的值
    captcha 验证码
    email 邮箱
    password 密码
    remember_me 记住我

    另一个问题。验证码的值怎么得到呢。答案是人工输入。将验证码保存到本地进行觉得识别,输入后进行登陆就可以。

    这里的网络请求使用OkHttp。以及解析使用Jsoup,然后我们会使用到Gson,将他们增加maven依赖

        <dependencies>
            <dependency>
                <groupId>com.squareup.okhttp</groupId>
                <artifactId>okhttp</artifactId>
                <version>2.4.0</version>
            </dependency>
            <dependency>
                <groupId>org.jsoup</groupId>
                <artifactId>jsoup</artifactId>
                <version>1.8.3</version>
            </dependency>
            <dependency>
                <groupId>com.google.code.gson</groupId>
                <artifactId>gson</artifactId>
                <version>2.3.1</version>
            </dependency>
        </dependencies>

    在编码之前。我们得想想怎么维持登陆状态。没错,就是Cookie怎样保持,我们仅仅进行登陆一次,兴许都直接採集数据就能够了,因此须要将cookie持久化。对之前的文章中的一个Android类进行改造。使其变成java平台可用的类。能够看到我们将它从之前保存到SharePrefrences中改成了保存到文件里,并以json形式存储,这就是为什么会用到Gson的原因了

    package cn.edu.zafu.zhihu;
    
    
    
    import com.google.gson.Gson;
    import com.google.gson.GsonBuilder;
    import com.google.gson.reflect.TypeToken;
    
    import java.io.*;
    import java.net.CookieStore;
    import java.net.HttpCookie;
    import java.net.URI;
    import java.net.URISyntaxException;
    import java.util.*;
    import java.util.concurrent.ConcurrentHashMap;
    
    /**
     * User:lizhangqu(513163535@qq.com)
     * Date:2015-07-18
     * Time: 16:54
     */
    public class PersistentCookieStore implements CookieStore {
        private static final Gson gson= new GsonBuilder().setPrettyPrinting().create();
        private static final String LOG_TAG = "PersistentCookieStore";
        private static final String COOKIE_PREFS = "CookiePrefsFile";
        private static final String COOKIE_NAME_PREFIX = "cookie_";
    
        private final HashMap<String, ConcurrentHashMap<String, HttpCookie>> cookies;
        private  Map<String,String> cookiePrefs=new HashMap<String, String>();
    
        /**
         * Construct a persistent cookie store.
         *
         */
        public PersistentCookieStore() {
            String cookieJson = readFile("cookie.json");
            Map<String,String> fromJson = gson.fromJson(cookieJson,new TypeToken<Map<String, String>>() {}.getType());  
            if(fromJson!=null){
                System.out.println(fromJson);
                cookiePrefs=fromJson;
            }
    
    
            cookies = new HashMap<String, ConcurrentHashMap<String, HttpCookie>>();
    
            // Load any previously stored cookies into the store
    
            for(Map.Entry<String, ?> entry : cookiePrefs.entrySet()) {
                if (((String)entry.getValue()) != null && !((String)entry.getValue()).startsWith(COOKIE_NAME_PREFIX)) {
                    String[] cookieNames = split((String) entry.getValue(), ",");
                    for (String name : cookieNames) {
                        String encodedCookie = cookiePrefs.get(COOKIE_NAME_PREFIX + name);
                        if (encodedCookie != null) {
                            HttpCookie decodedCookie = decodeCookie(encodedCookie);
                            if (decodedCookie != null) {
                                if(!cookies.containsKey(entry.getKey()))
                                    cookies.put(entry.getKey(), new ConcurrentHashMap<String, HttpCookie>());
                                cookies.get(entry.getKey()).put(name, decodedCookie);
                            }
                        }
                    }
    
                }
            }
        }
    
        public void add(URI uri, HttpCookie cookie) {
            String name = getCookieToken(uri, cookie);
    
            // Save cookie into local store, or remove if expired
            if (!cookie.hasExpired()) {
                if(!cookies.containsKey(uri.getHost()))
                    cookies.put(uri.getHost(), new ConcurrentHashMap<String, HttpCookie>());
                cookies.get(uri.getHost()).put(name, cookie);
            } else {
                if(cookies.containsKey(uri.toString()))
                    cookies.get(uri.getHost()).remove(name);
            }
            cookiePrefs.put(uri.getHost(), join(",", cookies.get(uri.getHost()).keySet()));
            cookiePrefs.put(COOKIE_NAME_PREFIX + name, encodeCookie(new SerializableHttpCookie(cookie)));
    
            String json=gson.toJson(cookiePrefs);
            saveFile(json.getBytes(), "cookie.json");
    
        }
    
        protected String getCookieToken(URI uri, HttpCookie cookie) {
            return cookie.getName() + cookie.getDomain();
        }
    
        public List<HttpCookie> get(URI uri) {
            ArrayList<HttpCookie> ret = new ArrayList<HttpCookie>();
            if(cookies.containsKey(uri.getHost()))
                ret.addAll(cookies.get(uri.getHost()).values());
            return ret;
        }
    
        public boolean removeAll() {
            cookiePrefs.clear();
            cookies.clear();
            return true;
        }
    
    
        public boolean remove(URI uri, HttpCookie cookie) {
            String name = getCookieToken(uri, cookie);
    
            if(cookies.containsKey(uri.getHost()) && cookies.get(uri.getHost()).containsKey(name)) {
                cookies.get(uri.getHost()).remove(name);
                if(cookiePrefs.containsKey(COOKIE_NAME_PREFIX + name)) {
                    cookiePrefs.remove(COOKIE_NAME_PREFIX + name);
                }
                cookiePrefs.put(uri.getHost(), join(",", cookies.get(uri.getHost()).keySet()));
    
                return true;
            } else {
                return false;
            }
        }
    
        public List<HttpCookie> getCookies() {
            ArrayList<HttpCookie> ret = new ArrayList<HttpCookie>();
            for (String key : cookies.keySet())
                ret.addAll(cookies.get(key).values());
    
            return ret;
        }
    
        public List<URI> getURIs() {
            ArrayList<URI> ret = new ArrayList<URI>();
            for (String key : cookies.keySet())
                try {
                    ret.add(new URI(key));
                } catch (URISyntaxException e) {
                    e.printStackTrace();
                }
    
            return ret;
        }
    
        /**
         * Serializes Cookie object into String
         *
         * @param cookie cookie to be encoded, can be null
         * @return cookie encoded as String
         */
        protected String encodeCookie(SerializableHttpCookie cookie) {
            if (cookie == null)
                return null;
            ByteArrayOutputStream os = new ByteArrayOutputStream();
            try {
                ObjectOutputStream outputStream = new ObjectOutputStream(os);
                outputStream.writeObject(cookie);
            } catch (IOException e) {
                System.out.println("IOException in encodeCookie"+ e);
                return null;
            }
    
            return byteArrayToHexString(os.toByteArray());
        }
    
        /**
         * Returns cookie decoded from cookie string
         *
         * @param cookieString string of cookie as returned from http request
         * @return decoded cookie or null if exception occured
         */
        protected HttpCookie decodeCookie(String cookieString) {
            byte[] bytes = hexStringToByteArray(cookieString);
            ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
            HttpCookie cookie = null;
            try {
                ObjectInputStream objectInputStream = new ObjectInputStream(byteArrayInputStream);
                cookie = ((SerializableHttpCookie) objectInputStream.readObject()).getCookie();
            } catch (IOException e) {
                System.out.println("IOException in decodeCookie"+e);
            } catch (ClassNotFoundException e) {
                System.out.println("ClassNotFoundException in decodeCookie"+e);
            }
    
            return cookie;
        }
    
        /**
         * Using some super basic byte array &lt;-&gt; hex conversions so we don't have to rely on any
         * large Base64 libraries. Can be overridden if you like!
         *
         * @param bytes byte array to be converted
         * @return string containing hex values
         */
        protected String byteArrayToHexString(byte[] bytes) {
            StringBuilder sb = new StringBuilder(bytes.length * 2);
            for (byte element : bytes) {
                int v = element & 0xff;
                if (v < 16) {
                    sb.append('0');
                }
                sb.append(Integer.toHexString(v));
            }
            return sb.toString().toUpperCase(Locale.US);
        }
    
        /**
         * Converts hex values from strings to byte arra
         *
         * @param hexString string of hex-encoded values
         * @return decoded byte array
         */
        protected byte[] hexStringToByteArray(String hexString) {
            int len = hexString.length();
            byte[] data = new byte[len / 2];
            for (int i = 0; i < len; i += 2) {
                data[i / 2] = (byte) ((Character.digit(hexString.charAt(i), 16) << 4) + Character.digit(hexString.charAt(i + 1), 16));
            }
            return data;
        }
        public static String join(CharSequence delimiter, Iterable tokens) {
            StringBuilder sb = new StringBuilder();
            boolean firstTime = true;
            for (Object token: tokens) {
                if (firstTime) {
                    firstTime = false;
                } else {
                    sb.append(delimiter);
                }
                sb.append(token);
            }
            return sb.toString();
        }
        public static String[] split(String text, String expression) {
            if (text.length() == 0) {
                return new String[]{};
            } else {
                return text.split(expression, -1);
            }
        }
    
        public static void saveFile(byte[] bfile, String fileName) {
            BufferedOutputStream bos = null;
            FileOutputStream fos = null;
            File file = null;
            try {
                file = new File(fileName);
                fos = new FileOutputStream(file);
                bos = new BufferedOutputStream(fos);
                bos.write(bfile);
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (bos != null) {
                    try {
                        bos.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }
                }
                if (fos != null) {
                    try {
                        fos.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }
                }
            }
        }
        public static String readFile(String fileName) {
            BufferedInputStream bis = null;
            FileInputStream fis = null;
            File file = null;
            try {
                file = new File(fileName);
                fis = new FileInputStream(file);
                bis = new BufferedInputStream(fis);
    
                int available = bis.available();
                byte[] bytes=new byte[available];
                bis.read(bytes);
                String str=new String(bytes);
                return str;
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (bis != null) {
                    try {
                        bis.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }
                }
                if (fis != null) {
                    try {
                        fis.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }
                }
            }
            return "";
        }
    }

    然后新建一个OkHttp请求类,并设置其Cookie处理类为我们编写的类。

    private static OkHttpClient client = new OkHttpClient();
    client.setCookieHandler(new CookieManager(new PersistentCookieStore(), CookiePolicy.ACCEPT_ALL));

    好了。能够開始获取_xsrf以及验证码了。验证码保存在项目根文件夹下名为code.png的文件

    private static String xsrf;
    public static void getCode() throws IOException{
            Request request = new Request.Builder()
            .url("http://www.zhihu.com/")
            .addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36")
            .build();
    
            Response response = client.newCall(request).execute();
            String result = response.body().string();
    
            Document parse = Jsoup.parse(result);
            System.out.println(parse + "");
            result = parse.select("input[type=hidden]").get(0).attr("value")
                    .trim();
            xsrf=result;
            System.out.println("_xsrf:" + result);
            String codeUrl = "http://www.zhihu.com/captcha.gif?r=";
            codeUrl += System.currentTimeMillis();
            System.out.println("codeUrl:" + codeUrl);
            Request getcode = new Request.Builder()
                    .url(codeUrl)
                    .addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36")
                    .build();
    
            Response code = client.newCall(getcode).execute();
    
            byte[] bytes = code.body().bytes();
            saveCode(bytes, "code.png");
        }
        public static void saveCode(byte[] bfile, String fileName) {
            BufferedOutputStream bos = null;
            FileOutputStream fos = null;
            File file = null;
            try {
                file = new File(fileName);
                fos = new FileOutputStream(file);
                bos = new BufferedOutputStream(fos);
                bos.write(bfile);
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (bos != null) {
                    try {
                        bos.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }
                }
                if (fos != null) {
                    try {
                        fos.close();
                    } catch (IOException e1) {
                        e1.printStackTrace();
                    }
                }
            }
        }

    然后将获取来的參数连同账号密码进行提交登录

        public static void login(String randCode,String email,String password) throws IOException{
            RequestBody formBody = new FormEncodingBuilder()
            .add("_xsrf", xsrf)
            .add("captcha", randCode)
            .add("email", email)
            .add("password", password)
            .add("remember_me", "true")
            .build();
            Request login = new Request.Builder()
            .url("http://www.zhihu.com/login/email")
            .post(formBody)
            .addHeader("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36")
            .build();
    
    
            Response execute = client.newCall(login).execute();
            System.out.println(decode(execute.body().string()));
    
        }
    public static String decode(String unicodeStr) {
            if (unicodeStr == null) {
                return null;
            }
            StringBuffer retBuf = new StringBuffer();
            int maxLoop = unicodeStr.length();
            for (int i = 0; i < maxLoop; i++) {
                if (unicodeStr.charAt(i) == '\') {
                    if ((i < maxLoop - 5)
                            && ((unicodeStr.charAt(i + 1) == 'u') || (unicodeStr
                            .charAt(i + 1) == 'U')))
                        try {
                            retBuf.append((char) Integer.parseInt(
                                    unicodeStr.substring(i + 2, i + 6), 16));
                            i += 5;
                        } catch (NumberFormatException localNumberFormatException) {
                            retBuf.append(unicodeStr.charAt(i));
                        }
                    else
                        retBuf.append(unicodeStr.charAt(i));
                } else {
                    retBuf.append(unicodeStr.charAt(i));
                }
            }
            return retBuf.toString();
        }

    当看到以下的信息就代码登录成功了

    这里写图片描写叙述

    之后你就能够获取你想要的信息了。这里简单获取一些信息,比方我要获取轮子哥的followers的昵称。分页自己处理下就ok了。

    public static void getFollowers() throws IOException{
            Request request = new Request.Builder()
            .url("http://www.zhihu.com/people/zord-vczh/followees")
            .addHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36")
            .build();
            Response response = client.newCall(request).execute();
    
            String result=response.body().string();
    
            Document parse = Jsoup.parse(result);
    
            Elements select = parse.select("div.zm-profile-card");
            StringBuilder builder=new StringBuilder();
            for (int i=0;i<select.size();i++){
                Element element = select.get(i);
                String name=element.select("h2").text();
                System.out.println(name+"");
                builder.append(name);
                builder.append("
    ");
            }
        }

    下图就是获取到的信息。当然。仅仅要你登录了。什么信息你都能够获取到。
    这里写图片描写叙述

    最后上源代码,Intelij的maven项目
    http://download.csdn.net/detail/sbsujjbcy/8984375

  • 相关阅读:
    重新想象 Windows 8 Store Apps (15) 控件 UI: 字体继承, Style, ControlTemplate, SystemResource, VisualState, VisualStateManager
    重新想象 Windows 8 Store Apps (12) 控件之 GridView 特性: 拖动项, 项尺寸可变, 分组显示
    返璞归真 asp.net mvc (10) asp.net mvc 4.0 新特性之 Web API
    与众不同 windows phone (29) Communication(通信)之与 OData 服务通信
    与众不同 windows phone (33) Communication(通信)之源特定组播 SSM(Source Specific Multicast)
    与众不同 windows phone (27) Feature(特性)之搜索的可扩展性, 程序的生命周期和页面的生命周期, 页面导航, 系统状态栏
    与众不同 windows phone (30) Communication(通信)之基于 Socket TCP 开发一个多人聊天室
    返璞归真 asp.net mvc (12) asp.net mvc 4.0 新特性之移动特性
    重新想象 Windows 8 Store Apps (2) 控件之按钮控件: Button, HyperlinkButton, RepeatButton, ToggleButton, RadioButton, CheckBox, ToggleSwitch
    重新想象 Windows 8 Store Apps (10) 控件之 ScrollViewer 特性: Chaining, Rail, Inertia, Snap, Zoom
  • 原文地址:https://www.cnblogs.com/llguanli/p/7400421.html
Copyright © 2020-2023  润新知