• 记一次MongoDB Map&Reduce入门操作


    • 需求说明

    用Map&Reduce计算几个班级中,每个班级10岁和20岁之间学生的数量:

    • 需求分析

    • 学生表的字段:

    db.students.insert({classid:1, age:14, name:'Tom'})

    将classid随机1和2、age在8-25岁之间随机,name在3-7个字符之间随机。

    • 数据写入

    • 数据写入java脚本

    往mrtask库中students写入1000万条数据:

     
    
    package org.test;
    
     
    import java.util.ArrayList;
    
    import java.util.List;
    
    import java.util.Random;
    
    
    import com.mongodb.BasicDBObject;
    
    import com.mongodb.DB;
    
    import com.mongodb.DBCollection;
    
    import com.mongodb.DBCursor;
    
    import com.mongodb.DBObject;
    
    import com.mongodb.MongoClient;
    
    import com.mongodb.ServerAddress;
    
     
    
    public class TestMongoDBReplSet {
    
     
    
        public static void main(String[] args) {
    
            try {
    
                List<ServerAddress> addresses = new ArrayList<ServerAddress>();
    
                ServerAddress address1 = new ServerAddress("172.16.16.89", 27017);
    
                addresses.add(address1);
    
                MongoClient client = new MongoClient(addresses);
    
                DB db = client.getDB("mrtask");
    
                DBCollection coll = db.getCollection("students");
    
     
    
                // 数据写入
    
                BasicDBObject object = new BasicDBObject();
    
                for (int i = 1; i <= 10000000; i++) {
    
                    object.append("classid", 1 + (int) (Math.random() * 2));
    
                    object.append("age", 8 + (int) (Math.random() * 17));
    
                    object.append("name", getName());
    
                    coll.insert(object);
    
                    object.clear();
    
                }
    
            } catch (Exception e) {
    
                e.printStackTrace();
    
            }
    
     
    
        }
    
     
    
        public static String getName() {
    
            ArrayList list = new ArrayList();
    
            for (char c = 'a'; c <= 'z'; c++) {
    
                list.add(c);
    
            }
    
            String str = "";
    
            int end = 3 + (int) (Math.random() * 4);
    
            for (int i = 0; i < end; i++) {
    
                int num = (int) (Math.random() * 26);
    
                str = str + list.get(num);
    
            }
    
            return str;
    
        }
    
     
    
    }
    

      

    • 查看数据写入

    经查看,mrtask库中students表中有1000万条的数据:

    [root@localhost bin]# ./mongo

    MongoDB shell version: 2.6.11

    connecting to: test

    > show dbs

    admin   (empty)

    local   0.078GB

    mrtask  3.952GB

    test    0.453GB

    > use mrtask

    switched to db mrtask

    > db.students.find().count()

    10000000

     

    • Map&Reduce计算

    • Map计算

    > mapfun = function(){emit(this.classid,1)}

    • Reduce计算

    > reducefun=function (key, values) { var count = 0; values.forEach(function (v) {count += v;}); return count; }

    > ff = function (key, value) { return {classid:key, count:value}; }

    • Result输出

    > classid_res = db.runCommand({

    mapreduce:"students",

    map:mapfun,

    reduce:reducefun,

    out:"students_classid_res",

    finalize:ff,

    query:{age:{$gt:10,$lt:20}}

    });

     

    • 计算结果

    > db.students_classid_res.find()

    { "_id" : 1, "value" : { "classid" : 1, "count" : 2643128 } }

    { "_id" : 2, "value" : { "classid" : 2, "count" : 2650870 } }

     

  • 相关阅读:
    Python学习记录——Ubuntu(四)计划任务、grep、正则表达式、sed、awk
    Python学习记录——Ubuntu(三)文件操作
    Python学习记录——Ubuntu(二)用户和用户组、环境变量
    Python学习记录——Ubuntu(一)基本配置、快捷键和系统启停命令行
    【转】Pycharm常用快捷键
    Python学习记录——文件操作
    闭包详解
    关于django中前端表单提交那点事
    Django+celery+redis 异步发送邮件功能
    Nginx
  • 原文地址:https://www.cnblogs.com/ljai/p/5017864.html
Copyright © 2020-2023  润新知