• 第十四篇 Mongodb数据库


    Mongodb

     SQL
        - 结构化查询语言
        - 关系数据库全都同SQL来操作
    MongoDB
        - MongoDB是一个NoSQL的数据库
        - MongoDB是一款文档型数据库
        - 数据库指的就是一个存储数据的仓库
            数据库可以使我们完成对数据的持久化的操作
        - MongoDB数据库中存储的数据的基本单位就是文档,
            MongoDB中存储的就是文档,所谓文档其实就是一个“JSON”
        - MongoDB中的“JSON”我们称为BSON,比普通的JSON的功能要更加的强大
        - MongoDB数据库使用的是JavaScript进行操作的,在MongoDB含有一个对ES标准实现的引擎,
            在MongoDB中所有ES中的语法中都可以使用
        - 安装
                     - 配置环境变量
                C:Program FilesMongoDBServer3.2in
                     - 在c盘根目录
               - 创建一个文件夹 data
               - 在data中创建一个文件夹db
            
        - MongoDB的基本的指令
            - 启动服务器
                mongod --dbpath 路径 --port 端口号
                
            - 启动客户端
                mongo
        - 基本概念
            数据库(database)
            集合(collection)
            文档(document)
                - 在MongoDB中,数据库和集合都不需要手动创建,
                    当我们创建文档时,如果文档所在的集合或数据库不存在会自动创建数据库和集合
            
            
        - MongoDB的CRUD的操作            
            - 基本操作
                use 数据库
                    - 进入指定的数据库
                show dbs
                    - 显示所有的数据库
                show collections
                    - 显示数据库中所有的集合
                db
                    - 显示当前所在的数据库
        
            - 向数据库中插入文档
                - db.collection.insert()
                    - insert()可以向集合中插入一个或多个文档
                    - 例子:向test数据库中的,stus集合中插入一个新的学生对象
                                {name:"孙悟空",age:18,gender:""}
                               db.stus.insert({name:"孙悟空",age:18,gender:""}
                - db.collection.insertOne()
                    - 向集合中插入一个文档
                - db.collection.insertMany()
                    - 向集合中插入多个文档
                    
            - 查询数据库中的文档
                - db.collection.find()
                    - 可以根据指定条件从集合中查询所有符合条件的文档
                    - 返回的是一个数组
                     示例: db.stu.find({age:28})
                - db.collection.findOne()
                    - 查询第一个符合条件的文档
                    - 返回的是一个对象
                - db.collection.find().count()
                    - 查询符合条件的文档的数量
                    
            - 修改数据库中的文档
                - db.collection.update()
                    - 可以修改、替换集合中的一个或多个文档
                - db.collection.updateOne()
                    - 修改集合中的一个文档
                - db.collection.updateMany()
                    - 修改集合中的多个文档
                - db.collection.replaceOne()
                    - 替换集合中的一个文档
                    
            - 删除集合中的文档
                - db.collection.remove()
                    - 删除集合中的一个或多个文档(默认删除多个)
                - db.collection.deleteOne()
                    - 删除集合中的一个文档
                - db.collection.deleteMany()
                    - 删除集合中的多个文档
                - 清空一个集合
                    db.collection.remove({})
                - 删除一个集合
                    db.collection.drop()
                - 删除一个数据库
                    db.dropDatabase()
                    
    示例:
    
    1查询文档
    /*
        查询
            db.collection.find()
            - find()用来查询集合中所有符合条件的文档
            - find()可以接收一个对象作为条件参数
                {} 表示查询集合中所有的文档
                {属性:值} 查询属性是指定值的文档
            - find()返回的是一个数组
                
            db.collection.findOne()
            - 用来查询集合中符合条件的第一个文档  
            - findOne()返回的是一个文档对象 
           
           db.collection.find({}).count() 
            - 查询所有结果的数量
    
           pretty() 方法以格式化的方式来显示所有文档。
                   ---db.col.find().pretty()
    */
    db.stus.find({_id:"hello"});
    db.stus.find({age:16 , name:"白骨精"});
    db.stus.find({age:28});
    db.stus.findOne({age:28});
    
    db.stus.find({}).count();
    
    2插入文档                
    /*
        向数据库插入文档
            db.<collection>.insert()
            - 向集合中插入一个或多个文档
            - 当我们向集合中插入文档时,如果没有给文档指定_id属性,则数据库会自动为文档添加_id
                该属性用来作为文档的唯一标识
            - _id我们可以自己指定,如果我们指定了数据库就不会在添加了,如果自己指定_id 也必须确保它的唯一性
            
        db.collection.insertOne()
            - 插入一个文档对象
        db.collection.insertMany() 
            - 插入多个文档对象
    */
    db.stus.insert({name:"猪八戒",age:28,gender:""});
    
    db.stus.insert([
        {name:"沙和尚",age:38,gender:""},
        {name:"白骨精",age:16,gender:""},
        {name:"蜘蛛精",age:14,gender:""}
    ]);
    
    db.stus.insert({_id:"hello",name:"猪八戒",age:28,gender:""});
    
    db.stus.find();
    ObjectId()                
    
    
    3 修改文档
    /*
        修改
         db.collection.update(查询条件,新对象)
            - update()默认情况下会使用新对象来替换旧的对象
            - 如果需要修改指定的属性,而不是替换需要使用“修改操作符”来完成修改
                $set 可以用来修改文档中的指定属性
                $unset 可以用来删除文档的指定属性
            - update()默认只会修改一个
                
            db.collection.updateMany()
            - 同时修改多个符合条件的文档
       
            db.collection.updateOne()
            - 修改一个符合条件的文档    
            
            db.collection.replaceOne()
            - 替换一个文档
    */
    db.stus.find({});
    
    //替换
    db.stus.update({name:"沙和尚"},{age:28});
    
    db.stus.update(
        {"_id" : ObjectId("59c219689410bc1dbecc0709")},
        {$set:{
            gender:"",
            address:"流沙河"
        }}    
    )
    
    db.stus.update(
        {"_id" : ObjectId("59c219689410bc1dbecc0709")},
        {$unset:{
            address:1
        }}    
    )
    
    db.stus.updateMany(
        {"name" : "猪八戒"},
        {
            $set:{
                address:"猪老庄"
            }
        }    
    );
        
    db.stus.update(
        {"name" : "猪八戒"},
        
        {
            $set:{
            address:"呵呵呵"
            }
        }  ,
        {
            multi:true
        }    
    )
    
    db.stus.find()
    
    
    4删除文档
    /*
        db.collection.remove()
            - 删除一个或多个,可以第二个参数传递一个true,则只会删除一个
            - 如果传递一个空对象作为参数,则会删除所有的
        db.collection.deleteOne()
        db.collection.deleteMany()
        db.collection.drop() 删除集合
        db.dropDatabase() 删除数据库
        
            - 一般数据库中的数据都不会删除,所以删除的方法很少调用
                一般会在数据中添加一个字段,来表示数据是否被删除
    
    */ db.stu.deleteOne({name:"齐天大圣"})
    db.stu.remove({name:"孙悟空"}, {justOne: false})    
    db.stus.insert([
        
        {
            name:"zbj",
            isDel:0
            },
            {
            name:"shs",
            isDel:0
            },
        {
        name:"ts",
            isDel:0
        }
    
    ]);
    
    db.stus.updateOne({name:"ts"},{$set:{isDel:1}});
        
    db.stus.find({isDel:0})    
                    
                    
            
            
    mongodb.txt
    /*
      文档之间的关系
        一对一(one to one)
            - 夫妻 (一个丈夫 对应 一个妻子)
            - 在MongoDB,可以通过内嵌文档的形式来体现出一对一的关系
        
        一对多(one to many)/多对一(many to one)
            - 父母 - 孩子
              用户 - 订单
              文章 - 评论
              - 也可以通过内嵌文档来映射一对多的关系
              
        
        多对多(many to many)
           - 分类 - 商品
             老师 - 学生 
        
    */
    db.wifeAndHusband.insert([
        {
            name:"黄蓉",
            husband:{
                name:"郭靖"
            }
        },{
            name:"潘金莲",
            husband:{
                name:"武大郎"
            }
        }
    
    ]);
    
    db.wifeAndHusband.find();
    
    
    //一对多 用户(users) 和 订单(orders)
    db.users.insert([{
        username:"swk"
        },{
        username:"zbj"
    }]);
    
    db.order.insert({
        
        list:["牛肉","漫画"],
        user_id: ObjectId("59c47e35241d8d36a1d50de0")
        
    });
    
    db.users.find()
    db.order.find()
    
    //查找用户swk的订单
    var user_id = db.users.findOne({username:"zbj"})._id;
    db.order.find({user_id:user_id});
    
    //多对多
    db.teachers.insert([
        {name:"洪七公"},
        {name:"黄药师"},
        {name:"龟仙人"}
    ]);
    
    db.stus.insert([
        {
            name:"郭靖",
            tech_ids:[
                ObjectId("59c4806d241d8d36a1d50de4"),
                ObjectId("59c4806d241d8d36a1d50de5")
            ]
        },{
            name:"孙悟空",
            tech_ids:[
                ObjectId("59c4806d241d8d36a1d50de4"),
                ObjectId("59c4806d241d8d36a1d50de5"),
                ObjectId("59c4806d241d8d36a1d50de6")
            ]
        }
    ])
    
    db.teachers.find()
    
    db.stus.find()
    文档间关系

    示例:

    //1.进入my_test数据库
    use my_test
    
    //2.向数据库的user集合中插入一个文档  
    db.users.insert({
        username:"sunwukong"
    });
    
    //3.查询user集合中的文档
    db.users.find();
    
    //4.向数据库的user集合中插入一个文档   
    db.users.insert({
        username:"zhubajie"
    });
       
    //5.查询数据库user集合中的文档
    db.users.find();
    
    //6.统计数据库user集合中的文档数量
    db.users.find().count();
    
    //7.查询数据库user集合中username为sunwukong的文档
    db.users.find({username:"sunwukong"});
    
    //8.向数据库user集合中的username为sunwukong的文档,添加一个address属性,属性值为huaguoshan
    db.users.update({username:"sunwukong"},{$set:{address:"huaguoshan"}});
    
    
    //9.使用{username:"tangseng"} 替换 username 为 zhubajie的文档
    db.users.replaceOne({username:"zhubajie"},{username:"tangseng"});    
        
    //10.删除username为sunwukong的文档的address属性
    db.users.update({username:"sunwukong"},{$unset:{address:1}});
    
    
    //11.向username为sunwukong的文档中,添加一个hobby:{cities:["beijing","shanghai","shenzhen"] , movies:["sanguo","hero"]}
    //MongoDB的文档的属性值也可以是一个文档,当一个文档的属性值是一个文档时,我们称这个文档叫做 内嵌文档
    db.users.update({username:"sunwukong"},{$set:{hobby:{cities:["beijing","shanghai","shenzhen"] , movies:["sanguo","hero"]}}});
    db.users.find();
    
    //12.向username为tangseng的文档中,添加一个hobby:{movies:["A Chinese Odyssey","King of comedy"]}
    db.users.update({username:"tangseng"},{$set:{hobby:{movies:["A Chinese Odyssey","King of comedy"]}}})
    
    //13.查询喜欢电影hero的文档
    //MongoDB支持直接通过内嵌文档的属性进行查询,如果要查询内嵌文档则可以通过.的形式来匹配
    //如果要通过内嵌文档来对文档进行查询,此时属性名必须使用引号 
    db.users.find({'hobby.movies':"hero"});
    
    //14.向tangseng中添加一个新的电影Interstellar
    //$push 用于向数组中添加一个新的元素
    //$addToSet 向数组中添加一个新元素 , 如果数组中已经存在了该元素,则不会添加
    db.users.update({username:"tangseng"},{$push:{"hobby.movies":"Interstellar"}});
    db.users.update({username:"tangseng"},{$addToSet:{"hobby.movies":"Interstellar"}});
    db.users.find();
    
    //15.删除喜欢beijing的用户
    db.users.remove({"hobby.cities":"beijing"});
    
    //16.删除user集合
    db.users.remove({});
    db.users.drop();
    
    show dbs;
    
    //17.向numbers中插入20000条数据 7.2s
    for(var i=1 ; i<=20000 ; i++){
        db.numbers.insert({num:i});
    }
    
    db.numbers.find()
    
    db.numbers.remove({});
    
    
    //0.4s
    var arr = [];
    
    for(var i=1 ; i<=20000 ; i++){
        arr.push({num:i});
    }
    
    db.numbers.insert(arr);
    增删改查
    //查询文档时,默认情况是按照_id的值进行排列(升序)
    //sort()可以用来指定文档的排序的规则,sort()需要传递一个对象来指定排序规则 1表示升序 -1表示降序
    //limit skip sort 可以以任意的顺序进行调用
    db.emp.find({}).sort({sal:1,empno:-1});
    
    //在查询时,可以在第二个参数的位置来设置查询结果的 投影(展示所要展示的字段))
    db.emp.find({},{ename:1 , _id:0 , sal:1});
    sort-投影
    //18.查询numbers中num为500的文档
    db.numbers.find({num:500})
    
    //19.查询numbers中num大于5000的文档
    db.numbers.find({num:{$gt:500}});
    db.numbers.find({num:{$eq:500}});
    
    //20.查询numbers中num小于30的文档
    db.numbers.find({num:{$lt:30}});
    
    //21.查询numbers中num大于40小于50的文档
    db.numbers.find({num:{$gt:40 , $lt:50}});
    
    //22.查询numbers中num大于19996的文档
    db.numbers.find({num:{$gt:19996}});
    
    //23.查看numbers集合中的前10条数据
    db.numbers.find({num:{$lte:10}});
    
    //limit()设置显示数据的上限
    db.numbers.find().limit(10);
    //在开发时,我们绝对不会执行不带条件的查询
    db.numbers.find();
    
    //24.查看numbers集合中的第11条到20条数据
    /*
        分页 每页显示10条
            1-10     0
            11-20    10
            21-30    20
            。。。
            
            skip((页码-1) * 每页显示的条数).limit(每页显示的条数);
            
        skip()用于跳过指定数量的数据    
        
        MongoDB会自动调整skip和limit的位置
    */
    db.numbers.find().skip(10).limit(10);
    
    //25.查看numbers集合中的第21条到30条数据
    db.numbers.find().skip(20).limit(10);
    
    db.numbers.find().limit(10).skip(10);
    条件查询
    //26.将dept和emp集合导入到数据库中
    db.dept.find()
    db.emp.find()
    
    //27.查询工资小于2000的员工
    db.emp.find({sal:{$lt:2000}});
    
    //28.查询工资在1000-2000之间的员工
    db.emp.find({sal:{$lt:2000 , $gt:1000}});
    
    //29.查询工资小于1000或大于2500的员工
    db.emp.find({$or:[{sal:{$lt:1000}} , {sal:{$gt:2500}}]});
    
    //30.查询财务部的所有员工
    //(depno)
    var depno = db.dept.findOne({dname:"财务部"}).deptno;
    db.emp.find({depno:depno});
    
    //31.查询销售部的所有员工
    var depno = db.dept.findOne({dname:"销售部"}).deptno;
    db.emp.find({depno:depno});
    
    //32.查询所有mgr为7698的所有员工
    db.emp.find({mgr:7698})
    
    //33.为所有薪资低于1000的员工增加工资400元
    db.emp.updateMany({sal:{$lte:1000}} , {$inc:{sal:400}});
    db.emp.find()

    pymongo模块

    1.连接mongo

    # -*- coding:utf-8 -*-
    import pymongo  # 导入pymongo模块
    
    def mongodb_init01():
        """连接mongodb"""
        mongo = pymongo.MongoClient(host='127.0.0.1', port=27017, tz_aware=True)
    
    
    def mongodb_init02():
        """连接mongodb"""
        uri = "mongodb://{}:{}".format('127.0.0.1', 27017)
        mongo = pymongo.MongoClient(uri, tz_aware=True)

    2.操作文档

    emp1 = {"_id":1,"name":"武大郎","sex":"male","age":18,"hire_date":"20170301","post":"烧饼检察官","salary":7300.33}
    emp2 = {"_id":2,"name":"武松","sex":"male","age":78,"hire_date":"20150302","post":"公务员","salary":71000000.31}
    emp3 = {"_id":3,"name":"宋江","sex":"male","age":81,"hire_date":"20130305","post":"公务员","salary":78300}
    emp4 = {"_id":4,"name":"林冲","sex":"male","age":73,"hire_date":"20140701","post":"公务员","salary":73500}
    emp5 = {"_id":5,"name":"林冲","sex":"male","age":73,"hire_date":"20140701","post":"公务员","salary":73500}
    emp6 = {"_id":6,"name":"柴进","sex":"male","age":28,"hire_date":"20121101","post":"公务员","salary":72100}
    emp7 = {"_id":7,"name":"卢俊义","sex":"female","age":18,"hire_date":"20110211","post":"公务员","salary":79000}
    emp8 = {"_id":8,"name":"高俅","sex":"male","age":18,"hire_date":"19000301","post":"公务员","salary":730000}
    emp9 = {"_id":9,"name":"鲁智深","sex":"male","age":48,"hire_date":"20101111","post":"公务员","salary":710000}
    emp10 = {"_id":10,"name":"史进","sex":"female","age":48,"hire_date":"20150311","post":"打手","salary":73000.13}
    emp11 = {"_id":11,"name":"李逵","sex":"female","age":38,"hire_date":"20101101","post":"打手","salary":72000.35}
    emp12 = {"_id":12,"name":"周通","sex":"female","age":18,"hire_date":"20110312","post":"打手","salary":71000.37}
    emp13 = {"_id":13,"name":"石秀","sex":"female","age":18,"hire_date":"20160513","post":"打手","salary":73000.29}
    emp14 = {"_id":14,"name":"李忠","sex":"female","age":28,"hire_date":"20170127","post":"打手","salary":74000.33}
    emp15 = {"_id":15,"name":"吴用","sex":"male","age":28,"hire_date":"20160311","post":"文人","salary":710000.13}
    emp16 = {"_id":16,"name":"萧让","sex":"male","age":18,"hire_date":"19970312","post":"文人","salary":720000}
    emp17 = {"_id":17,"name":"安道全","sex":"female","age":18,"hire_date":"20130311","post":"文人","salary":719000}
    emp18 = {"_id":18,"name":"公孙胜","sex":"male","age":18,"hire_date":"20150411","post":"文人","salary":718000}
    emp19 = {"_id":19,"name":"朱贵","sex":"female","age":18,"hire_date":"20140512","post":"文人","salary":717000}
    
    db.emp.insertMany([emp1, emp2, emp3, emp4, emp5, emp6, emp7, emp8, emp9, emp10, emp11, emp12, emp13, emp14, emp15, emp16, emp17, emp18, emp19])
    emp.txt(测试数据)
    # -*- coding:utf-8 -*-
    import pymongo  # 导入pymongo模块
    from bson import ObjectId
    from pymongo import IndexModel
    from pymongo.database import Database
    
    mongo = pymongo.MongoClient(host='192.168.10.100', port=27017, tz_aware=True)
    
    
    def handler_db():
        """操作数据库"""
        # 创建数据库
        # db = Database(name='abc', client=mongo)
        db = mongo.abc
        print(db)
    
        # 删除数据库
        print(mongo.drop_database('abc'))
    
        # 获取数据库
        for db_name in mongo.list_database_names():
            print(db_name)
    
    
    def handler_collection():
        """操作集合"""
        # db = mongo.get_database('abc') # 获取数据库,如果没有会自动创建
        db = mongo.abc
        # 创建一个集合
        col = db.create_collection('col')  # 创建集合,如果存在就会报错
        print('创建的集合:', col)
        # 获取一个集合
        # col = db.get_collection('col') # 获取集合,如果没有会自动创建集合
        col = db.col  # 获取集合,如果没有会自动创建集合
        print('获取的集合:', col)
        # 获取所有集合
        collection_names = db.list_collection_names()  # 获取集合名称,返回list
        for name in collection_names:
            print('集合名称:', name)
    
        collections = db.list_collections()  # 返回所有集合对象
        for collection in collections:
            print('获取集合:', collection)
        # 删除集合
        print(db.drop_collection('col'))  # 删除集合,返回字典
    
    
    def handler_index():
        """操作索引"""
        # 创建升序索引
        users = mongo.sxt.users
        r = users.create_index([('name', pymongo.ASCENDING)])
        print(r)
        # 创建降序索引
        r = users.create_index([('age', pymongo.DESCENDING)])
        print(r)
        # 创建混合索引
        r = users.create_index([('name', pymongo.ASCENDING), ('status', pymongo.DESCENDING)])
        print(r)
        # 创建唯一索引
        r = users.create_index([('user_id', pymongo.ASCENDING)], unique=True)
        print(r)
    
        # 删除索引
        users.drop_index('name_1')  # 如果索引不存在,就会报错
        users.drop_indexes()  # 删除所有索引
    
        # 创建多个索引
        user_id_index = IndexModel([('user_id', pymongo.ASCENDING)], unique=True)
        name_index = IndexModel([('name', pymongo.DESCENDING)])
        r = users.create_indexes([user_id_index, name_index])  # 返回索引名称的list
        print(r)
    
    
    def handler_document():
        """操作文档"""
        shsxt = Database(name='sxt', client=mongo)  # 获取数据库
        col = shsxt.get_collection(name='col')  # 获取集合
        # 添加
        doc = {'name': 'xiaohuang', 'age': 18, 'sex': '男'}
        r = col.insert_one(doc)
        print(r.acknowledged, r.inserted_id)  # acknowledged是否成功,inserted_id主键id
    
        doc1 = {'name': '老王', 'age': 20, 'sex': '男'}
        doc2 = {'name': '老李', 'age': 15, 'sex': '女'}
        rs = col.insert_many([doc1, doc2], ordered=True)
        print(rs.acknowledged, rs.inserted_ids)  # acknowledged是否成功,inserted_id主键id
        # 删除
        d = col.delete_one({'name': '老王'})  # 删除一条
        print(d.acknowledged, d.deleted_count)  # acknowledged是否成功,deleted_count删除记录数
    
        d = col.delete_many({'name': '老李'})  # 删除多条
        print(d.acknowledged, d.deleted_count)  # acknowledged是否成功,deleted_count删除记录数
    
        # 修改
        upt = col.update_one(filter={'name': '老李'}, update={'$set': {'name': '小李', 'age': 19}}, upsert=True)
        print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id)
    
        upt = col.update_many(filter={'name': '老李'}, update={'$set': {'name': '小李', 'age': 19}}, upsert=True)
        print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id)
    
        # upt = col.update_one(filter={'_id': ObjectId("5b2d041fa3dbd1ad597db57e")}, update={'$set': {'name': '小李'}})
        # print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id)
    
    
    if __name__ == '__main__':
        handler_document()
    操作文档
    # -*- coding:utf-8 -*-
    from collections import OrderedDict
    
    import pymongo
    
    client = pymongo.MongoClient(host='192.168.10.100', port=27017, tz_aware=True)
    
    db = client.get_database('shsxt')
    emp = db.get_collection('emp')
    user = db.get_collection('user')
    
    
    def show(document):
        for d in document:
            print(d)
    
    
    def test_and():
        """and操作"""
        '''
            db.user.find({"$and":[
                {"_id":{"$gte":3,"$lte":4}},
                {"age":{"$gte":4}}
            ]})
        '''
        r = user.find({"$and": [
            {"_id": {"$gte": 3, "$lte": 4}},
            {"age": {"$gte": 4}}
        ]})  # 返回Cursor
        print(r)
        show(r)
    
    
    def test_or():
        """or操作"""
        '''
            db.user.find({"$or":[
                {"_id":{"$lte":1,"$gte":0}},
                {"_id":{"$gte":4}},
                {"name":"tianqi"}
            ]})
        '''
        r = user.find({"$or": [
            {"_id": {"$lte": 1, "$gte": 0}},
            {"_id": {"$gte": 4}},
            {"name": "tianqi"}
        ]})
        show(r)
    
    
    def test_reg():
        """正则操作"""
        # 匹配规则:z开头、n或u结尾,不区分大小写
        # db.user.find({'name':/^z.*(n|u)$/i})
        r = user.find({"name": {"$regex": '^z.*(n|u)$'}})
        show(r)
    
        # db.users.find({name: /a/})
        r = user.find({'name': {'$regex': 'a'}})
        print()
        print('like %a%')
        show(r)
        # db.users.find({name: /^zh/}) // like 'pa%'
        r = user.find({'name': {'$regex': '^zh'}})
        print()
        print('like zh%')
        show(r)
        # db.users.find({name: /qi$/}) // like'%ro'
        r = user.find({'name': {'$regex': 'qi$'}})
        print()
        print('like %qi')
        show(r)
    
    
    def test_array():
        """数组操作"""
        '''
            #查看所有人的第1个到第3个爱好,第一个{}表示查询条件为所有,第二个是显示条件
            db.user.find(
                {},
                {
                    "_id":0,
                    "name":0,
                    "age":0,
                    "addr":0,
                    "hobbies":{"$slice":[0,2]},
                }
            )
        '''
        r = user.find({}, {
            "_id": 0,
            "name": 0,
            "age": 0,
            "addr": 0,
            "hobbies": {"$slice": [0, 2]},
        })
        show(r)
    
    
    def test_page(page=1, page_size=2):
        """分页操作"""
        # 获取结果集
        r = user.find().skip((page - 1) * page_size).limit(page_size)
        show(r)
        # 获取总记录数
        # count = db.user.find().count()
        count = user.count()
        print('总记录数:', count)
    
    
    def test_project():
        """投影操作"""
        '''
            db.user.find({
                    "name":/^z.*(n|u)$/i
                },
                {
                    "_id":0,
                    "name":1,
                    "age":1
                }
            )
        '''
        r = user.find({"name": {"$regex": "^z.*(n|u)$"}}, {
            "_id": 0,
            "name": 1,
            "age": 1
        })
        show(r)
    
    
    def test_sort():
        """排序操作"""
        '''
            # 按姓名正序
            db.user.find().sort({"name":1})
            # 按年龄倒序 按id正序
            db.user.find().sort({"age":-1,'_id':1})
        '''
        r = user.find({}, {'age': 1}).sort('age', pymongo.ASCENDING)
        show(r)
        print()
    
        r = user.find({}, {'age': -1, '_id': 1}).sort([('age', pymongo.DESCENDING),
                                                       ('_id', pymongo.ASCENDING)])
        show(r)
    
    
    def test_group():
        """分组查询"""
        # {"$match": {"字段": "条件"}}, 可以使用任何常用查询操作符$gt,$lt,$in等
        # select * from db1.emp where post='公务员';
        r = db.emp.aggregate([{"$match": {"post": "公务员"}}])
    
        # select * from db1.emp where id > 3 group by post;
        r = db.emp.aggregate([
            {"$match": {"_id": {"$gt": 3}}},
            {"$group": {"_id": "$post", 'avg_salary': {"$avg": "$salary"}}}
        ])
    
        # select * from db1.emp where id > 3 group by post having avg(salary) > 10000;
        r = db.emp.aggregate([
            {"$match": {"_id": {"$gt": 3}}},
            {"$group": {"_id": "$post", 'avg_salary': {"$avg": "$salary"}}},
            {"$match": {"avg_salary": {"$gt": 10000}}}
        ])
    
        # {"$group": {"_id": 分组字段, "新的字段名": 聚合操作符}}
    
        # 1、将分组字段传给$group函数的_id字段即可
        # {"$group": {"_id": "$sex"}}  # 按照性别分组
        # {"$group": {"_id": "$post"}}  # 按照职位分组
        # {"$group": {"_id": {"state": "$state", "city": "$city"}}}  # 按照多个字段分组,比如按照州市分组
    
        # 2、分组后聚合得结果,类似于sql中聚合函数的聚合操作符:$sum、$avg、$max、$min、$first、$last
        # 例1:select post,max(salary) from db1.emp group by post;
        r = db.emp.aggregate([{"$group": {"_id": "$post", "max_salary": {"$max": "$salary"}}}])
    
        # 例2:取每个部门最大薪资与最低薪资
        r = db.emp.aggregate(
            [{"$group": {"_id": "$post", "max_salary": {"$max": "$salary"}, "min_salary": {"$min": "$salary"}}}])
    
        # 例3:如果字段是排序后的,那么$first,$last会很有用,比用$max和$min效率高
        r = db.emp.aggregate([{"$group": {"_id": "$post", "first_id": {"$first": "$_id"}}}])
    
        # 例4:求每个部门的总工资
        r = db.emp.aggregate([{"$group": {"_id": "$post", "count": {"$sum": "$salary"}}},
                              {"$sort": {"count": 1}}])
    
        # 例5:求每个部门的人数
        r = db.emp.aggregate([{"$group": {"_id": "$post", "count": {"$sum": 1}}},
                              {"$sort": {"count": 1}}])
    
        # 3、数组操作符
        # {"$addToSet": expr}  # 不重复
        # {"$push": expr}  # 重复
    
        # 例:查询岗位名以及各岗位内的员工姓名:select post,group_concat(name) from db1.emp group by post;
        r = db.emp.aggregate([{"$group": {"_id": "$post", "names": {"$push": "$name"}}}])
        r = db.emp.aggregate([{"$group": {"_id": "$post", "names": {"$addToSet": "$name"}}}])
        show(r)
    
    
    def test_project02():
        """投影操作"""
        # {"$project": {"要保留的字段名": 1, "要去掉的字段名": 0, "新增的字段名": "表达式"}}
    
        # select name,post,(age+1) as new_age from db1.emp;
        r = db.emp.aggregate([
            {
                "$project": {
                    "name": 1,
                    "post": 1,
                    "new_age": {"$add": ["$age", 1]}
                }
            }
        ])
        show(r)
    
    
    def test_sort_limit_skip():
        """排序限制跳过"""
        # 例1、取平均工资最高的前两个部门
        r = db.emp.aggregate([
            {
                "$group": {"_id": "$post", "平均工资": {"$avg": "$salary"}}
            },
            {
                "$sort": {"平均工资": -1}
            },
            {
                "$limit": 2
            }
        ])
        # 例2、取平均工资最高的第二个部门
        r = db.emp.aggregate([
            {
                "$group": {"_id": "$post", "平均工资": {"$avg": "$salary"}}
            },
            {
                "$sort": {"平均工资": -1}
            },
            {
                "$limit": 2
            },
            {
                "$skip": 1
            }
        ])
        show(r)
    
    
    def test_sample():
        """随机获取"""
        # 随机获取3个文档
        r = db.emp.aggregate([
            {"$sample": {"size": 3}}
        ])
        show(r)
    
    
    def test_str():
        """字符串操作"""
        # 截取字符串
        r = db.emp.aggregate([
            {
                "$project": {
                    "_id": 0,
                    "str": {"$substr": ["$sex", 0, 2]}
                }
            }
        ])
    
        # 拼接
        r = db.emp.aggregate([
            {
                "$project": {
                    "name": 1,
                    "post": 1,
                    "name_sex": {"$concat": ["$name", "测试拼接", "$sex"]}
                }
            }
        ])
    
        # 将性别的英文转为大写
        r = db.emp.aggregate([{"$project": {"sex": {"$toUpper": "$sex"}}}])
        show(r)
    
    
    if __name__ == '__main__':
        test_str()
    mongodb_query.py

    3 示例:scrapy+mongo(爬取bibi排行榜相关数据)

     bibi_ranke.py

    import scrapy
    from bibi.items import BibiItem
    
    class BibiRankeSpider(scrapy.Spider):
        name = 'bibi_ranke'
        # allowed_domains = ['https://www.bilibili.com']
        start_urls = ['https://www.bilibili.com/v/popular/rank/all']
    
        def parse(self, response):
    
            titles =response.xpath('//div[@class="info"]//a[@class="title"]/text()').extract()
            play_nums =response.xpath('//div[@class="detail"]/span[@class="data-box"][1]/text()').extract()
            up_names =response.xpath("//div[@class='detail']/a//text()").extract()
            scores =response.xpath('//div[@class="info"]//div[@class="pts"]/div/text()').extract()
    
            play_nums=[i.strip() for i in play_nums]
            up_names=[i.strip() for i in up_names]
            # print(titles,play_nums,up_names,scores)
            # 打包
            for title,play_num,up_name,score in zip(titles,play_nums,up_names,scores):
                item=BibiItem()
                item['play_num']=play_num
                item['title']=title
                item['up_name']=up_name
                item['score']=score
                yield item

    item.py

    # Define here the models for your scraped items
    #
    # See documentation in:
    # https://docs.scrapy.org/en/latest/topics/items.html
    
    import scrapy
    
    
    class BibiItem(scrapy.Item):
        _id =scrapy.Field()
        # define the fields for your item here like:
        title = scrapy.Field()
        play_num = scrapy.Field()
        up_name = scrapy.Field()
        score = scrapy.Field()

    pipelines.py

    # Define your item pipelines here
    #
    # Don't forget to add your pipeline to the ITEM_PIPELINES setting
    # See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
    
    
    # useful for handling different item types with a single interface
    from itemadapter import ItemAdapter
    import pymongo
    
    
    class BibiPipeline:
        # 爬虫开始时连接数据库
        def open_spider(self,spider):
            self.client =pymongo.MongoClient()
    
        def process_item(self, item, spider):
            # 存入mongodb数据库中
            self.client.bilibli.rank.insert_one(item)
            return item
    
        # 爬虫结束关闭数据库
        def close_spider(self,spider):
            self.client.close()
    # Scrapy settings for bibi project
    #
    # For simplicity, this file contains only settings considered important or
    # commonly used. You can find more settings consulting the documentation:
    #
    #     https://docs.scrapy.org/en/latest/topics/settings.html
    #     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
    #     https://docs.scrapy.org/en/latest/topics/spider-middleware.html
    
    BOT_NAME = 'bibi'
    
    SPIDER_MODULES = ['bibi.spiders']
    NEWSPIDER_MODULE = 'bibi.spiders'
    
    from fake_useragent import UserAgent
    # Crawl responsibly by identifying yourself (and your website) on the user-agent
    #USER_AGENT = 'bibi (+http://www.yourdomain.com)'
    
    USER_AGENT = UserAgent().random # 随机生成UA伪装
    
    # Obey robots.txt rules
    ROBOTSTXT_OBEY = False
    
    # Configure maximum concurrent requests performed by Scrapy (default: 16)
    #CONCURRENT_REQUESTS = 32
    
    # Configure a delay for requests for the same website (default: 0)
    # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
    # See also autothrottle settings and docs
    DOWNLOAD_DELAY = 1
    # The download delay setting will honor only one of:
    #CONCURRENT_REQUESTS_PER_DOMAIN = 16
    #CONCURRENT_REQUESTS_PER_IP = 16
    
    # Disable cookies (enabled by default)
    #COOKIES_ENABLED = False
    
    # Disable Telnet Console (enabled by default)
    #TELNETCONSOLE_ENABLED = False
    
    # Override the default request headers:
    #DEFAULT_REQUEST_HEADERS = {
    #   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    #   'Accept-Language': 'en',
    #}
    
    # Enable or disable spider middlewares
    # See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
    #SPIDER_MIDDLEWARES = {
    #    'bibi.middlewares.BibiSpiderMiddleware': 543,
    #}
    
    # Enable or disable downloader middlewares
    # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
    #DOWNLOADER_MIDDLEWARES = {
    #    'bibi.middlewares.BibiDownloaderMiddleware': 543,
    #}
    
    # Enable or disable extensions
    # See https://docs.scrapy.org/en/latest/topics/extensions.html
    #EXTENSIONS = {
    #    'scrapy.extensions.telnet.TelnetConsole': None,
    #}
    
    # Configure item pipelines
    # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
    ITEM_PIPELINES = {
       'bibi.pipelines.BibiPipeline': 300,
    }
    
    # Enable and configure the AutoThrottle extension (disabled by default)
    # See https://docs.scrapy.org/en/latest/topics/autothrottle.html
    #AUTOTHROTTLE_ENABLED = True
    # The initial download delay
    #AUTOTHROTTLE_START_DELAY = 5
    # The maximum download delay to be set in case of high latencies
    #AUTOTHROTTLE_MAX_DELAY = 60
    # The average number of requests Scrapy should be sending in parallel to
    # each remote server
    #AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
    # Enable showing throttling stats for every response received:
    #AUTOTHROTTLE_DEBUG = False
    
    # Enable and configure HTTP caching (disabled by default)
    # See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
    #HTTPCACHE_ENABLED = True
    #HTTPCACHE_EXPIRATION_SECS = 0
    #HTTPCACHE_DIR = 'httpcache'
    #HTTPCACHE_IGNORE_HTTP_CODES = []
    #HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
    settings.py
    作者:华王 博客:https://www.cnblogs.com/huahuawang/
  • 相关阅读:
    SQL语句编写
    触发器
    plot函数中的type中的参数
    【转】R中read.table详解
    7月18日R笔记
    RMySQL在windows下的安装方法
    WinXP下面实现JAVA对R调用 (rJava包设置)
    用R进行文档层次聚类完整实例(tm包)
    R学习之R层次聚类方法(tm包)
    R对term进行层次聚类完整实例(tm包)
  • 原文地址:https://www.cnblogs.com/huahuawang/p/14814176.html
Copyright © 2020-2023  润新知