PyMongo 教程 - 润新知

PyMongo 教程

>先看[《MongoDB权威指南》](http://book.douban.com/subject/6068947/) >本教程是[官方教程](http://api.mongodb.org/python/current/tutorial.html)的非专业翻译教程 --- 本教程的目的是介绍__MongoDB__和__PyMongo__ 准备工作 --- 在我们开始之前，首先确保已经安装__PyMongo__模块，然后尝试在python shell中运行以下语句，没有出现异常，说明__pymongo__已经可以正常工作了: [shell] import pymongo 本教程还假设__MongoDB__实例已经正常运行在默认端口上。如果您已经下载并安装了__MongoDB__，可以像这样启动它: $ mongod 建立链接 --- 第一步就是使用__PyMongo__创建一个链接，用来链接到__mongod__实例: [shell] from pymongo import Connection [shell] connection = Connection() 上面的代码将会链接到默认的主机与端口。我们也可以指定主机和端口: [shell] connection = Connection('localhost', 27017) 获取数据库 --- 一个__MongoDB__实例可以支持多个独立的数据库。在__PyMongo__中，你可以使用属性风格来使用__connection__获得一个数据库: [shell] db = connection.test_database 如果是因为数据库名称或其他什么原因不能使用属性风格来访问的话，可以使用字典风格来访问这个数据库: [shell] db = connection['test-database'] 获取集合 --- 集合(Collection)是存放在__MongoDB__数据库中的一组文档，相当与关系型数据库中的表。获取一个集合与获取一个数据库的方法大致相同: [shell] collection = db.test_collection 或者(使用字典风格): [shell] collection = db['test-collection'] 需要注意的是，上述的语句中的集合与数据库在MongoDB中都是延迟创建的，当执行其他操作时才会被真正的创建。集合与数据库将会在有第一个文档插入后真正创建。文档 --- MongoDB中使用JSON风格的__BSON__文档(document)来表示数据。在__PyMongo__中，我们使用字典(dic)来表示一个文档(document)。例如，下面的字典就i表示一篇博客文章: [shell] import datetime [shell] post = {"author": "Mike", ... "text": "My first blog post!", ... "tags": ["mongodb", "python", "pymongo"], ... "date": datetime.datetime.utcnow()} 需要注意的是，上述代码中，要提交的文档包含了Python类型的数据(`datetime.datetime`类型)，它会被自动转换为适当的__BSON__类型。插入文档 --- 我们可以使用`insert()`方法来将文档插入到一个集合中: [shell] posts = db.posts [shell] posts.insert(post) ObjectId('...') 在插入文档时，如果没有显式指明`_id`键，__MongoDB__会自动为`_id`键产生一个`ObjectId`类型的值;如果指明`_id`键，那么请确保它的值在集合(collection)中是唯一。`insert()`方法会在执行后返回文档的`_id`值。在插入第一个文档的过程中，上述代码中的*posts*集合实际上已经被创建了，我们可以列出数据库中所有的集合来验证一下前面提到过的延迟创建: [shell] db.collection_names() [u'posts', u'system.indexes'] *__注意:__* *system.indexes*集合是一个自动创建的特殊内部集合使用`find_one()`获取文档 --- 在__MongoDB__中最常用的查询就是`find_one()`。此方法返回一个匹配条件的文档(如果没有参数则不进行匹配)。在知道只有一个匹配文档或者只对第一个文档感兴趣的情况下，这个方法非常有用。现在我们从*post*集合获取地一个文档: [shell] posts.find_one() {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']} 这个返回的字典与我们之前插入的第一条数据一样。 *__注意:__* 返回结果中的`_id`是插入时自动创建的 `find_one()`同时支持根据特定条件的查询。为了限制结果，我们现在只查询作者*author*为*Mike*的文档: [shell] posts.find_one({"author": "Mike"}) {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']} 我们可以尝试使用另一个不同的作者，比如*Eliot*，我们不会得到结果的，因为集合中唯一的一个文档不满足条件: [shell] posts.find_one({"author": "Eliot"}) Unicode字符串 --- 你可能会注意到返回结果的字符串与__Python__中默认的字符串有些不同(比如用`u'Mike`来代替`'Mike'`)。这里简短说明一下。 MongoDB以__BSON__格式存储数据，而__BSON__字符串使用的是UTF-8编码，所以__PyMongo__必须确保它存储的字符串为UTF-8格式。普通字符串(`str`)的存储不变，unicode字符串会被__PyMongo__自动转为UTF-8格式。批量插入 --- 为了使查询更有趣，让我们插入几个文档。除了插入单个文档，我们也可以通过传入一个可迭代的参数(`list`)批量插入多个文档。这样只使用一条命令将每次迭代的文档插入数据库: [shell] new_posts = [{"author": "Mike", ... "text": "Another post!", ... "tags": ["bulk", "insert"], ... "date": datetime.datetime(2009, 11, 12, 11, 14)}, ... {"author": "Eliot", ... "title": "MongoDB is fun", ... "text": "and pretty easy too!", ... "date": datetime.datetime(2009, 11, 10, 10, 45)}] [shell] posts.insert(new_posts) [ObjectId('...'), ObjectId('...')] 有一下几点比较有趣的事情需要注意: 1. `insert()`的返回值包含了两个`ObjectId`对象，每个都对应上面批量插入的文档 2. *new_posts[1]*与其他的posts看起来不一样:没有`tags`，并且增加了一个新的`title`。这里也证明了为什么我们一直说__MongoDB__是没有模式的查询多个文档 --- 要获得多个文档结果，我们使用`find()`方法来查询。`find()`返回一个`Cursor`(游标)对象，它可以让我们遍历所有匹配的文档。例如，我们可以遍历所有*posts*集合的文档: [shell] for post in posts.find(): ... post ... {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']} {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']} {u'date': datetime.datetime(2009, 11, 10, 10, 45), u'text': u'and pretty easy too!', u'_id': ObjectId('...'), u'author': u'Eliot', u'title': u'MongoDB is fun'} `find()`也可以像`find_one()`那样来进行条件查询。现在，我们来查询所有作者*author*为*Mike*的文档: [shell] for post in posts.find({"author": "Mike"}): ... post ... {u'date': datetime.datetime(...), u'text': u'My first blog post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']} {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']} 计数 --- 如果我们只是单纯的想知道有多少文件符合条件，我们可以执行`count()`，而不用进行一次完整的查询。我们可以得到一个集合中所有文档的总数: [shell] posts.count() 3 或者只统计符合条件的: [shell] posts.find({"author": "Mike"}).count() 2 范围查询 --- __MongoDB__支持许多不同类型的高级查询。例如，我们只查询符合某一特定日期提交的文档，别且结果按作者*author*排序: [shell] d = datetime.datetime(2009, 11, 12, 12) [shell] for post in posts.find({"date": {"$lt": d}}).sort("author"): ... print post ... {u'date': datetime.datetime(2009, 11, 10, 10, 45), u'text': u'and pretty easy too!', u'_id': ObjectId('...'), u'author': u'Eliot', u'title': u'MongoDB is fun'} {u'date': datetime.datetime(2009, 11, 12, 11, 14), u'text': u'Another post!', u'_id': ObjectId('...'), u'author': u'Mike', u'tags': [u'bulk', u'insert']} 上面的代码中，我们使用了特殊操作符`$lt`来限制条件，并且使用了`sort()`方法来将结果以作者排序。索引 --- 为了使上面的查询速度快，我们可以在*date*和*author*上添加一个复合索引。首先，我们使用`explain`工具来获取查询在没有使用索引情况下的一些信息: [shell] posts.find({"date": {"$lt": d}}).sort("author").explain()["cursor"] u'BasicCursor' [shell] posts.find({"date": {"$lt": d}}).sort("author").explain()["nscanned"] 3 我们可以看到，当前查询使用*BasicCurosr*游标，说明没有使用索引;*nscanned*说明数据库查找了3个文档。现在让我们加上一个复合索引再看看: [shell] from pymongo import ASCENDING, DESCENDING [shell] posts.create_index([("date", DESCENDING), ("author", ASCENDING)]) u'date_-1_author_1' [shell] posts.find({"date": {"$lt": d}}).sort("author").explain()["cursor"] u'BtreeCursor date_-1_author_1' [shell] posts.find({"date": {"$lt": d}}).sort("author").explain()["nscanned"] 2 现在的查询使用了*BtreeCursor*游标，说明使用了索引，并且索引存储在B树结构中;*nscanned*说明数据库只查找了2个符合条件的文档
相关阅读:
Linux
Python
Linux
Python
爬虫
 WEB
法正（13）：密谋
 法正（12）：张松
 法正（11）：入川
 法正（10）：袍哥
原文地址：https://www.cnblogs.com/hangxin1940/p/2806471.html