(十四）Exploring Your Data

Sample Dataset

Now that we’ve gotten a glimpse of the basics, let’s try to work on a more realistic dataset. I’ve prepared a sample of fictitious JSON documents of customer bank account information. Each document has the following schema:

现在我们已经了解了基础知识，让我们尝试更真实的数据集。我准备了一份关于客户银行账户信息的虚构JSON文档样本。每个文档都有以下架构：

{
    "account_number": 0,
    "balance": 16623,
    "firstname": "Bradshaw",
    "lastname": "Mckenzie",
    "age": 29,
    "gender": "F",
    "address": "244 Columbus Place",
    "employer": "Euron",
    "email": "bradshawmckenzie@euron.com",
    "city": "Hobucken",
    "state": "CO"
}

For the curious, this data was generated using www.json-generator.com/, so please ignore the actual values and semantics of the data as these are all randomly generated.

奇怪的是，这些数据是使用www.json-generator.com/生成的，因此请忽略数据的实际值和语义，因为这些都是随机生成的。

Loading the Sample Dataset

You can download the sample dataset (accounts.json) from here. Extract it to our current directory and let’s load it into our cluster as follows:

您可以从此处下载示例数据集（accounts.json）。将它解压缩到我们当前的目录，然后将它们加载到我们的集群中，如下所示：

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"
curl "localhost:9200/_cat/indices?v"

And the response:

health status index uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank  l7sSYV2cQXmu6_4rJWVIww   5   1       1000            0    128.6kb        128.6kb

Which means that we just successfully bulk indexed 1000 documents into the bank index (under the _doc type).

这意味着我们只是成功地将1000个文档批量索引到银行索引（在_doc类型下）。

相关阅读:
编写高质量代码建议17代码错误调试
同步和异步的不同场景的概念理解
kafka版本0.8.2.0-Producer Configs之request.required.acks
linux的grep命令
jetty服务器访问系统的域名
linux工具问题，tail -f 失效
memcached并发处理
python爬虫scrapy的Selectors参考文档
访问nginx提示gateway timeout 504 ，发现总是当调用时间超过30s时提示504错误
重构再次理解

原文地址：https://www.cnblogs.com/shuaiandjun/p/10273257.html