elasticsearch data importing

elasticsearch data importing
ElasticSearch stores each piece of data in a document.

That's what I need.

Using the bulk API.

Transform the raw data file from data.json to be new_data.json .

And then do this to import data to ElasticSearch :
```
curl -s -XPOST 'localhost:9200/_bulk' --data-binary @new_data.json
```
For example, I now have a raw JSON data file as following:

The file data.json
```
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}
```
Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.

A new file will be created new_data.json
```
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}
```
There are information above each of the data line in the file new_data.json

And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_******** line

Here is an example of a valid JSON file for elasticsearch.

full_data.json
```
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"value1","key2":"value2","key3":"value3"}
{"index":{"_index":"myindex1","_type":"mytype1"}}
{"key1":"abcde","key2":"efg","key3":"klm"}
{"index":{"_index":"myindex2","_type":"mytype2"}}
{"newkey":"newvalue"}
```
Notice that : There are 2 indexes in the file above. They are myindex1 and myindex2

And the data schema in index myindex2 is different from that in index myindex1 .

That's why it's so important to have so many lines of {"index":{"_******** in the new data file.

-----

Now I am coding a python scripe to manipulate with some raw JSON data files.

Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.

example_raw_data.json
```
import sys

def get_schema():
    """
    """
    return None


if __name__ == "__main__":
    print(get_schema)
```
-------------Updated on 27th Nov. 2015 ----------

I solved this by inventing a new wheel

You can check this out:

https://github.com/xros/json-py-es

-------------Updated on 28th Nov. 2015 at 01:33 A.M. ----------
```
pip install jsonpyes
```
I wrote this module and it works!

Happy hacking!
相关阅读:
Windows 编程入门，了解什么是UWP应用。
java getway springcloud 记录请求数据
 nginx服务器配置传递给下一层的信息的一些参数-设置哪些跨域的域名可访问
 e.printStackTrace() 原理的分析
 关于性能测试组出现的问题查询和优化
 springboot connecting to :mongodb://127.0..0.1:27017/test authentication failed
redis 集群 slots are covered by nodes.
@PostConstruct +getapplicationcontext.getbean springboot获取getBean
idea 错误: 找不到或无法加载主类 xx.xxx.Application
elastic-job和spring cloud版本冲突2
原文地址：https://www.cnblogs.com/spaceship9/p/4974607.html