• elasticsearch data importing


    ElasticSearch stores each piece of data in a document.

    That's what I need.

    Using the bulk API.

    Transform the raw data file from data.json to be new_data.json .

    And then do this to import data to ElasticSearch :

    curl -s -XPOST 'localhost:9200/_bulk' --data-binary @new_data.json

    For example, I now have a raw JSON data file as following:

     The file   data.json

    {"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
    {"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
    {"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}

    Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.

    A new file will be created  new_data.json

    {"index":{"_index":"myindex1","_type":"mytype1"}}
    {"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"}
    {"index":{"_index":"myindex1","_type":"mytype1"}}
    {"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"}
    {"index":{"_index":"myindex1","_type":"mytype1"}}
    {"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}


    There are information above each of the data line in the file new_data.json

    And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_********   line

    Here is an example of a valid JSON file for elasticsearch.

    full_data.json

    {"index":{"_index":"myindex1","_type":"mytype1"}}
    {"key1":"value1","key2":"value2","key3":"value3"}
    {"index":{"_index":"myindex1","_type":"mytype1"}}
    {"key1":"abcde","key2":"efg","key3":"klm"}
    {"index":{"_index":"myindex2","_type":"mytype2"}}
    {"newkey":"newvalue"}


    Notice that : There are 2 indexes in the file above. They are   myindex1  and  myindex2

    And the data schema in index myindex2 is different from that in index myindex1 .

    That's why it's so important to have so many lines of {"index":{"_********    in the new data file.

    -----

    Now I am coding a python scripe to manipulate with some raw JSON data files.

    Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.

    example_raw_data.json

    import sys
    
    def get_schema():
        """
        """
        return None
    
    
    if __name__ == "__main__":
        print(get_schema)

    -------------Updated on 27th Nov. 2015 ----------

    I solved this by inventing a new wheel

    You can check this out:

    https://github.com/xros/json-py-es

    -------------Updated on 28th Nov. 2015  at 01:33 A.M. ----------

    pip install jsonpyes

    I wrote this module and it works!

    Happy hacking!

  • 相关阅读:
    Windows 编程入门,了解什么是UWP应用。
    java getway springcloud 记录请求数据
    nginx服务器配置传递给下一层的信息的一些参数-设置哪些跨域的域名可访问
    e.printStackTrace() 原理的分析
    关于性能测试组出现的问题查询和优化
    springboot connecting to :mongodb://127.0..0.1:27017/test authentication failed
    redis 集群 slots are covered by nodes.
    @PostConstruct +getapplicationcontext.getbean springboot获取getBean
    idea 错误: 找不到或无法加载主类 xx.xxx.Application
    elastic-job和spring cloud版本冲突2
  • 原文地址:https://www.cnblogs.com/spaceship9/p/4974607.html
Copyright © 2020-2023  润新知