• hadoop elementary course


    导引
    两个主要的问题
    如何存储海量数据
    如何分析海量数据

    Hadoop就是Hadoop项目
    它包括Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Oozie

    Hadoop文件系统适合于有数据流(一次写,多次读)和运行的普通主机上的海量数据
    但是Hadoop文件系统部适合运行延迟性输入,多次写以及随意修改的小文件

    HDFS 框架
    分块:默认64M(很大,因为用于海量数据)
    名字结点:含有文件系统的目录,文件信息以及相应的分块信息(很重要)
    数据结点:储存分块信息
    HA策略:1.x只能有一个名字结点,2.x之后就有针对名字结点的活动-待机模式

    MapReduce
    就是用于处理并行计算海量数据的编程模式
    举个例子,求9个数字的最大值
    第一步,调用map函数得到每三个数的最大值,这三个数都是用Hadoop文件系统的方式储存的
    第二步,用reduce函数得到最大的值

    总结,Hadoop文件系统就是提供储存海量数据在多个主机上的方法,以及相应的策略
    而Mapreduce就是用分而治之的思想来分析数据

    INTRODUCTORY
    the two main question
    first, how to handle the mass data storage - HDFS
    second, how to analyze the mass data - MapReduce

    Hadoop = The Hadoop projects
    including Common, Avro, MapReduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Oozie

    Hapood is suitable for very large files which possess streaming date access and run in commodity hardware.
    but hadoop is not proper for small files which have low-latency date access, multiply writer, arbitrary modification.


    HDFS Frame
    Block: default 64M(big, because for mass data)
    NameNode: contain catalogue of the file system, file info and according block info. (crucial)
    DateNode: store block info.
    HA strategy: 1.x just has one NameNode, and after 2.x, there is active-standy pattern of NameNode.


    MapReduce
    which is progroming, using for parallel computation of mass data.
    For example, get max of the nice numbers.
    Firstly, using map function get max of three numbers respectively.
    you know that the data is stored by the HDFS.
    Secondly, using reduce function to get the maximum value.


    In conclusion, the HDFS provide the method that store mess data in many host, incluing some strategy.
    then Mapreduce analyze the data by divide and rule.

  • 相关阅读:
    HDU 2717 Catch That Cow
    补题列表
    Codeforces 862C 异或!
    HDU 2084
    HDU 2037
    Codeforces 492B
    POJ 2262
    Codeforces 1037A
    HDU 1276
    itertools — Functions creating iterators for efficient looping
  • 原文地址:https://www.cnblogs.com/chuanlong/p/2822933.html
Copyright © 2020-2023  润新知