• cloudera learning4:Hadoop集群规划


    涉及到一些关于硬件的东西,我也不是很懂,记录下来有待以后学习。

    Hadoop集群一般都是由小到大,刚开始可能只有4到6个节点,随着存储数据的增加,计算量的增大,内存需求的增加,集群慢慢变大。

    比如按照数据存储量增大集群,每个星期数据存储3TB数据,HDFS的block备份数为3,则集群就需要9TB的磁盘,一般还要再预估25%buffer。如果一台机器的存储量为16*3T,则大概每个月往集群中增加1台机器。

    如何进行硬件选择?一般Hadoop节点分成管理节点(master node)和工作节点(work node)。管理节点上跑NameNode,Standby NameNode,ResourceManager,SecondaryNameNode。工作节点上跑DataNode,NodeManager,impala server进程。

    work nodes的推荐配置:

    中级配置(deep storage, 1Gb Ethernet ):

    – 16 x 3TB SATA II hard drives, in a non-RAID, JBOD* configuraGon – 1 or 2 of the 16 drives for the OS, with RAID-1 mirroring
    – 2 x 6-core 2.9GHz CPUs, 15MB cache
    – 256GB RAM

    – 2x1 Gigabit Ethernet

    高级配置(high memory, spindle dense, 10Gb Ethernet ):

    – 24 x 1TB Nearline/MDL SAS hard drives, in a non-RAID, JBOD* configuraGon

    – 2 x 6-core 2.9GHz CPUs, 15MB cache – 512GB RAM (or more)
    – 1x10 Gigabit Ethernet 

    Work Node不推荐RAID,不推荐Blade Servers。

    master node的推荐配置:

    Carrier-class hardware

    Dual power supplies

    Dual Ethernet cards
    – Bonded to provide failover

    RAIDed hard drives

    Reasonable amount of RAM
    – 64 GB for clusters of 20 nodes or less

    – 96 GB for clusters of up to 300 nodes

    – 128 GB for larger clusters 

    不推荐部署在虚拟化的主机上,因为虚拟化会带了很多不确定性,比如虚拟的三个server,实际的存储可能在一个物理server上,给hdfs的block备份带来风险。

    Network推荐:

    Nodes are connected to a top-of-rack switch

    Nodes should be connected at a minimum speed of 1Gb/sec

    Consider 10Gb/sec connecAons in the following cases:

    – Clusters storing very large amounts of data

    – Clusters in which typical jobs produce large amounts of intermediate data 

    Racks are interconnected via core switches
    Core switches should connect to top-of-rack switches at 10Gb/sec or faster

    Beware of oversubscripAon in top-of-rack and core switches

    Consider bonded Ethernet to miAgate against failure

    Consider redundant top-of-rack and core switches 

    用域名,避免用IP地址,最好配DNS.

    OS建议选centos or RedHat Enterprise Linux (RHEL) 

    磁盘划分越多越好,避免LVM(Logical Volume Manager),设置noatime。

    存储的文件size越大越好。

    OS,network,system time, user and group和component版本等等的配置,可以通过Cloudera Manager Host Inspector 进行check。

  • 相关阅读:
    (转)使用Nios II 9.1 SP1 SBTE的Flash Programmer的几点注意事项
    [转]linux mysql 更改MySQL数据库目录位置
    2009年第二天被小偷光顾
    [转]几乎没人能逃过的定向思维,我做过了,几乎全对,最后想的是苹果、鼻子跟鸭子
    QQ群里一段推理(恶搞)
    [转]Visio虚线复制到word中变为实线问题的解决办法
    [转]Vmware中提供的与网络通讯的三种网络模式的讲解
    [转]Word 2007书籍排版完全手册
    好好造句
    地产忽悠大全
  • 原文地址:https://www.cnblogs.com/zhq1007/p/5922065.html
Copyright © 2020-2023  润新知