• spark单机模式简单搭建


    待安装列表
    hadoop
    hive
    scala
    spark
    一.环境变量配置:
    ~/.bash_profile
    PATH=$PATH:$HOME/bin

    export PATH

    JAVA_HOME=/usr/local/jdk
    export SCALA_HOME=/usr/local/scala
    export SPARK_HOME=/usr/local/spark
    export PATH=.:$JAVA_HOME/bin:$SCALA_HOME/bin:$PATH

    HADOOP_HOME=/usr/local/hadoop
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    export HADOOP_HOME PATH

    HIVE_HOME=/usr/local/hive
    PATH=$HIVE_HOME/bin:$PATH
    export HIVE_HOME PATH

    二.hadoop 安装搭建
    1.配置ssh互信
    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    chmod 700 ~/.ssh
    chmod 600 ~/.ssh/authorized_keys

    2.修改hostname 为yul32 vi/etc/hosts vi /etc/sysconfig/network
    (3.修改hadoop-env.sh
    export JAVA_HOME=/usr/local/jdk)

    (4.修改core-site.xml)
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://yul32:9000</value>
    </property>

    (5.修改hdfs-site.xml) (/usr/hadoop-2.3.0/etc/hadoop)
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>/usr/local/hadoop/dfs/name</value>
    </property>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>/usr/local/hadoop/dfs/data</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    <property>
    <name>dfs.permission</name>
    <value>false</value>
    </property>

    (5.修改mapred-site.xml) (mapred-site.xml.template ?) (/usr/hadoop-2.3.0/etc/hadoop)
    <property>
    <name>mapreduce.cluster.temp.dir</name>
    <value></value>
    <description>No description</description>
    <final>true</final>
    </property>
    <property>
    <name>mapreduce.cluster.local.dir</name>
    <value></value>
    <description>No description</description>
    <final>true</final>
    </property>

    (6.修改yarn-site.xml) (/usr/hadoop-2.3.0/etc/hadoop)
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>

    7.修改slaves.sh ??
    yul32
    8.namenode format
    输入命令hadoop namenode –format
    9.启动hadoop
    cd hadoop/sbin start-all.sh

    ifup ifdown

    三.spark 搭建
    (/usr/spark-1.1.0-bin-hadoop2.3/conf) <报错 readonly>
    1.修改conf/slaves
    yul32

    (2.修改spark-env.sh (/usr/spark-1.1.0-bin-hadoop2.3/conf))
    export SCALA_HOME=/usr/local/scala
    export JAVA_HOME=/usr/local/jdk
    export SPARK_MASTER_IP=yul32
    export SPARK_WORKER_CORES=1
    export SPARK_WORKER_INSTANCES=1
    export SPARK_MASTER_PORT=7077
    export SPARK_WORKER_MEMORY=1g
    export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}

    3.启动spark
    ./sbin/start-all.sh
    4.运行spark例子
    ./bin/run-example org.apache.spark.examples.JavaSparkPi 2
    5.运行scala-shell
    ./bin/spark-shell --master local[2]
    6.python
    ./bin/pyspark --master local[2]
    7.启动spark sql
    ./sbin/start-thriftserver.sh(./sbin/start-thriftserver.sh --master yarn)
    在后台运行命令: nohup ./sbin/start-thriftserver.sh --master yarn &
    查看后台运行进程命令: jobs -l
    启动后jps 中包含 SparkSubmit
    8.spark sql 客户端连接
    ./bin/beeline -u jdbc:hive2://yul32:10000 -n spark -p spark
    说明 -n 用户名 -p 密码
    或者输入命令 ./bin/beeline
    beeline> !connect jdbc:hive2://yul32:10000
    用户名
    密码

    上传文件,创建表;
    1.hadoop fs -ls /user/ocdc/coc
    hadoop fs -put /home/ocdc/CI_CUSER_20141104112305197.csv /user/ocdc/coc
    2.shark> create table CI_CUSER_20141104112305196( PRODUCT_NO string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' ;
    shark> load data inpath '/user/ocdc/coc/CI_CUSER_20141104112305197.csv' into table CI_CUSER_20141104112305196;
    shark> create table CI_CUSER_20141104112305197( PRODUCT_NO string)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY ' ' stored as rcfile;
    shark> insert into table CI_CUSER_20141104112305197 select * from CI_CUSER_20141104112305196;

    四.hive 安装配置(非必须)
    1.修改hive-env.sh
    export HADOOP_HOME=/usr/local/hadoop
    export HIVE_CONF_DIR=/usr/local/hive/conf
    2.hive 远程服务 (端口号10000) 启动方式
    hive --service hiveserver &
    连接Hive JDBC URL:jdbc:hive://ip:10000/default (Hive默认端口:10000 默认数据库名:default)
    hive数据仓库的位置
    hive/conf/hive-site.xml
    hive.metastroe.warehouse.dir:数据仓库的位置,默认是/user/hive/warehouse;
    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    shark jdbc 连接
    1.查看SharServer 是否启动
    [ocdc@oc98 conf]$ jps
    7983 Kafka
    8803 SharkCliDriver
    7377 ResourceManager
    16894 SharkServer
    6925 JournalNode
    12601 CoarseGrainedExecutorBackend
    17056 CoarseGrainedExecutorBackend
    18424 Jps
    14486 Master
    4108 QuorumPeerMain
    23408 HRegionServer
    17655 RunJar
    6727 DataNode
    7132 DFSZKFailoverController
    7510 NodeManager
    12553 WorkerLauncher
    6614 NameNode
    23268 HMaster
    12415 SharkCliDriver
    2.查看SharkServer端口
    [ocdc@oc98 conf]$ netstat -apn | grep 16894
    tcp 0 0 ::ffff:10.1.251.98:57902 :::* LISTEN 16894/java
    tcp 0 0 :::52309 :::* LISTEN 16894/java
    tcp 0 0 :::9977 :::* LISTEN 16894/java
    tcp 0 0 :::41222 :::* LISTEN 16894/java
    tcp 0 0 :::4040 :::* LISTEN 16894/java
    tcp 0 0 :::45192 :::* LISTEN 16894/java
    tcp 0 0 ::ffff:10.1.251.98:35289 ::ffff:10.1.251.98:3306 ESTABLISHED 16894/java
    tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.104:41877 ESTABLISHED 16894/java
    tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.98:53176 ESTABLISHED 16894/java
    tcp 0 0 ::ffff:10.1.251.98:9977 ::ffff:10.1.48.20:60586 ESTABLISHED 16894/java
    tcp 1 0 ::ffff:10.1.251.98:57320 ::ffff:10.1.251.98:50012 CLOSE_WAIT 16894/java
    tcp 0 0 ::ffff:10.1.251.98:9977 ::ffff:10.1.48.20:59756 ESTABLISHED 16894/java
    tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.101:50160 ESTABLISHED 16894/java
    tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.98:53172 ESTABLISHED 16894/java
    tcp 0 0 ::ffff:10.1.251.98:57902 ::ffff:10.1.251.101:50159 ESTABLISHED 16894/java
    unix 2 [ ] STREAM CONNECTED 8889813 16894/java
    unix 2 [ ] STREAM CONNECTED 8889793 16894/java
    端口为9977 即shark服务启动端口 nohup ./bin/shark –-service sharkserver –-p 9977 &

    3.jdbc连接
    public class SharkTest {
    private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";

    public static void main(String args[]) throws SQLException {
    try {
    Class.forName(driverName);
    } catch (ClassNotFoundException e) {

    e.printStackTrace();

    System.exit(1);
    }
    Connection con = DriverManager.getConnection(
    "jdbc:hive://10.1.251.98:9977/default", "ocdc", "asiainfo");
    Statement stmt = con.createStatement();
    ResultSet res = stmt.executeQuery("select * from src ");
    if (res.next()) {
    System.out.println(res.getString(1)+ " " + res.getString(2));
    }
    }
    }


    Sparksql Sever启动命令
    ./sbin/start-thriftserver.sh --master yarn
    客户端连接
    ./bin/beeline -u jdbc:hive2://10.1.251.98:10000 -n ocdc -p asiainfo
    让配置文件立即生效
    source /etc/profile

    依赖jar包
    hive-common-0.8.1.jar
    hive-exec-0.8.1.jar
    hive-jdbc-0.8.1.jar
    hive-metastore-0.8.1.jar
    hive-service-0.8.1.jar
    libfb303.jar
    slf4j-api-1.4.3.jar
    slf4j-log4j12-1.4.3.jar
    httpclient-4.2.5.jar
    hadoop-common-2.3.0.jar

    wq 是保存
    i 是编辑
    q 是强制退出

    (赋权)
    1、到你想要赋权的文件夹路径下
    2、使用 chmod 777 slaves(为这个文件赋权)
    3、赋权给ysy(用户)写的权限 chown -R ysy132:ysy132 dfs

    切换用户 使用 su - ysy

    (hadoop报错日志位置为 /usr/hadoop-2.3.0/logs)
    tail -500 hadoop-root-namenode-ysy0915.log 查看500行报错日志

    (启动hadoop)
    在hadoop-2.3.0目录下 输入./sbin/start-dfs.sh
    停止 .sbin/stop-dfs.sh ./sbin/stop-dfs.sh

    查看启动的节点 jps
    (回退到上一个目录下)


    eg:spark SQL
    (select a+b from table)
    val a:Int = inputRow.getInt(0)
    val b:Int = inputRow.getInt(1)
    val result:Int = a + b
    resultRow.setInt(0,result)

    def generateCode(e: Expression): Tree = e match{
    case Attribute(ordinal) =>
    q"inputRow.getInt($ordinal)"
    case Add(left,right)=>
    q"""
    {
    val leftResult = ${generateCode(left)}
    val rightResult = ${generateCode(right)}
    leftResult + rightResult
    }
    """
    }

  • 相关阅读:
    session
    .net core 入坑经验
    .net core 入坑经验
    .net core 入坑经验
    一段刚刚出炉的CSV文件转换为DataTable对象的代码
    Github的一般用法
    SQLite简单使用记录
    一次SQLServer数据库宕机问题
    B样条基函数(cubic spline basis)
    matlab使用
  • 原文地址:https://www.cnblogs.com/yangsy0915/p/4866917.html
Copyright © 2020-2023  润新知