• SpringBoot集成Sqoop1.4.6实现关系型数据库与Hive数据库的数据相互同步


    在你服务器上安装测试好hive、hadoop、hbase、sqoop这些要用的工具(不会自行百度/Google),目前Sqoop有两个版本分别是sqoop1:1.4.x和sqoop2:1.99.x(这里为什么不选用sqoop2原因自行百度/Google),这里主要讲SpringBoot集成Sqoop1.4.6实现关系型数据库与Hive数据库的数据相互同步,不废话直接进入主题。

    相关依赖
    <!--sqoop-->
    <dependency>
    <groupId>org.apache.sqoop</groupId>
    <artifactId>sqoop</artifactId>
    <version>1.4.6</version>
    </dependency>
    <!--hadoop-->
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.10.0</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.10.0</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.10.0</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>2.10.0</version>
    <scope>test</scope>
    </dependency>
    这里要强调一下Windows本地也要安装和服务器一样的工具一样的版本,并且还要配置home环境,Maven依赖版本一定也要和你服务器安装工具版本相同,要不然整合的时候会遇到一些乱七八糟的错误,这些错误网上还不好找解决方法。完成以上相关环境后在你项目根目录创建hive-site.xml文件(这个文件在你安装的hive工具里也能找到,可以直接复制过来)。

    hive-site.xml文件内容

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>

    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://数据库地址:3306/metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>这里配置用户</value>
    <description>username to use against metastore database</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>这里配置密码</value>
    <description>password to use against metastore database</description>
    </property>
    <property>
    <name>hive.cli.print.header</name>
    <value>true</value>
    </property>

    <property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
    </property>


    <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    </property>
    <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/warehouse</value>
    <description>location of default database for the warehouse</description>
    </property>

    <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>
    Enforce metastore schema version consistency.
    True: Verify that version information stored in is compatible with one from Hive jars. Also disable
    automatic
    schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
    proper metastore schema migration. (Default)
    False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
    </property>

    <property>
    <name>hive.metastore.schema.verification.record.version</name>
    <value>false</value>
    <description>
    When true the current MS version is recorded in the VERSION table. If this is disabled and verification is
    enabled the MS will be unusable.
    </description>
    </property>

    <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    </property>

    <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>这里配置你安装机器地址</value>
    </property>

    </configuration>
    到这里可以说已经完成了SpringBoot集成Sqoop了,接下来就是你业务开发了。

    简单导入导出案例
    public static void main(String[] args) throws IOException {
    String[] oracle1 = new String[] {
    "--connect","jdbc:oracle:thin:@//xxx.xxx.x.xx:1521/orcl",
    "-username","xxxx",
    "-password","xxxx",
    "--table","xxxx",
    "--hive-import",
    "--hive-database","db_test",
    "--create-hive-table",
    "--fields-terminated-by","'\t'",
    "-m","1",
    };

    SqoopTool tool = SqoopTool.getTool("import"); //关系型数据库导hive export
    //SqoopTool tool = SqoopTool.getTool("export"); //hive导关系型数据库
    Configuration conf= new Configuration();
    conf.set("fs.default.name","hdfs://hadoop001:9000/"); //这里是你的机器地址,不能直接写在这里,要配置在host文件里,具体是什么原因还不清楚
    Configuration loadPlugins = SqoopTool.loadPlugins(conf);
    Sqoop sqoop = new Sqoop((com.cloudera.sqoop.tool.SqoopTool) tool,loadPlugins);
    FileSystem fileSystem = FileSystem.get(conf);
    Path path = fileSystem.getHomeDirectory();
    // 判断output文件夹是否存在,如果存在则删除
    if (fileSystem.exists(path)) {
    fileSystem.delete(path, true);// true的意思是,就算output有东西,也一带删除
    }
    int res = Sqoop.runSqoop(sqoop,oracle1);
    System.out.println(res);
    System.out.println("执行sqoop结束");
    }
    sqoop1不仅支持关系型数据库和hive数据库的同步,也同样支持hbase和hdfs的同步。

    所有sqoop1.4.6操作命令:Sqoop1.4.6开发文档

    这里记录一个奇葩的问题
    错误:提示hadoop无法找到JAVA_HOME的问题。

    解决办法:找到本地安装hadoop环境下的etc/Hadoop/hadoop-env.cmd 文件,修改set JAVA_HOME = C:\PROGRA~1\java\jdk(注:在设置路径的时候不能设置为Program Files的形式,window的系统目录的空格为中文空格,hadoop会存在读取路径乱码的情况)。


    ————————————————
    版权声明:本文为CSDN博主「一个不称职的程序猿」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/qq_39035773/article/details/107961561

  • 相关阅读:
    Linux 中 java 访问 windows共享目录
    Java中Calender引用类型
    Hadoop MultipleOutputs 结果输出到多个文件夹 出现数据不全,部分文件为空
    转个扯淡的~
    nginx+uwsgi部署python web(web.py)
    关于已经安装python为何还要安装python-dev
    centos7安装mysql5.6
    centos7.0防火墙更换为firewalld
    并发模型(二)——Master-Worker模式
    并发模型(一)——Future模式
  • 原文地址:https://www.cnblogs.com/javalinux/p/14842038.html
Copyright © 2020-2023  润新知