Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据. Hue在数据库方面,默认使用的是SQLite数据库来管理自身的数据,包括用户认证和授权,另外,可以自定义为MySQL数据库、Postgresql数据库、以及Oracle数据库
目录:
- 功能介绍 (演示地址: http://gethue.com/)
- 安装部署
- Azure 安装CDH
功能介绍
- 对HDFS的访问,通过浏览器来查阅HDFS的数据
- Hive编辑器:可以编写HQL和运行HQL脚本,以及查看运行结果等相关Hive功能
- 提供Solr搜索应用,并对应相应的可视化数据视图以及DashBoard
- 提供Impala的应用进行数据交互查询
- 最新的版本集成了Spark编辑器和DashBoard
- 支持Pig编辑器,并能够运行编写的脚本任务
- Oozie调度器,可以通过DashBoard来提交和监控Workflow、Coordinator以及Bundle
- 支持HBase对数据的查询修改以及可视化
- 支持对Metastore的浏览,可以访问Hive的元数据以及对应的HCatalog
- 对Job的支持,Sqoop,ZooKeeper等的支持
安装部署
- 安装配置过程参见:http://cloudera.github.io/hue/docs-3.6.0/manual.html
- hue支持广,依赖多,系统环境有所缺失安装就比较麻烦,如make时会自建一个虚拟的运行环境,导致与系统默认有所偏差,造成编译安装过程遇到一些问题
- 最简单的安装方式当然是使用CDH的RPM包,但是就要用到CDH的一整套集群环境,毕竟这在已有集群的情况下不太合理,可行性低
Azure 安装CDH
- Go to https://ms.portal.azure.com
- Click on resource groups on the left navigation bar
- Enter a name for your resource group, pick the subscription and availability region and click on “create”.This will create a resource group that we will use in the cluster setup
- Click on “New”, then on “Data + Analytics” and then on “Cloudera Enterprise Data Hub”
- In the blade that opens up, under “Select deployment model”, click on “Resource Manager”, the click “Create”
- In the blade that opens, click on “Basics, Configure basic settings”; Here, enter the following: User name (Linux user)password....
-
- Next, click on “Inftrastructure information”; See screenshot below for where you can customize, and where to leave defaults.
- Next, click on “Cloudera setup information”; Here, enter the following: Cloudera Manager User Name Password Cluster Type (two options – POC and Production) Number of data nodes
- Click on user information, enter some details about yourself.
- Click on “Buy” and then create. This will provision the cluster.
- Step away for a long break; At the time this post was written, it took more than an hour. You can monitor the progress from the portal.
Nodes and Roles
- In the setup, we entered 3 data nodes, and selected Production,The following are the nodes and the roles running on them:
Connecting to the cluster