Cloudera Manager API 使用

Cloudera Manager API 简单使用

前言

Hadoop 是一个开源项目，所以很多公司在这个基础进行商业化，Cloudera 对 Hadoop 做了相应的改变，
Cloudera 公司的发行版，我们将该版本称为 CDH(Cloudera Distribution Hadoop)。

我司使用的就是 CDH 版本，方便快捷部署，同时指标可视化，健康检测也做的挺好的。

Clouder Manager 是管理 CDH 的管理平台，通过接入 Cloudera Manager 获取到一些信息，
然后基于信息做些监控报警和宕机重启操作

因为之前使用的是 CDH 5.7.1 版本的，API 直接可用，但是现在新集群使用的是 CDH 6.3.1 ，API 改版使用 Swagger 方式来
操作了，按以前的方法各种踩坑。

使用

Cloudera Manager API 官网地址

参考官网实例基本够用，同时注意选择 API 版本

访问的cm地址可以直接查看 api 版本 http://host:port/api/version，参考自第一个链接

swagger 的方式与之前真不大相同，这里把写的 Python demo 保存下，方便以后查阅，注意注册一个可读账号来访问

python 2.7.9

#!/usr/bin/python
# -*- coding: UTF-* -*-

import cm_client
from cm_client.rest import ApiException
from pprint import pprint

# Configure HTTP basic authorization: basic
cm_client.configuration.username = 'hadoop'
cm_client.configuration.password = 'xxxx'

# Create an instance of the API class
api_host = 'http://192.168.1.240'
port = '7180'
api_version = 'v30'
# Construct base URL for API
# http://cmhost:7180/api/v30
api_url = api_host + ':' + port + '/api/' + api_version
api_client = cm_client.ApiClient(api_url)
cluster_api_instance = cm_client.ClustersResourceApi(api_client)


# Lists all known clusters.
api_response = cluster_api_instance.read_clusters(view='SUMMARY')
for cluster in api_response.items:
    print (cluster.name , "-", cluster.full_version)
    if cluster.full_version.startswith("6."):
        services_api_instance = cm_client.ServicesResourceApi(api_client)
        services = services_api_instance.read_services(cluster.name, view='FULL')
        for service in services.items:
            print (service.name, "-", service.type)
            if service.type == 'HDFS':
                hive = service
                print ("lihm")
                print (hive.name, hive.service_state, hive.health_summary)
                for health_check in hive.health_checks:
                    print (health_check.name, "---", health_check.summary)
                role_api_instance = cm_client.RolesResourceApi(api_client)
                roles = role_api_instance.read_roles(cluster.name, hive.name)
                for role in roles.items:
                    if role.type == 'NAMENODE':
                        nn = role
                        print (nn.name, nn.role_state, nn.health_summary, nn.host_ref.host_id)
                        for hc in nn.health_checks:
                            print (hc.name, "---", hc.summary)
                # cmd = services_api_instance.restart_command(cluster.name, hive.name)
                # print ("cmd.active")

场景

从运维 CDH 集群以来，使用的场景无非就是监控、告警

使用 Python 方式来做脚本获取信息，并接入第三方告警系统中，如 open-falcon
使用 Java 方式获取指标信息，接入自己开发的管理平台，可视化展示一些集群指标信息
使用 Java 方式获取角色异常退出信息并操作重启退出角色

参考链接

cm_api.api_client.ApiException: (error 404)