0%

Hadoop API

记录自己工作中遇到的 Hadoop API。 可以当作字典来查找


前言

记录自己工作中遇到的 Hadoop API。 可以当作字典来查找

HDSF

HDFS 获取目录大小

获取文件大小,在命令行上,使用 hadoop fs -du 命令可以,但是通过 Java API 怎么获取呢

最开始想到的是递归的方法,这个方法很慢,后来发现 FileSystem.getContentSummary 的方法。

参考自java api获取hdfs目录大小

1
fs.getContentSummary(new Path(PathSrc)).getLength();
获取 HDFS 存储信息

FsStatus 对象提供了获取 hadoop 总空间、已使用空间、剩余空间 的方法。使用方式如下

1
2
3
4
5
FileSystem fileSystem = FileSystem.get(new URI(hdfsURL), configuration);
FsStatus fsStatus = fileSystem.getStatus();
long capacity = fsStatus.getCapacity(); //Configured Capacity
long remaining = fsStatus.getRemaining(); //DFS Remaining
long used = fsStatus.getUsed(); // DFS Used
获取 NN 的 active 节点

是使用 Hadoop 源码提供的 HAUtil 工具类来做的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.hdfs.HAUtil;

import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.URI;

/**
* @author lihm
* @date 2019-07-26 09:48
* @description TODO
*/

public class DemoApplication {

public void getNameNodeAdress() throws Exception {
Configuration conf = new Configuration();
String nameservices = "testcluster";
String[] namenodesAddr = {"massive-dataset-new-001:8020", "massive-dataset-new-008:8020"};
String[] namenodes = {"nn1", "nn2"};
conf.set("fs.defaultFS", "hdfs://" + nameservices);
conf.set("dfs.nameservices", nameservices);
conf.set("dfs.ha.namenodes." + nameservices, namenodes[0] + "," + namenodes[1]);
conf.set("dfs.namenode.rpc-address." + nameservices + "." + namenodes[0], namenodesAddr[0]);
conf.set("dfs.namenode.rpc-address." + nameservices + "." + namenodes[1], namenodesAddr[1]);
conf.set("dfs.client.failover.proxy.provider." + nameservices, "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");
String hdfsRPCUrl = "hdfs://" + nameservices + ":" + 8020;

FileSystem fs = null;
try {
fs = FileSystem.get(new URI(hdfsRPCUrl), conf);
InetSocketAddress active = HAUtil.getAddressOfActive(fs);
System.out.println(active.getHostString());
// System.out.println("hdfs port:" + active.getPort());
// InetAddress address = active.getAddress();
// System.out.println("hdfs://" + address.getHostAddress() + ":"+ active.getPort()); 
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (fs != null) {
fs.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

public static void main(String[] args) {
DemoApplication nn = new DemoApplication() ;
try {
nn.getNameNodeAdress();
} catch (Exception e) {
e.printStackTrace();
}
}
}

参考自通过api获取NN的active节点