Rack awareness in Hadoop is an essential feature that allows the cluster to understand the physical topology of nodes across different racks. By using this configuration, you can implement rack awareness to improve data placement strategies, resulting in optimized performance, fault tolerance, and efficient network usage within the Hadoop ecosystem.
- Data Locality: Minimizes network traffic by storing data closer to the compute nodes processing it.
- Fault Tolerance: Ensures data redundancy by placing replicas on different racks.
- Network Traffic Management: Reduces cross-rack data transfer, improving performance.
Follow these steps to configure rack awareness in your Hadoop cluster:
Define your rack setup in rackmap.txt. Place nodes in their respective racks.
rackmap.txt Example
192.168.56.109 /default/rack1 # node1
192.168.56.110 /default/rack2 # node2
192.168.56.111 /default/rack2 # node3
dn1 /default/rack1
dn2 /default/rack2
dn3 /default/rack2
Create the Python script rackaware.py to read the rackmap.txt file and return rack details.
rackaware.py Example
#!/usr/bin/env python3
import sys
DEFAULT_RACK = "/default/default"
HOST_RACK_FILE = "/usr/local/hadoop/etc/hadoop/rackmap.txt"
host_rack = {}
# Load the rack mapping from the file
with open(HOST_RACK_FILE) as f:
for line in f:
if line.strip(): # Avoid empty lines
host, rack = line.split()
host_rack[host] = rack
# Return the rack for each host provided as a command line argument
for host in sys.argv[1:]:
print(host_rack.get(host, DEFAULT_RACK))Make rackaware.py executable:
chmod +x /usr/local/hadoop/etc/hadoop/rackaware.pyModify core-site.xml to use the new rack awareness script.
core-site.xml Example
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://manager:9000</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/usr/local/hadoop/etc/hadoop/rackaware.py</value>
</property>
</configuration>Restart your Hadoop services to apply changes:
stop-all.sh
start-all.shVerify the setup:
hdfs dfsadmin -printTopology-
Rackaware.py :- Its python script file.
-
Rackmap.txt file :- This file contains all Datanodes IPโs and there host name for maping.
-
Core-site.xml file configuration:-
-
Hdfs print Topolog:-
-
File Information of rack-test.txt :- hdfs fsck / -files โlocations โblocks
๐จโ๐ป ๐๐ป๐ช๐ฏ๐ฝ๐ฎ๐ญ ๐ซ๐: Suraj Kumar Choudhary | ๐ฉ ๐๐ฎ๐ฎ๐ต ๐ฏ๐ป๐ฎ๐ฎ ๐ฝ๐ธ ๐๐ ๐ฏ๐ธ๐ป ๐ช๐ท๐ ๐ฑ๐ฎ๐ต๐น: csuraj982@gmail.com




