Course description
Hadoop Fundamentals is a comprehensive course designed to introduce you to the core components of the Hadoop ecosystem, providing the foundational knowledge and practical skills necessary to work with big data technologies. Whether you’re a beginner or have some experience, this course will equip you with the expertise needed to effectively manage and process large-scale data using Hadoop.
Course Overview:
- Basic Concepts of Modern Data Architecture
- Begin with an introduction to modern data architecture, focusing on the role Hadoop plays in managing and processing big data. Understand the evolution of data management technologies and how Hadoop fits into the larger ecosystem.
- HDFS: Hadoop Distributed File System
- Delve into the architecture of HDFS, exploring how it manages distributed storage, replication, and data accessibility. Learn key commands for working with HDFS and get hands-on experience connecting to a Hadoop cluster and managing files using both the shell and Hue interface.
- The MapReduce Paradigm and Its Implementation in Java and Hadoop Streaming
- Explore the MapReduce programming model, a core component of Hadoop for processing large datasets. Learn how to implement MapReduce in Java and through Hadoop Streaming. Practice by launching applications and observing how data is processed in a distributed environment.
- YARN: Distributed Application Execution Management
- Understand the role of YARN in managing distributed applications within Hadoop. Learn about YARN’s architecture, how to launch applications in YARN, and monitor them through the user interface.
- Introduction to Hive
- Discover Hive, a data warehouse infrastructure built on top of Hadoop. Learn about its architecture, table metadata, file formats, and the HiveQL query language. Practice creating tables, working with different file formats (CSV, Parquet, ORC), and executing SQL queries with aggregation and joins.
- Introduction to Spark
- Get introduced to Apache Spark, focusing on its DataFrame/SQL API, metadata management, file formats, and data sources. Practice by reading and writing data using JDBC, CSV, and Parquet formats, and explore partitioning, query execution plans, and monitoring tasks through the Spark UI.
- Introduction to Streaming Data Processing
- Learn about real-time data processing using Spark Streaming, Spark Structured Streaming, and Flink. Practice reading, processing, and writing data streams between Kafka, relational databases, and file systems.
- Introduction to HBase
- Conclude with an introduction to HBase, a NoSQL database for Hadoop. Learn its architecture and query language, then practice writing and reading data through the HBase shell.
By the end of this course, participants will:
- Understand the core components of the Hadoop ecosystem and how they interact to manage big data.
- Gain practical experience with HDFS, MapReduce, YARN, Hive, Spark, and HBase.
- Develop the skills necessary to manage and process large-scale datasets using Hadoop and its associated tools.
- Apply concepts learned in real-world scenarios, including data storage, processing, and analysis.
This course offers a balanced mix of theory and practice, with 24 hours of content. You’ll engage in hands-on exercises that complement the theoretical knowledge, ensuring you’re ready to apply Hadoop technologies in practical settings.
Upcoming start dates
Who should attend?
Prerequisites
- Basic Java programming skills. Unix/Linux shell familiarity. Experience with databases is optional.
- Desired requirements:
- NoSQL/RDBMS experience
- BigData understanding
Training content
1. Basic concepts of modern data architecture (1h theory)
2. HDFS: Hadoop Distributed File System (2h theory, 1h practice)
- Architecture, replication, data in/out, HDFS commands
Practice (shell, Hue): connecting to a cluster, working with the file system
3. The MapReduce paradigm and its implementation in Java and Hadoop Streaming (2h theory, 1h practice)
Practice: Launching applications
4. YARN: Distributed application execution management (theory 1h, practice 1h)
- YARN architecture, application launch in YARN
Practice: launching applications and monitoring the cluster through the UI
5. Introduction to Hive (2h theory, 3h practice)
- Architecture, Table metadata, File formats, HiveQL query language
Practice (Hue, hive, beeline, Tez UI): creating tables, reading & writing CSV, Parquet, ORC, partitioning, SQL queries with aggregation and joins
6. Introduction to Spark (theory 2h, practice 3h)
- DataFrame/SQL, metadata, file formats, data sources, RDD
Practice (Zeppelin, Spark UI): reading & writing from the database (JDBC), CSV, Parquet, partitioning, SQL queries with aggregation and joins, query execution plans, monitoring
7. Introduction to streaming data processing (theory 2h, practice 1h)
- Spark Streaming, Spark Structured Streaming, Flink
Practice: Reading/processing/writing streams between Kafka, relational database and file system
8. Introduction to HBase (1h theory, 1h practice)
- Architecture, query language
Practice (HBase shell): writing and reading data
Total: theory 13h (54%), practice 11h (46%)
Certification / Credits
Objectives
Upon completion of the "Hadoop Fundamentals" course, trainees will be able to:
- Effectively navigate and manage Hadoop’s core components, including HDFS, MapReduce, YARN, Hive, and Spark.
- Implement data processing pipelines using MapReduce, HiveQL, and Spark SQL.
- Utilize HDFS and HBase for efficient data storage and retrieval.
- Process real-time data streams with Spark Streaming and Flink.
- Monitor and optimize Hadoop applications through various user interfaces.
Quick stats about Luxoft Training Center?
More than 200 training courses
Conducted over 1,500 training sessions
Customized training solutions for business
Contact this provider
Luxoft Training Center
Luxoft Training Center — an essential part of the global technology leader, Luxoft, a DXC Technology Company. We play a pivotal role in propelling B2B businesses forward by delivering customized training solutions. Emphasizing the significance of learning and employee development,...