Course description
We’ll look at the Spark framework for automated distributed code execution, and companion projects in the Map-Reduce paradigm. We’ll work with RDD, DataFrame, DataSet and describe logic with Spark SQL and DSL. As well, we’ll talk about loading data from/to external storages such as Cassandra, Kafka, Postgres, and S3. We will also work with HDFS and data formats.
Upcoming start dates
Who should attend?
Prerequisites
Basic Java, Python, Scala programming skills. Unix/Linux shell familiarity. Experience with databases is optional.
Training content
- Spark concepts and architecture
- Programming with RDDs: transformations and actions
- Using key/value pairs
- Loading and storing data
- Accumulators and broadcast variables
- Spark SQL, DataFrames, Datasets
- Spark Streaming
- Machine Learning using MLLib and Spark ML
- Graph analysis using GraphX
Certification / Credits
Objectives
During the training participants will:
- Write a Spark pipeline via functional Python and RDDs;
- Write a Spark pipeline via Python, Spark DSL, Spark SQL and DataFrame;
- Draw architecture with different sources;
- Write a Spark pipeline with external systems (Kafka, Cassandra, Postgres) which works in parallel modes;
- Resolve problems with slow joins.
After the training, participants will be able to build a simple PySpark application and execute it on the cluster in parallel mode.
Quick stats about Luxoft Training Center?
More than 200 training courses
Conducted over 1,500 training sessions
Customized training solutions for business
Contact this provider
Luxoft Training Center
Luxoft Training Center — an essential part of the global technology leader, Luxoft, a DXC Technology Company. We play a pivotal role in propelling B2B businesses forward by delivering customized training solutions. Emphasizing the significance of learning and employee development,...