Name: Developing With Spark for Big Data | Enterprise-Grade Spark Programming for the Hadoop & Big Data Ecosystem
Brand: Trivera Technologies LLC
SKU: 1408055

Course description

Developing With Spark for Big Data | Enterprise-Grade Spark Programming for the Hadoop & Big Data Ecosystem

Apache Spark, a significant component in the Hadoop Ecosystem, is a cluster computing engine used in Big Data. Building on top of the Hadoop YARN and HDFS ecosystem, it offers order-of-magnitude faster processing for many in-memory computing tasks compared to Map/Reduce. It can be programmed in Java, Scala, Python, and R - the favorite languages of Data Scientists - along with SQL-based front ends. With advanced libraries like Mahout and MLib for Machine Learning, GraphX or Neo4J for rich data graph processing as well as access to other NOSQL data stores, Rule engines and other Enterprise components, Spark is a lynchpin in modern Big Data and Data Science computing.

Geared for experienced developers, Developing with Spark for Big Data is an intermediate-level and beyond course that provides students with a comprehensive, hands-on exploration of enterprise-grade Spark programming, interacting with the significant components mentioned above to craft complete data science solutions. Students will leave this course armed with the skills they require to work with Spark in a practical, real world environment to an advanced level.

NOTE: Students newer to data science or with lighter development background should consider the TTSK7503 Spark Developer | Introduction to Spark for Big Data, Hadoop & Machine Learning, our three-day subset of this course, as an alternative.

This course is offered in support of the Java programming language, with alternatives available in R Programming, Python and Scala. Our team will work with you to coordinate the languages, tools and environment that will work best for your organization and needs.

Course Topics: This is a high-level list of the course topics covered in this training. Please see the detailed Course Agenda with session details, lessons and labs listed below:

Spark Overview
Spark Component Overview
RDDs: Resilient Distributed Datasets
DataFrames
Spark Applications

DataFrame Persistence
Distributed Persistence
Spark Streaming
Accessing NOSQL Data
Enterprise Integration

Algorithms and Patterns
Spark SQL
GraphX
Alternate Languages (R, Pythion, Scala, Web Notebooks)
Clustering Spark for Developers

Performance and Tuning

Learning Objectives

This course provides indoctrination in the practical use of the umbrella of technologies that are on the leading edge of data science development focused on Spark and related tools. Working in a hands-on learning environment, students will learn:

The essentials of Spark architecture and applications
How to execute Spark Programs
How to create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
How to persist and restore data frames

Essential NOSQL access
How to integrate machine learning into Spark applications
How to use Spark Streaming and Kafka to create streaming applications

Do you work at this company and want to update this page?

Is there out-of-date information about your company or courses published here? Fill out this form to get in touch with us.

Who should attend?

This in an intermediate-level course is geared for experienced developers seeking to be proficient in Spark tools & technologies. Attendees should be experienced developers who are comfortable with Java, Scala or Python programming. Students should also be able to navigate Linux command line, and who have basic knowledge of Linux editors (such as VI / nano) for editing code.

Take Before: Students should have attended the course(s) below, or should have basic skills in these areas:

TT2104Java Programming Fundamentals (for Java supported course flavor)
TTPS4800 Introduction to Python Programming (for Python supported course flavor)
TTSQLB3Introduction to SQL (Basic familiarity is needed for all editions)

Training content

Spark Overview

Hadoop Ecosystem
Hadoop YARN vs. Mesos
Spark vs. Map/Reduce
Spark with Map/Reduce: Lambda Architecture

Spark in the Enterprise Data Science Architecture

Spark Component Overview

Spark Shell
RDDs: Resilient Distributed Datasets

Data Frames
Spark 2 Unified DataFrames
Spark Sessions
Functional Programming
Spark SQL

MLib
Structured Streaming
Spark R
Spark and Python

RDDs: Resilient Distributed Datasets

Coding with RDDs
Transformations
Actions
Lazy Evaluation and Optimization

RDDs in Map/Reduce

DataFrames

RDDs vs. DataFrames
Unified Dataframes (UDF) in Spark 2.0

Partitioning

Spark Applications

Spark Sessions
Running Applications

Logging

DataFrame Persistence

RDD Persistence
DataFrame and Unified DataFrame Persistence

Distributed Persistence

Spark Streaming

Streaming Overview

Streams
Structured Streaming
DStreams and Apache Kafka

Accessing NOSQL Data

Ingesting data
Parquet Files
Relational Databases
Graph Databases (Neo4J, GraphX)
Interacting with Hive

Accessing Cassandra Data
Document Databases (MongoDB, CouchDB)

Enterprise Integration

Map/Reduce and Lambda Integration

Camel Integration
Drools and Spark

Algorithms and Patterns

MLib and Mahout

Classification
Clustering
Decision Trees
Decompositions
Pipelines

Spark Packages

Spark SQL

Spark SQL
SQL and DataFrames

Spark SQL and Hive
Spark SQL and JDBC

GraphX

Graph APIs

GraphX
ETL in GraphX
Exploratory Analysis
Graph computation
Pregel API Overview

GraphX Algorithms
Neo4J as an alternative

Alternate Languages

Using Web Notebooks (Zeppelin, Jupyter)

R on Spark
Python on Spark
Scala on Spark

Clustering Spark for Developers

Parallelizing Spark Applications
Clustering concerns for Developers

Performance and Tuning

Monitoring Spark Performance

Tuning Memory
Tuning CPU
Tuning Data Locality

Troubleshooting

Course delivery details

Student Materials: Each participant will receive a Student Guide with course notes, code samples, software tutorials, step-by-step written lab instructions, diagrams and related reference materials and resource links. Students will also receive the project files (or code, if applicable) and solutions required for the hands-on work.

Hands-On Setup Made Simple! Our dedicated tech team will work with you to ensure our ‘easy-access’ cloud-based course environment is accessible, fully-tested and verified as ready to go well in advance of the course start date, ensuring a smooth start to class and effective learning experience for all participants. Please inquire for details and options.

Costs

Price: $2,695.00
Discounted Price: $1,751.75

Quick stats about Trivera Technologies LLC?

Over 25 years of technology training expertise.

Robust portfolio of over 1,000 leading edge technology courses.

Guaranteed to run courses and flexible learning options.

Contact this provider

Contact course provider

Before we redirect you to this supplier's website, do you mind filling out this form so that we can stay in touch? You can unsubscribe at any time.
If you want us to recommend other suitable courses, please fill out all fields below and check the box beside "Please recommend similar options"

Country *

Please recommend similar options

I accept the: Terms and Conditions & Privacy Policy

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trivera Technologies LLC

7862 West Irlo Bronson Highway

STE 626

Kissimmee FL 34747

844.475.4559

Trivera Technologies

Trivera Technologies is a IT education services & courseware firm that offers a range of wide professional technical education services including: end to end IT training development and delivery, skills-based mentoring programs,new hire training and re-skilling services, courseware licensing and...