Cloudera University’s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:
The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis
The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
How Pig, Hive, and Impala improve productivity for typical analysis tasks
Joining diverse datasets to gain valuable business insight
Distributed Data Processing: YARN, MapReduce, and Spark
Data Processing and Analysis: Pig, Hive, and Impala
Data Integration: Sqoop
Other Hadoop Data Tools
Exercise Scenarios Explanation
Introduction to Pig
What Is Pig?
Pig Use Cases
Interacting with Pig
Basic Data Analysis with Pig
Pig Latin Syntax
Simple Data Types
Viewing the Schema
Filtering and Sorting Data
Processing Complex Data with Pig
Complex/Nested Data Types
Built-In Functions for Complex Data
Iterating Grouped Data
Multi-Dataset Operations with Pig
Techniques for Combining Data Sets
Joining Data Sets in Pig
Splitting Data Sets
Pig Troubleshooting and Optimization
Using Hadoop’s Web UI
Data Sampling and Debugging
Understanding the Execution Plan
Tips for Improving the Performance of Your Pig Jobs
Introduction to Hive and Impala
What Is Hive?
What Is Impala?
Schema and Data Storage
Comparing Hive to Traditional Databases
Hive Use Cases
Querying with Hive and Impala
Databases and Tables
Basic Hive and Impala Query Language Syntax
Differences Between Hive and Impala Query Syntax
Using Hue to Execute Queries
Using the Impala Shell
Creating Databases and Tables
Altering Databases and Tables
Simplifying Queries with Views
Storing Query Results
Data Storage and Performance
Choosing a File Format
Controlling Access to Data
Relational Data Analysis with Hive and Impala
Common Built-In Functions
Aggregation and Windowing
Working with Impala
How Impala Executes Queries
Extending Impala with User-Defined Functions
Improving Impala Performance
Analyzing Text and Complex Data with Hive
Complex Values in Hive
Using Regular Expressions in Hive
Sentiment Analysis and N-Grams
Understanding Query Performance
Controlling Job Execution Plan
Data Transformation with Custom Scripts
Choosing the Best Tool for the Job
Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
Which to Choose?
This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Knowledge of SQL is assumed, as is basic Linux command-line familiarity. Knowledge of at least one scripting language (e.g., Bash scripting, Perl, Python, Ruby) would be helpful but is not essential. Prior knowledge of Apache Hadoop is not required.
College Credit, CEUs, PDUs and CDUs When you take courses with the Babbage Simmel, be sure you get the credit you deserve. Curriculum offered by Babbage Simmel can earn you college credit, CEUs, PDUs or CDUs.
College Credit Select curriculum offered by Babbage Simmel is part of the accredited University of Findlay's undergraduate course catalogs. For questions please E-Mail: firstname.lastname@example.org or call 614-481-4345.
Continuing Education Units (CEUs) Continuing Education Units (CEUs) are nationally recognized standard units of measurement earned for satisfactory completion of qualified programs of continuing education. If you need more information about CEUs, please E-Mail: email@example.com or call 614-481-4345.
Professional Development Units (PDUs) Professional Development Units (PDUs) can be issued by PMI® for formal learning activities related to project management. Project Management Professionals (PMPs®) are required to earn a minimum of 60 PDUs every 3 years to maintain certification. For more information about this program go to the PMI® web site or call 1-855 746 4849.
Continuing Development Units (CDUs) CDUs may be earned by attending professional development (e.g. courses, seminars) offered by organizations endorsed by IIBA® and designated as an EEP vendor. As an IIBA Endorsed Education Provider (EEP) Babbage Simmel's IIBA® endorsed courses qualify for CDU credit. For more information about CDUs go the IIBA® web site or call 1-647-426-3735.
Our babsimLIVE distance learning brings the classroom learning experience to you by seating you virtually into a real-life instructor-led classroom taught by award winning world-class instructors with other IT professionals like yourself. From the comfort of your home, workplace, or at the Babbage Simmel Columbus Campus, you acquire the training you need, when you want it, in the environment that is most comfortable for you to be successful.