Get in Touch

Course Outline

Introduction to Data Analysis and Big Data

  • What Defines Big Data as 'Big'?
    • Velocity, Volume, Variety, and Veracity (VVVV)
  • Limits of Traditional Data Processing
  • Distributed Processing
  • Statistical Analysis
  • Types of Machine Learning Analysis
  • Data Visualization

Big Data Roles and Responsibilities

  • Administrators
  • Developers
  • Data Analysts

Programming Languages for Data Analysis

  • R Language
    • Why use R for Data Analysis?
    • Data manipulation, calculation, and graphical display
  • Python
    • Why use Python for Data Analysis?
    • Manipulating, processing, cleaning, and crunching data

Approaches to Data Analysis

  • Statistical Analysis
    • Time Series analysis
    • Forecasting using Correlation and Regression models
    • Inferential Statistics (estimating)
    • Descriptive Statistics in Big Data sets (e.g., calculating mean)
  • Machine Learning
    • Supervised vs. unsupervised learning
    • Classification and clustering
    • Evaluating the cost of specific methods
    • Filtering
  • Natural Language Processing
    • Processing text
    • Understanding text meaning
    • Automatic text generation
    • Sentiment analysis and topic analysis
  • Computer Vision
    • Acquiring, processing, analyzing, and interpreting images
    • Reconstructing, interpreting, and understanding 3D scenes
    • Leveraging image data for decision-making

Big Data Infrastructure

  • Data Storage
    • Relational databases (SQL)
      • MySQL
      • Postgres
      • Oracle
    • NoSQL databases
      • Cassandra
      • MongoDB
      • Neo4j
    • Understanding Key Differences
      • Hierarchical databases
      • Object-oriented databases
      • Document-oriented databases
      • Graph-oriented databases
      • Other types
  • Distributed Processing
    • Hadoop
      • HDFS as a distributed filesystem
      • MapReduce for distributed processing
    • Spark
      • An all-in-one, in-memory cluster computing framework for large-scale data processing
      • Structured streaming
      • Spark SQL
      • Machine Learning libraries: MLlib
      • Graph processing with GraphX
  • Scalability
    • Public cloud
      • AWS, Google, Aliyun, etc.
    • Private cloud
      • OpenStack, Cloud Foundry, etc.
    • Auto-scalability

Choosing the Right Solution for the Problem

The Future of Big Data

Summary and Next Steps

Requirements

  • A foundational understanding of mathematics
  • A foundational understanding of programming
  • A foundational understanding of databases

Target Audience

  • Developers / programmers
  • IT consultants
 35 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories