singapore training prospectus 2017 singapore training prospectus ...

Predictive Analytics and Advanced Machine Learning in R and Azure ... Data science and analytics is revolutionizing business across all industry verticals.
2MB Sizes 1 Downloads 100 Views
DATA SCIENCE AND ANALYTICS COURSES

SINGAPORE TRAINING PROSPECTUS 2017

www.dataseer.com.sg

2

Contents About Become a data ninja Our training clients Our faculty Meet the lead trainer: Isaac Reyes What is data science? The skills of the data scientist

3 4 5 6 7 8

Our Courses Data Storytelling for Business Course Overview Course Outline – Day 1 Course Outline – Day 2

9 10 11

Advanced Visualization and Dashboard Design Course Overview Course Outline – Day 1

12 13

Introduction to R Programming for Business Applications Course Overview Course Outline – Day 1 & Day 2 Course Outline – Day 3

14 15 16

Introduction to Data Science and Machine Learning in R and Azure Course Overview Course Outline – Day 1 Course Outline – Day 2 Course Outline – Day 3

16 17 18 19

Predictive Analytics and Advanced Machine Learning in R and Azure Course Overview Course Outline – Day 1 Course Outline – Day 2 Course Outline – Day 3

20 21 22 23

For more information, visit our website at www.dataseer.com.sg

3

Become a data ninja. Data is useless without the skill to analyse it. Data alone is merely a commodity. It’s data scientists and analysts who breathe life into this data and create value, advantage and impact. And the business world agrees–McKinsey predicts that the United States alone faces a shortage of 140,000-190,000 people with deep analytical skills. We train the region’s analytics talent so that they are prepared to face the challenges and opportunities posed by the new data environment.

Our difference - real business datasets. Computer science and statistics courses from the University sector do not create professionals who are prepared for the rigors of commercial data. Real business data is often large (millions of rows), high dimensional (hundreds of variables), unstructured and high velocity. It is also rarely clean, awash with missing values, data breaks and outliers. All of our courses utilise real commercial datasets that will prepare you for the information you will encounter in your next role as a data scientist or analyst.

You cannot give me too much data. I see big data as storytelling — whether it is through information graphics or other visual aids that explain it in a way that allows others to understand across sectors. I always push for the full scope of the data over averages and aggregations — and I like to go to the raw data because of the possibilities of things you can do with it. Mike Cavaretta Data Scientist and Manager, Ford Motor Company

4

Trusted by industry Data science and analytics is revolutionizing business across all industry verticals. Since 2015, we’ve trained over 100 companies, government departments and NGOs in fundamental data science skills. From banking to telcos and retail to real estate: we’ve trained people in your field.

5

OUR FACULTY Learn from thought leaders on the field DataSeer is an analytics and data science training provider that has been offering innovative public and private training courses since 2015. ISAAC REYES, Data Scientist Isaac is concurrently lead trainer at DataSeer and Head of Data Science at Altis, Australia’s largest information management consultancy. At Altis, Isaac leads a team of data scientists who design analytics and machine learning solutions for enterprise clients throughout AU/NZ. A former university lecturer in statistics at the Australian National University, Isaac is also a TEDx speaker and regular keynote at big data conferences. Isaac holds a Masters Degree in Statistics from the Australian National University and a Bachelors Degree in Actuarial Science from Macquarie University.

JAY MANAHAN, Data Storytelling Expert A data storytelling expert, Jay is concurrently a trainer at DataSeer and Head of Operations at Magpie.IM, an online payments startup. In his prior role, Jay was the Head of the Manila Shared Services Center for Kforce (Nasdaq: KFRC) and a Business Development Director at analytics company, Sencor. Jay holds an MBA and B.S. in Mathematics from Ateneo de Manila University.

ALAN WHITE, Data Management Expert Alan is a leading data management expert with over 15 years’ experience in the data management and analytics industry. A published data management thought leader, he has delivered successful data management solutions to multiple Fortune 500 clients including Pfizer, AIG and ADP. Alan holds a Mini-MBA from the Wharton School of the University of Pennsylvania.

6

Meet our lead trainer: Isaac Reyes “I live and breathe it.” This is how Isaac Reyes describes his decade long relationship with data. And with over 3,000 hours of data science training experience at the world’s leading institutions, the numbers certainly add up. “Teaching is definitely a passion”, says Isaac. “I’ve always kept one foot in the education sector and one foot in the commercial sector. A trainer who is unfamiliar with the commercial application of his methods risks becoming too esoteric in his teaching. On the other hand, a practitioner who doesn’t teach misses out on the peer review process that occurs when presenting to a smart audience.” A thought leader in the field, Isaac was the keynote speaker at last year’s Big Data Analytics Conference 2016, where he spoke about bringing analytics projects from conceptualization through to to productionization. Responsible for the design and delivery of the DataSeer training curriculum, Isaac is

machine learning competitions, including those issued by industrial equipment giant, Caterpillar, and the large European retailer, Rossman.” More recently, Isaac shared his vision for Data Science with perhaps the biggest stage of them all – TEDx. “Speaking about the intersection of data science and world issues at a TED event was something that I’ve always wanted to do. My TED talk focused on how we can use data science to measure how much we really care about the issues that matter.” So what does Isaac have in store for DataSeer training in 2017? “2017 is the year we implement all of the feedback we collected from our course attendees. We plan on creating more realistic workshop problems around commercial datasets that reflect the digital, high throughput and unstructured data

“Our clients will continue to win because they end up with staff capable of playing at the highest levels of the analytics value chain.” a big believer in what he calls ‘data driven education’: “We like to practice what we preach, so DataSeer’s whole training process is heavily data driven. We are big believers in pre and post training assessments that allow us to measure whether our clients are getting the specific training outcomes they need. We also encourage validation of the effectiveness of our courses by measuring the ROI or quality of analytics projects both before and after our training.” Yet another method is used to validate DataSeer training outcomes: global data science competitions such as Kaggle. “I couldn’t be more proud of our graduates who have gone on to succeed in international data science competitions, like Kaggle”, Isaac says. “Back in 2015, our first batch of DataSeer bootcamp graduates ranked in the top 2.6% of Data Scientists worldwide. Since then, DataSeer graduates have continued to post top 5% fnishes in the world’s most competitive

environment of 2017. We are also set to deliver domain specific custom trainings that provide our corporate partners with the exact outcomes they need for their specific industry vertical or departmental needs. Finally, the vision is for our clients and course attendees to keep winning. Our training attendees will continue to win because they build analytical skills that increase their value in the labor market. Our clients will continue to win because they end up with staff capable of playing at the highest levels of the analytics value chain.” Isaac holds a Bachelor’s Degree in Actuarial Science from Macquarie University and a Master’s Degree in Statistics from the Australian National University. He was previously a Data Scientist at Quantium, a Biostatistician at Datapharm and an Actuarial Analyst at PricewaterhouseCoopers.

7

What is data science? Data Science is one of the fastest growing disciplines in the business sector today. New findings from MIT research show that companies with data-driven decision making environments had 4% higher productivity and 6% higher profits than other businesses. In 2008, Dr DJ Patil and Jeff Hammerbacher, heads of analytics and data at LinkedIn and Facebook respectively, coined the term ‘data science’ to describe the emerging field of study that focused on teasing out the hidden value in the data that was being collected from touchpoints all over the retail and business sectors. Data Science is now the umbrella term used for a discipline that spans Programming, Statistics, Data Mining, Artificial Intelligence, Networking, Analytics, Business Intelligence, Visualisation and a host of other subject areas. The science is constantly changing and evolving, as it moves to keep abreast of technology and business practices alike. Data Science has applications not only in business decisions, but also across a wide range of verticals including biostatistics, astronomy and molecular biology. Wherever you find large amounts of information, you’ll find an application for data science.

You Can’t Hide From Data The combination of distributed processing power in the cloud, ultra-fast internet and cheap storage has made one thing clear: data is here to stay. Unprecedented amounts of data are now being collected, saved, and stored safely in the cloud. As exabyte upon exabyte is stored, a new discipline grows to tunnel through the mountain of datasets to find the nuggets of gold: actionable insights that can change the way you do business.

“Without big data, companies are blind and deaf, wandering out onto the web like deer on a freeway.” - Geoffrey Moore

Never before have these three been so closely aligned: Coding to query and manipulate large datasets. Statistics to run robust analyses. Business expertise to know how to ask the right questions and create useable insights. But data science isn’t just a static flowchart - it’s a conglomeration of skills in individuals who can use data to let companies know how to move forward and along which vertices. 1 — Coding Skills Every good data scientist knows that the quality of insights are dependent on the quality of data input. The first task of any analytics project is to extract data, whether that data is stored within an on premise data warehouse or housed alongside terabytes in the cloud. Coding skills in languages like SQL, R, Spark and Python are required to extract, clean and prepare data for analysis. 2 — Math and Statistics While statistics is hardly a new feld, today’s data scientists have experienced a paradigm shift in statistical application. Where once the field of statistics concentrated on achieving valid results with small samples, today, with a torrent of information, modern data scientists face the challenge of separating the signal from the noise. Judicious application of statistical methods, coupled with rigorous mathematical theory allow data scientists to create models that power actionable insights.

SK IL L G

MACHINE LEARNING

CO D IN

The Big Three Skills: Coding, Statistics and Business

CS TI IS E AT G ST ED & L H OW AT N M K

Data Scientist: The Sexiest Job of the 21st Century

S

8

DATA SCIENCE DANGER ZONE!

TRADITIONAL RESEARCH

BUSINESS EXPERTISE

3 — Business: Analysing the Results Data Scientists are a rare class among their technical brethren: they need to have excellent client facing and human interfacing skills to complement their technical skills. Often the point of contact between the C-suite and analytics teams, data scientists must have a firm grasp on core business processes, costs, project management methodologies, production systems and corporate culture. The creation of actionable, positive ROI recommendations, backed by solid analysis and good data is the end game, and is the primary reason the profession has grown to be one of the most desirable skillsets in corporate circles today.

“Consumer data will be the biggest differentiator in the next two to three years. Whoever unlocks the reams of data and uses it strategically will win.” -Angela Ahrendts, Apple

9

COURSE DURATION:

2 DAYS

PREREQUISITES: None.

LAPTOP SPECS: Minimum required specs of Intel i3 processor, 2GB RAM. Either Mac or Windows operating system

REQUIRED SOFTWARE: •

Any data visualization software package (e.g. Excel, Tableau, PowerBI, Qlik, R, Python) and Powerpoint

COURSE ONE

DATA STORYTELLING FOR BUSINESS Data Storytelling is predicted to be the top business skill of the next 5 years. Well told data stories are change drivers within the modern organisation. But how do we find the most important insights in our business data and communicate them in a compelling way? How do we connect the data that we have to the key underlying business issue? This course takes students from the fundamentals (what should we be measuring and why?) through to the elements of good visualisation design (what does a good chart look like?) through to proficiency in data storytelling. By the end of the course, participants will know how to produce engaging, cohesive and memorable data stories using Excel and PowerPoint. The course also teaches attendees the importance of producing statistically robust visualisations and insights.

Suitable For This is our most popular course. It’s suited towards any professional who works with data and charts. If you need to tell better stories with your data, then this course is for you.

DATA STORYTELLING FOR BUSINESS

Course Outline — Day 1 I.

Introductions, Ice Breaker (9:00am – 9:15am)

II.

Overview of the Four Keys to Data Storytelling (9:15am – 9:30am) • Knowing your audience • Preparing your data • Choosing the right visual and designing it well • Telling the story

III.

Preparing your data: Exploratory Data Analysis in the Business Setting (9:30am – 10:15am) • Step 1 - Know the story behind your data • Step 2 - Variable classifcation • Step 3 - Handle missingness • Step 4 - Sanity check • Step 5 - Univariate EDA • Step 6 - Bivariate EDA

IV.

Q&A / Break (10:15am – 10:30am)

V.

Tables Versus Charts Versus Single Metrics - What to Use and When? (10:30am – 11:15am) • Choosing between tables, charts and single headline metrics - guidelines • Visualisation is the fastest bandwidth channel for transferring high dimensional information into the human brain • Visualisation separates data structure from data noise • Visualisation uncovers hidden patterns • Visualisation grabs attention • Visualisation uncovers cause and effect relationships • When to not use graphs - Recognizing situations where a table is most appropriate • When to not use graphs - Recognizing situations where a single headline metric is appropriate

VI.

Q&A / Break (11:15am – 11:30am)

VII.

The Visualisation Arsenal (11:15am – 12:00pm) • The Histogram - The most underutilized visualization in business • The Bar Chart - The king of flexibility, guidelines on vertical and horizontal variations • The Case for and Against Stacked Bar Charts • The Pie Chart - Theory and controversy, smack down with bar charts • The Scatter Plot - Theory and guidelines for large datasets • The Line Chart - Theory, comparison with clustered bar charts, discussion on dual axis line charts • Bubble, Waterfall and Area Charts - Quick opinions

VIII.

LUNCH (12:00pm – 1:00pm)

IX.

Recent Developments in Data Visualization Media (1:15pm – 1:45pm) • Virtual Reality Data Visualization Demo • Interactivity and animation, d3.js • Macros for more efficient and consistent designs • Histograms in Excel 2016 - An Applied Walkthrough

IX.

Workshop: Team Activity (1:45pm – 4:15pm)

X.

Group Work Submission Deadline (4:15pm)

XI.

Group Presentations, Feedback and Day 1 Wrap Up (4:15pm – 5:00pm)

10

DATA STORYTELLING FOR BUSINESS

Course Outline — Day 2 I.

Ice Breaker Exercise – Let’s Tell Stories as a Group (9:00am – 9:15am)

II.

The Elements of Data Visualisation Design (9:15am – 10:00am) • Above all else, show the data • Tufte’s war on chart-junk • Tufte’s data-ink ratio • Using color to focus attention • Dimension, perspective and 3D • The Gestalt principles of visual perception • Proximity • Similarity • Closure • Continuity • Connectedness • Enclosure

III.

Q&A / Break (10:00am – 10:15am)

IV.

The Elements of Data Storytelling (10:15am – 11:00am) • Knowing your audience • Designing your visuals and narrative around ‘The Big Takeaway’ • Delivering insights • Creating memorable soundbites • Structuring your data story - What is an appropriate story flow? • From reporting to strategy - Is your data story actionable?

V.

Q&A / Break (11:00am – 11:15am)

VI.

Examples of good data stories (11:15am – 12:00pm) • ‘The Apathy Gap’ – Real life replay of Isaac’s TEDx talk • ‘200 Countries, 200 Years, 4 Minutes’ – Hans Rosling’s Animated Take on Global Health • Examples of data stories from the top management consulting firms

VII.

LUNCH (12:00pm – 1:00pm)

VIII.

The Statistics Behind Good Data Storytelling (1:00pm – 1:30pm) • Sample size and inference - Why it’s important • Correlation and causation - Applied examples

X.

Workshop: Team Activity and Presentation (1:30pm – 4:15pm)

XI.

Group Work Submission Deadline (4:15pm)

XII.

Group Feedback, Course Wrap Up, Awarding of Certificates of Completion (4:15pm – 5:00pm)

Dataset This course utilises a 50,000 row, 70 variable Customer Relationship Management (CRM) dataset as a learning tool. Data Fields The dataset includes over 25 customer behavior variables including information about customer spend, customer complaints, customer retention and purchase frequency. The dataset also features over 20 customer demographic variables including age, occupation and marital status. Data Format The data is provided to participants in unstructured .dat format. Participants are taught how to import the dataset into Excel and convert the .dat file into an .xlsx file.

11

12

COURSE DURATION:

1 DAY

PREREQUISITES: None.

LAPTOP SPECS: • •

Intel i3 processor, 2GB RAM. Either Mac or Windows operating system

REQUIRED SOFTWARE: •

Any data visualization software package (e.g. Excel, Tableau, PowerBI, Qlik, R, Python) and Powerpoint

COURSE TWO

ADVANCED VISUALIZATION AND DASHBOARD DESIGN Take your visualization and dashboard skills to the next level. Advanced Visualization and Dashboard Design is aimed at the professional who already possesses fundamental data visualization and data storytelling skills. A natural continuation point from Data Storytelling for Business, this course provides participants with the skills needed to produce stunning, understandable business dashboards and graphs. Taught using a variety of visualization tools, the course covers the keys to designing for interactivity and drill down effects. The course also covers less commonly used but valuable visualization methods, including methods for visualizing networks and flows. Dashboard design is covered in detail, with participants creating a dashboard ‘makeover’ during the class practical workshop.

Suitable For This course is suited to any professional who wants to improve their data visualization and dashboard skills

ADVANCED VISUALIZATION AND DASHBOARD DESIGN

Course Outline — Day 1 I.

Introductions, Ice Breaker (9:00am – 9:15am)

II.

Advanced Visualization Design (9:15am – 10:00am) • Interactivity - Overview first, zoom and filter, details on demand” - Shneiderman (1996) • Taxonomy of interactive dynamics for visual analysis in Heer & Shneiderman (2012) • Guidelines for Annotation layers: rollovers, highlights, auto-summaries • Tools for adding interactivity and annotations - from Excel to d3.js

III.

Q&A / Break (10:00am – 10:15am)

IV.

Extremely Useful Charts That You Won’t Find in Excel (10:15am – 11:00am) • Tree maps • Mosaic plots • Trellis displays • Chord diagrams • Sankey diagrams

IV.

Q&A / Break (11:00am – 11:15am)

V.

Good Dashboard Design (11:15am – 12:00pm) • The unique challenges and opportunities posed by the dashboard layout • Dashboard variations • To label or not to label • Common dashboard features such as the ‘speedometer’ • The characteristics of well designed dashboards

VIII.

LUNCH (12:00pm – 1:00pm)

IX.

Walkthrough - Let’s Give a Poor Dashboard a Makeover (1:00pm – 1:30pm)

IX.

Workshop: Team Activity - Let’s Create Good Dashboards Together (1:30pm – 4:15pm)

XI.

Workshop Feedback, Presentation from Winning Model, Awarding of Certificates and Course Wrap Up (4:15pm – 5:00pm)

13

14

COURSE DURATION:

3 DAYS

PREREQUISITES: -

None.

LAPTOP SPECS: • • •

Intel i3 processor, 4GB RAM Windows operating system Unrestricted PC that has install permissions

REQUIRED SOFTWARE: • • •

Base R or Microsoft R Open RStudio Microsoft account (for Jupyter via Azure ML Studio or Azure

COURSE THREE

INTRODUCTION TO R PROGRAMMING FOR BUSINESS APPLICATIONS

R is the world’s leading data science and statistics programming language. In this introduction to R, you will master the basics of this beautiful open source language, including factors, lists and data frames. After completing the course, you will be ready to undertake your very own end-to-end data analysis projects using the worlds most sophisticated data analysis tool. R itself is completely free and can be used to extend the capabilities of data warehousing software such as SQL Server 2016 and Microsoft Azure ML Studio! Working on business datasets in class, you will leverage the power of R to inform business decision making and analyses. Join millions of R users world wide in a user community that is growing by 40% every year! Suitable For This course is suited for quants and IT professionals who want a crash course in an end-to-end data science workflow that is completely implemented in R. It is also suitable for professionals who seek to understand the ecosystem and community behind R and make it a powerful and cost-effective application for their enterprise.

INTRODUCTION TO R PROGRAMMING FOR BUSINESS APPLICATIONS

Course Outline — Day 1 I.

Introductions, Ice Breaker (9:00am – 9:15am)

II.

R overview: open-source statistical programming language (9:15am -- 9:45am)

III.

R core: R Development Core Team and enhanced distros (9:45am -- 10:15am)



Q&A/Break (10:15am -- 10:30am)

IV.

Extending R: user-contributed packages and repositories (10:30am -- 11:15am)

V.

R communities: journal, online fora, and blogs (11:15am -- 12:00pm)



Lunch (12:00pm -- 1:00pm)

VI.

Using stock R: shell and RGUI (1:00pm -- 1:15pm)

VII.

R notebooks: Jupyter (1:15pm -- 1:45pm)

VIII.

De facto R IDE: RStudio (1:45pm -- 2:30pm)



Q&A/Break (2:30pm -- 2:45pm)

IX.

R integration: R in-database and R in the cloud (2:45pm -- 3:30pm)

X.

R connections: samples of R APIs and bindings with other languages (3:30pm -- 4:15pm)

XI.

Day 1 wrap-up (4:15pm -- 5:00pm)

Course Outline — Day 2 I.

From spreadsheets to prompts: intro to interactive programming in R (9:00am -- 9:45am)

II.

R data objects: modes, classes, and coercion (9:45pm -- 10:30am)



Q&A/Break (10:30am -- 10:45am)

III.

Workshop 1 (10:45pm -- 11:30am)



Lunch (11:30am -- 12:30pm)

IV.

Special values in R: missing values, nulls, infinite values, and NaNs (12:30pm -- 1:00pm)

V.

Functions: class-specific behaviour and user-defined functions (1:00pm -- 1:15pm)



Lunch (12:00pm -- 1:00pm)

VI.

R packages: Installing packages and exposing libraries (1:15pm -- 2:00pm)



Q&A/Break (2:00pm -- 2:15pm)

VII.

Loops and conditionals: basic programming in R (2:15pm -- 3:00pm)

VIII.

Vectorisation: *apply() and do.call() (3:00pm -- 3:30pm)

IX.

Workshop 2 (3:30pm -- 4:30pm)

X.

Workshop feedback & Day 2 wrap-up (4:30pm -- 5:00pm)

15

INTRODUCTION TO R PROGRAMMING FOR BUSINESS APPLICATIONS

Course Outline — Day 3 I.

Chaining R commands: magrittr package (9:00am -- 9:30am)

II.

Workshop 1 (9:30am -- 10:00am)



Q&A/Break (10:00am -- 10:15am)

III.

Reading and writing data: readxl and readr packages (10:15am -- 10:45am)

IV.

Data wrangling: dplyr and reshape2 packages (10:45am -- 11:30am)



Lunch (11:30am -- 12:30pm)

V.

Workshop 2 (12:30pm -- 1:15pm)

VI.

Introduction to modelling in R: OLS regression (1:15pm -- 2:00pm)

VII.

Workshop 3 (2:00pm -- 2:45pm)



Q&A/Break (2:45pm -- 3:00pm)

VIII.

Introduction to visualization in R: ggplot2 package (3:00pm -- 3:45pm)

IX.

Workshop 4 (3:45pm -- 4:30pm)

X.

Workshop feedback & course wrap-up (4:30pm -- 5:00pm)

16

17

COURSE DURATION:

3 DAYS

PREREQUISITES: It is recommended that participants have completed an introductory R programming course or MOOC and at least one introductory statistics unit at the university level

LAPTOP SPECS: • • •

Intel i3 processor, 4GB RAM Windows operating system Unrestricted PC that has install permissions

REQUIRED SOFTWARE: • • •

Excel 2010, 2013 or 2016 R or RStudio latest version A free trial or paid subscription to Microsoft Azure ML Studio

COURSE FOUR

INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE

Learn the fundamentals of data science and analytics, from problem formulation through to model building and interpretation of results. Introduction to Data Science and Machine Learning in R and Azure is aimed at the professional who wants an understanding of data science fundamentals with a strong focus on business applications. By the end of the course, participants will be capable of building, tuning and deploying regression and classification models for a variety of business problems. Participants will also gain an understanding of unsupervised learning techniques and big data architecture. Taught using a variety of open source and cloud technologies, the course teaches techniques for handling, manipulating and analyzing high volume (millions of rows), high dimension (thousands of variables) business data. Real world projects from the DataSeer analytics consulting team are extensively used to illustrate how each models is used in the real world. Suitable For This course is suitable for any person who wants to acquire fundamental data science skills.

INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE

Course Outline — Day 1 I.

Introductions, Ice Breaker (9:00am – 9:15am)

II.

Introduction to Data Science, Big Data and Analytics (9:15am – 10:00am) • What is data science? What does a data scientist do? • What is analytics? What is predictive analytics? • The analytics value chain - Myth and reality • Mapping business problems to data science problems • The 5 V’s of big data • The current state and future of machine learning and AI

III.

Q&A / Break (10:00am – 10:15am)

IV.

The Data Science Process (10:15am – 11:00am) • Ask an interesting question • Get the data • Explore the data • Model the data (including comparison of regression and classification problems) • Communicate and visualize the results • Walkthrough of process using a real world DataSeer data science consulting project • Technology overview - from Azure to Hadoop to RStudio

V.

Q&A / Break (11:00am – 11:15am)

VI.

Linear Regression I - ‘Breaking Open the Blackbox’ (11:15am – 12:00nn) • Recognizing regression model applications in business • Why is it called ‘ordinary least squares’ (OLS) regression? • Comparison of OLS with Least Absolute Deviations (LAD) regression • A ‘simple’ linear regression model calculated from first principles using Solver • A ‘multiple’ linear regression model calculated from first principles using Solver • Caution on extrapolating outside the range of provided data • Using a linear regression model to make predictions

VII.

LUNCH (12:00nn – 1:00pm)

VI.

Workshop: Team Activity – Fitting a Regression Model to Business Data (1:00pm – 2:15pm)

VII.

Machine Learning Fundamentals and Linear Regression II (2:15pm – 3:00pm) • What is a machine learning model? • What is a test and training dataset? • Polynomial regression • Variable transformations • Feature selection and feature engineering • Overfitting and underfitting

VIII.

Q&A / Break (3:00pm – 3:15pm)

Workshop: Apply Your Regression Skills in a Kaggle Style Competition (3:15pm – 5:00pm) • In this workshop, small groups use their new regression skills to build models on real business data with the aim of achieving the lowest possible root mean square error on a hold out test dataset X. Workshop Feedback, Presentation from Winning Model and Day 1 Wrap Up (5:00pm – 5:15pm) IX.

18

INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE

Course Outline — Day 2 I.

Introduction to Azure ML Studio (9:00am – 9:30am) • Azure ML Studio - Overview, capabilities, limitations • Business considerations - data center locations, data confidentiality, cost • Connecting ML Studio to a data source • Exploring and pre-processing data in ML Studio • Running experiments and setting up workflows • Supervised and unsupervised learning in ML Studio • Deploying and productionizing an ML Studio model as a service

II.

Workshop: Team Activity – Fitting Yesterday’s Regression Model in Azure ML Studio and Comparing Results (9:30am – 10:30am)

III.

Q&A / Break (10:30am – 10:45am)

IV.

Overview of Other Regression Models and Fitting in ML Studio (11:15am – 12:00nn) • • • • •

V. VI.

Decision trees Random forest Neural networks Parametric and non-parametric models Comparing models in Azure ML Studio

LUNCH (12:00nn – 1:00pm) Logistic Regression I - ‘Breaking Open the Blackbox’ (1:00pm – 1:45pm) • Recognizing classification model applications in businessTrue positives, false positives, true negatives and false negatives • Walkthrough of a classification model case study from business • What does ‘maximum likelihood’ mean? • A logistic regression model calculated from first principles using Solver • Using a logistic regression model to make probability predictions • Converting probability predictions into labelled predictions

VII.

Q&A / Break (1:45pm – 2:00pm) Comparing Classification Models and Logistic Regression II (2:00pm – 2:30pm) • Assessing model accuracy • True positives, false positives, true negatives and false negatives • The Receiver Operator Characteristic (ROC) curve and Area Under the Curve (AUC) • A brief introduction to other classification models - Decision trees, support vector machines, gradient boosting and neural nets

VIII.

IX.

Workshop: Apply Your Classification Skills in a Kaggle Style Competition (2:30pm – 5:00pm) •

X.

In this workshop, small groups use their new classification skills to build models on real business data in Azure ML Studio with the aim of achieving the lowest possible classification error on a hold out test dataset

Workshop Feedback, Presentation From Winning Model and Day 2 Wrap Up (5:00pm – 5:15pm)

19

INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE 20

Course Outline — Day 3 I.

Decision Trees (9:00am – 9:45am) • Introduction to decision trees • Advantages and disadvantages compared to statistical models • Advantages and disadvantages compared to statistical models • Decision tree algorithms • Prediction using decision trees • Example using business data

II.

Q&A / Break (10:15 – 10:30am)

III.

Workshop: Team Activity – Decision Trees (10:00am – 11:15am)

IV.

Unsupervised Learning and PCA (11:15am – 12:00nn) • Unsupervised learning models • Introduction to Principal Components Analysis • Visualizing PCA results • Interpreting PCA results

V.

LUNCH (12:00nn – 1:00pm)

VI.

Workshop: Team Activity – PCA with a Customer Demographics Dataset (1:00pm – 2:45pm)

VII.

Fundamentals of Big Data Engineering (2:45pm – 3:30pm) • Introduction to distributed computing • MapReduce • Hadoop and HDFS • Hive • Mahout • Spark • Event ingestion and stream processing • How will things change as IoT ramps up?

VIII.

Q&A / Break (3:30pm – 3:45pm)

IX.

Workshop: Big Data Engineering (3:45pm – 4:45pm)

X.

Workshop Feedback, Awarding of Certificates and Course Wrap Up (4:45pm – 5:15pm)

“The topics were handled very well. Statistical theories were explained in a way that it can be grasped by a participant with a non-statistical background.” Celina, Indra Philippines Inc.

21

COURSE DURATION:

3 DAYS

PREREQUISITES: It is recommended that participants have completed an introductory R programming course or MOOC and at least one 2nd year statistics unit at the university level.

LAPTOP SPECS: • • •

Intel i3 processor, 4GB RAM. Windows operating system Unrestricted PC that has install permissions

REQUIRED SOFTWARE: • • •

Excel 2010, 2013 or 2016 R or RStudio latest version A free trial or paid subscription to Microsoft Azure ML Studio

COURSE FIVE

PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R AND AZURE

Use R and Azure ML Studio to build and tune advanced machine learning models. Predictive analytics and machine learning techniques are revolutionizing business and government. Predictive Analytics and Machine Learning in R & Azure is aimed at the person who wants to have a better understanding of the mechanics behind the models and how these models are realistically applied in the business setting. In addition to covering advanced machine learning techniques in depth, the course covers the management of stakeholder expectations during predictive analytics projects and analytics project management. Advanced machine learning methods are discussed in depth, including those used to win global data science competitions. Suitable For

• This course is suited to any professional who already understands

analytics and machine learning basics and is ready to progress to higher levels of sophistication. It is also suitable to any professional who is interested in who predictive analytics projects are conceptualized, scoped and project managed.

PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R & AZURE

Course Outline — Day 1 I.

Introductions, Ice Breaker (9:00am – 9:30am)

II.

Dimensionality, Parsimony, Testing Accuracy (9:15am – 10:00am) • The curse of dimensionality • The principle of parsimony • Testing model accuracy • John Elder’s Target Shuffling • Lift charts • Bootstrap sampling

III.

Q&A / Break (10:15am – 10:30am)

IV.

Shrinkage - More Than What Happens in the Pool (10:15am – 11:00am) • How shrinkage methods depart from traditional statistical methods • Ridge regression • The LASSO method • How does the LASSO method help perform variable selection? • Sparsity

V.

Q&A / Break (11:00am – 11:15am)

VI.

Workshop: Team Activity - Let’s compare LASSO and ridge regression (11:15am – 12:00nn)

VI.

LUNCH (11:30am – 12:30pm)

VI.

Workshop: Team Activity (cont.) - Let’s compare LASSO and ridge regression (1:00pm – 1:30pm)

VII.

Cross Validation, Bagging and Ensembling (1:30pm – 2:15pm) Bootstrap aggregation K-fold cross validation Model ensembling Choosing weights for ensemble models

• • • •

VIII.

Q&A / Break (2:15pm – 2:30pm)

IX.

Workshop: Let’s bag, ensemble and cross validate! (2:30pm – 4:45pm)

X.

Workshop Feedback, Presentation from Winning Model and Day 1 Wrap Up (4:45pm – 5:00pm)

22

PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R & AZURE

Course Outline — Day 2 I.

Artifical Neural Networks (9:00am – 10:00am) • A gentle introduction to ANNs using colors • What is deep learning? • What is forward and back propagation? • How many hidden layers should we use? • ANN and linear regression smack down in Azure ML Studio

II.

Q&A / Break (10:00am – 10:15am)

III.

Workshop: Team Activity - Let’s build and tune neural nets (10:15am – 12:00nn)

IV.

LUNCH (12:00nn – 1:00pm)

V.

Predictive Analytics in Practice - Managing Analytics Projects and Teams (1:00pm – 1:45pm) • Where should the analytics team be situated in the corporate structure? Research findings. • Managing stakeholder expectations in analytics projects • The importance of having analytics champions • Project management for analytics projects - how does it differ from regular IT projects?

VI.

Q&A / Break (1:45pm – 2:00pm)

VI.

Support Vector Machines (2:00pm – 3:00pm) • The maximal margin classifier • The support vector classifier • Kernels and SVMs • Performance comparison to other classification methods

VII.

Q&A / Break (3:00pm – 3:15pm)

VIII.

Workshop: Let’s build and tune SVMs! (3:15pm – 4:45pm)

IX.

Workshop Feedback, Presentation from Winning Model and Day 2 Wrap Up (4:45pm – 5:00pm)

Dataset This course utilises the following datasets as learning tools: • A 50,000 row, 70 variable Customer Relationship Management (CRM) dataset as a learning tool. • A 750,000 row, 30 variable digital marketing dataset from the insurance sector • A 227,000 row, 21 variable airlines dataset Data Fields The dataset includes over 25 customer behavior variables including information about customer spend, customer complaints, customer retention and purchase frequency. The dataset also features over 20 customer demographic variables including age, occupation and marital status. The digital marketing dataset includes information about customer demographics, product category purchased and the digital marketing channel the customer engaged with at each respective online touchpoint. The airlines dataset includes information on domestic US flights that departed Houston in 2011. The fields include departure time, arrival time, flight number and destination location (alongside 17 other fields). Data Format The data is provided to participants in unstructured .dat format. Participants are taught how to import the dataset into Excel and convert the .dat file into an .xlsx file. 23

PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R & AZURE

Course Outline — Day 3 I.

Market Basket Analysis and Affinity Analysis (9:00am – 9:45am) • What is association rule mining? • What is the business case for market basket analysis? • Support, lift and confidence • Visualizing market basket results

II.

Q&A / Break (9:45am – 10:00am)

III.

Workshop: Let’s use arules to perform MBA on supermarket data (10:00am – 11:30am)

IV.

Introduction to Kaggle Competitions (11:30am – 12:15pm) Kaggle overview Kaggle competition strategies Private and public LB Team merging

• • • •

V.

LUNCH (12:15pm – 1:15pm)

VI.

Workshop: Team Activity - Let’s Kaggle! (1:15pm – 4:45pm)



VII.

During this capstone team activity, course participants will enrol in a live Kaggle competition. With the aim of achieving a top 50% leaderboard ranking by the end of the day, the full data science process will be implemented. Toward the end of the task, a strategy for continued learning and success in the competition will be discussed.

Workshop Feedback, Awarding of Certificates and Course Wrap-up (4:45pm – 5:00pm)

“I like that I have a better handle on the backend workings of regression, instead of just automatically generating it using a tool.” JP, ABS-CBN Corporation

24

25

How can we help? Contact us today for enrollment and inquiries.

DataSeer

www.dataseer.com [email protected] 111 North Bridge Rd #08-18 Peninsula Plaza Singapore 179098 PH: +632 908 2565 or +632 908 2566 (Business Hours) +639176773825 (After Hours) SG: +65 3152 6845