Skip to Main Content

Oracle Machine Learning Office Hours

Free tips and training every month! Subscribe for reminders and more from Office Hours. FAQ

Header container

February 18

17:00 UTC   Start Times Around the World


Oracle Machine Learning for Spark
We saw how Oracle Machine Learning for Spark offers interfaces to run Machine Learning algorithms on top of Data Lakes, using Spark to distribute computation across Nodes, and brings integration with the Big Data ecosystem that allows for manipulation tables in HIVE and Impala, as well as integration with HDFS and the Oracle Database, using the R language as front-end.

It makes the open source R scripting language and environment ready for the enterprise and big data. Designed for problems involving both large and small volumes of data, Oracle Machine Learning for Spark integrates R with Data Lakes, allowing users to execute R commands and scripts for data processing, statistical and machine learning analytics on HIVE, IMPALA, Spark DataFrame tables and views using R and Spark SQL syntax. Many familiar R functions are overloaded and translate R functions into SQL for in-Data Lake execution.

Oracle Machine Learning consists of complementary components supporting scalable machine learning algorithms for in-database and big data environments (including Cloud and on-premises), notebook technology, SQL, Python and R APIs, and Hadoop/Spark environments.

The Slides used in the presentation can be found in the Resources section below.

Video highlights:
04:50 Introduction to Oracle Machine Learning for Spark
07:10 Oracle Machine Learning for Spark integration
09:56 OML4Spark R language API
11:40 OML4Spark performance benchmark
13:55 OML4Spark benefits for Spark MLlib on users on R
17:20 Demo - Manipulating HDFS data
22:00 Demo - Manipulating HIVE, IMPALA and Spark DataFrames
36:48 Demo - Using OML4Spark ML models to predict Bike Demand
43:45 Demo - OML4Spark Cross-Validation and Classification Model Selection
47:54 Demo - Benchmark of OML4Spark GLM Logistic on 100mi records
49:26 - OML4Spark Roadmap
51:09 - Q&A

Subscribe to be notified of changes to sessions and give us feedback!

Having trouble watching the video on this page? Open the video in your browser.

Your Experts

Marcos Arancibia
Marcos Arancibia, Product Manager, Data Science and Big Data    
Marcos Arancibia is the Product Manager for Oracle Data Science and Big Data. He works with Machine Learning in the Oracle Database and on Big Data clusters under Hadoop and Spark, on premises and in the Oracle Cloud. He works within Product Management to develop product strategy, roadmap prioritization, product positioning and product evangelization, working closely with the engineering team in defining the product roadmaps for Oracle Machine Learning and Big Data in the Cloud. Before joining Oracle 9 years ago he was at SAS Institute Inc. for 13 years as a Data Mining architect and expert in the US and Latin America. He holds a Bachelor Degree of Science in Statistics with additional courses in the Master of Science in Statistics, both from UNICAMP in Brazil. He has Certifications from Stanford on AI and Machine Learning, and from the University of Washington on Computational Neuroscience. He is an expert on Deep Learning and passionate about Machine Learning.
Mark Hornick
Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning    
Mark Hornick is the Senior Director of Product Management for the Oracle Machine Learning (OML) family of products. He leads the OML PM team and works closely with Product Development on product strategy, positioning, and evangelization, Mark has over 20 years of experience with integrating and leveraging machine learning with Oracle technologies, working with internal and external customers in the application of Oracle’s machine learning technologies for scalable and deployable data science projects. Mark is Oracle’s representative on the R Consortium’s Board of Directors, an Oracle Adviser and founding member of the Business Intelligence Warehousing and Analytics (BIWA) User Community, and Content Selection Committee Chair for the Analytics and Data Summits.