Skip to Main Content

Oracle Machine Learning Office Hours

Free tips and training every month! Subscribe for reminders and more from Office Hours. FAQ

Header container

February 18, 2020

17:00 UTC   Start Times Around the World

Subscribe to be notified of changes to sessions and give us feedback!

Having trouble watching the video on this page? Open the video in your browser.


Oracle Machine Learning for Spark
We saw how Oracle Machine Learning for Spark offers interfaces to run Machine Learning algorithms on top of Data Lakes, using Spark to distribute computation across Nodes, and brings integration with the Big Data ecosystem that allows for manipulation tables in HIVE and Impala, as well as integration with HDFS and the Oracle Database, using the R language as front-end.

It makes the open source R scripting language and environment ready for the enterprise and big data. Designed for problems involving both large and small volumes of data, Oracle Machine Learning for Spark integrates R with Data Lakes, allowing users to execute R commands and scripts for data processing, statistical and machine learning analytics on HIVE, IMPALA, Spark DataFrame tables and views using R and Spark SQL syntax. Many familiar R functions are overloaded and translate R functions into SQL for in-Data Lake execution.

Oracle Machine Learning consists of complementary components supporting scalable machine learning algorithms for in-database and big data environments (including Cloud and on-premises), notebook technology, SQL, Python and R APIs, and Hadoop/Spark environments.

The Slides used in the presentation can be found in the Resources section below.

Video highlights:
04:50 Introduction to Oracle Machine Learning for Spark
07:10 Oracle Machine Learning for Spark integration
09:56 OML4Spark R language API
11:40 OML4Spark performance benchmark
13:55 OML4Spark benefits for Spark MLlib on users on R
17:20 Demo - Manipulating HDFS data
22:00 Demo - Manipulating HIVE, IMPALA and Spark DataFrames
36:48 Demo - Using OML4Spark ML models to predict Bike Demand
43:45 Demo - OML4Spark Cross-Validation and Classification Model Selection
47:54 Demo - Benchmark of OML4Spark GLM Logistic on 100mi records
49:26 - OML4Spark Roadmap
51:09 - Q&A

Your Experts

    Marcos Arancibia

    Marcos Arancibia   

    Marcos Arancibia is a Senior Principal Product Manager in the Oracle Autonomous Database team. He is charted with developing a comprehensive platform for enabling all customers and use cases to be successful on Autonomous Database, and working with developers to bring their Applications and workloads as well. Within Product Management he works to develop product strategy, roadmap prioritization, product positioning and product evangelization, working closely with the engineering teams in defining the product roadmap for Autonomous Database. He previously was a PM for Oracle Machine Learning, and before joining Oracle in 2010 he spent 13 years at SAS Institute Inc., from Country Manager in LAD to Regional Data Mining lead in the US. He holds a bachelor's degree with additional courses in the master's degree, both in Statistics from UNICAMP in Brazil. He has Certifications from Stanford on AI and Machine Learning, and from the University of Washington on Computational Neuroscience.
    Mark Hornick

    Mark Hornick   

    Mark Hornick is the Senior Director of Product Management for the Oracle Machine Learning (OML) family of products. He leads the OML PM team and works closely with Product Development on product strategy, positioning, and evangelization. Mark has over 20 years of experience with integrating and leveraging machine learning with Oracle technologies, and working with internal and external customers in applying Oracle’s machine learning technologies for scalable and deployable data science projects. Mark is Oracle’s representative to the R Consortium and an Oracle adviser and founding member of the Analytics and Data Oracle User Community. He holds a bachelor's degree from Rutgers University and a master's degree from Brown University, both in Computer Science.

All Sessions