Skip to Main Content

Oracle Machine Learning Office Hours

Free tips and training every month! Subscribe for reminders and more from Office Hours. FAQ

Header container

February 18

17:00 UTC   Start Times Around the World


Oracle Machine Learning for Spark
We saw how Oracle Machine Learning for Spark offers interfaces to run Machine Learning algorithms on top of Data Lakes, using Spark to distribute computation across Nodes, and brings integration with the Big Data ecosystem that allows for manipulation tables in HIVE and Impala, as well as integration with HDFS and the Oracle Database, using the R language as front-end.

It makes the open source R scripting language and environment ready for the enterprise and big data. Designed for problems involving both large and small volumes of data, Oracle Machine Learning for Spark integrates R with Data Lakes, allowing users to execute R commands and scripts for data processing, statistical and machine learning analytics on HIVE, IMPALA, Spark DataFrame tables and views using R and Spark SQL syntax. Many familiar R functions are overloaded and translate R functions into SQL for in-Data Lake execution.

Oracle Machine Learning consists of complementary components supporting scalable machine learning algorithms for in-database and big data environments (including Cloud and on-premises), notebook technology, SQL, Python and R APIs, and Hadoop/Spark environments.

The Slides used in the presentation can be found in the Resources section below.

Video highlights:
04:50 Introduction to Oracle Machine Learning for Spark
07:10 Oracle Machine Learning for Spark integration
09:56 OML4Spark R language API
11:40 OML4Spark performance benchmark
13:55 OML4Spark benefits for Spark MLlib on users on R
17:20 Demo - Manipulating HDFS data
22:00 Demo - Manipulating HIVE, IMPALA and Spark DataFrames
36:48 Demo - Using OML4Spark ML models to predict Bike Demand
43:45 Demo - OML4Spark Cross-Validation and Classification Model Selection
47:54 Demo - Benchmark of OML4Spark GLM Logistic on 100mi records
49:26 - OML4Spark Roadmap
51:09 - Q&A

Subscribe to be notified of changes to sessions and give us feedback!

Having trouble watching the video on this page? Open the video in your browser.

Your Experts

Marcos Arancibia
Marcos Arancibia, Product Manager, Data Science and Big Data    
Marcos Arancibia is the Product Manager for Oracle Data Science and Big Data. He works with Machine Learning in the Oracle Database and on Big Data clusters under Hadoop and Spark, on premises and in the Oracle Cloud. He works within Product Management to develop product strategy, roadmap prioritization, product positioning and product evangelization, working closely with the engineering team in defining the product roadmaps for Oracle Machine Learning and Big Data in the Cloud. Before joining Oracle 9 years ago he was at SAS Institute Inc. for 13 years as a Data Mining architect and expert in the US and Latin America. He holds a Bachelor Degree of Science in Statistics with additional courses in the Master of Science in Statistics, both from UNICAMP in Brazil. He has Certifications from Stanford on AI and Machine Learning, and from the University of Washington on Computational Neuroscience. He is an expert on Deep Learning and passionate about Machine Learning.
Mark Hornick
Mark Hornick, Senior Director, Product Management, Data Science and Machine Learning    
Mark Hornick is the Senior Director of Product Management for the Oracle Machine Learning (OML) family of products. He leads the OML PM team and works closely with Product Development on product strategy, positioning, and evangelization, Mark has over 20 years of experience with integrating and leveraging machine learning with Oracle technologies, working with internal and external customers in the application of Oracle’s machine learning technologies for scalable and deployable data science projects. Mark is Oracle’s representative on the R Consortium’s Board of Directors, an Oracle Adviser and founding member of the Business Intelligence Warehousing and Analytics (BIWA) User Community, and Content Selection Committee Chair for the Analytics and Data Summits.

All Sessions

September 8 2021 16:00:00 UTCOracle Machine Learning Office Hours
August 11 2021 16:00:00 UTCOracle Machine Learning Office Hours
June 29 2021Weekly Office Hours: OML on Autonomous Database - Ask & Learn
June 22 2021ML Concepts - Encoding of Categorical Attributes: OneHot vs Mean vs WoE and when to use them
June 15 2021OML usage highlight: Machine Learning Recommendations for Maintenance and Repair
May 25 2021Hands-On Lab using Oracle Machine Learning AutoML UI on Autonomous Database
May 18 2021Hands-On Lab using Oracle Machine Learning Services on Autonomous Database
May 11 2021OML usage highlight: Oracle Process Automation with Real-time OML Services scoring
April 20 2021OML usage highlight: Oracle Stream Analytics with Real-time OML Services scoring
April 13 2021OML usage highlight: Making Oracle Digital Assistant smarter with OML Services
March 30 2021OML feature highlight: OML AutoML UI for Automated Model Building
March 23 2021Weekly Office Hours: OML on Autonomous Database - Ask & Learn
March 11 2021OML feature highlight: OML Services on Autonomous for Model Deployment
March 2 2021Weekly Office Hours: OML on Autonomous Database - Ask & Learn
February 23 2021Hands-On Lab using Oracle Machine Learning for Python on Autonomous Database
February 18 2021Machine Learning 102 - Feature Extraction
December 17 2020Machine Learning 101: Feature Extraction
November 5 2020Machine Learning 102: Clustering
October 28 2020Oracle Machine Learning for R: An Introduction
September 29 2020Machine Learning 101: Clustering