Skip to Main Content
  • Questions
  • Unload Javascript web page contents to a text file to load in database table

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Chris.

Asked: June 19, 2017 - 1:08 pm UTC

Last updated: June 30, 2017 - 4:06 am UTC

Version: 11.2.0.4.0

Viewed 1000+ times

You Asked

Hello Experts,

I have a requirement wherein I need to unload/export web page contents whose data is populated by javascript to a text/csv file; which in turn will be be loaded in the database table.

I am not sure if utl_http will get the job done, and I couldn't find any satisfactory answer hence my question to you experts.

Thanks,
Chris





and Connor said...

OK, take a look at these links from related questions

Reading html from a UTL_HTTP and storing it in a blob

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9533637800346631296

Parsing html output into usable data

https://asktom.oracle.com/pls/apex/f?p=100:11:::NO:RP:P11_QUESTION_ID:9532363700346931156

Combining the two should get you on the way. If the web page data is coming back as pure CSV, here's a routine to parse that

https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9535071900346238234


Rating

  (5 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Chris, June 20, 2017 - 10:31 pm UTC

Connor,

I understand you must be super busy especially after the weekend and yet your response was fairly quick, appreciate your timely response.

Before I get started one clarification, would I be needing some sort of javascript virtual machine(like Oracle Nashorn) if I wanted to retrieve html page and have the javascript run to pull the data into some sort of text or csv file onto my end or this is no longer needed as utl_http will do it

Thanks


Connor McDonald
June 21, 2017 - 1:22 am UTC

UTL_HTTP should be sufficient to retrieve the HTML, after which you can parse it / store it using standard PL/SQL.

Of course, that stores it on the database server.


Chris, June 23, 2017 - 11:59 pm UTC

Connor,

Not sure if my earlier post was updated as I don't see it at the bottom. One clarification as the webpage(https) data is populated dynamically through javascript, will I be needing some sort of javascript engine like(Nashorn) to pull the data or the below examples should suffice ?

Thanks,
Chris

Chris, June 24, 2017 - 2:09 am UTC

Connor,

Apologies for not looking before posting, as I see you already replied to my question. So I went ahead and tried as you mentioned, all I am getting is the html page along with the javascript code however not getting the csv data.

Please take a look and let me know what am I missing here

CREATE TABLE http_blob_test (
id NUMBER(10),
url VARCHAR2(255),
data BLOB,
CONSTRAINT http_blob_test_pk PRIMARY KEY (id)
);

CREATE SEQUENCE http_blob_test_seq;

CREATE OR REPLACE PROCEDURE load_binary_from_url (p_url IN VARCHAR2) AS
l_http_request UTL_HTTP.req;
l_http_response UTL_HTTP.resp;
l_blob BLOB;
l_raw RAW(32767);
BEGIN
-- Initialize the BLOB.
DBMS_LOB.createtemporary(l_blob, FALSE);

UTL_HTTP.set_wallet('file:link', 'pwd');

-- Make a HTTP request and get the response.
l_http_request := UTL_HTTP.begin_request(p_url);

l_http_response := UTL_HTTP.get_response(l_http_request);

-- Copy the response into the BLOB.
BEGIN
LOOP
UTL_HTTP.read_raw(l_http_response, l_raw, 32766);
DBMS_LOB.writeappend (l_blob, UTL_RAW.length(l_raw), l_raw);
END LOOP;
EXCEPTION
WHEN UTL_HTTP.end_of_body THEN
UTL_HTTP.end_response(l_http_response);
END;

-- Insert the data into the table.
INSERT INTO http_blob_test (id, url, data)
VALUES (http_blob_test_seq.NEXTVAL, p_url, l_blob);

-- Relase the resources associated with the temporary LOB.
DBMS_LOB.freetemporary(l_blob);
EXCEPTION
WHEN OTHERS THEN
UTL_HTTP.end_response(l_http_response);
DBMS_LOB.freetemporary(l_blob);
RAISE;
END load_binary_from_url;
/

exec load_binary_from_url(' https://marketchameleon.com/Reports/optionVolumeReport' )


I dumped the blob as file(csv) on my directory to verify however its all html and javascript code.

Thanks.


Connor McDonald
June 28, 2017 - 1:47 am UTC

Ah, I see what you mean now - you actually need to run the Javascript in order to get the data you want.

I'll ask around and see what I can find out.

Addenda:

The consensus from asking around inside the org seems to be - There is JavaScript in java in the database but it's a long ways from the js engine and env in a browser, so you're unlikely to have a lot of success.


Chris, June 29, 2017 - 1:03 am UTC

Connor,

Thank you for your efforts. Perhaps Oracle Nashorn could be of some use

http://www.oracle.com/technetwork/articles/java/jf14-nashorn-2126515.html

Cheers!

node.js

Jonathan Taylor, June 29, 2017 - 11:31 am UTC

I have never used this technique myself but perhaps an alternative is to use node.js - which can use an "headless" browser to both get and render the page.

Node.js would run on the server - so you would need to setup oracle to call an external O/S command, passing URL and a filename to deposit the results in.

https://stackoverflow.com/questions/28739098/how-can-i-scrape-pages-with-dynamic-content-using-node-js
http://noodlejs.com/
Connor McDonald
June 30, 2017 - 4:06 am UTC

Nice input.

More to Explore

CLOB/BLOB/etc

Complete documentation on Securefiles and Large Objects here