Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Steve.

Asked: July 10, 2008 - 4:41 pm UTC

Last updated: March 29, 2016 - 5:49 am UTC

Version: Oracle 9.2

Viewed 50K+ times! This question is

You Asked

Hi Tom, if possible could you please explain the advantages and dis-advantages of storing files within an Oracle database. We are currently using 9.2 and 10g. We have no immediate plans to move to 11g. Thanks, we all appreciate your answers!

and Tom said...

I know of no advantages to storing data I want to keep for a long time outside of a database.

If it is in the database I can

be sure it is professionally managed

backed up

recoverable (with the rest of the data)

secured

scalable (try putting 100,000 documents in a single directory, now, put them in table - which one 'scales' - it is not the directory)

I can undelete (flashback) easily

I have locking

I have read consistency


I honestly cannot think of an advantage to storing anything of importance in the filesystem.

Rating

  (49 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

What about the disk storage amount?

Amin Adatia, July 10, 2008 - 10:37 pm UTC

I am trying to determine the actual space taken up by having the files in each of the environments. If anyone else has that information I would be grateful. At my project, the System Administrators always give the low performing disks for file storage. So there is an immediate hit on performance if you do anything other than read the file one time and extract the "text" and save it in a CLOB -- which defeats the whole purpose of not putting the files in the database table.
Tom Kyte
July 11, 2008 - 8:12 am UTC

if you store them in the database, they are not files anymore - they are data.

So the comparison is meaningless - they become important data in the database, in the file system - they are just boring files.

I'd be amazed though if you had disks of such differing performance that you actually could tell they were on the "low performing" disks unless you were a really high end sort of system with thousands of transactions/second or minute against these files. And if you were - that is all the more reason to put them in the database so they are backed up, recoverable, secured, audited, etc etc etc (eg: they must be important)

depends entirely on your situation

Ben, July 11, 2008 - 12:14 pm UTC

I find that a mix of file system & database storage works best.

For retaining images (high end photos, x-rays, etc), MS Office documents for general use, and other very large files I recommend file system. Some imaging systems can add multiple GB of storage daily. Do you really want your db archives to grow like this? There are also frequently issues with getting 3rd party software to render documents properly. We use a server based control for rendering tiff & jpg through a webpage and it can not use a db based repository. Not to mention all of the archiving/retention products out there that work with file level storage (except for Oracle's products, of course...)

For central file storage, when the files themselves will be relativly small but you will have a lot of them, database works well. It does scale well, until the users start trying to upload 300MB pdf files & then the archive dest fills up every couple of hours...
Tom Kyte
July 11, 2008 - 1:32 pm UTC

yes, you do really want your database archives to grow like this.

Do I want you to keep MY medical stuff in a directory, unsecured, unmanaged, not backed up like the database is - we have no idea if recovery is possible, who accessed it last? Who has access to it.

The document is stored on a server, the document gets to a client for display. Documents in the database can actually appear as a network share on a client - just like a file, looks like a file, behaves like a file - but we have everything - everything the database offers as far as data features.


The only use I have for files is to give them to the database to store important stuff.

data on network shares

Donat, July 14, 2008 - 5:27 am UTC

> Documents in the database can actually appear
> as a network share on a client - just like a file,
> looks like a file, behaves like a file - but we
> have everything - everything the database offers
> as far as data features.

Tom, could you please elaborate on how this can be achieved?

Thanking you in anticipation and best regards,
Donat Callens

To Donat - Re access files in DB via network share

Kim Berg Hansen, July 15, 2008 - 5:09 am UTC

Regarding accessing files in DB via network share.

One way is using WEBDAV. For example we have a system where we upload files to the XML repository in Oracle - here we can add XML metadata to the files (tags and other info so we can sort of "attach" the files to records in our relational data.) The XML repository can then be published via WEBDAV and the files accessed via HTTP, FTP or network shares. Or you can install the ODrive client, which gives you a more "advanced" network share for collaborating on the files.

That way (using the XML repository in Oracle) works fine - to integrate it into our application we have had to code a bit ourselves. Another more "full-fledged" solution would have been to get Oracle Content Services, which (as far as I know) provides most of your needs for files in the DB "out of the box." :-)

file system

sam, July 15, 2008 - 5:09 pm UTC

Tom:

1. Do you agree that most IT managers or architects prefer to store files on the filesystem. I guess becuase they are afraid of the DB or hey treat it as black box or data dump.

2. If i have my files in the DB, can users see them like on a windows share drive (drive/directory tree) or you have to create an application that displays that for them.

3. What did you mean by scalable? try puttin 100,000 documents in one directory versus one table.

4. Is not there one disadvantage of uploading the files from CD or disk to the ORacle which can take considerabel time if you compare it to a copy to a unix filesystem or windows.

thanks,
Tom Kyte
July 15, 2008 - 8:15 pm UTC

1) they are therefore not architects if they think of the database as a black box.

Those that value their data, the security of their data, the availability of their data store their data in a database.

2) see above, we just talked about that.

3) go ahead, put 100,000 documents into a directory and then start to try to open the last one. See how long it takes. Now triple that, and again and again. File System Directories are not made, not designed to hold thousands of files, they do not scale. You would have to find some magic way to distribute them logically over many hundreds/thousands of directories yourself and pray that you can find them later when you need to.

4) No, there isn't. You still have to UPLOAD the file regardless of where you store it, we can write lobs into the buffer cache if you want.

Another disadvantage of the file system - you would upload the file to the appilcation server - which people tend to run N of instead of just one. Therefore, you have to cross mount all of the file systems to every application server - yet another single point of failure, contention, whatever.

filesystem

A reader, July 15, 2008 - 9:04 pm UTC

Tom:

I did not understand your last comment. Can you explain what you mean. what do they run N. do you mean when 10 people wants to download a file from the web server file system.


Tom Kyte
July 16, 2008 - 10:00 am UTC

... the file to the appilcation server - which people tend to run N of instead of just one. ....


run N application servers.


people tend to run more than one application server.

Oracle File system outperforms windows file system

Balaji Chellappa, July 15, 2008 - 10:05 pm UTC

The entire argument below is based on keeping/maintaining the files in Oracle 11g XDB.

I was under the impression, keeping the file system inside oracle is no way better than keeping it in the Windows share. But it is not true. It is just mind set and it takes some time to come out of it. Just think about it. If you access a file through windows it checks your permission on the same and then make it available to you. In Oracle file is stored as the same chunks but the difference is in the tablespace datafiles. When you try to access file through Oracle it does the same thing. It just goes to the physical location where the file is stored and gives it to you. I always tell myself (may or may not be true, but makes me feel more comfortable with Oracle file system) Oracle handles the files in a special way. It doesn't cache the big files in the memory like regular tables and then made it available to the User. Rather as soon as it realizes User requested for the file it just goes and read it from the location where it was stored physically in the tablespace and gives it to the user. Obviously it will do smart things like caching the physical location, security point view whether the current user has access to it etc.

Actually I am working on a project migrating our sales force application running on Oracle 10g R2/Apache web server/Apex 2.0 environment to Oracle 11g/XDB-Webdav/Apex 3.1 environment. Our application is distributed across 350 laptops out in the field and synchronizing with homeoffice database every day. We are successfully running it for the past 3 years. We had numerous issues while copying resource files (like images, css, pdf etc) to local machine from a centralized windows share. Especially if the User is connected to home office through a telephone line VPN connection most of the time the file copy process was failed due to instable connection and no way we will know whether the process succeeded or not. Also windows does numerous security checks before it allows the User to access it and was slow. Hence we decided to give it a shot on Oracle 11g-XDB-webdav as a replacement for Apache, keep everything in the database and get away with windows file share/copy.

Our original file copy process (We have around 1100 files, most of them are Apex images of size 1-50 kbs, some pdfs, xls etc total size 100 MB) from windows share to local machine took about 3 minutes. That too laptops connected through local network(LAN).

Then I actually stored those files as a BLOB in a Physical table in a centralized database and used database links to read the BLOB and copy the same to Oracle webdav. Believe it or not it took less than a minute to copy everything. It is not just that, it is more reliable and I am 200% sure the process went through fine if my commit succeeds.

Also you can make the Oracle file system look like a regular windows explorer file system using webdav (Copy entire folder between webdav and windows/ paste / delete will work like regular windows explorer copy paste). Moreover FTP can also be used to access the Oracle file system.

Till now I talked about the advantage. The only disadvantage that I am seeing right now is file edit. I don¿t have any tools that allow me work directly (open/edit/save) on the webdav files. Might be some tools available in the web. Yet to google it for the same.

PS 1: I had lot issues while copying the BLOB to webdav because of new security features (ACL) introduced in Oracle 11g. It took good amount of time to understand and make it work. I did everything by myself and none of the forums talks about. I will share the same in detail when I get some time.

PS 2:We are considering further process improvement by compressing the files using utl_compress.lz_compress while storing it in the centralized database and uncompressing the same in the local notebook after downloading it from the centralized DB Server.

filesystem

A reader, July 16, 2008 - 5:15 pm UTC

Tom:

1. How does mounting a filesystem to N application servers will be not good compared to storing files in oracle. Would they mount one filesystem on app server to all other app servers?

2. I like your ideas about storing files in DB. In order for me to sell that I need to get more educated on the benefits which you list breifly. However, is there a book or article that goes into more depth on the benefits.

3. I have a system now where a vendor uploads a bunch of electronic files for a book (3 giga bytes) to a remote server A. Then a bunch of perl prgoram run some tests and if they pass it moves the files across network from Server A to server B for production control. there they do some QA checks and then move files to server C for archival and server D for website.

Each book is about 10 files (mp3 and xml) but there will be thousands of books on the filesystem in directories and subdirecotries.

a} How would you design this in oracle? the way i am thinking i can have one database or three databases (instead of servers) on one machine and let the vendor upload files to one table. Then I can flag the book in that table or copy the record to another table for production control, etc.

So instead of having several servers/filesystems, i have one database/tables. Staff can easily query info or access files on any book and run them.

b} Would I run into size problem here if the table became 100 tera bytes or whatever. would it create a 100% dependency on full time DBA (one of the issues)?

c) how can some external progam like perl run tests on files stored inside oracle?

A reader, July 17, 2008 - 2:00 am UTC

If it is not recommanded now to use CLOB and BLOB to store data in a talbe and only use Oracle XML DB Repository Data for better performance.

If we only need to upload only the scan data and dont need to update this then we use CLOB and BLOB.
Tom Kyte
July 17, 2008 - 11:45 am UTC

where did you see that recommendation.

as with anything, the only reasonable answer is "it depends, what do you want to do, what features/functions do you need - then we can tell you the best way to do it"

Differences in XDB's handling of LOB and XML data

Max, July 18, 2008 - 8:27 am UTC

As far as I know one can't have LOB data "sent" to XDB *AND* get it stored in *user-defined* tables by XDB (without any additional coding) -- as opposed to what can be achieved with XDB's processing of XML data (by registering XML schemas to define mappings of XML elements to user-defined tables).

Or is there any to "move" LOB data to LOB columns of user-defined tables instead of XDB tables like that ...?

LOBs and performance

Stew Ashton, July 29, 2008 - 1:25 pm UTC


Tom, I accept (and repeat in my company) all your arguments for storing files in the database; however, developers and software companies generally object that File System access is much faster than access to LOBs in the database.

Oracle has submitted proof that this is true for versions prior to 11G:
http://www.oracle.com/technology/products/database/securefiles/pdf/securefiles_oow2007.pdf
On slide 10 they show that using SecureFiles, LOB access is about as fast as File System access, while on slide 11 they show that it is about 3 times faster than pre-11G LOB access. It is easy to deduce from this that pre-11G LOB access is about 3 times slower than File System access.

I gather from this that for 11G storing files in the database has huge advantages and no drawbacks, whereas prior to 11G it has great advantages and one drawback.

By the way, I wonder how many files per directory they had in their File System test ??
Tom Kyte
August 01, 2008 - 9:54 am UTC

and i wonder if they enabled any sort of caching for the lobs myself. OS file system writes/reads - buffered. LOB writes and reads, by default, not buffered.

buffer it and you'd see a huge difference

buffered?

ap, August 09, 2008 - 11:57 pm UTC

Tom

Could you elaborate on your last comment?

"And i wonder if they enabled any sort of caching for the lobs myself. OS file system writes/reads - buffered. LOB writes and reads, by default, not buffered. Buffer it and you'd see a huge difference"

How would the results from the Oracle performance paper change?
Tom Kyte
August 12, 2008 - 4:38 am UTC

if you write a lob, right now - by default, that is not done into the buffer cache, it is done as an IO to disk that must complete before the application gets control back.

if you go to read that same lob, right now - by default, that is a physical IO to disk - no buffer cache.

The file system typically buffers
Lobs by default - do not


ap

ap, August 12, 2008 - 11:20 am UTC

Tom

If I understand you correctly, are you saying that the actual performance of SecureFiles compared to a filesystem is even better than what is mentioned in the Oracle paper, given that their results were based on a buffered filesystem versus unbuffered Oracle?

Also, given that they used a SAN in their test, I would assume it was setup with a write-back cache (which is the default). In this case, both types of tests would be sort of buffered, right?

filesystem

A reader, August 14, 2008 - 11:16 pm UTC

Tom:

IT seems there are two camps on how to store files

http://en.wikibooks.org/wiki/Programming:WebObjects/Web_Applications/Development/Database_vs_Filesystem

The biggest thing about using filesystems is:

a. faster performance
b. cheaper disk space (than hiring dba)

Are you saying i can store 100,000 files in one table or one million and my performance will never change?

What about if you build this nested directory structure and put those 100,000 files. would that keep filesystem yield good performance?
Tom Kyte
August 20, 2008 - 8:25 am UTC

Think about this...

cheaper is better right? is it? what happens when you have the data not being managed and something bad happens. How do you secure, manage, backup, recover, protect this data in your file systems?


I'm saying you can store billions of documents in a single table and I would expect the access time to be approximately the same to get to a single document as I would if there were one row in that table.

With a table, you have the ability to scale up like that.

Now with the file system you have to write code, lots and lots of code to spread the files out - try to remember where you put them - adding directories left and right as the documents arrive. And you have no point in time recovery, you have no rollback, you have no commit, you have no real ability to secure the files, you have no audit trail really to tell you what has been done to them, etc etc etc.


Databases were 'born' to protect data - that is what they do, they are called database Management systems - they were born to manage data.


If you value these documents, if you want them around for a long long time, you'll protect them.

no such thing as a perfect solution

Ben, August 25, 2008 - 4:38 pm UTC

everyone will have an opinion on this & on Oracle forums I would expect them to lean one way & other vendor (non-DB) I'd expect them to lean the other way. Storing files in the data does have all of the advantages of storing any other data in a database.

That said, I can sum up why not to in two words : other vendors. Archiving systems, image manipulation, etc, etc, etc. are all are written to 1st work with a file system & maybe 2nd to work with a DB.

Yes, webdav is good, yes it can give a user the experience of a file system. User experience is _not_ system experience & sometimes it just won't work & the only way to find out for certain is to try it.

If you can make your complete system work with storing the files in the DB, you're better off in the long run. But be warned that it could influence future software choices.

"If it is in the database I can"
be sure it is professionally managed" : ??? are you slightly systems admins saying DBAs are better?
"backed up" : been doing this with file systems for decades before databases existed. Pretty sure it can be done still
"recoverable (with the rest of the data)" : see above
"secured" : again, see above
"scalable" : you need a caveat. I frequently here this from DB people. I must chime in to say this is in fact absolutly FALSE. It is not the file system that gets slower, but rather the interface that bogs down. For users this is a file list from a shell, command prompt, Windows Explorer, or whatever interface they are using. Modern file systems operate at the same speed if 10 files, 100,000 files or a million. It's all about how you interact with it.
"undelete (flasback) easily)" : 1) not all systems can allow delete to exist. 2) again, its' all about how you interact with it, but generally yes, the db recovers faster.
"locking" :
"read consistency" :

We store millions of images in the files system, with a max of 5,000 per directory and store file path refrences in our database. The only reason we keep it to 5,000 per folder is so that with the average size of image we can archive the directory to an optical archive system. We use a 3rd party rendering server to offer up any file format to a web client, but it must use a file system. We have successfully used this setup with over 10,000 files in a folder (average size 200KB) and actually get FASTER times that the 3KB text files we store in our database (10gR1). The DB response isn't slow, it's just the file system is faster. If we were to put 100,000 files in the same directory, the response time wouldn't change for the system (from the user's perspective). Due to regulations, no file ever imaged is allowed to be deleted. This is easily enforced with file permissions, confirmed with 3rd party change managment software, logged & audited by other 3rd party managment software. Yes, I know you can log & audit et. all in a db, but the key here is the 3rd party change management system is enterprise wide, monitoring shares, ldap/ActiveDirecoty servers, system configs, yada, yada. If it was stored in the DB, that would be the limit of the auditing. Here we can store the file once, use it in many systems, & audit it with the same tool we use to audit non-database information.



Tom Kyte
August 26, 2008 - 9:06 pm UTC

... are you slightly systems admins
saying DBAs are better?
...

yes, 100%. DBA's manage data, they care about data, they understand the need to perform a consistent point in time recovery, they know about auditing, version control and many other things that frankly - SA's don't think about. This is about data, not about getting eth0 plumbed correctly.

... been doing this with file systems for decades before databases
existed. Pretty sure it can be done still
....

see above, there is something more than just backing up in data management - especially when it comes to "files". Non-repudiation, auditing, securing, version control, yadda, yadda yadda - you know, the features you "buy" in those version control things that are really trying to be - get this - a database on top of the file system! Yes, that is what they actually do, they are * a database*....



... It is not the file system
that gets slower, but rather the interface that bogs down....

hhhmmmmmmm

interesting

so you have put 1,000,000 files in a directory?

... with a max of 5,000 per
directory and store file path refrences in our database. ...

oh, no, you don't. You know it is a "bad idea"



It is true, YOU can write a lot of software to do a lot of things.

Or, you can use what you paid for already.

You are correct, and it is entirely up to you. Me, I prefer to .... not write code, as often as possible. I am a bad coder (we all are), I prefer to use what I can that has come before and is used heavily by others.

database

A reader, August 26, 2008 - 10:30 pm UTC

Tom:

if you call yourself a "bad" coder i do not know who is good in this world.

Is there a way when you use a system to tell whether a file is stored in the DB or filesystem.

I noticed that my bank stores all the previous monthly statements (PDF) in the web app. I am just wondering if it is in DB.

Do you know if facebook or youtube stores all those files in the DB.

I think majority of applications out there store files on filesystem due to lack of API and slowiness in LOB access.

However, i think the trend is starting to reverse and many enterprises are storing unstructured data into DB. But this all depends on what database they use. I do not think if you use Sybase, informix, my sql, DB2, sql server you would want to store files in the DB.

I think Oracle is "years" ahead of all those players in this area.
Tom Kyte
August 27, 2008 - 8:22 am UTC

we are all bad coders, we get lucky sometimes.

... Is there a way when you use a system to tell whether a file is stored in the DB
or filesystem.
...

depends on how it was setup, you can make it so that you cannot tell whether the file is in a file system or the database. I can mount my "file share" as a network disk, ftp in and out of it, or use a web interface to down/upload a file - among many other access methods. And all of the files are in.... the database.


... I noticed that my bank stores all the previous monthly statements (PDF) in the
web app. I am just wondering if it is in DB.
...

probably not - and that is probably OK. Many times the pdf's are generated on the fly (you know, there are no pages stored for asktom, the page you see is generated every time you access it). And even if not, they are generated and can be easily regenerated from the source data that IS in the database - then they expire in 12-18 months.


try this

Ben, August 27, 2008 - 10:08 am UTC

Tom,
Databases have their place in the enterprise, but they are not the enterprise as you seem to suggest.

Scaling : File system vs Database
Please try this experiment and determine for yourself if a file system does not scale
Create a script/application that creates 100,000 files named fileXXX.txt in a single folder (any OS)
Store all of these files also in a database, by your prefered method.
do a direcotory list via filesystem
do a full table scan in the database to get a list of all of the filenames
How do the times relate? I'll bet the DB is faster, but this will come down to setup.
Now : open a file from the database & open a file from the file system. Does the one from the file system open any slower than when you are storing only one file?

now create a database, turn on archiving & flashback (because we're good little DBA's) and load 16TB of high resolution medical images into the database. How much storage do you need? How long does it take? Back up the database any way you like. Put it in the file system on a SAN LUN & use a mirror tech to copy. Which backup is faster? Thought so.

You can purchase a lot more software already battle tested than you can write & it all works with files stored in a file system. Also, you can scale up file system capacity far easier than you can a DB (NAS, SAN, Nfs, etc)

FYI - System admins are not the high school MS certified lackies sent to worry about "eth0". That's helpdesk. A System Admin has in depth knowlege of all applications that run on a system & this MUST include any Oracle application (RDBMS, OAS, anything) on the host. A DBA is just a limited subset of a System Admin.

I am a bad coder (we all are) : no, we are not all bad coders. When you go to a hospital & they hook up all kinds of diagnostic equipment, do you think the software on that equipment has tons of bugs? There is a BIG difference between a Software Engineer & people who code.
Tom Kyte
August 28, 2008 - 9:01 am UTC

... but they are not the enterprise
as you seem to suggest.
...

interesting, I agree they are not "*THE*" enterprise, just one of the biggest components of it. Erase my applications and I'm slowed down. Erase my data and I'm *out of business*.


so, why did you limit your implementation to 5,000. Why do that?


Your backups are not even remotely equivalent to our backups and you know what? I'd be using partitioning and disk based backup and backing up a fraction of the database - so, mine would be much faster - most of the image data would over time become read only, we'd ignore it entirely. And with block change tracking and the ability to catch our backups up - using incrementals only - I'd always have a full database backup ready to recover with and one that only took a short period of time to create.


we are all bad coders, some of us have to pass more hurdles than others - and yes that diagnostic stuff has lots of hurdles. I'll bet your imaging storage and retrieval system did not (there are many cases of machines over dosing patients and doing 'bad' things because of software glitches. Heck there is even a virus on the space station right now - that software was written by some engineers, not just coders)


I prefer, whenever possible, to not reinvent the wheel - if the product I already purchase has the feature set I need, I'll use it. Backup, recovery, flashing back, partitioning, scaling to billions of rows in a single table (directory), etc... I'm there.

filesystem

A reader, August 27, 2008 - 12:13 pm UTC

Ben:

Most system admins i have seen do not know much about the applications running on the machine.
All they know is Linux, Unix, or windows stuff. If there are some, they are very few.

Regarding loading 16 TB into DB. What difference would it make from a storage prespective.
You still need 16 TB on your filesystem to load it.

Concerning backup i am not expert. But, do not you need to backup the 16TB from filesystem.
Is it much faster to back up 16T bytes from filesystem than backing up the DB?

I think the question is "which database" you are testing with? Are you using the 11g LOB or new SECureFiles data type.

"Software Engineering"

S, August 27, 2008 - 4:33 pm UTC

code

rc, August 27, 2008 - 5:18 pm UTC

>>I am a bad coder (we all are), I prefer to use what I can that has come before and is used heavily by others.

That is why I use Windows instead of Linux. Windows is more heavily used.
Tom Kyte
August 29, 2008 - 1:16 pm UTC

ahh, but windows as a server platform is not necessarily more heavily used.

And the comparison would be between:

a) an OS you write
b) versus and off the shelf OS

I know which one I would almost certainly choose, and it would not be a) in the year 2008.

Oleksandr Alesinskyy, August 28, 2008 - 7:27 am UTC

Really, for most cases storage of the files in the databases is not very wise solution. Declared advantages are mostly imaginable. Backup? There are tons of products that perform backup of the filesystem (and often database backups arewritten to external devices by means of those filesystem backup products).

Flashback? Version control systems are viable (and for many cases preferrable) alternative.

Presentation of datain the database as files to client? Yes, attempts have been made, but not very succesful one. What is with OFS now? Is it dead, yeh?

Performance? For most cases write/update performance would be order of magnitude higher for FS then for DB. And BTW most OS supports several FS, so optimal FS may selected for a task. And migration from one FS to another one is much simplier then one DB to another.

And so on, and so on. DB is optimal to store and process data with a structure known to DB, when you can leverage the full power of SQL. But for "black-box" data ... things are different.
Tom Kyte
August 29, 2008 - 4:14 pm UTC

"mostly imaginable" huh. interesting.

The problem isn't "could it be done in a file system", sure with lots of work - it absolutely could be.

Backing up a database - large databases - well know, done all of the time, bread and butter for most DBA's, it is what they do. Data management - it is what they do.

Enter the file system and - well, it is really not being managed by the same mind set, the same group of people.

Flashback is for short term "whoops" operations - version control systems do not necessarily lend themselves for that - version control would still be built into the file itself. OFS exists, superceded, but exists - same underpinning - different upper layer infrastructure - to say "it is gone or not successful" would be somewhat misleading.


I prefer the security, the backup, the recoverability, the concurrency, the openess of the database - the database makes it look like a file system - if you start getting into version control systems - you are not talking a filesystem anymore.


for ANY data that is relevant to you, the database is the right place to store, organize, index, retrieve, search, version, backup, recover - whatever.

filesystem

A reader, August 28, 2008 - 8:25 am UTC

Are not XML documents text files? You can use xquery just like sql with xml documents in DB. Can you do that if xml file is on filesystem.

what about security? can you secure your filesystem like a DB.

You are also counting that your file pointers in the DB will always be in sync with the filesystem because if they are not you will have some issues that wont occur if the file is in a ROW/TABLE.
Tom Kyte
August 29, 2008 - 10:15 pm UTC

besides - people do seem to enjoy setting up many application servers - now you have to create a distributed file system - available to all servers and mediate shared access and ....

RE

Ben, August 28, 2008 - 8:26 am UTC

A Reader :
This topic is a little off the initial subject, so I'd understand it being bumped.

The fastest backup you can get will be a SAN to SAN FC mirror copy, which duplicates the entire disk while it is live. The key is recovery.

If it is hosting database files, you now have a copy of one or more datafiles, but not neccessary in the form of a transportable tablespace unless you've done some other work. Which process do you use to recover?
There are pleny of excellent forums/Metalink articles that go over what to do with datafiles for recovery if you don't have a TT, but suffice to say you need a better than average DBA.

If it is hosting native file system files, how do you recover?
Here it is easy. Just dynaically add the mirror copy to any SAN attached host, recover the file(s) you need & you're done.

An alternate that is becoming very popular is never backing it up (collective gasp here) These systems use Content Addressed Storage and multi-node duplication/sync to ensure that as every document is written, it is written to multiple disks & synced off site. If a node goes down, just put a new node "off the shelf" & configure it to replicate from your off site node(s) Properly configured, you can lose entire sites without any loss of data & rebuild in minutes (& then it resync automatically) EMC's DiskExtender/Centera system is an example.

The best storage solution depends entirely on the knowledge of your staff & the amount of money you've got.

As to the systems admins who know only the OS, please, please, please call them somthing else (I call them Bobs - think Office Space) The word system implicitly defines an entire process, which should NOT refer to an isolated host, but rather all physical hosts, their OS's, and ALL software in the stack.

The major point I keep trying to make is that under most circumstances I agree with Tom & you'd be better off storing files in the db, but unlike Tom I can see plenty of situations where storing in the database would be a poor choice. Which fits can only be answered by taking the entire system into account.

S: What is the % fail on medical equipment vs business code? I'll bet medical equipment software is of higher quality.

rc: there's more quality code for POSIX OS's than for Windows, so I must disagree. The choice of OS should also be situational.

Oleksandr Alesinskyy : Flashback is a recovery tool, not a versioning tool. Where do version control tools store data? Many use a database. I'ved used some that run on an Oracle DB that happens to have Flashback enabled.
Presentation of the data is only part of a justification for how to store. How searchable is the data? Legal Discovery laws are being passed in many US states & other countries. Sometimes the lawmakers are clueless about anything electronic (such as the new "anti-hacker" law in Germany effectivily killing the security research industry there) Know your laws!!!
Performance should be taken int account, but it should be from a user acceptance level, not just the fastest read. Does everyone need a formula one race car to get to work?

some advantages with filesystems

Peter Köhler / Awaro GmbH, August 29, 2008 - 5:10 am UTC

Hi Tom,

some problems are better solved with filesystems:

1.: If You download files no database call has to be made. With big files and clients having a really slow or unreliable internet connection (we have clients in Africa and Middle East!) this means that no expensive database connection is occupied over many minutes.

2.: Architecturally it seems strange to me that the view or the controller should deal with database connections. They belong to the service layer.

Just my two cents.

Greetings from Frankfurt, Germany

Peter Köhler
Tom Kyte
August 29, 2008 - 11:25 pm UTC

1) hmm, wonder how all of our files in Oracle are available. You make it sound like the 10's of terabytes of data we have stored in our single network file server would be almost always unavailable?

And we have developers / sales people / support / whatever in almost every country, including the two geographies you mention.

We must be totally hosed, we never get our files....

2) eh? you just lost me entirely, the first point - you lost most mostly, the second one - entirely. Who said anything about a database connection? We can ftp, smb, nfs, webdav, whatever. Under the covers, way under the covers, there be a database.

MVC

rc, August 29, 2008 - 5:42 am UTC

@Peter Köhler

>> "2.: Architecturally it seems strange to me that the view or the controller should deal with
database connections. They belong to the service layer."

You mean the model view-controller-design-pattern? Nobody in this thread has said that the view or the controller should connect to the database (correct me if I'm wrong). So I don't understand what you want to say.

In MVC the model should connect to a service and this service connects to the database.



@rc

Peter Köhler / Awaro GmbH, August 29, 2008 - 7:11 am UTC

Sorry, i was not clear enough.
What I meant was that if you download a file which is in the database than you have to deal with the http request which sould only be present in the view or in the controller layer of a web application AND you need to deal with a database connection (for the file).
In Java this is often hidden by some strange proxy objects, but to to me this has a bad taste.

Greetings from Frankfurt / Germany

Peter Köhler


Tom Kyte
August 29, 2008 - 11:26 pm UTC

not correct.

layers

rc, August 29, 2008 - 9:54 am UTC

I think there should be one layer for data access. Data access can be accessing a database, files on the file system or calling a web service. Data is data, the other layers should not know the origin of the data.

If you change your mind and want to store the files in the database instead of the file system you should only have to change one layer in your code.

Maybe that is what you mean?
Tom Kyte
August 30, 2008 - 9:31 am UTC

well, if you change the data layer, you are changing TWO layers in your code - no?


and if you decided for whatever reason to put the files in the file system, you can still access them from the database without really changing the "data access" layer.


Regardless of where you put your "data access code", if you remove or replace the entire data layer (go from database to file system, from file system to database) you are going to reimplement that *data access layer* - regardless.

I prefer to put my data access layer in stored procedures because DATA goes into a DATA BASE.

RE

Ben, August 29, 2008 - 10:10 am UTC

A reader,
"Regarding loading 16 TB into DB. What difference would it make from a storage prespective. You still need 16 TB on your filesystem to load it."

You actually need quite a bit more. With archiving enabled, the archives will grow at a rate determined by the inflow of data. With Flashback, whatever blocks get updated will get logged into your flashback location. With RMAN to disk, a good backup will backup both the datafiles & archives. So a "worst case" scenario would be in a system where there is a high insert & high update of these files with routine backups. In this example it would be 16TB of datafiles, 16TB of archives, ??TB of Flashback, 16TB+16TB for Rman.

Tom,
Sorry I didn't respond to you in my last post. The 5000 file limit was a purely arbitrary decision made by a senior exec (non-IT) who had read an article about how with a fancy new CD write drive you could take a Write Once Read Many approach. So we bought us a fancy new 2x CDR drive & a CDrom tower. Since then the CD archive has been abandonded (1st DVD, now CAS based archiving with auto off site syncing), but there has been no business reason to change this arbitrary limit (why change existing code...)
FS backups are capable of doing _everything_ you describe for DB backups. Most times an FS backup is faster than a DB backup & use a simple enough interface you can hand the task off to a Bob. Don't get me wrong, Rman came a loooong way in 10g, but it's still not user friendly enough I'd trust an operator with it.
I believe the virus on the laptops on the spacestation (not on the space station systems themselves) was the result of some Bob who should have his contract revoked for incompetence.

Even Oracle recognizes that at times a FS storage approach is a better fit (note fit, not solution) In OAS, the files for J2EE & Forms apps are stored -> in the FS, not DB. You could argue this is because these could be deployed without an infrastructure DB, but that argument fails for the time when someone does deploy an infrastructure. There is NO option for storing application files in the DB (except of course for PSP pages & ApEx...)

"interesting, I agree they are not "*THE*" enterprise, just one of the biggest components of it. Erase my applications and I'm slowed down. Erase my data and I'm *out of business*."
actually, most enterprises can exist without their data for a surprising amount of time (hours to days). They can't go minutes without the communication infrastructure (email, im, phone, etc). The db has information critical to business. Communication is critical to people. Customers don't work with businesses, they work with people.

I believe in the theory that a chain is only as strong as the weakest link. Everything a user needs to perform their duties is part of this chain. Yes the DB is critical, but so is the application, the network, & the user's workstation, etc. Data is USELESS without the tools to interpret it.

RE : Peter Köhler / Awaro GmbH
There are setups where the user's end doesn't do any DB connections & just sees the files as any other mapped drive/share. As far as network traffic goes it's almost identical. However, there is a performance hit. Most systems can easily accomdate the speed drop from storing a file in the DB, but this is on a per system requirement decision.

Tom Kyte
August 30, 2008 - 9:44 am UTC

... You actually need quite a bit more. With archiving enabled, the archives
will grow at a rate determined by the inflow of data. ...

excellent, so we can actually do a point in time recovery, this is what databases are all about.

and actually, if you didn't have that need, you can do these things non-logged, caveat emptor - penny wise - pound foolish as they say.


... With Flashback, whatever
blocks get updated will get logged into your flashback location. ....

you are confusing the flash recovery area for FLASHBACK DATABASE with flashback query which uses undo (or the pctversioned data). Don't confuse the two, you do NOT NEED a flashrecovery area for flashback query, flashback table, flashback versions query. Only for flashback database and that isn't what we were talking about - we were talking about flashback query to quickly undo a 'whoops' operation.

and truth be told, a flash recovery area isn't a bad idea if you have serious uptime requirements. And, well, many a file system these days do things very very very similar.




... Most times an FS backup is faster than a DB backup & use a simple enough
interface you can hand the task off to a Bob. ...


anything like that you hand off to a "bob" as you call it is something that'll fail. I would never in a million years consider handing off the single most important thing ever to a person that is not competent. That would be a CLM (career limiting move).

and database backups take 0 seconds. They are totally a background thing, don't get hung up on the time (but if you do, let's race - let me set up a proper database with block change tracking and a flashrecovery area and I'll use incrementals to catch my backup up)....

... There is NO option for storing application
files in the DB (except of course for PSP pages & ApEx...)
...


I do not believe anyone proposed storing application (exe's, zip's, war's, whatever) in the database. Applications come and go - they are source code controlled somewhere else (one hopes). We have been talking about the core data you need for your business.

Lose an application - slow me down.
Lose my data - SHUT ME DOWN (out of business).


... actually, most enterprises can exist without their data for a surprising amount
of time (hours to days). ...

which ones? I'm pretty sure I wouldn't want my doctor to diagnose me without having access to my records. I'm pretty sure Oracle would be hard pressed to operate without it's support/tar database. What industry could suffer the loss of data for hours or days without having something bad happen? Remember - if the already stored data has gone unavailable the ability to CREATE NEW data is gone too pretty much. You cannot work with existing customers, you cannot really create new customers (because the data needed to do so is - well - not there)

The only thing email (hey, wait, that is actually data isn't it - what if the datastore is the IMAP repository that you decided to store in a file system....) so well, actually email is gone.

The only thing IM would do is let us all sit around and say "hey, remember the days before computers, we were never down like this, we had that paper stuff"

The phone would do the same thing, assuming the phones work (eg: our phones are sort of data driven these days, if we suffered a major data loss, we could well lose our phones).

Communication - without anything to talk about - would just be a big "complain session". "hey when will the systems be up".




LJ, August 30, 2008 - 4:20 am UTC

I've been using the Oracle database as a content repository for over 2 years now, including the 10g and now 11g versions. I've also been a Documentum Administrator for 7 years - and after comparing both experiences, I would never go back to using a file system as a content repository if given the choice. Even if it meant going back to a 10g database.

I'll admit not having the technical ability to do a "SAN to SAN FC mirror copy" without expensive software - but I also found that I don't *have* to do a SAN to SAN FC mirror copy because I use RMAN, a block change tracking file, and applied incremental Image Copies to do backups on RAID 10 disks in separate enclosures from the Live storage. This is by design, not accident.
In addition to not having to do SAN to SAN FC mirror copying, I also don't have to worry about securing a separate content server(s) in addition to the database, I don't have to worry about losing transactional control, or integrity, or reliability, or scalability. I don't have to worry about whether or not I'll be able to restore content files and metadata either complete or incompletely to the exact point in time I need.
I can use caching on the application server to make up for some (maybe not all) of the performance loss from not using a file system. But performance isn't everything. Performance sucks when your site is down for 3 days trying to match up separate metadata and content.

And to reiterate one of Tom's mantra's ... there is no "one size fits all" answer for anything. There is no perfect solution for everyone. The right decision will completely depend on the other components in your environment. For Ben who commented above, he's obviously a talented Sys. Admin. and can do advanced maneuvers like SAN to SAN FC mirroring as a backup. For his projects, perhaps this is the best way of doing things. For me, I'd screw it up in under 10 seconds since I don't have those specific skills Sys. Admin. But since I have to manage the storage, backups and database myself, it's OK because I have ASM and SecureFiles and Paritioning and RMAN to accomplish the same goal, just in a different way.


data access layer

rc, August 30, 2008 - 11:44 am UTC

The data access layer is often simply the layer that calls the stored procs. But you can also call a web service to retrieve data in the data access layer. And I think it is a good place to access files on your file system too.

I don't understand what Tom and Peter Köhler want to say.

Anyway, the view and the controller should not access the file system or a database or a web service. Those two layers should not have to know the exact origin of the data.

Tom Kyte
August 31, 2008 - 9:23 am UTC

and you can have a web service that calls a stored procedure.


If you store data in the database, files even, you can

a) pretend it is a file system and access it that way
b) use web services to access it
c) access the information in SQL
d) etc etc etc etc


meaning - put it in a database and you can access this same information from multiple environments in an open fashion.

And your "view" or "controller" layers need not be bothered with the details.

Performance Difference in Files vs CLOB

Amin Adatia, August 30, 2008 - 12:03 pm UTC

I would like to store my data in the database. However, we are finding that the read performance between the CLOB and OS_File is about 3 times in favour of the File. Is there something we can do in our code which is using the Java streams function. Are we stuck with the slow performance because of Java?
Tom Kyte
August 31, 2008 - 9:25 am UTC

tell us about your test and your needs here.

Are you needing to serve these files up over the "internet" for example, does your test including a web brower downloading the files, do you have to use java as the language in the middle (eg: mod_plsql can get files out of the database pretty easily - one line of code actually)

also how did you create your clob, do you read and re-read the clob over and over, have you enabled caching of the clob (read only caching or read/write caching - you decide)

data access layer

rc, August 31, 2008 - 11:38 am UTC

???
Tom Kyte
September 01, 2008 - 12:51 pm UTC

eh? did you have a question or comment?


DB or FS

Ben, September 02, 2008 - 10:25 am UTC

I don't want to get pulled into a "anything you can do, I can do better" debate. For every situation that I can create to justify moving files into a database, I can create another scenario to justify using the file system.

Below are some of the questions that I would typically ask to help determine which is a better fit (in no particular order):

What is the expected growth rate of data storage? Remember that unstructured data tends to grow much faster than structured data.
What is the source of the files : user (such as word processing/spreadsheet)? (this is another way to ask about growth rate. I've seen managers claim slow expected growth rates, but in reality when users are unleashed on the system...)
Does the DB server have sufficient capacity to allow for at least 3 years of this growth rate?
Assuming Archiving is turned on in the DB : Does the server have sufficient capacity in it's archive destination? (apply same for Flashback DB)
How does the expanded DB size fit into your backup systems and procedures?
Can your Disaster Recovery plan accomodate the increased DB size?
Do any other vendor products have to interact with these files? If so, can these products work with them stored in the DB?
What is the budget?
Is performance on file read critical?
What technologies are already in place & paid for?
What are the skillsets for your current admins?
What are the skillsets for your users?

If Tom's mantra is there is no "one size fits all"? Then why is he advocating that the database is ALWAYS the better fit? I am simply advocating that there are plenty of situations where FS or a blended storage solution is a better fit.
Tom Kyte
September 02, 2008 - 12:04 pm UTC

I'll say it as simply as I can:

if this data is valuable to your business, if this data is necessary for your business, if the loss of this data would damage your ability to do business - it has no business NOT being in the database.

The only place I know data can be secured, properly backed up and recovered, safe - is a database.

A file is just data.


If the files are static images for your website, if the files are generated reports - reports that can easily be regenerated - put them where ever you want.

If the files contain information crucial to your business, there is only one place for them - a database.

respond

Ben, September 02, 2008 - 12:32 pm UTC

And I'll respond as simply as I can

"The only place I know data can be secured, properly backed up and recovered, safe - is a database."

Whereas I can not attest to your knowledge, the idea that an FS is limited compared to a DB for security, backup, recover, etc is simply not true.

Name one way in which data can be secured in the database that has no equiv in the FS.

Name one way that a DB can't be backed up that can't be done in an FS

Same for recovery


If these are your predominant arguments for the DB as the sole file repository, then I respectfully suggest that you're suffering from multiple logical facilcies, predominatly the Appeal to Belief http://www.nizkor.org/features/fallacies/appeal-to-belief.html
Tom Kyte
September 02, 2008 - 1:27 pm UTC

backup/recovery:

do a consistent point in time recovery of your structured (relational) and unstructured data (files) - I need it to be restored to 11:59:23am last Tuesday.

And it obviously needs to be consistent. The meta data that describes where and how the files are needs to be consistent with the files themselves (using just a file system).


securing:

I would like an exhaustive list of everyone that has accessed or modified the file for the last two years - and for compliance reasons, I need all of the intermediate versions of the document (using just a file system).




And please bear in mind, database administrators - they do backup, recovery, securing as part of their day to day job. It is expected of them. System admins, frankly, they are *not* data oriented people - it is just not what they do. You want good, safe, secured data - you'll put it in a database.

You want files to go accidentally missing, not noticed they are missing, go ahead, put them in to the file system.

Sorry, but every time, every single time, I see people store things outside of the database - it eventually goes missing, parts of it at least. Or the need to consistently restore a file system with a database so the related data bits and bytes all "align" - it just isn't there.

And too many times - people do not even think about this stuff when they "design" their system.

For the same reason I hate

o triggers (do more harm than good in real life)
o autonomous transactions
o when others

I will always and adamantly say "put it in the database"

Could you build something in the file system yourself to achieve most/many/all of your goals? Probably.

That isn't the point, the point is - most implementations don't; don't do it right, don't think it through - and they don't realize it until it is too late.


This is not an appeal to belief, this is "I've seen it go horribly wrong so many times, I've wasted too many weekends and evenings trying to fix these kinds of mishaps"


response

Ben, September 02, 2008 - 2:10 pm UTC

"Could you build something in the file system yourself to achieve most/many/all of your goals? Probably."

already done, off the shelf products exist to do this. yes, including point in time recovery. No, it's not native to the OS, but that means companies COMPETE to create products & the end result is better than if one vender did it & said "this is the best"

Does Oracle do it well in the DB? Yes, but it has it's own limitations.

FYI - I seen "DBAs" drop data left and right, have incorrect update scripts updating millions of rows with bad data, etc. This is human error, not system limitations.
Tom Kyte
September 02, 2008 - 2:24 pm UTC

Ok, say I install this "off the shelf product" - and I as an admin go around the product and just access files. Can you tell? What if I modify them - can you tell? What if I take a whole bunch home with me - can you tell?




Data loss, when data is managed in a database, is far less than the data loss when data isn't stored in a database. When you put it into a database - people tend to treat it a little more seriously. And that million row botched update - we have the tools to easily, readily, quickly put it back right.

but anyway...

If someone asks me "store in file system or store in database", I sort of presume also they are sitting down to WRITE this stuff - they are not in the market to buy yet another thing. And if they put it in the file system - they have basically nothing. They try to write it all. And that is just "not smart"

update

Ben, September 02, 2008 - 2:58 pm UTC

what happens when the db crashes? Can users still process spreadsheets, wordprocessing ,etc? Surely you don't suggest that Oracle's DB is bug free?

put in on an FS & you divorce yourself from db stability issues.

Every solution has pros and cons & I hope that in the future you are capable of taking your blinders off and re-evalutating your stance on this topic.
Tom Kyte
September 02, 2008 - 3:27 pm UTC

Ben

let me put it this way - by the time you get the feature set you say comes with being on a file system along with a layer:

...
already done, off the shelf products exist to do this. yes, including point in time recovery. No,
it's not native to the OS, but that means companies COMPETE to create products & the end result is
better than if one vender did it & said "this is the best"
.....

you know what you have? You have *a database*. It might not be Oracle (heck, you know, we use a file system too - we are a layer on top of a file system typically, although we can provide our own). But by the time you have the feature set of backup, recover, any sort of security, auditing - etc you have either a) bought a database (call it what you will, it is a database - access to the data must be through that layer - else the recovery is not there, else the security it not there) or b) written your own database

But a rose by another other name....

My database stays up, file systems and applications on top of them - that is the height of instability to me.

Surely you don't suggest that *anything* is bug free. What about all of the 3rd party products you mention? They are bug free? What happens when the nfs server freezes, veritas decides to go awry, the network becomes unavailable, the OS hosting the file system blue screens or panics, etc etc etc. The database is not inherently "unstable".



And in Oracle, we actually do run a single file server for the entire company, and a single email database for the entire company and so on... And if email and documents were unavailable - we'd not be able to work - virtually everything in the company runs on it.

Systems are only as "instable" as the people managing them ultimately.

And actually by the time you have your backup capability, recovery capability, versioning capability, auditing capability, <most any> capability - you have either installed a database or written your own.

So, actually, point made. To get the feature set you need - you install a database of some name and use it.


Data goes into a database. Stuff you don't care about can go willy nilly on the file system. And if someone in the future writes to me:

... if possible could you please explain the advantages and dis-advantages of storing files within an Oracle database. We are currently using 9.2 and 10g. We have no immediate plans to move to 11g. Thanks, we all appreciate your answers! ....

The answer would be the same: "DO NOT WRITE your own database, it is already written, put data you value into the database"

Because the person asking it is not asking "should we use product X or put it into Oracle", they are asking "should we just stuff our files into the file system and other stuff in the database". If the files mean a thing, they will put them in the database.

You choose your database, but ultimately, you choose a database.

All I'm saying is "data you value, goes into a database".

A reader, September 04, 2008 - 10:32 am UTC

"what happens when the db crashes? Can users still process spreadsheets, wordprocessing ,etc?
Surely you don't suggest that Oracle's DB is bug free? "

Uh, if your database crashes, how on earth are your users going to know which file is theirs without the associated metadata from that database? Unless you give your users access directly to the FS and let them poke around through all the files until they find the one their looking for? (nice security model)

If your database crashes, then your files are useless whether or not your files are in the DB or not. However, if your files are in the database and it crashes, we can recover the database - which includes the files - to whatever point-in-time. If the files were on a file system, we'd need to hope we could restore the database to the same poin-in-time that the files could also be restored to on the another file server. Without the metadata stored in the database, you just have a bunch of files on a FS with no way to tell anything except "here is a PDF file, wonder who created it and what's in it".


File system interface to the database?

Stew Ashton, November 18, 2008 - 9:06 am UTC


Tom, with the advent of SecureFiles there is growing interest among my colleagues for storing files in the database; however, in most cases we are using purchased software that only knows how to store files on a file system. We were wondering if an interface exists to access an Oracle DB as if it were a file system? The idea would be to use the database completely "under the covers".

My searches only found rather old stuff (IFS), or else services that seemed to be wrapped up in other products such as Content Management.

Any pointers?

By the way, I am not saying this is a good solution. It would be infinitely better to store all the data together in business transactions. However, this might be a step in the right direction and it would at least show that SecureFiles performance meets our needs.

Re: file-system interface

Stew Ashton, November 19, 2008 - 4:09 am UTC


So you can type in search keywords better than I; we all knew that ;)

"used to provide users with a file-system interface"

Unfortunately, we would need something that provides programs with what appears as a local file-system interface: you know, fopen(), fclose() and the like. I can see where it would be much more feasible to serve higher-level protocols like FTP or WebDav, since they only deal with entire files and they are protocols, not interfaces.

Maybe providing a low-level "read from / write to a file" interface is just too much of a reach technically. In any case there would have to be specific interface code installed as a "file system" on the calling program's side.

WinFS

Parthiban Nagarajan, December 01, 2008 - 2:59 am UTC

Hi Tom

You are suggesting to store IMPORTANT files in the database rather than in FileSystem for the sake of Backup, Recovery and Security. Though its acceptable, its not practical. Inspite of database, we can depend on Third Party Softwares for Recovery, and User/Group privileges provided by the OS for Security. This should be enough for most of the IMPORTANT files, I believe.

If you are not satisfied with the above paragraph, what if the FileSystem itself acts as a database instead of making Database to act as FileSystem? Yes, I am talking about the WinFS from MicroSoft.

http://en.wikipedia.org/wiki/WinFS

I hope, it will be shipped with the next version of Windows OS.

I wonder, what would Oracle or any other Technology vendor's answer for this ?!!!
Tom Kyte
December 01, 2008 - 7:59 am UTC

gee, not practical. better not tell many of our customers or even or internal IT staff that - they might be very surprised.

Many people, people who really value their data, place data that is important to their business in the ....... database.


And it is hugely practical.


I don't believe in what you believe in.


My answer to something like 'winfs' (which you hope will be shipped some years into the future) is the same I have for distributed databases


distributed complexity is just that. If you want to query documents and relational data (and spatial and xml and image and audio and whatever) your best bet is to have.....

a single data repository - so they are all as of the same point in time, with consistent centralized security polices, managed by the same team - providing recovery and backup in a consistent fashion.



You give no factual or even "in my opinion" reasoning behind why you believe this is not practical. Perhaps that means you haven't read far enough into the documentation - haven't used the database in enough situations to know what it is actually capable of doing?

Storing files in the database is always a bad bad idea.

Victor Logan, April 30, 2009 - 7:34 pm UTC

Tom,

Sure you would come up with every reason in the world to use a DB -- duh. Any reason why you would suggest not to?...:)

When you have large files to manage retrieving them from a webapp is a pain. read this here for the why http://faler.wordpress.com/2009/04/23/storing-imagesfiles-from-in-a-database-is-bad-mmmkay/

The best suggestion would be to use a content management like documentum, which still uses a database (this should make Tom happy) but uses the file system for storing files, al which is transparent for the users.



Tom Kyte
April 30, 2009 - 9:34 pm UTC

you know, whenever you say "is always", I know one thing for a fact

You are wrong.

never say never, never say always, I always say.



did you know that a file system is..... a database (a database with a really poor feature set but a database none the less)

did you know that every content management system - every single one - is a database


Don't tell me storing files in a database is always bad - or hard - or whatever, it just means to me that you haven't been around long enough.

All of our files in Oracle - terabytes and terabytes of them - are in.... a database (a popular relational database with lots of other stuff like text capabilities, backup, recovery, security, etc)

I read that blog entry and see:

... What does this mean? It means that a single user getting a single page will require 21 connections out of your database pool! Why? Because browsers and other clients will do a separate ¿GET¿ for each perceived ¿file¿ (such as images) that is referenced on the page. ....

well - I don't think anyone suggested putting your trivial images into the database - what is suggested is:

if your data means something to you, if the data is critical - hard to replace or irreplaceable - then the only responsible sensible place for them is a database that provides for all kinds of contigencies

On asktom, my images - simple file system. I can get them a million times over if they get mucked up. My files - database. Need them backed up and recoverable to the same point in time as my application data. Need them secured in the same fashion as my application data. If I decided to add more application servers, I need ubiquitous access to them (without having to cross mount a file system all over the place).

The patent trade office for example - millions and millions and millions of documents, images, drawings - guess where they are? They are not in a file system - they are professionally managed in....

a database (just happens to be an Oracle database)

as an example.


I did like that the blog entry says "almost all of the time." - very wise. To caveat like that. Never say never, never say always, I always say.


That blog entry doesn't say "do not store critical data in the database"

It does say "don't store trivial things that could easily be reproduced at the drop of a hat in there"

And it does not say why "retrieving them from a webapp is a pain", that is an entirely false statement. Goto my files tab, click on it, select a file, click on it - retrieved. You know what would be a pain? IF I STORED THAT ON A FILE SYSTEM.

What happens when you go to scale out your application server - which stored all of these files. Do you really want to cross mount your single disk with your *critical* data all over the place?

sorry, you present nothing here to really argue against.

fielsystem

A reader, May 01, 2009 - 6:12 pm UTC

Tom:

I am not a unix or windows expert. But is cross mounting a single disk with critical data to other app servers a complex task? I thought it ia simple unuix command.


Tom Kyte
May 02, 2009 - 12:04 pm UTC

there will be limits to how many places you can cross mount it and now you have your left hand all over your right hand messing with stuff.

There will be performance issues as well - filesystems are not built the same way a multi-user database is


A common phrase heard an environment like this (with lots of cross mounts) will be "oh, you needed those files, I needed more space, they looked old - sorry about that"


But all the big kids are doing it...

Ken, May 04, 2009 - 3:53 am UTC

To Sam,

Let's assume that what you say about YouTube and FaceBook is correct. If you're one of those guys, it might be worthwhile to have a higly-customized system.

I think that what Tom is saying is that it is possible to construct a fast, filesystem-based repository; it is just not the wisest choice for most of us.

If YouTube loses a video or two, who's going to notice? Even if they lost 10%, would they really suffer? Would they ever need to roll forward to a point in time? Do they need a history of document changes?

The point is to think about the needs of a business: speed, cost, security, recoverability, traceability, staffing needs, etc. The web sites you mention are not typical businesses, but rather exceptions which prove the rule.

Advantages of Oracle Database

Tunji Olaniyan, May 18, 2009 - 5:42 am UTC

I quite agree with Tom that there 're a lot of advantages to be derived from storing data in Oracle database. That 's why today, all critical business data of most companies world over are stored in Oracle. Those hard-to-beat features have made Oracle the hottest database.

Thank you.

files

A reader, March 30, 2010 - 12:13 pm UTC


Everything can stored into database

senthil kumar, September 16, 2010 - 3:59 pm UTC

Thanks Tom,

why the operating system need change to FileSystem to FileSystem like FAT->FAT32->NTFS->EXT3...
because to improve handling data storage.so i simply say os is just a managing software...why not a database management system do more then that(i.e managing or handling more then that an os). yes if suppose oracle release an server operating system (like sun Solaries)that may be really powerful then other os.


Tom Kyte
September 16, 2010 - 5:29 pm UTC

<quote src=chapter 1, Expert Oracle Database Architecture>

My Approach

Before we begin, I feel it is only fair that you understand my approach to development. I tend to take a database-centric approach to problems. If I can do it in the database, I will. There are a couple of reasons for this—the first and foremost being that I know that if I build functionality in the database, I can deploy it anywhere. I am not aware of a popular, commercially viable server operating system on which Oracle is not available—from Windows to dozens of UNIX and Linux systems, to AIX and more—the same exact Oracle software and options are available. I frequently build and test solutions on my laptop, running Oracle11g or Oracle10g under Linux or Windows on a virtual machine. I can then deploy them on a variety of servers running the same database software but different operating systems. When I have to implement a feature outside of the database, I find it extremely hard to deploy that feature anywhere I want. One of the main features that makes the Java language appealing to many people—the fact that their programs are always compiled in the same virtual environment, the Java Virtual Machine (JVM), and so are highly portable—is the exact same feature that make the database appealing to me. The database is my virtual machine. It is my virtual operating system.
</quote>


believe me, I've said it myself....

filesystem

A reader, September 20, 2010 - 3:49 pm UTC

excellent

files

A reader, July 10, 2011 - 1:42 pm UTC


Wrong place to ask this question

A reader, March 29, 2016 - 5:36 am UTC

There clearly are benefits, but none that would be mentioned by an Oracle rep, so asking that type of question here is not appropriate.
Connor McDonald
March 29, 2016 - 5:49 am UTC

Actually we try to give advice that we think will serve the customer (ie, the person that asked the question) best.