Skip to Main Content

Breadcrumb

Question and Answer

Connor McDonald

Thanks for the question, Gleisson.

Asked: January 02, 2006 - 11:17 pm UTC

Last updated: June 18, 2020 - 3:31 am UTC

Version: 9.2.0.1.0

Viewed 50K+ times! This question is

You Asked

Does Oracle provide a package or function for data masking ? For example, in a development environment, for data protection purposes, the information of the table customer needs to be masked.

create table customer (last_name varchar2(25), first_name varchar2(25), address varchar2(25));
insert into customer values('Doe', 'John', '10 someroad st');

In the production environment, if we do a select on customer, the result would be without any surprise:

select *
from customer;

LAST_NAME FIRST_NAME ADDRESS
-------------- -------------- --------------------
Doe John 10 someroad st

In the development environment, the same information needs to be masked for something like:

LAST_NAME FIRST_NAME ADDRESS
-------------- -------------- --------------------
Ahd Uiea 55 xxxx ue

It doesn't really need to make sense, it doesn't need to be readable. Just needs to be masked. Does Oracle supply anything for that purpose at all ? I tried to find but didn't have much success. I found a third party software that will do the job. If you want to, I can post its name here.

Thanks Tom and happy 2006.

Gleisson Henrique

and Tom said...

Updated in 2018:
================

There are various mechanisms available - data masking and data redaction.

Check out the whitepaper for examples and details

http://www.oracle.com/us/products/database/data-masking-best-practices-161213.pdf

Original Answer from 2006
==========================
No, we can "hide" the information (Oracle 9i provides for row hiding and Oracle 10g provided for row and/or column hiding)

But data obfuscation - no, it cannot do that. You actually need to physically obscure the data itself prior to storage (and it had better be a one way obfuscation). Think about things like indexes and such.

Don't forget, as soon as you start obscuring data, you start to lose the natural distribution of data. Do not be surprised when your obfuscated system behaves entirely differently from your "clear text" system - clustering factors, data lengths, distributions of values - all will be radically different.


I'll post to see if anyone has a suggestion on a tool to use sparingly on names/addresses and the like.

Rating

  (55 ratings)

Is this answer out of date? If it is, please let us know via a Comment

Comments

Data Masking

David, January 03, 2006 - 8:29 am UTC

A most excellent tool for data masking may be found at the following :

www.datamasker.com

This software provides a simple repeatable method of scrambling data.

Thanks Tom.

Gleisson Henrique, January 03, 2006 - 9:04 am UTC

I will look in the documentation for the row hiding in 9i that you mentioned to see how it works and what it does. I found this link that might work for my purposes.

</code> http://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:6382864029082 <code>

Analytics really *rock* but I agree with Robert on that post, Analytics are hard to read. But the way, I saw on some page on AskTom that a reader suggested that you should publish a book on Analytics. That's actually a good idea. What do you say ?

I didn't want to advertise on the AskTom, but since you said you would post to see if anyone knows any tool that might do that, I found one last night after I posted the question, and it's called The Data Masker. I watched the tutorial and it looks like it works pretty well.

Tom Kyte
January 03, 2006 - 9:08 am UTC

That would kill performance (using analytics to shift data), you want to obscure it - physically scramble it - for testing purposes.

the row/column hiding is Fine Grained Access Control. And I don't think it applies to this problem.

And I don't mind "refering to software that solves a problem" on here at all.

That was quick ! ! ! ! ! !

Gleisson Henrique, January 03, 2006 - 9:14 am UTC

Yeah, I started to look into row hiding and is not really what I thought it was.

Rajiv, January 03, 2006 - 9:47 am UTC

Why not write a couple of pl/sql functions using dbms_random, to just scramble the data while loading to development (either through sqlldr or select insert over dblink/whatever..) Thats pretty much what the tool also would do behind the scenes?

Jon, January 03, 2006 - 9:50 am UTC

> A most excellent tool for data masking may be found at
> the following :

> www.datamasker.com

> This software provides a simple repeatable method of
> scrambling data.

Do you know what the price is?

Does it install packages in Oracle and perform the data masking there?

Shift columns in PL/SQL table

Dejan, January 03, 2006 - 12:23 pm UTC

Hi,

We have the same requirement of scrambling test data and we did the following:

- Bulk load "customer data" into PL/SQL table
- in PL/SQL table shift column "last_name" 1 up,
column first_name 1 down ..
- Update original customer table ..

Regards Dejan




DBMS_Obfuscation_Toolkit

Will, January 04, 2006 - 5:09 am UTC

Hi Tom,

I used the Oracle obfuscation package successfully before after reading your Expert One-on-One book. select/insert/update operations were all carried out using stored procedures which made implementation easy. I just wondered if you had a reason for not suggesting it this time?

Would it be possible to combine this with FGAC? If so could we then store the data in its naturally readable form, but obfuscate it, maybe with an obfuscated view, and depending on the querying user?

Best regards
Will

Tom Kyte
January 04, 2006 - 9:01 am UTC

data distributions.... (it is not an easy problem)
data widths change....

This data masking for testing is not a "trivial problem" to solve.

Book on Analytics

Michael T., January 04, 2006 - 10:04 am UTC

I know on a couple of threads you have been asked about, or mentioned thinking about writing a book on analytics. I think a Cookbook style book, e.g. Perl Cookbook, on analytics would be terrific. It would be chock full of examples and each example would be explained, so even if a reader couldn't find an exact match they would most likely be able to use one the examples as a starting point for their solution.

Book on analytics

Jonathan Gennick, January 04, 2006 - 11:05 am UTC

A couple of people have mentioned a desire for a book on analytics. I hope I'm not too out of line mentioning this, but Anthony Molinaro uses analytics extensively in his just-published SQL Cookbook. I was at times worried that he leaned on them too heavily (I was his editor). Anyway, if you want a source of examples, that book is a good possibility.

Are views an option?

MarkB, January 04, 2006 - 12:23 pm UTC

Could you not mask the required columns behind a view?

Columns that are not to be shown can either by totally omitted or replaced with something like
'DETAILS NOT ALLOWED - CONTACT HELPDESK'

or masked out

'****'||SUBSTR(customer.address,LENGTH(customer.address)-3)

However, this still leaves the "base table" visible, so permissions would need to be changed appropriately.

Tom Kyte
January 04, 2006 - 1:06 pm UTC

data distribution problems.
indexes are not used problems..
updates, inserts - problems...

if the goal was to test - this would not permit a useful test.

Police data

Connor, January 05, 2006 - 8:36 am UTC

We have a similar problem (we record police criminal records and want to populate our test systems with realistic data).

You need to find a balance between "masking" and "realism". For example, we maintain a table which contains all column names with a tag of: "do nothing", "wipe", "scramble" where
- wipe means null out the data (eg images etc)
- do nothing means leave the data untouched (typically dates, numeric columns etc)
- scramble means mask the data

We also have some smarts in there about constraints and index maintenance options which I won't bore you with.

Our masking algorithm has the characteristics of:

- retain the first character (to keep some semblance of the data distribution)
- case is not retained (except for first character)
- length is retained

with the scrambling done with an old fashioned substitution cipher, with a different substitution array for each of the possible leading characters of the source column.

Now that of course makes things a lot more hackable, (eg, "Smith" is a popular name so (eg) "Sxy@f" in the masked data probably means "smith". But we're not trying to make our data *impossible* to decode, we're simply trying to make sure our development/testing teams don't "stumble upon" information they were not meant to see. Many of them have legitimate access to the unscrambled production data anyway (that access is obviously more finely audited). Ultimately, if you don't trust your developers, then you're dead in the water anyway - they could simply easter egg the code to get whatever production data they needed.

Also, the fact that the cipher is 1-to-1 gives benefits that are probably worth the loss of masking. For example, if a user has a problem when they so (say) an operation on: "Johnson", we can let the test team know to attempt that same operation in our test environment on (say): "Jr8w8dhq" to see if they can replicate the problem. The moment you go with some more "hard core" scrambling, you will lose that facility.

Another plus for a simple cipher method is because its also a balance between the amount of scrambling you do and the resources required to actually scramble that data. With substitution, we can build a 2-dimensional array package variable which contains all the substitutions and simply issue:

update table
set col1 = translate(col1, the_standard_ascii_table,
the_substition_table(ascii(col1))

so the scrambling is down with a single SQL rather than a mish-mash of PL/SQL and unload/reload operations.

hth
Connor

Don't scramble - mix'n'match

Isaac Blank, February 14, 2006 - 5:10 pm UTC

We once used a following method of obfuscating personal data: We would randomly swap column values between different rows, so if you have John Brown, living on 123 Main St., Columbus, OH and Jane Dow from 567 Broadway, Manahattan, NY , they become John Doe, 567 Broadway, Columbus, NY and Jane Brown, 123 Main St., Manhattan, OH. That way data distribution for individual columns stays intact. Composite indexes are, of course, affected, but it gave us a pretty good database to play with

A reader, August 08, 2006 - 8:20 am UTC


Looking for mix'n'match solution

siddu, September 26, 2006 - 3:03 pm UTC

We are looking at a mix'n'match solution, like the one posted by Isaac Blank above. What is the best way to achieve this? We tested using the cursor logic, and it is very slow. We have over 20 million Customer records. I appreciate any help on this.

Tom Kyte
September 26, 2006 - 5:08 pm UTC

ok, it does seem that Isaac therefore told you what to do? what more do you need to get started?

this is not an algorithm I really want to play with "too much". scrambling data is "hard", requires some thought about YOUR data and what makes sense.

so, what are you scrambling here? and why? and what does it look like?

Re: Looking for mix'n'match solution

Siddu, September 27, 2006 - 12:19 am UTC

I'm looking for some logic or SQL to shuffle the Names and addresses across the rows with better performance than using cursor. We want to scramble the Names, Addresses, SSN, DOB, Phone Numbers and Email Addresses. They all are stored in different tables. We do not want any sensitive data to be available outside the production systems. At the same time we want to have meaningful data available for testing and other needs.

We want to shuffle the data randomly in Name and Address tables. We want to scramble the SSN, Phone Number and Email addresses. Add few days to date fields.


Tom Kyte
September 27, 2006 - 4:59 am UTC

it is going to be slow regardless of the approach here, think of the work involved for a table of any size.

say the table has 5 columns you want to "scramble", we'd have to sort that table randomly 5 times in order to scramble it up. Trying to do something in a bulk sql statement is going to be hard - since you cannot predicably get functions to be called over and over and over

so, say you wanted to "scramble" the owner and object_name column from all_objects, you might be able to do something like this - but it will require hitting all_objects three times into temp and then joining the results:

create table t
as
with main_data
as
(
select rownum r,
object_id,
subobject_name,
data_object_id,
object_type,
created
from all_objects
),
new_object_names
as
(select object_name,
row_number() over (order by dbms_random.random) r
from all_objects
),
new_owners
as
(select owner,
row_number() over (order by dbms_random.random) r
from all_objects
)
select object_id, subobject_name,
data_object_id, object_type,
created, object_name, owner
from main_data a, new_object_names b, new_owners c
where a.r = b.r and a.r = c.r
/


karthick, December 04, 2007 - 3:18 am UTC

Data masking cannot be generic. It must be application specific. The tool that masks the data must know the meaning of those data that it¿s masking.

When you say you want to give your production data which is masked for testing and you want your application to be tested with that data. That data must be a valid data it should satisfy all the rules that the application applies on the data.

The masker should know the nature of the information that it¿s handling.

It should identify all constrains on the data and the masked information should satisfy them. By constraints I don¿t mean database constraints alone. Application level constraints. Even though constraints should exist in the data base this world has the dirty habit of putting it in different places.

What if there is a front end restriction that salary cannot be > 1000 for manager and your masked data has salary 2000 for manager. How is this going to be handled?

Think about indexes what if there is a unique index in a column you should ensure that the masked data is unique. What if you have a function based index on a column what way it will get affected when you mask the data used by that index.

So when it¿s masking you are not masking just the data in your database but the data in the application. Thought at application level is very important for this.

As tom pointed out some technical problem in this, we actually have bigger functional and logical problems too

Masking different sources with same algorithm ?

A reader, October 28, 2009 - 10:07 am UTC

Hi - We are right now designing a 11g data warehouse database for a BI application. This database is going to have data loaded from different sources. And a lot of them will have the same information for a person like SSN, Name, etc. We have a masking requirement but the trick here is that for a person's SSN, it should be the same value across all sources for eg., if a person's SSN is 012345678 then the resultant masked value should be 555667777 from all sources. For a lot of the algorithms, they are not repeatable in the sense, if I run it once on 012345678 it will give 555667777 but another time in another source it will give 999883333. Then in the data warehouse final database it might seem that these people are different. Has anyone come across this before ? What have you used to get over this drawback ? We are also looking at the new masking from OEM, can that be used in any way to overcome this ?

Any help is appreciated.

Thanks.

Regarding masking different sources

A reader, October 29, 2009 - 10:22 am UTC

I have posted a question above regarding masking different sources. Can you please reply ?
Tom Kyte
October 29, 2009 - 10:48 am UTC

I am not personally aware of anything like that. You'd have to keep a mapping yourself to do something like this.

Masking from multiple sources

A reader, October 29, 2009 - 2:02 pm UTC

What is the common way that people do this ? I am surprised that I am not getting much responses because I thought that this would be common for any BI application.
Tom Kyte
October 29, 2009 - 3:35 pm UTC

It is common for testing scenarios - take this production data and mask it so we can test the set of functionality.

But for BI - the rule has always seemed to be that either

a) you are allowed to see the data
b) you are not allowed to see the data


If there was sensitive information in the base data, it would just not be exposed to me. For testing a production application - that isn't feasible (the application relies on seeing that data) but for a BI application that certainly is feasible - you are either allowed to see or NOT allowed to see it.

So, I haven't heard this request myself before, it goes against what I know.

Other ideas for SSN

Jack, October 30, 2009 - 3:53 pm UTC

A reader in the comment 3 prior said "For a lot of the algorithms, they are not repeatable in the sense, if I run it once on 012345678 it will give 555667777 but another time in another source it will give 999883333."

What algorithms?  For "a lot of the algorithms", they ARE repeatable (aka deterministic).  Here is one example, ssn_masked = (ssn + 1).

What is the point of masking the SSN?

If it is to render the SSN useless except as a join key (indicator of whether or not "the people are different"), then I would suggest a one-way key-hashed Message Authentication Code (MAC).  Oracle provides dbms_crypto.mac.  Here is an example:


SQL> col ssn for a40
SQL>
SQL> select id,
  2         dbms_crypto.mac(
  3                        utl_raw.cast_to_raw(ssn),
  4                        2, -- hmac_sh1
  5                        utl_raw.cast_to_raw('a 32 byte secret key that only you know')
  6                        ) ssn
  7  from (
  8       select 1 id, '111223333' ssn from dual
  9       union all
 10       select 2 id, '444556666' ssn from dual
 11       union all
 12       select 3 id, '444556666' ssn from dual
 13       union all
 14       select 4 id, '111223333' ssn from dual
 15       union all
 16       select 5 id, '777889999' ssn from dual
 17       union all
 18       select 6 id, '111223333' ssn from dual
 19       )
 20  ;

              ID SSN
---------------- ----------------------------------------
               1 79022167C1ABEB8E56504517F38AC9B22AAAEA65
               2 14A1B87278AE44FC5087245E70090751DFEC8F53
               3 14A1B87278AE44FC5087245E70090751DFEC8F53
               4 79022167C1ABEB8E56504517F38AC9B22AAAEA65
               5 716B2D9C3420E6AC9D9E3B90B712C9356F0AD29B
               6 79022167C1ABEB8E56504517F38AC9B22AAAEA65

6 rows selected.


Now someone may use SSN to know if "the people are different," but not for anything else (e.g. identity theft).  Note, this hash is not reversible even by you (it is a one-way hash, not encryption).

Of course this changes the length of SSN from 9 to 20 bytes (raw).  It is not a simple "mask" as discussed earlier in this thread.  A simple mask is not going to prevent improper use of SSN.

You might also consider:

b) Maintain your own mapping to a surrogate value.  This will require more work to load a row to your data warehouse because you have to look up the surrogate value (or generate a new one if you haven't seen the SSN before).  This approach does allow you to have a 9 byte value for the surrogate.

c) Don't show SSN (better).

d) Don't put SSN in your data warehouse (best).


- Jack

Tom Kyte
November 09, 2009 - 11:36 am UTC

problem is, the applications are expecting a SSN. You cannot mask data in general by changing the type, that will break the application. The goal of data masking usually is to permit QA and Development staff to play with the application without exposing to them 'sensitive' data.

I agree with (d) and (c) - for their warehouse, they should not be exposing this information to anyone that is not allowed to see it - period. It either should not be there OR should be hidden (via dbms_rls) from the view of those that are not allowed to see it.

ďo not mask data in BI/DWH if possible

Jakub Illner, November 05, 2009 - 11:21 am UTC

My experience is that it is extremely difficult and costly to mask data in BI/DWH applications and that it is almost always better to use unmasked data with correctly enforced access privileges and auditing than to try to mask the data.

Why: in BI/DWH you take data from source systems, apply some transformations, possibly use some staging tables and then you load data to the target table. After that you might populate aggregated tables or OLAP cubes. Then you publish data to the analytical environment.

So the data you need to test are not in one place but in several layers. It is not possible to mask just one layer but you have to mask all of them. But the data on the input (source data) might be in different format than the target data, so how could you mask them? And you cannot mask just the input data since the input is matched with the content stored in the system. And for some types of data extraction (e.g. Streams) you cannot mask data at all.

Additionaly a transformation or check which works on non masked data will not work on masked data. For example we have a rule that personal number (which is unique per citizen) must be divisible by 11. Or there are rules how IBAN numbers are checksumed. Or the prefix of a phone number is used to identify country and operator etc. etc. All these rules will be broken if you mask the data, unless you use it in a very sophisticated way which preserves the rules.

And then we have data cleansing activities, which are quite difficult by itself. We try to standardize names, compare them with lists of known names and produce clean data. And do the same with addresses. And maybe car license plates etc. This cannot be done with masked data. And suppose you have a rule that beginning balance + sum of transactions = ending balance, how could you mask account balances and still be able to apply this check?

And, of course, you have to consider how to actually create a test/dev environment. Since the environment should be as close to production as possible (for good testing), most often the test environment is created as a clone of the production database, maybe with some subsetting applied (e.g. historical partitions are dropped).

So my recommendation is not to mask the data but spend the effort on having good access rights and auditing in place. And if you really need to hide some sensitive data like credit card numbers, then do it very selectively, applying column specific data masking rules. And be aware of the impacts.

And a final note - it is much better if the data model normalizes the sensitive information into one place (e.g. customer dimension, card dimension). If you use SSN or card numbers as keys which are all over the place, then to mask this information will be a nightmare. It is better to use surrogate keys.

--
Jakub
Tom Kyte
November 11, 2009 - 12:53 pm UTC

My experience is that it is extremely difficult and costly to mask data in
BI/DWH applications and that it is almost always better to use unmasked data
with correctly enforced access privileges and auditing than to try to mask the
data.



agreed 100%

Tony Cannizzo, December 24, 2009 - 2:07 pm UTC

We have implemented a Dynamic Data Masking solution at the SQL*Net Protocol Layer. In this way we can intercept all inbound SQL while it is still outside of the database.

We have a Rules Engine that acts as an Oracle listener. The Rules allow us to implement security policies at the enterprise level.

A Rule can match SQL based on a number of options, not the least of which is SQL text, but also the UserID, Program, Time of Day, IP Address of the origination.

If the criteria of the rule is matched, we apply an appropriate action. Action include rewrite, block, redirect to mention a few.

One of the simplest illustrations of this is an application sends in SELECT NAME, we rewrite the statement to SELECT SUBSTR(NAME,1,2)||'***' which will take the name 'Tiger' and display it as "Ti***". There are a variety of masking algorithms, many of which were referenced in previous posts based on the specific masking requirements.

This has proven to work with any application, including in DWH environments that use a variety of tools, BO, OBIEE, Cognos, etc.

It also works with Dev Tools such as Toad, DBArtisan, PL/SQL, SQL*Plus and others.

Also we have implemented this in ERP/CRM applications where we do not have access to the source code.

The beauty of this solution is that the underlying data is not masked, but it is returned masked at the presentation layer. What this allow us to do is to use this in PRODUCTION as well.

We have different groups of Production DBA's. Some are local employees and others are cross-border contractors. Both groups use Toad and have SYSDBA. The local employees are permitted to see the real data, but rule identifies requests from the cross-border contractors and they do not get to see the real data when they are browsing. But if we do need to give access, then we can temporarily disable the rule for an hour and then turn the rule back on.
Tom Kyte
December 31, 2009 - 4:44 pm UTC

the beauty of this solution, since you've told me about it, is that it would take about 15 seconds to get around.

and you cannot upgrade easily.

and you are entirely unsupported.

and all of this already exists in the existing product :(


... Some are local employees and
others are cross-border contractors. Both groups use Toad and have SYSDBA...

and you are worried about security? ouch.

Do you understand how easy it would be to get around the controls you believe you have? Have you ever invited someone in and said "here is what we've done, see if you can see something we don't think you should"


I'll just send in a block of code like this:


declare
   l_query long;
   l_cursor sys_refcursor;
   function foo return varchar2 is begin return ' from t1';
begin
   l_query := 'name ';
   open l_cursor for 'select ' || l_query || foo;
   loop
       fetch 
       dbms_output.put_line( .... );
   end loop;
   close l_cursor;
end;
/


or some variant of the above, I'll find a way to "trick" you.

Fine grained access control does this - and you cannot get around it - for it is not a man in the middle (which can always be gotten around), but the "man itself", the database.



Data Masking using real data

Barry Chase, January 01, 2010 - 9:29 am UTC

I took a slightly different approach at my shop. I should say that I would be happy to share the code for those that are interested. Anyways, on to the meat and gravy...

The process can run in two modes, either for one individual or mass. Mass is passed as an array of employee #'s. Key data like employee# and other join keys stay the same. However, I scan the system for all of the last names and store into an index array. Same for all the male first names and female first names. Then for addresses line1 and line2. Then get date of birth, and salary information for non-exempt and exempt. I get phone number as well. Finally I generate SSN by getting the full set of existing SSNs, and starting with 900 I swap out the first 3 numbers and increment up. Why 900 ? Because 900 is not a valid SSN and is reserved by the Government agency, and thus makes it safe that I won't duplicate anything.

When collecting that information, I have added a random key generator on an order by so that the lists are always collected in random order.

I then begin the process of updating information. What is updated is configurable by input parameters in case you only want to do some or for that matter expand your set. I make sure that when creating the new name that I recognized whether the gender and that the new name is not the same as an existing name (forces it to pick the next generated name). And so on for the other data points.

Its not perfect, but it does work and the data is real. The demographic split of data obviously has been skewed, but one could easily alter the process to refer to a cross reference table set of standard conversions and when the standard does not exist, default to the generated value.


As a follow up to the data masking method we use

Barry Chase, January 01, 2010 - 9:34 am UTC

I should add that since the lists are random in nature and order, that reversing engineering the values would be near if not impossible since its not an encryption algorithm.

Also... on data points like Date of Birth, we had to include some extra intelligence to make sure we didn't have someone born after they were hired or someone would be underage. And for when we included a manager in the scramble request, I scanned for their direct reports because some of their key data was on their direct reports record too. We had to scramble that data as well and it had to be in sync with the actual managers data.

Tony Cannizzo, January 24, 2010 - 9:19 am UTC

Tom,
We would never pretend to have absolute security, not sure anyone could.

We do have functionality to address, at least in part, some of the concerns you expressed when you wrote:
Followup December 31, 2009 - 4pm Central time zone:
the beauty of this solution, since you've told me about it, is that it would take about 15 seconds to get around.


There are two approaches to prevent DBA's from bypassing our application:

1. Install it on the database server, configure our listener port to REPLACE and use the current Oracle listener port address, and assign the Oracle listener to use a hidden port to the Oracle listener port. In addition, the Oracle listener can be configured to receive connections
ONLY from our application's listener port, blocking all other connections that try to bypass our listener.

2. We have created a logon trigger which controls users not connected through our listener. When applications/users connect through our application, we automatically add a prefix to the terminal name of all sessions connected through us. This prefix is visible in the terminal column in v$session view (e.g., administrators can identify all sessions NOT connected through our app by running ‘select
terminal from v$session where terminal not like ‘AB-%’).

This way we can determine whether someone is trying to circumvent our solution. Oh, and they would be fired on the spot.

Again, there is no ABSOLUTE security, never was, never will be, but we think we have closed the exposure by a 99.9% effectiveness rate.
Tom Kyte
January 26, 2010 - 1:43 am UTC

... We would never pretend to have absolute security, not sure anyone could. ...

but, if you have obvious holes, why would you not even be concerned about them? If someone told me "I found a way around your stuff", my first concern would be plugging the hole. I would not say "well, I know it is not perfect - but it stopped 5 people last week".


1) hidden, how do you hide a port.

and further, how does that stop my attack?????

and further further, how does that even begin to let the DBA do their job???? They cannot use enterprise manager, they cannot use any tool.


2) ummm, who monitors the monitor? If you are trying to limit the people that monitor?

and if I connect via "your listener", what stops me from using whatever I want to submit sql again?

I think you might be mistaken in your 99.9% solution.




and again, since there is actually a builtin way to already do what your myriad of code attempts to do - why wouldn't you use it?

Tony Cannizzo, January 24, 2010 - 7:05 pm UTC

Addendum to previous post:

We also have deployed a function, written in Java, that will intercept function calls for any Stored Procedures that generate result sets from embedded SELECT statements. Short version: Any sensitive data will be masked on return to the application.
Tom Kyte
January 26, 2010 - 1:53 am UTC

for x in (select * from t)
loop
dbms_output.put_line(....);
end loop;

select sensitive_data into l_array from t;


there are so many ways to return "results" from a stored procedure. Again, since the database DOES data masking, why would you write what sounds like thousands of lines of complex fragile code to reinvent the wheel? A wheel that I'm pretty sure would be gotten around pretty easily in real life?

Alexander, January 26, 2010 - 9:40 am UTC

"Again, since the database DOES data masking, why would you write what sounds like thousands of lines of complex fragile code to reinvent the wheel?"

How about because the data masking pack costs $11,500 per processor, how's that for a reason ;)

Oracle loves to promote these features, and they totally fail to mention they cost extra, and are usually expensive.
Tom Kyte
January 29, 2010 - 2:38 pm UTC

data masking via find grained access control comes with the database. I'm not talking about enterprise manager and the data masking pack, I'm talking about putting the rules that give the ability to filter data down to the ROW and COLUMN level - that ability that COMES WITH THE DATABASE - you already own it with Enterprise edition.

Now what?


and do you believe this other piece of software was free? And the time spent doing all of the magic (that can and will be gotten around anytime someone wants) was free? I doubt it.

Expensive, sure. Supported, sure.

Paul, January 26, 2010 - 10:23 am UTC

How valuable is data security to your business?
I would think 11,500/cpu is pretty cheap when you consider that the solution is

1) supported by the vendor
2) upgradeable
3) tool agnostic

Would you build your own airbag for your car?
Doubt it.


Tom Kyte
January 29, 2010 - 2:39 pm UTC

and in this case, comes with - it is not an option that I'm talking about.

Alexander, January 26, 2010 - 10:36 am UTC

I'm not saying don't use it; All I'm saying is, you don't get it for nothing, it costs extra.
Tom Kyte
January 29, 2010 - 2:40 pm UTC

comes with - see above, comes with.

To: Alexander

Marcus, January 26, 2010 - 3:58 pm UTC

Versus writing and maintaining your own "algorithm" as Tom pointed out? 11.5k is pennies. You point out that it costs extra, but it costs a whole lot less than a custom solution.
Tom Kyte
January 29, 2010 - 2:56 pm UTC

and it comes with...

Alexander, January 29, 2010 - 2:48 pm UTC

Ok, gotcha. I just know that as FGA, or VPD.

mast input parameter passed to function for exec

A reader, March 26, 2010 - 3:23 pm UTC

IS there any way we can mask input parameters passed to functions during calls to execute the function - so that the password encrypting phrase - does not appear(or appears like '*****') even in data dictionary views like v$sql ?

thx
Anto

Tom Kyte
March 26, 2010 - 3:51 pm UTC

is there anyway I can get people to ask their questions in one and only one place? see the other place you asked this identical question.

A reader, March 26, 2010 - 4:12 pm UTC

sorry about that. As the response was not coming - I thought I will post the same question again in some thread which is active of late

thanks
Anto
Tom Kyte
March 26, 2010 - 5:24 pm UTC

the response was not coming because.... I was on a plane all day yesterday. Give me time sometimes...

A reader, March 29, 2010 - 9:32 am UTC

sorry about that - thanks

Anto

mfz, September 24, 2010 - 11:44 am UTC

My Requirement : To expose only the last 4 digits of SSN for any application user .

The application team wants to employ masking technique in the "Java" app , for which I am not big fan of .

I would like to use "Data Masking feature" in the database . I see the data masking feature ( under Schema / Data Masking ) in EM ?

Is there any "HOW TO" document to accompolish this ? Would it satisfy my requirement ? Is this an additional cost ( license) to use this feature ? I have not used "Data Masking " up till now ...

Thanks
Tom Kyte
September 27, 2010 - 10:33 am UTC

data masking would hide the column in its entirety.

This would best be done via a view or a virtual column (11g)

You would not expose the SSN column to anyone, just the columns in your view.

"Data Masking"

A reader, October 11, 2010 - 4:29 pm UTC

Here's the thing. I had this data masking argument with my superiors a few months ago and they finally realized I was right. We had credit card information in the DB and they asked me to mask it. I said no, that won't work...We need to AUDIT access to the information. They were trying to be as I call it, crunchy inside but chewy outside! What's the use configuring all sorts of security options on your DB when you can't even control who has access to THAT said DB and also control what they can see. No sense locking server racks if the server room is a free for all! VPD and Data Masking is great for consultants and developers but for production systems, your work as a DBA would be a nightmare..SPARE ME!! We currently use VPD and FGA for the Credit Card information and have better access controls to the DB....EVERYONE'S HAPPY!

Tool option

Eric Fales, October 18, 2010 - 6:15 pm UTC

We use a tool called pii masker, works very well, but could use more rulesets and better documentation.

FGA or data masking?

A reader, January 13, 2011 - 2:20 pm UTC

Hi Tom,
In your earlier discussion with Tony on their "Dynamic Data Masking solution at the SQL*Net Protocol Layer", you said that FGA does it and it is built-in. However, I thought the goal of the Tony's approach was to mask the data on its way to the user, while FGA is just a data protection feature. With FGA, if I'm allowed to see some rows and columns I will see their original values, unless they were masked by another tool.
I'm not clear why you equated FGA functionality to the mentioned "Dynamic Data Masking" layer.
Tom Kyte
January 13, 2011 - 2:47 pm UTC

first - fine grained auditing does not do data masking.

fine grained ACCESS CONTROL does data masking.


With fine grained access control you limit:

what rows an individual is allowed to see, to update, to delete, to insert.

And of those rows they are allowed to see - what columns they may see - if they are not allowed to see a set of columns under a set of conditions - they will retrieve NULL values for those columns. We *mask* the columns.


Where did I equate fine grained AUDITING with data masking ??? I cannot find it - please point to it.

EM grid control has data masking pack

A reader, January 14, 2011 - 9:14 am UTC

With 10.2.0.4, Enterprise Manager Grid Control has data masking pack. It is separately licensed and I haven't had chance to use it yet but it is there. It has been around for at least two years.
Tom Kyte
January 14, 2011 - 12:20 pm UTC

correct, it came out in the 10g timeframe (wasn't applicable to the original question) - we mentioned it briefly above last year as well...

Masking with trigger before insert

A reader, January 14, 2011 - 9:21 am UTC

In a 11R2 Datawarehouse environment (with OWB), What do you think about masking data
using a trigger before insert for each row?
I mean: create a trigger on each table with sensitive fields and the trigger crypts
these fields applying a function based on dbms_obfuscation_toolkit package.
So at the end of ETL phase I'll have the data masked, and it's not necessary to modify
any ETL process and the sensitive data has never been written in the datawarehouse.
The triggers won't exist in production environment.
Do you think it would be an acceptable approach?

Thanks
Tom Kyte
January 14, 2011 - 12:24 pm UTC

I think using a trigger in a data warehouse load would be the kiss of death performance wise - for two reasons

a) it means you must be using conventional path loads, probably slow by slow loads and that is very very slow to begin with

b) the trigger will introduce slow by slow processing before - it would make it so you CANNOT use direct path loads as they do not fire triggers - and having a trigger that doesn't even do anything makes you slower - having it actually perform work would do so even more.



Encryption is NOT ABOUT ACCESS CONTROL - it never has been, never will be. ACCESS CONTROL is about that and you have fine grained access control that hides the data people are NOT supposed to see.

encryption is about protecting the data in the event of theft. This encryption wouldn't accomplish even that. Think about key management and how you would do it. You would never be able to do it securely.


This would not be an approach to even consider.

Data Masking Pack for 11.2

Oliver, January 18, 2011 - 5:15 am UTC

if you use 11.2 you can also use the data masking pack with the standalone dbconsole, no need for Grid Control anymore!

With the Data Masking Pack, you can use a rman backup of your production dwh to create a development db, where the sensitive data is masked. You can than test your ETL processes in this environment.
This way, you can test your backup and create a save development environment.

Better Tool

Hector, September 08, 2011 - 3:03 pm UTC

A better tool for data masking is the one provided by Solix Technologies. It works better in a heterogeneous environment and comes with knowledgabases for a variety of package applications like Oracle EBS, JD Edwards, PeopleSoft and Siebel. It also works with custom applications so you can control all your masking needs from a centralized tool. Additionally, implementation was quick and it is not priced per CPU so the total cost tends to be much lower.

Data masking

Reviewer, November 16, 2011 - 6:19 am UTC

Hi Tom,
Can data masking potentially impact the performance of the queries? For example: If we copy data from prod (along with stats) in test env., and mask some of the sensitive columns; without re-gathering the stats, would optimizer may potentially draw bad execution plan?


Tom Kyte
November 16, 2011 - 9:51 am UTC

It in theory should not materially affect the plans nor the work performed. If rows got migrated during the masking (possible), then the performance of some queries might degrade a little bit chasing down the migrated blocks

Data Masting

Reviewer, November 17, 2011 - 2:06 am UTC

Hi Tom,
Thanks for prompt response. Just wanted to know that if we have a credit card table with masked credit card number and is joined to another table on that column (masked), stats are not refreshed, do you think that optimizer may behave differently?
Regards,
Tom Kyte
November 17, 2011 - 6:59 pm UTC

give me the real world case where you would actually join by credit card numbers? If you are storing that card number in more than one table, you've done it wrong already.

but in any case - if you used the masking pack AND you identified the relations so we know this is a primary key in one table and a foreign key in another, the relations would be preserved.

Stats

Reviewer, November 18, 2011 - 5:56 pm UTC

Hi Tom,
Thanks - What about stats? are they skewed?
If yes, do you think, it is better to gather stats again? or it will not make much difference on performance of queries.
Regards,
Tom Kyte
November 21, 2011 - 1:31 pm UTC

the high and low values of some fields may change - the histograms, if any, would definitely change. There may be a need to gather statistics, yes.

Data Masking

Reviewer, November 22, 2011 - 6:39 am UTC

Hi Tom,
Thanks a lot!

Masking sub string of a column data

Deepak, May 23, 2013 - 11:20 am UTC

Hi Tom,

Need your suggestion in addressing the following issue.

Our tables have multiple VARCHAR2 columns and we want to mask only certain words irrespective of the tables and the columns. The particular words may appear anywhere in a VARCHAR2 column data.

e.g.,

Assuming that we want to mask the word "Oracle" by using some text "ABCD".

Case1:
======

Column1
-------
Oracle


Should be replaced as

Column1
-------
ABCD


Case2:
======

Column1
-------
I like Oracle Database


Should be replaced as

Column1
-------
I like ABCD Database



We want to perform this for all the VARCHAR2 columns.

Please provide your expert comments.

Sorry, if I forgot to provide some important information about the problem.




Tom Kyte
May 23, 2013 - 3:05 pm UTC

use regexp_replace, http://docs.oracle.com/cd/E18283_01/olap.112/e17122/dml_functions_2068.htm it seems most appropriate for this.

Data Masking

Nivlek, October 16, 2013 - 9:23 am UTC

i see the data masking for GC 11i and 12c almost the same, what are the significate changes or advantage in 12c compare to 11i?
Tom Kyte
November 01, 2013 - 6:59 pm UTC

what is "GC"?

11i might have been an application release, but not a database release.


search for masking in this
http://docs.oracle.com/cd/E24628_01/doc.121/e25353/whats_new.htm


Practical Implementation of Data Masking

ASR, November 12, 2013 - 8:02 pm UTC

We are currently using Oracle Data Masking but are having issues.

For example, we use it to mask a table that contains company information. Two relevant fields that get masked are: company abbreviation and company name.

We can easily use Oracle Data Masking to generate a 3 letter string to mask the abbreviation. However, we would like to mask the company name by using the masked abbreviation value concatenated with " Incorporated."

So as an example, we would create a mask for abbreviation as "ABC" and then the masking format for company name would be "ABC Incorporated." Is it possible to do this? The only way I could think of is to add an post-job update statement at the end of the data masking job to do this, however, this doesn't seem like an elegant solution and might cause trouble down the road.

Practical Data Masking

Brendan Martin, February 02, 2014 - 1:59 am UTC

Hi Tom

Excellent Post. Its been going for 8 years and still getting referenced and updated.

Just in response to the requirement raised by ASR from Washington, this type of masking quality is now becoming very common place. Making the data appear more realistic can be a challenge but there are a variety of ways to overcome theses issues. The key way do treat these requirements is to run the concept of a ruleset for the masking of your dataset. Run a rule to change name A, run another rule to change name B, run a third rule after both those rules have completed to populate a field and make it a combination of A & B or truncate a set value from a masked dataset field and append to a newly created masked value.

A really good white paper can be found at http://www.datakitchen.com.au/features-mainmenu-47/data-masking-what-you-need-to-know

The White Paper is product agnostic and focuses on the application of masking methods. Hope this info helps a few of the visitors to this site.

Once again a great post.


DATA MSKING PROBLEM

Phool Singh, March 14, 2015 - 11:52 am UTC

We want to do masking in our environment using oracle data masking pack 12C,



The Application data model (ADM) was successfully created and Elapsed for 9 Hours, 32 Minutes and 14 Seconds

Now we needed to go to next step which is Generating masking script now the process has taken more than 15Hours and has not yet finished,

We can review our road map below



1. Creating data application model (ADM) ->OK Elapsed for 9.5 Hours

2. Identifying Sensitive Data -> OK ( Single column identified for testing)

3. Creating Masking Definition and format-> OK

4. Generate masking script job -> Taking more time and does not give output ?



IBM optim solution

A reader, June 05, 2016 - 8:50 am UTC

Hi team
What inconvenience vs oracle using this software? Mean about performance effects. Stats. Cardinalities. ...
Did you ever use infosphere Optim?

IBM optim solution

A reader, June 05, 2016 - 8:50 am UTC

Hi team
What inconvenience vs oracle using this software? Mean about performance effects. Stats. Cardinalities. ...
Did you ever use infosphere Optim?

Very old answer

Hoyt L Kesterson II, May 16, 2018 - 10:26 pm UTC

This old answer came up for an Internet search request. Perhaps you should eliminate this response and use this (which was also a response to the same question)

http://www.oracle.com/us/products/database/data-masking-best-practices-161213.pdf
Connor McDonald
May 18, 2018 - 2:29 am UTC

THanks - we've updated it

Data masking XML in CLOB

BC, June 17, 2020 - 6:50 pm UTC

Is it possible to data mask xml elements that are stored in a clob column using Oracle's data masking utility ? If so can you provide an example ?

Thanks

BC
MT, MI
Connor McDonald
June 18, 2020 - 3:31 am UTC

CLOBs can be masked, eg

https://docs.oracle.com/database/121/DMKSB/data_masking.htm#DMKSB-GUID-77E7E40E-75A9-4CE8-8E88-F375927DD575

but I don't think we have a specific XML handler, so you'd probably need to define that mapping yourself with a custom function.

More to Explore

Security

All of the vital components for a secure database are covered in the Security guide.