Database, SQL and PL/SQL

Planning for Trouble

You always write your best code, but you should be prepared for imperfection.

By Steven Feuerstein

November/December 2014

The folks at extremememe.info have been feeling lots better about their application lately. Their daily job now runs in well under 24 hours (as documented at bit.ly/ZtmApg). They’ve learned to make the most of SQL, but also that there are times when persistent data and SQL processing should be replaced by in-session memory management enabled by PL/SQL code (as reported at bit.ly/YU4Wto). In my latest visit to the company, I found that their focus had shifted to best practices, specifically with the objective of producing more-consistent, high-quality code across their entire team.

Answer to Last Issue’s Challenge The PL/SQL Challenge quiz in last issue’s “Persistence Doesn’t Always Pay” article presented a block of code that included a #REPLACE# tag and offered choices to provide a replacement for the #REPLACE# tag so that the resulting block displays “1-Papua New Guinea” after execution.

All three choices are correct. The only differences between the choices have to do with the readability of the code. The first choice uses SUBTYPEs to give names to the ways that integers and strings are being used. By doing this, I can avoid writing comments and let the code fully tell its own story. The second choice has neither SUBTYPEs nor comments and is rather difficult to sort out. The third choice uses comments to explain the code, but it is far better to let the code explain itself.

They asked me to come up with a standards document and go through a detailed training on best practices. I responded with the voice of experience.

I’ve worked with many companies over the years on best practices for PL/SQL. Along the way, I’ve come to accept certain harsh realities when it comes to improving code quality, including the following:

It’s very hard to change routines that developers have established over many years.
Documents that spell out naming conventions and detailed recommendations fit very nicely inside desk drawers, rarely see the light of day, and rarely have much impact on development practices.
It’s best to aim for small changes that can be internalized and sustained.

So I suggested that the team at extremememe.info identify a specific and significant pain point in the development process. They can then come up with a plan of action, figure out how best to apply it, and see if it can be put into practice effectively.

“Ooh, ooh, I’ve got one!” piped up Miguel, one of the newest members of the team. “A user reported a bug on her screen last week, and I had the hardest time sorting out what might be causing the problem. I couldn’t reproduce it in our sandbox, and, of course, I couldn’t see what the user was seeing. There was nothing in the error log, and the error message shown to the user was simply ‘Application Error. Call Support.’”

Several other developers were shaking their heads sadly. It seems that they knew exactly what Miguel was talking about.

“All right,” I replied. “That’s a fine place to start. Miguel, did you ever figure it out?”

“Oh yeah, I did. I tracked down the procedure, spent a few hours with the Oracle SQL Developer debugger, and finally nailed it. But I felt really stupid about the whole thing. It shouldn’t have taken that long. It shouldn’t have been that hard. Maybe it’s just the way it has to be when you are relying on lots of back-end PL/SQL code, buried deep in the database. Would I be better off moving the log into the middle tier—maybe implement it in JavaScript?”

“No way,” I replied. “I really doubt that your problem has to do with defining business logic ‘deep’ inside the database. Instead, it has more to do with how you anticipate problems that might occur in that logic—and bullet-proof your code to make it easier to diagnose and resolve problems that arise. Show me your procedure.”

Miguel showed me the em_process_data procedure:

CREATE TABLE em_memes
(
  meme_id        INTEGER PRIMARY KEY,
  meme_name      VARCHAR2 (1000) UNIQUE,
  discovered_on  DATE,
  meme_status    VARCHAR2 (100)
)
/
PROCEDURE em_process_data (
  for_this_meme_id_in   IN INTEGER)
IS
  l_meme_info   em_memes%ROWTYPE
     := em_pkg.g_memes (
            for_this_meme_id_in);
BEGIN
 /* Do lots of stuff with l_meme_info. */
  . . . .
END;

With the quickest glance at the start and end of this procedure, I was able to identify a number of potential problems, including

Assignment of a value in the declaration section. It’s very compact, but what if the assignment results in an error?
No exception section. Will nothing ever go wrong in this procedure?
Reference to a global variable inside the procedure. What is em_pkg.g_memes?
No up-front checks on the validity of an argument value. Will a sensible value for the meme ID always be passed in?
No instrumentation or tracing of the program’s execution. What if I can’t reproduce what is happening in production?

Assignment of a Value in the Declaration Section

Of course, a quick glance can lead to quick and careless advice. So it was time to dive in and drive the lessons home. First up: the assignment of a value within the declaration section, as a default value.

It’s always important to keep in mind one key feature of PL/SQL exception handling: The exception section of a block can only possibly handle exceptions raised in the executable section of that block.

If an exception is raised when a value is assigned to a variable or constant in the declaration section of a block, that exception will always cause that block to shut down with an unhandled exception, even if you have included a WHEN OTHERS handler.

In the em_process_data procedure, Miguel used the value of the for_this_meme_id parameter as an index value in the em_pkg global collection. If there is no element defined at that index value, Oracle Database raises the NO_DATA_FOUND exception.

There is no way to trap this NO_DATA_FOUND exception inside em_process_data, as the procedure is currently written.

Bottom line, I told Miguel, is that if you do want to trap this exception inside the em_process_data procedure, you should move the assignment from the declaration section to the executable section, optimally into its own nested initialization subprogram, like this:

PROCEDURE em_process_data (
   for_this_meme_id_in   IN INTEGER)
IS
   l_meme_info   em_memes%ROWTYPE;
   PROCEDURE initialize
   IS
   BEGIN
      l_meme_info :=
         em_pkg.g_memes (
            for_this_meme_id_in);
   END;
BEGIN
   initialize;
   /* Do lots more stuff */
   . . . .
END;

Miguel’s response was quick: “OK, I get that, and I like using that subprocedure to hide all the initialization logic. One thing, though: The guideline we follow in our application is that we do not trap the exception locally. That would mean having to write WHEN clauses all over the place. Very unproductive. Instead, we just use WHEN OTHERS in the very top-level PL/SQL block. That way, no matter what goes wrong, and no matter where it goes wrong, we’ve got it covered.”

I somehow managed to restrain myself from pointing out that in fact they didn’t have it covered at all, since it took Miguel way too long to identify and fix last week’s bug.

“That’s a strategy I hear occasionally from developers,” I replied, “but I must admit it has never made a whole lot of sense to me. So let’s take a closer look at that issue.”

No Exception Sections in Subprograms

I shared with Miguel that most developers I had worked with readily agree that they really don’t want exceptions to go unhandled and sent straight to users. When that happens, the users see a most unfriendly and usually unintelligible error message. This reduces their confidence in the application and, even worse, in their developers.

Miguel nodded his head vigorously. “At my last job, users got so sick of seeing those errors that they stopped even bothering to report them and stopped coming to meetings with our team. It was not a good feeling.”

And that’s not all. Some organizations also need to be very careful, from a security standpoint, about the error information provided to users. If people are trying to hack into a system, they are likely to hit lots of errors along the way. You don’t want an error message to inadvertently help the hacking process along.

This caution is evident in Oracle Database’s error messages. For example, if I try to select from a table that does not exist, I see this message:

ORA-00942: table or view does not exist

Notice that it does not say this:

ORA-00942: "CRITICAL_DATA" does not exist

So sorting out what kind of message to show users when an error occurs requires some planning and attention to the requirements of the application itself. Just as important, however, is deciding what information you need to store in your error log to make it as easy and quick as possible to diagnose and repair the problem.

PL/SQL provides a number of functions that allow you to retrieve “generic” information about the state of your application when the exception was raised:

DBMS_UTILITY.FORMAT_CALL_STACK. This function returns the execution call stack and answers the question, “How did I get there?”
DBMS_UTILITY.FORMAT_ERROR_STACK. This function returns the error stack, which in most circumstances really means the error message.
DBMS_UTILITY.FORMAT_ERROR_BACKTRACE. This function returns a formatted string that allows you to trace the error back to the line of code on which it was raised.
UTL_CALL_STACK. This new package, added in Oracle Database 12c, provides all the information listed in the previous three functions, but with a more sophisticated API. (For more information, see “Sophisticated Call Stack Analysis”.)
SQLCODE. This function returns the code for the error that has been raised.

This is all great information to have, and whenever you handle an error you should call these functions and store the relevant information in your log.

These functions, however, mostly make it easier to figure out where the error occurred, not why. And as many of you undoubtedly know, the why of an error is usually much more difficult to figure out than the where.

To figure out why an error occurred, you almost always need to know the state of your application at that moment, and that state is reflected not only in the contents of tables, but in the values held by variables and constants.

This obvious fact bears directly on the question, where do you put your exception handlers? If you choose to avoid exception sections on “inner” subprograms, and only include a WHEN OTHERS or a WHEN clause at the top-level block, two facts become clear:

Avoiding Repetitive Error Handling Code

While not the main focus of this article, I should also point out how to log errors. I offered a list of functions you could call to get generic information about your error. You should not, however, call those functions in every exception handler.

Instead, you should use a generic error logging procedure that calls all those error logging functions on your behalf. When you write your exception section, you should only have to call this generic procedure and pass it the values of local variables. And always remember to re-raise the exception so that it is not ignored by the block that called your subprogram.

Here’s a way to write that exception section:

EXCEPTION
   WHEN OTHERS
   THEN
      em_error_log_pkg.log_error (
            'Error for meme '
         || l_meme_info.meme_id);
      RAISE;
END;

There are several benefits to this approach:

As Oracle Database adds more built-in error handling functionality, all exception handlers can be instantly upgraded to take advantage of these features.
Developers improve their productivity, because they can focus more of their time implementing user requirements.
Developers are much more likely to add an exception handler to their program units, because the effort to do so is greatly reduced.

You can check (and record) the contents of tables, because that information is available at the session level and is not specific to a PL/SQL block.
You will not be able to record the values of variables and constants defined in the original block in which the error was raised.

Inaccessibility of values of local variables is the price you pay when you choose to rely solely on a single exception section at the “top.” It is certainly easier and improves developer productivity when writing new programs, but you pay the price of a loss of information when an error occurs, which can lead to dramatically reduced productivity when it comes to debugging that code.

When writing new applications, you are always under lots of pressure to deliver that application on time. So the idea of a single exception section is terribly attractive. Yet if you consider the overall lifespan of an application, developers spend much more time in maintenance and support of an application than with new software development. So a guideline that makes it easier to meet your initial production deadline and makes it harder to keep that application up and running—and responsive to user enhancement requests—seems like a bad idea.

For that reason, I offer the following recommendation for where and how to add exception sections to your subprograms:

When you’ve “finished” your subprogram (are you ever really finished?), ask yourself, “When something goes wrong, will I want to see the values of any locally declared data?” If the answer is yes, add an exception section and log those values. If the answer is no, then at least for the time being, you can let the exception propagate out of the subprogram unhandled.

Reference to a Global Variable Inside the Procedure

Inside extrememe.info’s em_process_data procedure, a reference to the meme ID is passed as an argument in the following expression:

em_pkg.g_memes (for_this_meme_id_in);

Looking at the names, I can deduce that em_pkg is a package and g_memes is a global collection declared in the package specification. A deduction, however, is not a certainty, so I need to look at that specification to make sure.

Miguel very obligingly opens up a window for the em_pkg package, and I find the following:

PACKAGE em_pkg
IS
   TYPE meme_ids_t
      IS TABLE OF em_memes%ROWTYPE
      INDEX BY PLS_INTEGER;
   g_memes   meme_ids_t;
END;

So the g_memes collection is an associative array of meme records, indexed by integer.

Looking at this code does not, unfortunately, answer all my questions. Instead, it raises other questions, including the following:

Where is this array filled, and what is it filled with? It is not clear from this definition that the index value is a meme ID, although em_process_data assumes that to be the case.
Is it possible that the ID value passed to em_process_data does not identify an element in the collection?
What happens when a NULL value is passed to em_process_data?

These are the kinds of questions that usually arise when you reference global data structures (declared in a package specification) directly inside your subprograms. These kinds of hidden references (not visible in the subprogram header) cause entanglements and dependencies that result in easily broken code.

A much better way to design the code is to move all global references into the parameter list. The interfaces between subprograms will then be cleaner: easy to understand, debug, and test.

Assuming that the em_process_data procedure does not reference the for_this_meme_id_in parameter after using it to get the meme info from the global collection, I could change the procedure header as follows:

procedure em_process_data (
   for_this_meme_id_in in
      em_memes%rowtype)

and it can be called as follows:

em_process_data (em_pkg.g_memes
(l_my_id));

This change does not address concerns about the index value used in the reference to g_memes, but it does remove the dependency on em_pkg.g_memes inside the em_process_data procedure.

No Up-front Checks on the Validity of Argument Values

Every program makes assumptions about the state of the application when the program is executed. Of course, programs do not write themselves, so what this really means is that the programmer made assumptions—and Miguel is no exception to this rule.

Some assumptions are fundamental, such as, “The em_memes table is defined and contains data.” But other assumptions, especially those very specific to the program being written, should be given careful consideration and attention—right at the start of the program.

Specifically, it is critical to make sure that the assumptions have been met, before the application starts executing logic that depends on those assumptions. If assumptions are not validated, the programs are much more likely to break in ways that confuse both users and developers.

Consider the original form of extrememe.info’s em_process_data procedure:

PROCEDURE em_process_data (
   for_this_meme_id_in   IN INTEGER)
IS
   l_meme_info   em_memes%ROWTYPE
      := em_pkg.g_memes (
            for_this_meme_id_in);

If the value of for_this_meme_id_in does not identify a defined index value in g_memes, Oracle Database will raise a NO_DATA_FOUND exception. If for_this_meme_id_in is NULL, then Oracle Database will raise a VALUE_ERROR exception with this error message:

ORA-06502: PL/SQL: numeric or value error:
NULL index table key value

Clearly, em_process_data assumes that a meme ID value has been passed to the procedure. Simply hoping that the meme ID will be valid can leave a developer having to sort out the source of the very common NO_DATA_FOUND and VALUE_ERROR exceptions.

A much better approach is to explicitly validate any assumptions, and raise an exception that proactively warns of the problem, as in

PROCEDURE em_process_data (
   for_this_meme_id_in   IN INTEGER)
IS
   l_meme_info   em_memes%ROWTYPE;
   PROCEDURE initialize
   IS
   BEGIN
      IF for_this_meme_id_in IS NULL
      THEN
         raise_application_error (
            -20000,
            'Meme ID is NULL');
      END IF;
      IF NOT em_pkg.g_memes.EXISTS (
                for_this_meme_id_in)
      THEN
         raise_application_error (
            -20000,
            'Meme ID not found
             in g_memes array');
      END IF;
      l_meme_info :=
         em_pkg.g_memes (
            for_this_meme_id_in);
   END;

Miguel interrupted me. “I hate to say it, Steven, but I don’t think that I am ever going to feel like I have the time to write all that validation code, especially if I have to do it over and over again. Should I just plan on copying and pasting the original assertion, and change the message each time?”

“No, no, no!” I almost shouted, immediately feeling bad, but that’s what happens when I hear the words “copy and paste” associated with code. “A much better approach is to use a generic, reusable assertion package that lets you easily assert that your assumptions are valid, and if not, procedures in the assertion package will raise an error, write information to a log, display text on the screen—really, whatever you want it to do.”

In just a few minutes, I modified Miguel’s em_process_data procedure to give him an idea of what that might look like:

PROCEDURE em_process_data (
   for_this_meme_id_in   IN INTEGER)
IS
   l_meme_info   em_memes%ROWTYPE;
   PROCEDURE initialize
   IS
   BEGIN
      em_assert.is_not_null (
         for_this_meme_id_in,
         'Meme ID is NULL');
      em_assert.is_true (
         em_pkg.g_memes.EXISTS (
            for_this_meme_id_in),
         'Meme ID not found
          in g_memes array');
      l_meme_info :=
         em_pkg.g_memes (
            for_this_meme_id_in);
   END;

Miguel looked relieved. “Ah, OK, that looks much more manageable. Now I just need to write that package. Maybe I can find some time over the weekend.”

“No, no, no, Miguel,” I said, this time in a much calmer voice. “I’ve already done the work for you. Download my assertion package from the PL/SQL Challenge website. Click Library, then Utilities, and then search on assert.”

No Instrumentation or Tracing of the Program’s Execution

“Thanks, Steven, that’s going to help a lot,” said Miguel. “But I don’t think it’s going to help with one of the most frustrating situations I have to deal with: a user reports a problem in her production version of the application that I simply cannot reproduce in our development environment.”

“You’re right, Miguel. Assertions are much more helpful during development and less so in production. And I suppose you’d really rather not tell them, ‘Sorry, we cannot reproduce the problem. Let us know if it happens again.’”

Miguel shook his head emphatically. “No way. I tried that once. Next thing I knew, my manager was calling me to say that she had gotten a call from the user’s manager. I definitely do not want to experience that again.”

“OK,” I replied. “We need to talk about instrumenting your code.”

As with the error-related functions, Oracle Database does provide lots of real-time information about the database instance and sessions to help developers resolve production issues. Yet, once again, there is often a big hole in that data, specifically when it comes to viewing the values of arguments and local variables in user sessions.

There is, unfortunately, no way to ask Oracle Database to write those values to a trace file or return them to the program with a function call. Developers must, instead, add code to their programs to save the values they know are most important to diagnosing a problem to a table or some other repository. This is known as tracing or instrumentation.

It’s not a terribly complicated aspect of programming. Generally, the program will call a procedure and pass it the values to store, along with text describing what is being saved. That procedure inserts the data into your trace table and then commits the insert. To avoid also committing the application transaction, trace procedures are defined as autonomous transactions.

Oracle Database offers the DBMS_APPLICATION_INFO package to help with tracing. There are also open source tracing utilities available for download.

“Which one should I use?” asked Miguel.

I responded, “That’s a good question, but an even better question is, ‘What features should I look for to ensure effective tracing implementation?’”

Regardless of the way a developer chooses to do tracing, I offer the following guidelines for tracing most effectively:

Do not use DBMS_OUTPUT.PUT_LINE directly in your code to do tracing. Your trace procedure could call this built-in package, but it is not in itself flexible enough to provide you with the kind of tracing you need to analyze production issues. For example, while it might be convenient during testing to send your trace output to the screen, that will almost never work in production.
Always put your calls to the tracing procedure inside an IF statement to ensure that the procedure is called only when tracing is turned on. This is critical for keeping to a minimum the overhead of running your application when tracing is disabled. Here’s an example of what this approach might look like:
```
BEGIN
   IF em_trace_pkg.trace_enabled
   THEN
      em_trace_pkg.trace_parameter (
         parameter_name
            => 'for_this_meme_id_in',
         parameter_value
            => for_this_meme_id_in);
   END IF;
```
Make it possible to turn on tracing without disrupting use of the application in production. This usually means you will have a trace configuration table, with a column whose value indicates whether or not tracing is enabled. Change the value in the table, commit, and trace data starts filling your table.
At least for the most error-prone, critical subprograms, you should trace all the parameter values right at the start of the executable section.

“Those first three make perfect sense, and I will definitely use them to review tracing options,” said Miguel. “But that fourth point has me worried. Sure, with em_process_data, I have only a single parameter, so it’s no big deal. It’s like what you just showed me in guideline number 2. But another procedure I have to debug all the time has 20 parameters. Do I really have to write an IF statement and call trace_parameter for every one of those? Honestly, Steven, I will probably never take the time to do that.”

Over my many years of giving advice to developers, I have come to accept that what Miguel is saying is bedrock reality: It doesn’t matter how good the idea is. If you don’t make it really easy and fast to do, many developers won’t do it.

“Good point, Miguel,” I replied. “You need to find a way to generate the trace code for procedures like that. And whenever you think about generating code based on database objects, you should always check to see if there is a data dictionary view that will help. In this case, we can use ALL_ARGUMENTS.”

I showed Miguel how you can query the ALL_ARGUMENTS view to obtain all the parameters of em_process_data:

SELECT *
  FROM all_arguments
 WHERE owner = USER
   AND object_name = 'EM_PROCESS_DATA'

That certainly looks straightforward enough, but ALL_ARGUMENTS can be a bit tricky, especially when you are dealing with overloaded subprograms. I don’t have space in this article to get into all the details, but that’s not necessary anyway, because I have created a procedure that does all the work for you.

Here’s an example that calls the sf_gen_trace_call procedure for em_process_data:

BEGIN
  sf_gen_trace_call (
    pkg_or_prog_in
      => 'EM_PROCESS_DATA',
    pkg_subprog_in
      => NULL,
    tracing_enabled_func_in
      => 'em_trace_pkg.trace_enabled',
    trace_func_in
      => 'em_trace_pkg.trace_parameter');
END;

Note: You can download the procedure at bit.ly/1uLkSbJ.

The Best Coders Assume the Worst

Most programmers I meet are optimists. No matter how bad the last application rollout went, they are convinced that this time they’ll get it right. Oh, and that most of the problems in the last rollout were not their fault.

Let’s face the facts, though: We will all still make mistakes, and even if we didn’t, we can certainly depend on our users to make their own. There will be production problems; there will be upset, impatient users.

That’s why experienced programmers assume that trouble will be knocking on their doors. So they build into the software, right from the start, the ability to gather lots of information when an error occurs, as well as the ability to “flip a switch” in their production application and obtain live tracing data so they can see what users and their applications are doing.

I assured Miguel that I thought his code was better than most that I review. And I encouraged him to detach his ego from his code as much as possible, while also remaining committed to applying best practices whenever possible to fend off problems later in production.

Take the Challenge

Each PL/SQL article offers a quiz to test your knowledge of the information provided in it. The quiz appears below and also at PL/SQL Challenge (plsqlchallenge.com), a website that offers online quizzes on the PL/SQL language as well as SQL, Oracle Application Express, database design, and deductive logic.

Here is your quiz for this article:

I execute the following statements:

CREATE OR REPLACE PACKAGE plch_pkg
IS
   g_save_value   NUMBER;
END;
/
CREATE OR REPLACE PROCEDURE plch_log_error (
   data_in   IN VARCHAR2)
IS
BEGIN
   DBMS_OUTPUT.put_line (
      SQLCODE || '-' || data_in);
END;
/

Which of the following choices display “-6502-Value=1000” (and perhaps display other text as well) after execution?

Note: SQLCODE returns -6502 for a VALUE_ERROR exception.

CREATE OR REPLACE PROCEDURE plch_do_stuff (
   value_in   IN NUMBER)
IS
   l_value   NUMBER := value_in * 100;
BEGIN
   RAISE VALUE_ERROR;
EXCEPTION
   WHEN OTHERS
   THEN
      plch_log_error ('Value=' || l_value);
      RAISE;
END;
/
DECLARE
   l_value   NUMBER;
BEGIN
   plch_do_stuff (10);
EXCEPTION
   WHEN OTHERS
   THEN
      plch_log_error ('Value=' || l_value);
END;
/

CREATE OR REPLACE PROCEDURE plch_do_stuff (
   value_in   IN NUMBER)
IS
   l_value   NUMBER := value_in * 100;
BEGIN
   RAISE VALUE_ERROR;
EXCEPTION
   WHEN OTHERS
   THEN
      plch_pkg.g_save_value := l_value;
      RAISE;
END;
/
DECLARE
   l_value   NUMBER;
BEGIN
   plch_do_stuff (10);
EXCEPTION
   WHEN OTHERS
   THEN
      plch_log_error ('Value=' || plch_pkg.g_save_value);
END;
/

CREATE OR REPLACE PROCEDURE plch_do_stuff (
   value_in   IN NUMBER)
IS
   l_value   NUMBER := value_in * 100;
BEGIN
   RAISE VALUE_ERROR;
END;
/
DECLARE
   l_value   NUMBER;
BEGIN
   plch_do_stuff (10);
EXCEPTION
   WHEN OTHERS
   THEN
      plch_log_error ('Value=' || l_value);
END;

Next Steps

Share

Planning for Trouble

You always write your best code, but you should be prepared for imperfection.