Database, SQL and PL/SQL

Refactoring for PL/SQL Developers

 

Go beyond identifying best PL/SQL practices to create better code.

By Steven Feuerstein Oracle ACE Director

January/February 2005

 


What is Refactoring?

Most PL/SQL developers stay very busy building applications. As a result, relatively few explore the wider world of software programming, in which the topic of refactoring is known.

Refactoring, to quote the book Refactoring: Improving the Design of Existing Code by Martin Fowler et al. (Addison-Wesley, 1999), "is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. . . . It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written."

Who Needs Refactoring?

All professional software developers, in fact, need to refactor their code. That's not bad news; it's good news, as I will demonstrate in this article. Additional good news: Many of us already do refactor our code. We are constantly going back into existing programs—written by ourselves or by someone else—to fix a problem, enhance the program to match new requirements, make it run faster, upgrade it to use new PL/SQL features, and more.

With each of these steps, we often—but not always—improve the internal design of our code. So why introduce a new buzzword to describe what we are already doing? Refactoring goes beyond simply making recommendations for best practices; the discipline of refactoring offers step-by-step guides to making very specific improvements. These guides help us make the changes in a consistent manner, without introducing new bugs.

In this article, I am going to perform a refactoring exercise on a program that checks to see if two files are equal (have the same contents). By seeing the refactoring process in action, you will be able to better evaluate its potential impact on your own development environment.

An Initial Review of eqfiles

The eqfiles function performs a line-by-line comparison of two files to see if their contents are equivalent. The code for this function is shown in Listing 1. It is a fairly straightforward program, and it is written that way, with each step exposed in the main body of the program:

Code Listing 1: The original eqfiles function

 1  CREATE OR REPLACE FUNCTION eqfiles (
 2     check_this_in         IN   VARCHAR2
 3    ,check_this_dir_in     IN   VARCHAR2
 4    ,against_this_in       IN   VARCHAR2
 5    ,against_this_dir_in   IN   VARCHAR2 := NULL
 6  )
 7     RETURN BOOLEAN
 8  IS
 9     checkid       UTL_FILE.file_type;
10     checkline     VARCHAR2 (32767);
11     check_eof     BOOLEAN;
12     againstid     UTL_FILE.file_type;
13     againstline   VARCHAR2 (32767);
14     against_eof   BOOLEAN;
15     retval        BOOLEAN;
16  BEGIN
17     -- Open both files, read-only.
18     checkid :=
19        UTL_FILE.fopen (check_this_dir_in
20                       ,check_this_in
21                       ,'R'
22                       ,max_linesize      => 32767
23                       );
24
25     againstid :=
26        UTL_FILE.fopen (NVL (against_this_dir_in, check_this_dir_in)
27                       ,against_this_in
28                       ,'R'
29                       ,max_linesize      => 32767
30                       );
31
32     LOOP
33        BEGIN
34           UTL_FILE.get_line (checkid, checkline);
35        EXCEPTION
36           WHEN NO_DATA_FOUND
37           THEN
38              check_eof := TRUE;
39        END;
40
41        BEGIN
42           UTL_FILE.get_line (againstid, againstline);
43        EXCEPTION
44           WHEN NO_DATA_FOUND
45           THEN
46              against_eof := TRUE;
47        END;
48
49        IF (check_eof AND against_eof)
50        THEN
51           retval := TRUE;
52           EXIT;
53        ELSIF (checkline != againstline) OR
54              (check_eof OR against_eof)
55        THEN
56           retval := FALSE;
57           EXIT;
58        END IF;
59     END LOOP;
60
61     UTL_FILE.fclose (checkid);
62     UTL_FILE.fclose (againstid);
63     RETURN retval;
64* END eqfiles;

Lines 17 through 30. Open both files, read-only, maximum line size possible.
Line 32. Start up a loop to read through the files.
Lines 33 through 47. Read the next line from each file.
Lines 49 through 58. Determine if the files are still the same. If not, exit the loop. The files are no longer the same if either (a) I reach the end of one file before reaching the end of the other or (b) the lines from each file are not the same.
Lines 61 through 63. Clean up (close) the files and return the Boolean value.

The eqfiles function is simple and short, and it gets the job done. So should I bother refactoring this program? To see why the answer is a definitive "yes!" let's review this code for possible problems and opportunities for improvement. I list below the issues that stand out for me; send a note to me at best-practice-plsql_us@oracle.com if you see anything I missed.

My program has no exception section. Worse, I am using UTL_FILE and there is no exception section. When performing file I/O, the chance of an error is quite high. It is extremely important to handle UTL_FILE errors in the same block in which the code is executed. That way you can get as much information as possible about the error—and, as important, you can close any open files. The way this program is written now, if any error occurs while I'm reading from one of the files, the files will be left open until I terminate my session!

Presence of hard-coded literal values. I want to read from the files, so I specify 'R' as the file open mode. I want to make sure I can read long lines, so I specify the maximum, 32767 , for the max_linesize parameter. Now how did I know those values? I looked them up in the documentation, or I simply remembered them from frequent use. Regardless of the answer, this is not a good way to write code. The values may change over time, and I should not have to trust my memory to get it right.

Repetition of the same logic. The entire FOPEN statement to open a file is repeated, which means that the hard-coded values are also repeated. Repetition of the same or similar logic is one of the biggest problems in software. Multiple instances means extra work to maintain, debug, and upgrade the code. There is also a greatly increased likelihood of bugs working their way into the application.

Exposed complex Boolean logic. The expression that I wrote to evaluate if the files are the same is complicated and hard to follow. If that's the way the author of the code feels about it, it isn't hard to predict how challenging a new reader of this code will find those lines.

Confusing loop structure. Linked to the complexity of the conditional logic is the loop in which the IF statement is found. I am using a simple loop, so the EXIT condition is not explicitly stated; rather, it is buried inside the loop. Furthermore, there are two different EXIT statements, which means that debugging the code will be more challenging (which EXIT path did it follow)?

An Imperfect Program—Like All Others

The eqfiles function isn't perfect, but there's nothing very surprising about that. There isn't any such thing as a perfect program. The question is how to make it better. Given the imperfections described above, I suggest that the following refactorings would apply very well to eqfiles :

  • Create local initialization and cleanup procedures.

  • Include an exception section in every block of code, or justify its absence.

  • Replace all hard-coded literal values with named constants.

  • Hide complex expressions and logic behind a function or variable.

  • Construct each loop so that there is only one way to exit.

Let's take these refactorings one at a time, applying them to eqfiles.

Create local initialization and cleanup procedures. Most executable sections of a PL/SQL block can be logically segmented into three separate phases:

Phase 1. Initialize and check: Initialize local variables, check parameters and other conditions.
Phase 2. Run main body of code.
Phase 3. Clean up when done. This cleanup takes place in the executable section, but you will also usually want to execute at least some part of it in error handlers.

Many times when we write code, all of these phases are just jumbled together inside the executable section. This approach makes that section longer and harder to read. You are much better off creating local modules to store the steps for both initialization and cleanup.

Let's look at how to apply this refactoring to eqfiles . Before I can actually run the main body of logic for eqfiles (analyzing if the lines are equal), I need to open the files successfully; this step constitutes the initialization of my program (lines 17 through 30 in Listing 1).

First I create a "stub" at the end of my declaration section:

PROCEDURE initialize
IS
BEGIN
END initialize;

Then I copy the UTL_FILE.FOPEN calls out of the executable section and paste them into the initialize procedure:

PROCEDURE initialize
IS
BEGIN
   checkid := UTL_FILE.fopen ...
   againstid := UTL_FILE.fopen ...
END initialize;

Finally, I replace the original UTL_FILE.FOPEN lines of code with a call to initialize, and thus my executable section now looks like this:

BEGIN
   initialize;
   LOOP
      BEGIN

Now for the cleanup (lines 61 through 62 in Listing 1). Again, I create a stub:

PROCEDURE cleanup
IS
BEGIN
END cleanup;

Next I copy the FCLOSE statements out from the end of the program and paste them into this cleanup procedure:

PROCEDURE cleanup
IS
BEGIN
   UTL_FILE.fclose (checkid);
   UTL_FILE.fclose (againstid);
END cleanup;

Then I replace the original FCLOSE statements with my call to this cleanup procedure. My executable section is now roughly 10 lines shorter and focused entirely on the core business logic. I have also set myself up to quickly and easily follow the next refactoring.

Include an exception section in every block of code, or justify its absence. There are certainly reasons for not including an exception section in a PL/SQL block, but the point is that you should have a good reason (besides "I forgot" or "I don't have time for that sort of thing").

In the case of a program that manipulates files with UTL_FILE , you should always include an exception section, for two main reasons:

  1. You can trap the error and then close any open files. If you do not, the files remain open and—since you have "lost" the handles to those files because they are generally declared locally in the block—you will not be able to close those specific files in an outer block that traps the error.

  2. You can check for specific UTL_FILE -related errors and take appropriate corrective action. If you let the error propagate to an outer block, it will be much harder to check for and correct file-related failures.

Now that I have created a cleanup procedure for eqfiles , it is quite easy for me to take care of No. 1 above. I simply add an exception section to the function, call the cleanup routine, and then reraise the exception:

EXCEPTION
WHEN OTHERS
THEN
cleanup;
RAISE;

Alternatively, I might decide to simply return FALSE instead of propagating the exception, because if I encounter an error, then clearly the two files are not the same.

To take care of No. 2 above, I can add an error handler for a specific exception, and I can easily call the cleanup procedure there as well. If, for example, I provide the name of a file that does not exist, Oracle will raise the UTL_FILE.INVALID_OPERATION error (which in Oracle9i Database now has its own error number: ORA-29283). So I can check for that and take a different action, as in:

EXCEPTION
WHEN UTL_FILE.INVALID_OPERATION
THEN
cleanup;
DBMS_OUTPUT.PUT_LINE(
'Please make sure that these files exist:');
DBMS_OUTPUT.PUT_LINE (
check_this_in || ' in ' ||
check_this_dir_in);
DBMS_OUTPUT.PUT_LINE (
against_this_in || ' in ' ||
against_this_dir_in);

Replace all hard-coded literal values with named constants. I am sure this refactoring is an "old story" for most of you. We should never hard-code literal values into our programs. We all know this is a bad idea, because those values change over time but our code likely does not. Yet we all find ourselves doing it anyway. Let's go through the exercise of removing hard-coded values from eqfiles . Here are the values I want to replace with named elements of some sort:

  • 'R' for read-only as the file open mode in UTL_FILE.FOPEN (lines 21 and 28 in Listing 1)

  • 32767 for the maximum line size allowed in UTL_FILE.FOPEN (lines 22 and 29 in Listing 1)

I could simply create named constants inside this function and assign values to those constants. The problem with this approach is that I may need to reference the same values in different parts of my application. Am I going to create named constants all over my application? That is scarcely much of an improvement over the existing code.

Given this situation, a much better approach is to create a single container for these values (and any other hard-coded values related to UTL_FILE ) and then reference the named elements in that container throughout my application.

So to get rid of these hard-coded values, I will

  1. Create a package specification that defines named constants for any hard-coded UTL_FILE values and compile it into my schema, as shown in Listing 2.

    Code Listing 2: The utl_file_constants package

    CREATE OR REPLACE PACKAGE utl_file_constants
    IS
    c_read_only CONSTANT VARCHAR2 (1) := 'R';
    c_write_only CONSTANT VARCHAR2 (1) := 'W';
    c_append CONSTANT VARCHAR2 (1) := 'A';
    c_min_linesize CONSTANT PLS_INTEGER := 1;
    c_def_linesize CONSTANT PLS_INTEGER := 1024;
    c_max_linesize CONSTANT PLS_INTEGER := 32767;
    END utl_file_constants;
    /

  2.  
  3. Replace references to the hard-coded values with the package-based constants. Here is one such replacement:

    checkid :=
    UTL_FILE.fopen (check_this_dir_in
    ,check_this_in
    ,utl_file_constants.c_read_only
    ,max_linesize =>
    utl_file_constants.c_max_linesize
    );

Hide complex expressions and logic behind a function or variable. This program contains a good example of tricky conditional logic (lines 49 through 58 in Listing 1). There isn't a lot of code involved, but it certainly is less than transparent:

IF (check_eof AND against_eof)
THEN
retval := TRUE;
EXIT;
ELSIF (checkline != againstline) OR
(check_eof OR against_eof)
THEN
retval := FALSE;
EXIT;
END IF;

Allow me to translate: After I read the next lines from each file, if I have reached the end of both files at the same time, my files are identical and I should stop reading. Otherwise, if the lines are different or if I have reached the end of one file before reaching the end of the other, then my files are not identical and I should stop reading.

Exposing complex logic like this in the main body of a procedure is problematic. It is hard to understand, and it is also a major distraction from seeing the overall flow of the program. A much better approach is to hide all of this logic behind a procedure or a function.

After reading (or attempting to read) the next line from each file, I need to compare the results to see if they are identical and if I should continue to read through the files. So why not create a program just for that? See Listing 3.

Code Listing 3: The compare_latest_read procedure

PROCEDURE compare_latest_read (
check_eof_in IN BOOLEAN
,against_eof_in IN BOOLEAN
,check_line_in IN VARCHAR2
,against_line_in IN VARCHAR2
,identical_out OUT BOOLEAN
,read_next_out OUT BOOLEAN
)
IS
BEGIN
IF check_eof_in AND against_eof_in
THEN
-- Made it to the end of both files simultaneously.
identical_out := TRUE;
read_next_out := FALSE;
ELSIF check_eof_in OR against_eof_in
THEN
-- Reached end of one before the other. Not identical!
identical_out := FALSE;
read_next_out := FALSE;
ELSE
-- Only continue IF the two lines are identical.
identical_out := check_line_in = against_line_in;
read_next_out := identical_out;
END IF;
END compare_latest_read;

Now the logic is more clearly expressed. Sure, there's more code than before, but that's not such an awful thing, if the program is more readable and easier to maintain.

There is another area of exposed complexity in the body of eqfiles. Lines 33 through 47 in Listing 1 contain the logic to read the next line of each file. UTL_FILE.GET_LINE raises the NO_DATA_FOUND exception when it reads past the end of a file. I must therefore wrap each call to GET_LINE inside its own BEGIN-END block, because reaching the end of file in this case is not an error.

Rather than duplicate this error-handling logic twice in this block of code, again creating quite a distraction from the main event, I prefer to isolate the read-and-trap logic in its own procedure, as shown in Listing 4.

Code Listing 4: The get_next_line_from_file procedure

PROCEDURE get_next_line_from_file (
file_in IN UTL_FILE.file_type
,line_out OUT VARCHAR2
,eof_out OUT BOOLEAN
)
IS
BEGIN
UTL_FILE.get_line (file_in, line_out);
eof_out := FALSE;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
line_out := NULL;
eof_out := TRUE;
END get_next_line_from_file;

With these changes, the body of the loop now looks like the code in Listing 5.

Code Listing 5: The body of the loop

LOOP
get_next_line_from_file (checkid, checkline, check_eof);
get_next_line_from_file (againstid, againstline, against_eof);
compare_latest_read (check_eof
,against_eof
,checkline
,againstline
,l_identical
,l_keep_checking
);
END LOOP;

This is very straightforward code, but it doesn't quite make sense. I am using a simple loop to read each line—but I no longer have an EXIT statement in my loop. This loop will run forever, which leads me to my final refactoring.

Construct each loop so that there is only one way to exit. In the original structure of my program, I used a simple loop that had two different EXIT statements (lines 52 and 57 in Listing 1):

LOOP
... read from files ...

IF (check_eof AND against_eof)
THEN
retval := TRUE;
EXIT;
ELSIF (checkline != againstline) OR (check_eof OR against_eof)
THEN
retval := FALSE;
EXIT;
END IF;
END LOOP;

As a consequence, I have multiple EXIT points from the loop. This situation is generally undesirable, as it makes debugging and analysis of program flow more difficult. In this simple piece of code, two different EXIT points don't seem like a big deal (besides the fact that the logic is confusing, which we addressed in the previous refactoring). In a larger, more complex program, however, multiple EXIT points in a simple loop can be a major headache.

To change this code to have a single EXIT point, I switch to a WHILE statement:

WHILE (l_keep_checking)
LOOP
... body of loop
END LOOP;

and then I simply need to provide the logic to set the "keep checking" flag, which I do with the compare_latest_read procedure. Applying my new loop structure to my modularized loop body, I end up with this fully refactored executable section for eqfiles shown in Listing 6.

Code Listing 6: Refactored executable section for eqfiles

BEGIN
initialize;

WHILE (l_keep_checking)
LOOP
get_next_line_from_file (checkid, checkline, check_eof);
get_next_line_from_file (againstid, againstline, against_eof);
compare_latest_read (check_eof
,against_eof
,checkline
,againstline
,l_identical
,l_keep_checking
);
END LOOP;

cleanup;
RETURN l_identical;


The Full Refactoring

The full refactoring is shown in Listing 7. The main body is short and self-explanatory. If I need the details on how to get the next line or the logic I use to compare the latest lines read from the files, I zoom in on those procedures. I have gotten rid of my hard-coded values and carefully cleaned up any messes that I have made (or, to be accurate, that the user has made by passing in bad values for filenames or directories).

Code Listing 7: The refactored eqfiles function

CREATE OR REPLACE FUNCTION eqfiles (
check_this_in IN VARCHAR2
,check_this_dir_in IN VARCHAR2
,against_this_in IN VARCHAR2
,against_this_dir_in IN VARCHAR2 := NULL
)
RETURN BOOLEAN
IS
checkid UTL_FILE.file_type;
checkline VARCHAR2 (32767);
check_eof BOOLEAN;
--
againstid UTL_FILE.file_type;
againstline VARCHAR2 (32767);
against_eof BOOLEAN;
--
l_keep_checking BOOLEAN DEFAULT TRUE;
l_identical BOOLEAN DEFAULT FALSE;

PROCEDURE initialize
IS
BEGIN
-- Open both files, read-only.
checkid :=
UTL_FILE.fopen (check_this_dir_in
,check_this_in
,utl_file_constants.c_read_only
,max_linesize => utl_file_constants.c_max_linesize
);
againstid :=
UTL_FILE.fopen (NVL (against_this_dir_in, check_this_dir_in)
,against_this_in
,utl_file_constants.c_read_only
,max_linesize => utl_file_constants.c_max_linesize
);
END initialize;

PROCEDURE cleanup
IS
BEGIN
UTL_FILE.fclose (checkid);
UTL_FILE.fclose (againstid);
END cleanup;

PROCEDURE get_next_line_from_file (
file_in IN UTL_FILE.file_type
,line_out OUT VARCHAR2
,eof_out OUT BOOLEAN
)
IS
BEGIN
UTL_FILE.get_line (file_in, line_out);
eof_out := FALSE;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
line_out := NULL;
eof_out := TRUE;
END get_next_line_from_file;

PROCEDURE compare_latest_read (
check_eof_in IN BOOLEAN
,against_eof_in IN BOOLEAN
,check_line_in IN VARCHAR2
,against_line_in IN VARCHAR2
,identical_out OUT BOOLEAN
,read_next_out OUT BOOLEAN
)
IS
BEGIN
IF check_eof_in AND against_eof_in
THEN
-- Made it to the end of both files simultaneously.
identical_out := TRUE;
read_next_out := FALSE;
ELSIF check_eof_in OR against_eof_in
THEN
-- Reached end of one before the other. Not identical!
identical_out := FALSE;
read_next_out := FALSE;
ELSE
-- Only continue IF the two lines are identical.
identical_out := check_line_in = against_line_in;
read_next_out := identical_out;
END IF;
END compare_latest_read;
BEGIN
initialize;

WHILE (l_keep_checking)
LOOP
get_next_line_from_file (checkid, checkline, check_eof);
get_next_line_from_file (againstid, againstline, against_eof);
compare_latest_read (check_eof
,against_eof
,checkline
,againstline
,l_identical
,l_keep_checking
);
END LOOP;

cleanup;
RETURN l_identical;
EXCEPTION
WHEN OTHERS
THEN
cleanup;
RETURN FALSE;
END eqfiles;
/


The Power of Refactoring

I have found the idea and discipline of refactoring to be very helpful. Refactoring takes us a step beyond identifying best practices by also providing a step-by-step approach to making changes, to minimize the chance of introducing bugs.

Next Steps

READ more Feuerstein
 oracle.com/technetwork/issue-archive/index-092362.html
 oracle.com/techetwork/articles
 www.oreillynet.com/cs/catalog/view/au/344

more on refactoring
 www.refactoring.com
 Refactoring: Improving the Design of Existing Code
 www.qnxo.com

 

 

 

DISCLAIMER: We've captured these popular historical magazine articles for your reference. Links etc contained within these article possibly won't work. These articles are provided as-is with no guarantee that the information within them is the best advice for current versions of the Oracle Database.