Radio Free Tooting: Oracle

Showing posts with label Oracle. Show all posts

Tuesday, August 25, 2020

How PL/SQL Development Standards work

I have been gigging at a place which has documented PL/SQL Development Standards. This is not so unusual: most Oracle shops have such a document. What makes it unusual is that they enforce the standards. With code reviews. And I mean properly enforce: programs fail QA for egregious breaches of the standards or a sufficient accumulation of minor breaches. This is less common than it ought to be.

Many coders are sceptical about development standards; I have been in the past. Standards generally focus on things which are easy to standardise (indentation, case, naming conventions) rather functional correctness or design principles. They frequently codify arbitrary or outdated practices (mandating explicit cursors is a particular bugbear of mine). They either go into so much detail that they are unreadably long (and dull) or are so sketchy that they operate as easy-to-ignore guidelines. But I think many experienced developers' objections boil down to: I don't like being told how to write my code; my style is the best style; my code is clean, clear and readable.

The catch is, readability is not simply a function of personal style: it emerges from consistency across the entire codebase. Just because I find my personal coding style clear doesn't mean everbody else will. At the very least a colleague reading my program will have to invest time in understanding how I name my variables, how I use table aliases, and a dozen other things, none of them important individually but all together adding friction to the crucial task of understanding how a program works (or does not work).

This particular set of standards certainly had a lot to say about layout. Many strictures fitted with my natural coding style (all lower case, one column per line in a SQL projection, comma before the column name rather than after it). Others were rather tiresome: the rules for clause alignment entail a lot of spacing and backspacing to ensure elements line up. There are a few strictures I actively disagree with (notably mandatory use of SQL-89 syntax i.e. impicit joins). But here's the rub: I didn't get to pick and choose which of the standards I followed. I just had to knuckle down and follow them all. Because the discipline of the code review meant my programs failed QA when I hadn't applied the standards.

There's more to consistency than just layout and naming conventions. There's also functional consistency: use of SQL and PL/SQL idioms, how to organise programs within a package, and so forth. Too many things to cover in a single document. But again, code reviews enforce standardisation of these aspects, by applying undocumented conventions with the same rigour as documented standards. A couple of times I tripped over such an undocumented convention and it didn't feel fair: my code failed the review because I wrote something which was wrong even though not explicitly covered by the standards. One of these times it was something awry in the layout. "That's wrong", the reviewer said. It was a difference I hadn't even noticed, and probably you wouldn't have noticed either, and even if I had have noticed it I wouldn't have thought it was wrong. But it was different from what everybody else was doing. That made it wrong.

Everybody undertakes code reviews and everybody's code is reviewed. Thus code reviews shape the codebase, by enforcing documented standards and undocumented conventions. As a result this is the most readable codebase I have ever worked on. It's almost impossible to tell who wrote any given program, because all programs look the same. It's easy to reason about a piece of code because it follows rigorous naming conventions and consistent architectural principles. The code is habitable. A colleague can read a program I wrote and feel comfortable doing so. The layout, the naming conventions, the consistent selection of one approach in situations where PL/SQL offers more than one way of doing something, all these factors mean my program looks just like the program anybody else would have written. So the reader is freer to understand what the program actually does and how it works. Standardisation reduces friction.

It is a virtuous circle. Code reviews enforce a consistent programming style, which eliminates trivial (i.e. non-functional) differences in the program. In turn this makes the program easier to review: all the programs look basically the same which highlights the things which need to be different, the business logic and the data structures.

Readability is a feature

Readility is a feature. It's a feature our code must have. We all know readability makes code easier to maintain, easier to re-use, easier to debug. Yet still many developers bridle at the suggestion that their PL/SQL must look like everybody else's PL/SQL. I get this. It's not that I think the way I write PL/SQL is intrinsically correct, it just looks the way I have evolved to write it over the years. A new set of coding standards, rigorously applied, disrupts my flow. I must slow down to correct the variable names or fix the layout. It's tedious.

Tedious but also necessary. A sofware system is a shared enterprise. It's not "my" code, it's the project's code; I am just the person checking it into source control. As a discipline, programming is a craft not an art. PL/SQL is simply a device for turning data into business value. It's more important that other people on the team can work with our code than that it has our signature style. So let's not be precious about appearance. We must follow the rules, and save our self-expression for our poems and our tweets.

Above all, know this: there are no development standards without code reviews.

Thursday, July 30, 2020

Minimal declaration of foreign key columns

Here is the full declaration of an inline foreign key constraint (referencing a primary key column on a table called PARENT):

, parent_id number(12,0) constraint chd_par_fk foreign key references parent(parent_id)

But what is the fewest number of words required to implement the same constraint? Two. This does exactly the same thing:

, parent_id references parent

The neat thing about this minimalist declaration is the child column inherits the datatype of the referenced primary key column. Here's what it looks like (with an odd primary key declaration, just to prove the point):

 
SQL> create table parent1
  2  (parent_id number(15,3) primary key)
  3  /

Table PARENT1 created.

SQL> create table child1
  2  ( id        number(12,0) primary key
  3   ,parent_id references parent1) 
  4  /

Table CHILD1 created.

SQL> desc child1
Name      Null?    Type         
--------- -------- ------------ 
ID        NOT NULL NUMBER(12)   
PARENT_ID          NUMBER(15,3) 
SQL>

If we want to specify a name for the foreign key we need to include the constraint keyword:

 
SQL> create table parent2
  2  (parent_id number(15,3) constraint par1_pk primary key)
  3  /

Table PARENT2 created.

SQL> create table child2
  2  ( id        number(12,0) constraint chd2_pk primary key
  3   ,parent_id              constraint chd2_par2_fk references parent2) 
  4  /

Table CHILD2 created.

SQL> desc child2
Name      Null?    Type         
--------- -------- ------------ 
ID        NOT NULL NUMBER(12)   
PARENT_ID          NUMBER(15,3) 
SQL>

This minimal declaration always references the parent table's primary key. Suppose we want to reference a unique key rather than the primary key. (I would regard this as a data model smell, but sometimes we need to do it.) To make this work we need merely explicitly reference the unique key column:

SQL> create table parent3
  2  ( parent_id  number(15,3)          constraint par3_pk primary key
  3   ,parent_ref varchar2(16) not null constraint par3_uk unique
  4  )
  5  /

Table PARENT3 created.

SQL> create table child3
  2  ( id         number(12,0) constraint chd3_pk primary key
  3   ,parent_ref              constraint chd3_par3_fk references parent3(ref)) 
  4  /

Table CHILD3 created.

SQL> desc child3
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_REF          VARCHAR2(16) 
SQL>

Hmmm, neat. What if we have a compound primary key? Well, that's another data model smell but it still works. Because we're constraining multiple columns we need to use a table level constraint and so the syntax becomes more verbose; we need to include the magic words foreign key:

SQL> create table parent4
  2  ( parent_id  number(15,3)   
  3   ,parent_ref varchar2(16) 
  4   ,constraint par4_pk primary key (id, ref)
  5  )
  6  /

Table PARENT4 created.

SQL> create table child4
  2  ( id number(12,0) constraint chd4_pk primary key
  3   ,parent_id
  4   ,parent_ref
  5   ,constraint chd4_par4_fk foreign key (parent_id, parent_ref) references parent4) 
  6  /

Table CHILD4 created.

SQL> desc child4
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_ID           NUMBER(15,3) 
PARENT_REF          VARCHAR2(16) 
SQL>

Okay, but supposing we change the declaration of the parent column, does Oracle ripple the change to the child table?

 
SQL> alter table parent4 modify parent_ref varchar2(24);

Table PARENT4 altered.

SQL> desc child4
Name       Null?    Type         
---------- -------- ------------ 
ID         NOT NULL NUMBER(12)   
PARENT_ID           NUMBER(15,3) 
PARENT_REF          VARCHAR2(16) 
SQL>

Nope. And rightly so. This minimal syntax is a convenience when we're creating a table, but there's no object-style inheritance mechanism.

Generally I prefer a verbose declaration over minimalism, because clarity trumps concision. I appreciate the rigour of enforcing the same datatype on both ends of a foreign key constraint. However, I hope that in most cases our CREATE TABLE statements have been generated from a data modelling tool. So I think this syntactical brevity is a neat thing to know about, but of limited practical use.

Monday, September 03, 2018

UKOUG London Development and Middleware event - free!

The Oracle development landscape is an extremely broad and complicated one these days. It covers such a wide range of tools, technologies and practices it is hard to keep up.

The UKOUG is presenting a day of sessions which can bring you up to speed. It's a joint initiative between the Development and Middleware SIGs - a composite if you will - at the Oracle City Office on Thursday 6th September. This event is free. If you are a UKOUG member attending it won't count against your allotment of SIG delegates; if you're not a UKOUG member there's no charge so come along and get a taste of what the UKOUG has to offer.

The day covers a broad spectrum. Martin Beeby is a popular speaker; his talk covers how Oracle is embracing new cool technologies such as Blockchain, Docker and chatbots. There are talks from Oracle ACE Director Simon Haslam on mobile applications and Oracle ACE Director Mark Simpson on real-life uses for AI. There are also sessions on API design, building bots and JavaScript frameworks.

Even last year these things might have been considered cutting edge, certainly in the enterprise realm. But most organisations of whatever size are at least thinking about or running Proof of Concept projects in AI or blockchain. Some already have these technologies active in Production. These things will affect everybody working in IT, and probably sooner rather than later. It's always good to know what's coming.

Check out the full agenda here.
Register here.

Oh, and did I mention it's free? Treat yourself to a day out from the present and get a glimpse of the future.

Wednesday, May 31, 2017

Avoiding Coincidental Cohesion

Given that Coincidental Cohesion is bad for our code base so obviously we want to avoid writing utilities packages. Fortunately it is mostly quite easy to do so. It requires vigilance on our part. Utilities packages are rarely planned. More often we are writing a piece of business functionality when we find ourselves in need of some low level functionality. It doesn't fit in the application package we're working on, perhaps we suspect that it might be more generally useful, so we need somewhere to put it.

The important thing is to recognise and resist the temptation of the Utilities package. The name itself (and similarly vague synonyms like helper or utils) should be a red flag. When we find ourselves about to type create or replace package utilities we need to stop and think: what would be a better name for this package? Consider whether there are related functions we might end up needing? Suppose we're about to write a function to convert a date into Unix epoch string. It doesn't take much imagine to think we might need a similar function to convert a Unix timestamp into a date. We don't need to write that function now but let's start a package dedicated to Time functions instead of a miscellaneous utils package.

Looking closely at the programs which comprise the DBMS_UTILITY package it is obviously unfair to describe them as a random selection. In fact that there seven or eight groups of related procedures.

DB Info

INSTANCE_RECORD Record Type
DBLINK_ARRAY Table Type
INSTANCE_TABLE Table Type
ACTIVE_INSTANCES Procedure
CURRENT_INSTANCE Function
DATA_BLOCK_ADDRESS_BLOCK Function
DATA_BLOCK_ADDRESS_FILE Function
DB_VERSION Procedure
GET_ENDIANNESS Function
GET_PARAMETER_VALUE Function
IS_CLUSTER_DATABASE Function
MAKE_DATA_BLOCK_ADDRESS Function
PORT_STRING Function

Runtime Messages

FORMAT_CALL_STACK Function
FORMAT_ERROR_BACKTRACE Function
FORMAT_ERROR_STACK Function

Object Management

COMMA_TO_TABLE Procedures
COMPILE_SCHEMA Procedure
CREATE_ALTER_TYPE_ERROR_TABLE Procedure
INVALIDATE Procedure
TABLE_TO_COMMA Procedures
VALIDATE Procedure

Object Info (Object Management?)

INDEX_TABLE_TYPE Table Type
LNAME_ARRAY Table Type
NAME_ARRAY Table Type
NUMBER_ARRAY Table Type
UNCL_ARRAY Table Type
CANONICALIZE Procedure
GET_DEPENDENCY Procedure
NAME_RESOLVE Procedure
NAME_TOKENIZE Procedure

Session Info

OLD_CURRENT_SCHEMA Function
OLD_CURRENT_USER Function

SQL Manipulation

EXPAND_SQL_TEXT Procedure
GET_SQL_HASH Function
SQLID_TO_SQLHASH Function

Statistics (deprecated))

ANALYZE_DATABASE Procedure
ANALYZE_PART_OBJECT Procedure
ANALYZE_SCHEMA Procedure

Time

GET_CPU_TIME Function
GET_TIME Function
GET_TZ_TRANSITIONS Procedure

Unclassified

WAIT_ON_PENDING_DML Function
EXEC_DDL_STATEMENT Procedure
GET_HASH_VALUE Function
IS_BIT_SET Function

We can see an alternative PL/SQL code suite, with several highly cohesive packages. But there will be some procedures which are genuinely unrelated to anything else. The four procedures in the Unclassified section above are examples. But writing a miscellaneous utils package for these programs is still wrong. There are better options.

Find a home. It's worth considering whether we already have a package which would fit the new function. Perhaps WAIT_ON_PENDING_DML() should have gone in DBMS_TRANSACTION; perhaps IS_BIT_SET() properly belongs in UTL_RAW.
A package of their own. Why not? It may seem extravagant to have a package with a single procedure but consider DBMS_DG with its lone procedure INITIATE_FS_FAILOVER(). The package delivers the usual architectural benefits plus it provides a natural home for related procedures we might discover a need for in the future.
Standalone procedure. Again, why not? We are so conditioned to think of a PL/SQL program as a package that we forget it can be just a Procedure or Function. Some programs are suited to standalone implementation.

So avoiding the Utilities package requires vigilance. Code reviews can help here. Preventing the Utilities package becoming entrenched is crucial: once we have a number of packages dependent on a Utilities package it is pretty hard to get rid of it. And once it becomes a fixture in the code base developers will consider it more acceptable to add procedures to it.

Part of the Designing PL/SQL Programs series

Utilities - the Coincidental Cohesion anti-pattern

One way to understand the importance of cohesion is to examine an example of a non-cohesive package, one exhibiting a random level of cohesion. The poster child for Coincidental Cohesion is the utility or helper package. Most applications will have one or more of these, and Oracle's PL/SQL library is no exception. DBMS_UTILITY has 37 distinct procedures and functions (i.e. not counting overloaded signatures) in 11gR2 and 38 in 12cR1 (and R2). Does DBMS_UTILITY deliver any of the benefits the PL/SQL Reference says packages deliver?

Easier Application Design?

One of the characteristics of utilities packages is that they aren't designed in advance. They are the place where functionality ends up because there is no apparently better place for it. Utilities occur when we are working on some other piece of application code; we discover a gap in the available functionality such as hashing a string. When this happens we generally need the functionality now: there's little benefit to deferring the implementation until later. So we write a GET_HASH_VALUE() function,x stick it in our utilities package and proceed with the task at hand.

The benefit of this approach is we keep our focus on the main job, delivering business functionality. The problem is, we never go back and re-evaluate the utilities. Indeed, now there is business functionality which depends on them: refactoring utilities introduces risk. Thus the size of the utilities package slowing increases, one tactical implementation at a time.

Hidden Implementation Details?

Another characteristic of utility functions is that they tend not to share concrete implementations. Often a utilities package beyond a certain size will have groups of procedures with related functionality. It seems probable that DBMS_UTILITY.ANALYZE_DATABASE(), DBMS_UTILITY.ANALYZE_PART_OBJECT() and DBMS_UTILITY.ANALYZE_SCHEMA() share some code. So there are benefits to co-locating them in the same package. But it is unlikely that CANONICALIZE() , CREATE_ALTER_TYPE_ERROR_TABLE() and GET_CPU_TIME() have much code in common.

Added Functionality?

Utility functions are rarely part of a specific business process. They are usually called on a one-off basis rather than being chained together. So there is no state to be maintained across different function calls.

Better Performance?

For the same reason there is no performance benefit from a utilities package. Quite the opposite. When there is no relationship between the functions we cannot make predictions about usage. We are not likely to call EXPAND_SQL_TEXT() right after calling PORT_STRING(). So there is no benefit in loading the former into memory when we call the latter. In fact the performance of EXPAND_SQL_TEXT() is impaired because we have to load the whole DBMS_UTILITY package into the shared pool, plus it uses up a larger chunk of memory until it gets aged out. Although to be fair, in these days of abundant RAM, some unused code in the library cache need not be our greatest concern. But whichever way we bounce it, it's not a boon.

Grants?

Privileges on utility packages is a neutral concern. Often utilities won't be used outside the owning schema. In cases where we do need to make them more widely available we're probably granting access on some procedures that the grantee will never use.

Modularity?

From an architectural perspective, modularity is the prime benefit of cohesion. A well-designed library should be frictionless and painless to navigate. The problem with random assemblages like DBMS_UTILITY is that it's not obvious what functions it may contain. Sometimes we write a piece of code we didn't need to.

The costs of utility packages

Perhaps your PL/SQL code base has a procedure like this:

create or replace procedure run_ddl
  ( p_stmt in varchar2)
is
  pragma autonomous_transaction;
  v_cursor number := dbms_sql.open_cursor;
  n pls_integer;
begin
  dbms_sql.parse(v_cursor, p_stmt, dbms_sql.native);
  n := dbms_sql.execute(v_cursor);
  dbms_sql.close_cursor(v_cursor);
exception
  when others then
    if dbms_sql.is_open(v_cursor) then
      dbms_sql.close_cursor(v_cursor);
    end if;
    raise;
end run_ddl;
/

It is a nice piece of code for executing DDL statements. The autonomous_transaction pragma prevents the execution of arbitrary DML statements (by throwing ORA-06519), so it's quite safe. The only problem is, it re-implements DBMS_UTILITY.EXEC_DDL_STATEMENT().

Code duplication like this is a common side effect of utility packages. Discovery is hard because their program units are clumped together accidentally. Nobody sets out to deliberately re-write DBMS_UTILITY.EXEC_DDL_STATEMENT(), it happens because not enough people know to look in that package before they start coding a helper function. Redundant code is a nasty cost of Coincidental Cohesion. Besides the initial wasted effort of writing an unnecessary program there are the incurred costs of maintaining it, testing it, the risk of introducing bugs or security holes. Plus each additional duplicated program makes our code base a little harder to navigate.

Fortunately there are tactics for avoiding or dealing with this. Find out more.

Part of the Designing PL/SQL Programs series

Monday, December 05, 2016

UKOUG Tech 2016 - Super Sunday

UKOUG 2016 is underway. This year I'm staying at the Jury's Inn hotel, one of a clutch of hotels within a stone's throw of the ICC and all the action of Brindley Place. Proximity is the greatest luxury. My room is on the thirteenth floor, so I have a great view across Birmingham; a view, which in the words of Telly Savalas "almost takes your breath away".

Although the conference proper - with keynotes, exhibition hall and so on - opens today, Monday, the pre-conference Super Sunday has already delivered some cracking talks. For the second year on the trot we have had a stream devoted to database development, which is great for Old Skool developers like me.

Fighting Bad PL/SQL, Phillip Salvisberg

The first talk in the stream discussed various metrics for assessing the the quality of PL/SQL code: McCabe Cyclic Complexity, Halstead Volume, Maintainability Index. Cyclic Complexity evaluates the number of paths through a piece of code; the more paths the harder it is to understand what the code does under any given circumstance. The volume approach assesses information density (the number of distinct words/total number of words); a higher number means more concepts, and so more to understand. The Maintainability Index takes both measures and throws it some extra calculations based on LoC and comments.

All these measures are interesting, and often insights but none are wholly satisfactory. Phillip showed how easier it is to game the MI by putting all the code of a function on a single line: the idea that such a layout makes our code more maintainable is laughable. More worryingly, none of these measures evaluate what the code actually does. The presented example of better PL/SQL (according to the MI measure) replaced several lines of PL/SQL into a single REGEXP_LIKE call. Regular expressions are notorious for getting complicated and hard to maintain. Also there are performance considerations. Metrics won't replace wise human judgement just yet. In the end I agree with Phillip that the most useful metric remains WTFs per minute.

REST enabling Oracle tables with Oracle REST Data Services, Jeff Smith

It was standing room only for That Jeff Smith, who coped well with jetlag and sleep deprivation. ORDS is the new name for the APEX listener, a misleading name because it is used for more than just APEX calls, and APEX doesn't need it. ORDS is a Java application which brokers JSON calls between a web client and the database: going one way it converts JSON payload into SQL statements, going the other way it converts result sets into JSON messages. Apparently Oracle is going to REST enable the entire database - Jeff showed us the set of REST commands for managing DataGuard. ORDS is the backbone of Oracle Cloud.

Most of the talk centred on Oracle's capabilities for auto-enabling REST access to tables (and PL/SQL with the next release of ORDS). This is quite impressive and certainly I can see the appeal of standing up a REST web service to the database without all the tedious pfaffing in Hibernate or whatever Java framework is in place. However I think auto-enabling is the wrong approach. REST calls are stateless and cannot be assembled to form transactions; basically each one auto-commits. It's Table APIs all over again. TAPI 2.0, if you will. It's a recipe for bad applications.

But I definitely like this vision of the future: an MVC implementation with JavaScript clients (V) passing JSON payloads to ORDS (C) with PL/SQL APIs doing all the business logic (M). The nineties revival starts here.

Meet your match: advanced row pattern matching, Stew Ashton

Stew's talk was one of those ones which are hard to pull off: Oracle 12c's MATCH RECOGNIZE clause is a topic more suited to an article with a database on hand so we can work through the examples. Stew succeeded in making it work as a talk because he's a good speaker with a nice style and a knack for lucid explanation. He made a very good case for the importance of understanding this arcane new syntax.

MATCH RECOGNIZE is lifted from event processing. It allows us to define arbitrary sets of data which we can iterate over in a SELECT statement. This allows us to solve several classes of problems relating to bin filtering, positive and negative sequencing, and hierarchical summaries. The most impressive example showed how to code an inequality (i.e. range) join that performs as well as an equality join. I will certainly be downloading this presentation and learning the syntax when I get back home.

If only Stew had done a talk on the MODEL clause several years ago.

SQL for change history with Temporal Validity and Flash Back Data Archive, Chris Saxon

Chris Saxon tackled the tricky concept of time travel in the database, as a mechanism for handling change. The first type of change is change in transactional data. For instance, when a customer moves house we need to retain a record of their former address as well as their new one. We've all implemented history like this, with START_DATE and END_DATE columns. The snag has always been how to formulate the query to establish which record applies at a given point in time. Oracle 12C solves this with Temporal Validity, a syntax for defining a PERIOD using those start and end dates. Then we can query the history using a simple AS OF PERIOD clause. It doesn't solve all the problems in this area (primary keys remain tricky) but at least the queries are solved.

The other type of change is change in metadata: when was a particular change applied? what are all the states of a record over the last year? etc. These are familiar auditing requirements, which are usually addressed through triggers and journalling tables. That approach carries an ongoing burden of maintenance and is too easy to get wrong. Oracle has had a built-in solution for several years now, Flashback Data Archive. Not enough people use it, probably because in 11g it was called Total Recall and a chargeable extra. In 12C Flashback Data Archive is free; shorn of the data optimization (which requires the Advanced Compression package) it is available in Standard Edition not just Enterprise. And it's been back-ported to 11.2.0.4. The syntax is simple: to get a historical version of the data we simply use AS OF TIMESTAMP. No separate query for a journalling table, no more nasty triggers to maintain... I honestly don't know why everybody isn't using it.

So that was Super Sunday. Roll on Not-So-Mundane Monday.

Thursday, November 24, 2016

UKOUG Conference 2016 coming up fast

The weather has turned cold, the lights are twinkling in windows and Starbucks is selling pumpkin lattes. Yes, it's starting to look a lot like Christmas. But first there's the wonder-filled advent calendar that is the UKOUG Annual Conference in Birmingham, UK. So many doors to choose from!

The Conference is the premier event for Oracle users in the UK (and beyond). This year has another cracker of an agenda: check it out.

The session I'm anticipating most is Monday's double header with Bryn Llewellyn and Toon Koopelaar's A Real-World Comparison of the NoPLSQL & Thick Database Paradigms. Will they come down on the side of implementing business logic in stored procedures or won't they? It'll be tense :) But it will definitely be insightful and elegantly argued.

Oracle's bailiwick has expanded vastly over the years, and it's become increasingly hard to cover everything. Even so, it's fair to say in recent years older technologies such as Forms have been neglected in favour in favour of shinier baubles. Not this year. There's a good representation of Forms sessions this year, including a talk from Michael Ferrante, the Forms Product Manager. These sessions are all scheduled for the Wednesday, in a day targeted at database developers. If you're an Old Skool developer, especially if you're a Forms developer, and your boss will allow you only one day at the conference, then Wednesday is the day to pick.

Hope to see you there

Tuesday, April 19, 2016

The importance of cohesion

"Come on, come on, let's stick together" - Bryan Ferry

There's more to PL/SQL programs than packages, but most of our code will live in packages. The PL/SQL Reference offers the following benefits of organising our code into packages:

Modularity - we encapsulate logically related components into an easy to understand structure.

Easier Application Design - we can start with the interface in the package specification and code the implementation later.

Hidden Implementation Details - the package body is private so we can prevent application users having direct access to certain functionality.

Added Functionality - we can share the state of Package public variables and cursors for the life of a session.

Better Performance - Oracle Database loads the whole package into memory the first time you invoke a package subprogram, which makes subsequent invocations of any other subprogram quicker. Also packages prevent cascading dependencies and unnecessary recompilation.

Grants - we can grant permission on a single package instead of a whole bunch of objects.

However, we can only realise these benefits if the packaged components belong together: in other words, if our package is cohesive.

The ever reliable Wikipedia defines cohesion like this: "the degree to which the elements of a module belong together"; in other words how it's a measure of the strength of the relationship between components. It's common to think of cohesion as a binary state - either a package is cohesive or it isn't - but actually it's a spectrum. (Perhaps computer science should use "cohesiveness" which is more expressi but cohesion it is.)

Cohesion

Cohesion owes its origin as a Comp Sci term to Stevens, Myers, and Constantine. Back in the Seventies they used the terms "module" and "processing elements", but we're discussing PL/SQL so let's use Package and Procedure instead. They defined seven levels of cohesion, with each level being better - more usefully cohesive - than its predecessor.

Coincidental

The package comprises an arbitrary selection of procedures and functions which are not related in any way. This obviously seems like a daft thing to do, but most packages with "Utility" in their name fall into this category.

Logical

The package contains procedures which all belong to the same logical class of functions. For instance, we might have a package to collect all the procedures which act as endpoints for REST Data Services.

Temporal

The package consists of procedures which are executed at the same system event. So we might have a package of procedures executed when a user logs on - authentication, auditing, session initialisation - and similar package for tidying up when the user logs off. Other than the triggering event the packaged functions are unrelated to each other.

Procedural

The package consists of procedures which are executed as part of the same business event. For instance, in an auction application there are a set of actions to follow whenever a bid is made: compare to asking price, evaluate against existing maximum bid, update lot's status, update bidder's history, send an email to the bidder, send an email to the user who's been outbid, etc.

Communicational

The package contains procedures which share common inputs or outputs. For example a payroll package may have procedures to calculate base salary, overtime, sick pay, commission, bonuses and produce the overall remuneration for an employee.

Sequential

The package comprises procedures which are executed as a chain, so that the output of one procedure becomes the input for another procedure. A classic example of this is an ETL package with procedures for loading data into a staging area, validating and transforming the data, and then loading records into the target table(s).

Functional

The package comprises procedures which are focused on a single task. Not only are all the procedures strongly related to each other but they are fitted to user roles too. So procedures for power users are in a separate package from procedures for normal users. The Oracle built-in packages for Advanced Queuing are a good model of Functional cohesion.

How cohesive is cohesive enough?

The grades of cohesion, with Coincidental as the worst and Functional as the best, are guidelines. Not every package needs to have Functional cohesion. In a software architecture we will have modules at different levels. The higher modules will tend to be composed of calls to lower level modules. The low level modules are the concrete implementations and they should aspire to Sequential or Functional cohesion.

The higher level modules can be organised to other levels. For instance we might want to build packages around user roles - Sales, Production, HR, IT - because Procedural cohesion makes it easier for the UI teams to develop screens, especially if they need to skin them for various different technologies (desktop, web, mobile). Likewise we wouldn't want to have Temporally cohesive packages with concrete code for managing user logon or logoff. But there is a value in organising a package which bundles up all the low level calls into a single abstract call for use in schema level AFTER LOGON triggers.

Cohesion is not an easily evaluated condition. We need cohesion with a purpose, a reason to stick those procedures together. It's not enough to say "this package is cohesive". We must take into consideration how cohesive the package needs to be: how will it be used? what is its relationships with the other packages?

Applying design principles such as Single Responsibility, Common Reuse, Common Closure and Interface Segregation can help us to build cohesive packages. Getting the balance right requires an understanding of the purpose of the package and its place within the overall software architecture.

Part of the Designing PL/SQL Programs series

Sunday, April 03, 2016

Working with the Interface Segregation Principle

Obviously Interface Segregation is crucial for implementing restricted access. For any given set of data there are three broad categories of access:

reporting
manipulation
administration and governance

So we need to define at least one interface - packages - for each category in order that we can grant the appropriate access to different groups of users: read-only users, regular users, power users.

But there's more to Interface Segregation. This example is based on a procedure posted on a programming forum. Its purpose is to maintain medical records relating to a patient's drug treatments. The procedure has some business logic (which I've redacted) but its overall structure is defined by the split between the Verification task and the De-verification task, and flow is controlled by the value of the p_verify_mode parameter.

 
procedure rx_verification
     (p_drh_id in number,
       p_patient_name in varchar2,
       p_verify_mode in varchar2)
as
    new_rxh_id number;
    rxh_count number;
    rxl_count number;
    drh_rec drug_admin_history%rowtype;
begin
    select * into drh_rec ....;
    select count(*) into rxh_count ....;

    if p_verify_mode = 'VERIFY' then

        update drug_admin_history ....;
        if drh_rec.pp_id <> 0 then
            update patient_prescription ....;
        end if;
        if rxh_count = 0 then
            insert into prescription_header ....;
        else
            select rxh_id into new_rxh_id ....;
        end if;
        insert into prescription_line ....;
        if drh_rec.threshhold > 0
            insert into prescription_line ....;
        end if;

    elsif p_verify_mode = 'DEVERIFY' then

        update drug_admin_history ....;
        if drh_rec.pp_id <> 0 then
            update patient_prescription ....;
        end if;
        select rxl_rxh_id into new_rxh_id ....;
        delete prescription_line ....;
        delete prescription_header ....;

    end if;
end;

Does this procedure have a Single Responsibility? Hmmm. It conforms to Common Reuse - users who can verify can also de-verify. It doesn't break Common Closure, because both tasks work with the same tables. But there is a nagging doubt. It appears to be doing two things: Verification and De-verification.

So, how does this does this procedure work as an interface? There is a definite problem when it comes to calling the procedure: how do I as a developer know what value to pass to p_verify_mode?

  rx_management.rx_verification
     (p_drh_id => 1234,
       p_patient_name => 'John Yaya',
       p_verify_mode => ???);

The only way to know is to inspect the source code of the procedure. That breaks the Information Hiding principle, and it might not be viable (if the procedure is owned by a different schema). Clearly the interface could benefit from a redesign. One approach would be to declare constants for the acceptable values; while we're at it, why not define a PL/SQL subtype for verification mode and tweak the procedure's signature to make it clear that's what's expected:

create or replace package rx_management is
 
  subtype verification_mode_subt is varchar2(10);
  c_verify constant verification_mode_subt := 'VERIFY'; 
  c_deverify constant verification_mode_subt := 'DEVERIFY'; 
 
  procedure rx_verification
     (p_drh_id in number,
       p_patient_name in varchar2,
       p_verify_mode in verification_mode_subt);

end rx_management;

Nevertheless it is still possible for a caller program to pass a wrong value:

  rx_management.rx_verification
     (p_drh_id => 1234,
       p_patient_name => 'John Yaya',
       p_verify_mode => 'Verify');

What happens then? Literally nothing. The value drops through the control structure without satisfying any condition. It's an unsatisfactory outcome. We could change the implementation of rx_verification() to validate the parameter value and raise and exception. Or we could add an ELSE branch and raise an exception. But those are runtime exceptions. It would be better to mistake-proof the interface so that it is not possible to pass an invalid value in the first place.

Which leads us to to a Segregated Interface :

create or replace package rx_management is
 
  procedure rx_verification
     (p_drh_id in number,
       p_patient_name in varchar2);
 
  procedure rx_deverification
     (p_drh_id in number);
     
end rx_management;

Suddenly it becomes clear that the original procedure was poorly named (I call rx_verification() to issue an RX de-verification?!) We have two procedures but their usage is now straightforward and the signatures are cleaner (the p_patient_name is only used in the Verification branch so there's no need to pass it when issuing a De-verification).

Summary

Interface Segregation creates simpler and safer controls but more of them. This is a general effect of the Information Hiding principle. It is a trade-off. We need to be sensible. Also, this is not a proscription against flags. There will always be times when we need to pass instructions to called procedures to modify their behaviour. In those cases it is important that the interface includes a definition of acceptable values.

Part of the Designing PL/SQL Programs series

Three more principles

Here are some more principles which can help us design better programs. These principles aren't part of an organized theory, and they're aren't particularly related to any programming paradigm. But each is part of the canon, and each is about the relationship between a program's interface and its implementation.

The Principle Of Least Astonishment

Also known as the Principle of Least Surprise, the rule is simple: programs should do what we expect them to do. This is more than simply honouring the contract of the interface. It means complying with accepted conventions of our programming. In PL/SQL programming there is a convention that functions are read-only, or at least do not change database state. Another such convention is that low-level routines do not execute COMMIT statements; transaction management is the prerogative of the program at the top of the call stack, which may be interacting directly with a user or may be an autonomous batch process.

Perhaps the most common flouting of the Principle Of Least Astonishment is this:

   exception
      when others then
      null;

It is reasonable to expect that a program will hurl an exception if something as gone awry. Unfortunately, we are not as astonished as we should be when we find a procedure with an exception handle which swallows any and every exception.

Information Hiding Principle

Another venerable principle, this one was expounded by David Parnas in 1972. It requires that a calling program should not need to know anything about the implementation of a called program. The definition of the interface should be sufficient. It is the cornerstone of black-box programming. The virtue of Information Hiding is that knowledge of internal details inevitably leads to coupling between the called and calling routines: when we change the called program we need to change the caller too. We honour this principle any time we call a procedure in a package owned by another schema, because the EXECUTE privilege grants visibility of the package specification (the interface) but not the body (the implementation).

The Law Of Leaky Abstractions

Joel Spolsky coined this one: "All non-trivial abstractions, to some degree, are leaky." No matter how hard we try, some details of the implementation of a called program will be exposed to the calling programming, and will need to be acknowledged. Let's consider this interface again:

    function get_employee_recs
        ( p_deptno in number ) 
        return emp_refcursor;

We know it returns a result set of employee records. But in what order? Sorting by EMPNO would be pretty useless, given that it is a surrogate key (and hence without meaning). Other candidates - HIREDATE, SAL - will be helpful for some cases and irrelevant for others. One approach is to always return an unsorted set and leave it to the caller to sort the results; but it is usually more efficient to sort records in a query rather than a collection. Another approach would be to write several functions - get_employee_recs_sorted_hiredate(), get_employee_recs_sorted_sal() - but that leads to a bloated interface which is hard to understand. Tricky.

Conclusion

Principles are guidelines. There are tensions between them. Good design is a matter of trade-offs. We cannot blindly follow Information Hiding and ignore the Leaky Abstractions. We need to exercise our professional judgement (which is a good thing).

Part of the Designing PL/SQL Programs series

It's all about the interface

When we talk about program design we're mainly talking about interface design. The interface is the part of our program that the users interact with. Normally discussion of UI focuses on GUI or UX, that is, the interface with the end user of our application.

But developers are users too.

Another developer writing a program which calls a routine in my program is a user of my code (and, I must remember, six months after I last touched the program, I am that other developer). A well-designed interface is frictionless: it can be slotted into a calling program without too much effort. A poor interface breaks the flow: it takes time and thought to figure it out. In the worst case we have to scramble around in the documentation or the source code.

Formally, an interface is the mechanism which allows the environment (the user or agent) to interact with the system (the program). What the system actually does is the implementation: the interface provides access to the implementation without the environment needing to understand the details. In PL/SQL programs the implementation will usually contain a hefty chunk of SQL. The interface mediates access to data.

An interface is a contract. It specifies what the caller must do and what the called program will do in return. Take this example:

function get_employee_recs
     ( p_deptno in number )
     return emp_refcursor;

The contract says, if the calling program passes a valid DEPTNO the function will return records for all the employees in that department, as a strongly-typed ref cursor. Unfortunately the contract doesn't say what will happen if the calling program passes an invalid DEPTNO. Does the function return an empty set or throw an exception? The short answer is we can't tell. We must rely on convention or the document, which is an unfortunate gap in the PL/SQL language; the Java keyword throws is quite neat in this respect.

The interface is here to help

The interface presents an implementation of business logic. The interface is a curated interpretation, and doesn't enable unfettered access. Rather, a well-designed interface helps a developer use the business logic in a sensible fashion. Dan Lockton calls this Design With Intent: Good design expresses how a product should be used. It doesn't have to be complicated. We can use simple control mechanisms which to help other developers use our code properly.

Restriction of access

Simply, the interface restricts access to certain functions or denies it altogether. Only certain users are allowed to view salaries, and even fewer to modify them. The interface to Employee records should separate salary functions from more widely-available functions. Access restriction can be implemented in a hard fashion, using architectural constructs (views, packages, schemas) or in a soft fashion (using VPD or Data Vault). The hard approach benefits from clarity, the soft approach offers flexibility.

Forcing functions

If certain things must be done in a specific order then the interface should only offer a method which enforces the correct order. For instance, if we need to insert records into a parent table and a child table in the same transaction (perhaps a super-type/sub-type implementation of a foreign key arc) a helpful interface will only expose a procedure which inserts both records in the correct order.

Mistake-proofing

A well-design interface prevents its users from making obvious mistakes. The signature of a procedure should be clear and unambiguous. Naming is important. If a parameter presents a table attribute the parameter name should echo the column name: p_empno is better than p_id. Default values for parameters should lead developers to sensible and safe choices. If several parameters have default values they must play nicely together: accepting all the defaults should not generate an error condition.

Abstraction

Abstraction is just another word for interface. It allows us to focus on the details of our own code without need to understand the concrete details of the other code we depend upon. That's why good interfaces are the key to managing large codebases.

Part of the Designing PL/SQL Programs series

Wednesday, March 16, 2016

Designing PL/SQL Programs: Series home page

Designing PL/SQL Programs is a succession of articles published the articles in a nonlinear fashion. Eventually it will evolve into a coherent series. In the meantime this page serves as a map and navigation aid. I will add articles to it as and when I publish them.

Introduction

Designing PL/SQL Programs
It's all about the interface

Principles and Patterns

Introducing the SOLID principles
Introducing the RCCASS principles
Three more principles
The Single Responsibility principles
The Dependency Inversion Principle: a practical example
Working with the Interface Segregation Principle

Software Architecture

The importance of cohesion
Utilities - the Coincidental Cohesion anti-pattern
Avoiding Coincidental Cohesion

Interface design

Data Access Layer versus Table APIs The use and misuse of %TYPE and %ROWTYPE attributes in PL/SQL APIs

Tools and Techniques

The Dependency Inversion Principle: a practical example

These design principles may seem rather academic, so let's look at a real life demonstration of how applying Dependency Inversion Principle lead to an improved software design.

Here is a simplified version of an ETL framework which uses SQL Types in a similar fashion to the approach described in my blog post here. The loading process is defined using an abstract non-instantiable Type like this:

create or replace type load_t force as object
    ( txn_date date
      , tgt_name varchar2(30)
      , member function load return number
      , final member function get_tgt return varchar2
      )
not final not instantiable;
/

create or replace type body load_t as
    member function load return number
    is
    begin
        return 0;
    end load;
    final member function get_tgt return varchar2
    is
    begin
        return self.tgt_name;
    end get_tgt;
end;
/

The concrete behaviour for each target table in the ABC feed is defined by sub-types like this:

create or replace type load_tgt1_t under load_t
    ( overriding member function load return number
        , constructor function load_tgt1_t
            (self in out nocopy load_tgt1_t
             , txn_date date)
           return self as result
      )
;
/
create or replace type body load_tgt1_t as
    overriding member function load return number
    is
    begin
        insert into tgt1 (col1, col2)
        select to_number(col_a), col_b
        from stg_abc stg
        where stg.txn_date = self.txn_date;
        return sql%rowcount;
    end load;
    constructor function load_tgt1_t
            (self in out nocopy load_tgt1_t
             , txn_date date)
           return self as result
    is
    begin
        self.txn_date := txn_date;
        self.tgt_name := 'TGT1';
        return;
    end load_tgt1_t;
end;
/

This approach is neat because ETL is a fairly generic process: the mappings and behaviour for a particular target table are specific but the shape of the loading process is the same for any and all target tables. So we can build a generic PL/SQL procedure to handle them. This simplistic example does some logging, loops through a set of generic objects and, through the magic of polymorphism, calls a generic method which executes specific code for each target table:

    procedure load  
     (p_txn_date in date
        , p_load_set in sys_refcursor)
    is
        type loadset_r is record (
            tgtset load_t
            );
        lrecs loadset_r;
        load_count number;
    begin
        logger.logm('LOAD START::txn_date='||to_char(p_txn_date,'YYYY-MM-DD'));
        loop
            fetch p_load_set into lrecs;
            exit when p_load_set%notfound;
            logger.logm(lrecs.tgtset.get_tgt()||' start');
            load_count := lrecs.tgtset.load();
            logger.logm(lrecs.tgtset.get_tgt()||' loaded='||to_char(load_count));
        end loop;
        logger.logm('LOAD FINISH');
    end load;

So far, so abstract. The catch is the procedure which instantiates the objects:

    procedure load_abc_from_stg  
         (p_txn_date in date)
    is
        rc sys_refcursor;
    begin
        open rc for
            select load_tgt1_t(p_txn_date) from dual union all
            select load_tgt2_t(p_txn_date) from dual;
       load(p_txn_date, rc);
    end load_abc_from_stg;

On casual inspection it doesn't seem problematic but the call to the load() procedure gives the game away. Both procedures are in the same package:

create or replace package loader as
    procedure load 
     (p_txn_date in date
        , p_load_set in sys_refcursor);
    procedure load_abc_from_stg
         (p_txn_date in date);
end loader;
/

So the package mixes generic and concrete functionality. What makes this a problem? After all, it's all ETL so doesn't the package follow the Single Responsibility Principle? Well, up to a point. But if we want to add a new table to the ABC feed we need to update the LOADER package. Likewise if we want to add a new feed, DEF, we need to update the LOADER package. So it breaks the Stable Abstractions principle. It also creates dependency problems, because the abstract load() process has dependencies on higher level modules. We can't deploy the LOADER package without deploying objects for all the feeds.

Applying the Dependency Inversion Principle.

The solution is to extract the load_abc() procedure into a concrete package of its own. To make this work we need to improve the interface between the load() procedure and programs which call it. Both sides of the interface should depend on a shared abstraction.

The LOADER package is now properly generic:

create or replace package loader as
    type loadset_r is record (
            tgtset load_t
            );
    type loadset_rc is ref cursor return loadset_r;
    procedure load 
        (p_txn_date in date
          , p_load_set in loadset_rc)
         authid current_user
               ;
end loader;
/

The loadset_r type has moved into the package specification, and defines a strongly-typed ref cursor. The load() procedure uses the strongly-typed ref cursor.

Similarly the LOAD_ABC package is wholly concrete:

create or replace package loader_abc as
    procedure load_from_stg
            (p_txn_date in date);
end loader_abc;
/

create or replace package body loader_abc as
    procedure load_from_stg
            (p_txn_date in date)
    is
        rc loader.loadset_rc;
    begin
        open rc for
            select load_tgt1_t(p_txn_date) from dual union all
            select load_tgt2_t(p_txn_date) from dual;
       loader.load(p_txn_date, rc);
    end load_from_stg;
end loader_abc;
/

Both package bodies now depend on abstractions: the strongly-typed ref cursor in the LOADER specification and the LOADER_T SQL Type. These should change much less frequently than the tables in the feed or even the loading process itself. This is the Dependency Inversion Principle in action.

Separating generic and concrete functionality into separate packages produces a more stable application. Users of a feed package are shielded from changes in other feeds. The LOADER package relies on strongly-typed abstractions. Consequently we can code a new feed package which can call loader.load() without peeking into that procedure's implementation to see what it's expecting.

Part of the Designing PL/SQL Programs series

Tuesday, March 15, 2016

Introducing the RCCASS design principles

Rob C Martin actually defined eleven principles for OOP. The first five, the SOLID principles, relate to individual classes. The other six, the RCCASS principles, deal with the design of packages (in the C++ or Java sense, i.e. libraries). They are far less known than the first five. There are two reasons for this:

Unlike "SOLID", "RCCASS" is awkward to say and doesn't form a neat mnemonic.
Programmers are far less interested in software architecture.

Software architecture tends to be an alien concept in PL/SQL. Usually a codebase of packages simply accretes over the years, like a coral reef. Perhaps the RCCASS principles can help change that.

The RCCASS Principles

Reuse Release Equivalency Principle

The Reuse Release Equivalency Principle states that the unit of release matches the unit of reuse, which is the parts of the program unit which are consumed by other programs. Basically the unit of release defines the scope of regression testing for consuming applications. It's an ill-mannered release which forces projects to undertake unnecessary regression testing. Cohesive program units allow consumers to do regression testing only for functionality they actually use. It's less of a problem for PL/SQL because (unlike C++ libraries of Java jars) the unit of release can have a very low level of granularity: individual packages or stored procedures.

Common Reuse Principle

The Common Reuse principle supports the definition of cohesive program units. Functions which share a dependency belong together, because they are likely to be used together belong together. For instance, procedures which maintain the Employees table should be co-located in one package (or a group of related packages). They will share sub-routines, constants and exceptions. Packaging related procedures together makes the package easier to write and easier for calling programs to use.

Common Closure Principle

The Common Closure principle supports also the definition of cohesive program units. Functions which share a dependency belong together, because they have a common axis of change. Common Closure helps to minimise the number of program units affected by a change. For instance, programs which use the Employees table may need to change if the structure of the table changes. All the changes must be released together: table, PL/SQL, types, etc.

Acyclic Dependencies Principle

Avoid cyclic dependencies between program units: if package A depends on package B then B must not have a dependency on B. Cyclic dependencies make application hard to use and harder to deploy. The dependency graph shows the order in which objects must be built. Designing a dependency graph upfront is futile, but we can keep to rough guidelines. Higher level packages implementing business rules tend to depend on generic routines which in turn tend to depend on low-level utilities. There should be no application logic in those lower-level routines. If SALES requires a special logging implementation then that should be handled in the SALES subsystem not in the standard logging package.

Stable Dependencies Principle

Any change to the implementation of a program unit which is widely used will generate regression tests for all the programs which call it. At the most extreme, a change to a logging routine could affect all the other programs in our application. As with the Open/Closed Principle we need to fix bugs. But new features should be introduced by extension not modification. And refactoring of low-level dependencies must not done on a whim.

Stable Abstractions Principle

Abstractions are dependencies, especially when we're talking about PL/SQL. So this Principle is quite similar to Stable Dependencies Principle. The key difference is that this relates to the definition of interfaces rather than implementation. A change to the signature of a logging routine could require code changes to all the other programs in the application. Obviously this is even more inconvenient than enforced regression testing. Avoid changing the signature of a public procedure or the projection of a public view. Again, extension rather than modification is the preferred approach.

Applicability of RCCASS principles in PL/SQL

The focus of these principles is the stability of a shared codebase, and minimising the impact of change on the consumers of our code. This is vital in large projects, where communication between teams is often convoluted. It is even more important for open source or proprietary libraries.

We we can apply Common Reuse Principle and Common Closure Principle to define the scope of the Reuse Release Equivalency Principle, and hence define the boundaries of a sub-system (whisper it, schema). Likewise we can apply the Stable Dependencies Principle and Stable Abstractions Principle to enforce the Acyclic Dependencies Principle to build stables PL/SQL libraries. So the RCCASS principles offer some most useful pointers towards a stable PL/SQL software architecture.

Part of the Designing PL/SQL Programs series

Monday, March 14, 2016

Introducing the SOLID design principles

PL/SQL programming standards tend to focus on layout (case of keywords, indentation, etc), naming conventions, and implementation details (such as use of cursors). These are all important things, but they don't address questions of design. How easy is it to use the written code? How easy is it to test? How easy will it be to maintain? Is it robust? Is it secure?

Simply put, there are no agreed design principles for PL/SQL. So it's hard to define what makes a well-designed PL/SQL program.

The SOLID principles

It's different for object-oriented programming. OOP has more design principles and paradigms and patterns than you can shake a stick at. Perhaps the most well-known are the SOLID principles, which were first mooted by Robert C. Martin, AKA Uncle Bob, back in 1995 (although it was Michael Feathers who coined the acronym).

Although Martin put these principles together for Object-Oriented code, they draw on a broader spectrum of programming practice. So they are transferable, or at least translatable, to the other forms of modular programming. For instance, PL/SQL.

Single Responsibility Principle

This is the foundation stone of modular programming: a program unit should do only one thing. Modules which do only one thing are easier to understand, easier to test and generally more versatile. Higher level procedures can be composed of lower level ones. Sometimes it can be hard to define what "one thing" means in a given context, but some of the other principles provide clarity. Martin's formulation is that there should be just one axis of change: there's just one set of requirements which, if modified or added to, would lead to a change in the package.

Open/closed Principle

The slightly obscure name conceals a straightforward proposal. It means program units are closed to modification but open to extension. If we need to add new functionality to a package, we create a new procedure rather than modifying an existing one. (Betrand Meyer, the father of Design By Contract programming, originally proposed it; in OO programming this principle is implemented through inheritance or polymorphism.) Clearly we must fix bugs in existing code. Also it doesn't rule out refactoring: we can tune the implementation providing we don't change the behaviour. This principle mainly applies to published program units, ones referenced by other programs in Production. Also the principle can be looser when the code is being used within the same project, because we can negotiate changes with our colleagues.

Liskov Substitution Principle

This is a real Computer Science-y one, good for dropping in code reviews. Named for Barbara Liskov it defines rules for behavioural sub-typing. If a procedure has a parameter defined as a base type it must be able to take an instance of any sub-type without changing the behaviour of the program. So a procedure which uses

IS OF

to test the type of a passed parameter and do something different is violating Liskov Substitution. Obviously we don't make much use of Inheritance in PL/SQL programming, so this Principle is less relevant than in other programming paradigms.

Interface Segregation Principle

This principle is about designing fine-grained interfaces. It is a extension of the Single Responsibility Principle. Instead of build one huge package which contains all the functions relating to a domain build several smaller, more cohesive packages. For example Oracle's Advanced Queuing subsystem comprises five packages, to manage different aspects of AQ. Users who write to or read from queues have

DBMS_AQ

; users who manage queues and subscribers have

DBMS_AQADM

Dependency Inversion Principle

Interactions between programs should be through abstract interfaces rather than concrete ones. Abstraction means the implementation of one side of the interface can change without changing the other side. PL/SQL doesn't support Abstract objects in the way that say Java does. To a certain extent Package Specifications provide a layer of abstraction but there can only be one concrete implementation. Using Types to pass data between Procedures is an interesting idea, which we can use to decouple data providers and data consumers in a useful fashion.

Applicability of SOLID principles in PL/SQL

So it seems like we can apply SOLID practices to PL/SQL. True, some Principles fit better than others. But we have something which we might use to distinguish good design from bad when it comes to PL/SQL interfaces.

The SOLID principles apply mainly to individual modules. Is there something similar we can use for designing module groups? Why, yes there is. I'm glad you asked.

Part of the Designing PL/SQL Programs series

Wednesday, December 31, 2014

UKOUG Annual Conference (Tech 2014 Edition)

The conference

This year the UKOUG's tour of Britain's post-industrial heritage brought the conference to Liverpool. The Arena & Convention Centre is based in Liverpool docklands, formerly the source of the city's wealth and now a touristic playground of museums, souvenir shops and bars. Still at least the Pumphouse functions as a decent pub, which is one more decent pub than London Docklands can boast. The weather was not so much cool in the 'Pool as flipping freezing, with the wind coming off the Mersey like a chainsaw that had been kept in a meat locker for a month. Plus rain. And hail. Which is great: nothing we Brits like more than moaning about the weather.

After last year's experiment with discrete conferences, Apps 2014 was co-located with Tech 2014; each was still a separate conference with their own exclusive agendas (and tickets) but with shared interests (Exhibition Hall, social events). Essentially DDD's Bounded Context pattern. I'll be interested to know how many delegates purchased the Rover ticket which allowed them to cross the borders. The conferences were colour-coded, with the Apps team in Blue and the Tech team in Red; I thought this was an, er, interesting decision in a footballing city like Liverpool. Fortunately the enforced separation of each team's supporters kept violent confrontation to a minimum.

The sessions

This is not all of the sessions I attended, just the ones I want to comment on.

There's no place like ORACLE_HOME

I started my conference by chairing Niall Litchfield's session on Monday morning. Niall experienced every presenter's nightmare: switch on the laptop, nada, nothing, completely dead. Fortunately it turned out to be the fuse in the charger's plug, and a marvellous tech support chap was able to find a spare kettle cable. Niall coped well with the stress and delivered a wide-ranging and interesting introduction of some of the database features available to developers. It's always nice to here a DBA say difficult is the task of developers these days. I'd like to hear more acknowledge it, and more importantly being helpful rather than becoming part of the developer's burden :)

The least an Oracle DBA needs to know about Linux

Turns out "the least" is still an awful lot. Martin Nash started with installing a distro and creating a file system, and moves on from there. As a developer I find I'm rarely allowed OS access to the database server these days; I suspect many enterprise DBAs also spend most of their time in OEM rather than the a shell prompt. But Linux falls into that category of things which when you need to know them you need to know them in the worst possible way. So Martin has given me a long list of commands with which to familiarize myself.

Why solid SQL still delivers the best performance

Robyn Sands began her session with the shocking statement that the best database performance requires good application design. Hardware improvements won't safe us from the consequences of our shonky code. From her experience in Oracle's Real World Performance team, the top three causes of database slowness are:

People not using the database the way it was designed to be used
Sub-optimal architecture or code
Sub-optimal algorithm (my new favourite synonym for "bug")

The bulk of her session was devoted to some demos, racing different approaches to DML:

Row-by-row processing
Array (bulk) processing
Manual parallelism i.e. concurrency
Set-based processing i.e. pure SQL

There were a series of races, starting with a simple copying of data from one table to another and culminating in a complex transformation exercise. If you have attended any Oracle performance session in the last twenty years you'll probably know the outcome already but it was interesting to see how much faster pure SQL was compared to the other approaches. in fact the gap between the set-based approach and the row-based approach widened with each increase in complexity of the task. What probably surprised many people (including me) was how badly manual parallelism fared: concurrent threads have a high impact on system resource usage, because of things like index contention.

Enterprise Data Warehouse Architecture for Big Data

Dai Clegg was at Oracle for a long time and has since worked for a couple of startups which used some of the new-fangled Big Data/NoSQL products. This mix of experience has given him a breadth of insight which is not common in the Big Data discussion.

His first message is one of simple economics: these new technologies solve the problem of linear scale-out at a price-point below that of Oracle. Massively parallel programs using cheap or free open source software on commodity hardware. Commodity hardware is more failure prone than enterprise tin (and having lots of the blighters actually reduces the MTTF) but these distributed frameworks are designed to handle node failures; besides, commodity hardware has gotten a lot more reliable over the years. So, it's not that we couldn't implement most Big Data applications using relational databases, it's just cheaper not to.

Dai's other main point addressed the panoply of products in the Big Data ecosystem. Even in just the official Hadoop stack there are lots of products with similar or overlapping capabilities: do we need Kafka or Flume or both? There is no one Big Data technology which is cheaper and better for all use cases. Therefore it is crucial to understand the requirements of the application before starting on the architecture. Different applications will demand different permutations from the available options. Properly defined use cases (which don't to be heavyweight - Dai hymned the praises of the Agile-style "user story") will indicate which kinds of products are required. Organizations are going to have to cope with heterogeneous environments. Let's hope they save enough on the licensing fees to pay for the application wranglers.

How to write better PL/SQL

After last year's fiasco with shonky screen rendering and failed demos I went extremely low tech: I could have my presentation from the PDF on a thumb-drive. Fortunately that wasn't necessary. My session was part of the Beginners' Track: I'm not sure how many people in the audience were actual beginners; I hope the grizzled veterans got something out of it.

One member of the audience turned out to be a university lecturer; he was distressed by my advice to use pure SQL rather than PL/SQL whenever possible. Apparently his students keep doing this and he has to tell them to use PL/SQL features instead. I'm quite heartened to hear that college students are familiar with the importance of set-based programming. I'm even chuffed to have my prejudice confirmed that it is university lecturers who are teach people to write what is bad code in the real world. I bet he tells them to use triggers as well :)

Oracle Database 12c New Indexing Features

I really enjoy Richard Foote's presenting style: it is breezily Aussie in tone, chatty and with the occasional mild cuss word. If anybody can make indexes entertaining it is Richard (and he did).

His key point is that indexes are not going away. Advances in caching and fast storage will not remove the need for indexed reads, and the proof is Oracle's commitment to adding further capabilities. In fact, there are so many new indexing features that Ricahrd's presentation was (for me) largely a list of things I need to go away and read about. Some of these features are quite arcane: an invisible index? on an invisible column? Hmmmm. I'm not sure I understand when I might want to implement partial indexing on a partitioned table. What I'm certain about is that most DBAs these days are responsible for so many databases that they don't have the time to acquire the requisite understanding of individual applications and their data; so it seems to me unlikely that they will be able to decide which partitions need indexing. This is an optimization for the consultants.

Make your data models sing

It was one of the questions in the Q&A section of Susan Duncan's talk which struck me. The questioner talked about their "legacy" data warehouse. How old did that make me feel? I can remember when Data Warehouses were new and shiny and going to solve very enterprises data problems.

The question itself dealt with foreign keys: as is a common practice the data warehouse had no defined foreign keys. Over the years it had sprawled across several hundred tables, without the documentation keeping up. Is it possible, the petitioner asked, to reverse engineer the data model with foreign keys in the database? Of course the short answer is No. While it might be possible to infer relationships from common column names, there isn't any tool we were aware of which could do this. Another reminder that disabled foreign keys are better than no keys at all.

Getting started with JSON in the Database

Marco Gralike has a new title: he is no longer Mr XMLDB he is now Mr Unstructured Data in the DB. Or at least his bailiwick has been extended to cover JSON. JSON (JavaScript Object Notation) is a lightweight data transfer mechanism: basically it's XML without the tags. All the cool kids like JSON because it's the basis of RESTful web interfaces. Now we can store JSON in the database (which probably means all the cool kids will wander off to find something else now that fusty old Oracle can do it).
The biggest surprise for me is that Oracle haven't introduced a JSON data type (apparently there were so many issues around the XMLType nobody had the appetite for another round). So that means we store JSON in VARCHAR2, CLOB, BLOB or RAW. But like XML there are operators which allow us to include JSON documents in our SQL. The JSON dot notation works pretty much like XPath, and we can use it to build function-based indexes on the stored documents. However, we can't (yet) update just part of a JSON doc: it is wholesale replacement only.

Error handling is cute: by default invalid JSON syntax in a query produces null in result set rather than an exception. Apparently that's how the cool kids like it. For those of us that prefer our exceptions hurled rather than swallowed there is an option to override this behaviour.

SQL is the best development language for Big Data

This was Tom Kyte giving the obverse presentation to Dai Clegg: Oracle can do all this Big Data stuff, and has been doing it for some time. He started with two historical observations:

XML data stores were going to kill off relational databases. Which didn't happen.
Before relational databases and SQL there was NoSQL, literally no SQL. Instead there were things like PL/1, which was a key-value data store.

Tom had a list of features in Oracle which support Big Data applications. They were:

Analytic functions which have enabled ordered array semantics in SQL since the last century.
SQL Developer's support for Oracle Data Mining.
The MODEL clause (for those brave enough to use it).
Advanced pattern matching with the MATCH RECOGNIZE clause in 12c
External tables with their support for extracting data from flat files, including from HDFS (with the right connectors)
Support for JSON documents (see above).

He could also have discussed document storage with XMLType and Oracle Text, Enterprise R, In-Memory columnar storage, and so on. We can even do Map/Reduce in PL/SQL if we feel so inclined. All of these are valid assertions; the problem is (pace Dai Clegg) simply one of licensing. Too many of the Big Data features are chargeable extras on top of Enterprise Edition licenses. Big Data technology is suited to a massively parallel world where all processors are multi-core and Oracle's licensing policy isn't.

Five hints for efficient SQL

This was an almost philosophical talk from Jonathan Lewis, in which he explained how he uses certain hints to fix poorly performing queries. The optimizer takes a left-deep approach, which can lead to a bad choice of transformation, bad estimates (but check your stats as well!) and bad join orders. His strategic solution is to shape the query with hints so that Oracle's execution plan meets our understanding of the data. <

So his top five hints are:

(NO_)MERGE
(NO_)PUSH_PRED
(NO_)UNNEST
(NO_)PUSH_SUBQ
DRIVING_SITE

Jonathan calls these strategic hints, because advise the optimizer how to join tables or how to transform a sub-query. They don't hard-code paths in the way that say the INDEX hint does.

Halfway through the presentation Jonathan's laptop slid off the lectern and slammed onto the stage floor. End of presentation? Luckily not. Apparently his laptop is made of the same stuff they use for black box flight recorders, because after a few anxious minutes it rebooted successfully and he was able to continue with his talk. I was struck by how unflustered he was by the situation (even though he didn't have a backup due to last minute tweaking of the slides). A lovely demonstration of grace under pressure.

Thursday, October 10, 2013

Oracle Big Data Meetup - 09-OCT-2013

The Oracle guys running the Big Data 4 the Enterprise Meetup are always apologetic about marketing. The novelty is quite amusing. They do this because most Big Data Meetups are full of brash young people from small start-ups who use cool open source software. They choose cool open source software partly because they're self-styled hackers who like being able to play with their software any way they choose. But mainly it is because the budgetary constraints of being a start-up mean they have to choose between a Clerkenwell office and Aeron chairs, or enterprise software licenses, and that's no choice at all.

But an Oracle Big Data meetup has a different constituency. We come from an enterprise background, we've all been using Oracle software for a long time and we know what to expect from an Oracle event. We're prepared to tolerate a certain amount of Oracle marketing because we want to hear the Oracle take on things, and we come prepared with our shields up. Apart from anything else, the Meetup sponsor is always cut some slack, in exchange for the beer'n'pizza.

Besides the Oracle Big Data Appliance is quite at easy sell, certainly compared to the rest of the engineered systems. The Exa stack largely comprises machines which replace existing servers whereas Big Data is a new requirement. Most Oracle shops probably don't have a pool of Linux/Java/Network hackers on hand to cobble together a parallel cluster of machines and configure them to run Hadoop. A pre-configured Exadoop appliance with Oracle's imprimatur is just what those organisations need. The thing is, it seems a bit cheeky to charge a six figure sum for a box with a bunch of free software on it. No matter how good box is. Particularly when it can be so hard to make the business case for a Big Data initiative.

Stephen Sheldon's presentation on Big Data Analytics As A Service addressed exactly this point. He works for Detica. They have stood up an Oracle BDA instance which they rent out for a couple of months to organisations who want to try a Big Data initiative. Detica provide a pool of data scientists and geeks to help out with the processing and analytics. At the end of the exercise the customer has a proven case showing whether Big Data can give them sufficient valuable insights into their business. This strikes me as a highly neat idea, one which other companies will wish they had thought of first.

Ian Sharp (one of the apologetic Oracle guys) presented on Oracle's Advanced Analytics. The big idea here is R embedded in the database. This gives data scientists access to orders of magnitude more data than they're used to having on their desktop R instances. Quants working in FS organisations will most likely have an accident when they realise just how great an idea this is. Unfortunately, Oracle R Enterprise is part of the Advanced Analytics option, so probably only the big FS companies will go for it. But the Oracle R distro is still quite neat, and free.

Mark Sampson from Cloudera rounded off the evening with a talk on a new offering, Cloudera Search. This basically provides a mechanism for building a Google / Amazon style search facility over a Hadoop cluster. The magic here is that Apache Solr is integrated into the Hadoop architecture instead of as a separate cluster, plus a UI building tool. I spent five years on a project which basically did this with an Oracle RDBMS, hand-rolled ETL and XML generators and lots of Java code plumbing an external search engine into the front-end. It was a great system, loved by its users and well worth the effort at the time. But I expect we could do the whole thing again in a couple of months with this tool set. Which is good news for the next wave of developers.

Some people regard attending technical meetups a bit odd. I mean, giving up your free time to listen to a bunch of presentations on work matters? But if you find this stuff interesting you can't help yourself. And if you work with Oracle tech and are interested in data then this meetup is definitely worth a couple of hours of your free time.

Thursday, July 11, 2013

UKOUG Analytics Event: a semi-structured analysis

Yesterday's UKOUG Analytics event was a mixture of presentations about OBIEE with sessions on the frontiers of data analysis. I'm not going to cover everything, just dipping into a few things which struck me during the day

During the day somebody described dashboards as "Fisher Price activity centres for managers". Well, Neil Sellers showed a mobile BI app called RoamBI which is exactly that. Swipe that table, pinch that graph, twirl that pie chart! (No really, how have we survived so long with pie charts which can't be rotated?) The thing is so slick, it'll keep the boss amused for hours. Neil's theme on the importance of data visualization to convey a message or tell a story was picked up by Claudio Bastia and Nicola Sandol. Their presentation included a demo of IConsulting's Location Intelligence extension for OBIEE. The tool not only does impressive things with the display of geographic data, it also allows users to interact with the maps to refine queries and drill down into the data. This is visualization which definitely goes beyond the gimmick: it's an extremely powerful way of communicating complex data sets.

A couple of presentations quoted the statistic that 90% of our data was created in the last two years. This is a figure which has been bandied about but I've never seen a citation which explains who calculated it and what method they used (although it's supposed to have originated at IBM). It probably comes from the same place as most other statistics (and project estimates). What is the "data" the figure measures? I'm sure in some areas of human endeavour (bioinformatics, say, or CERN) the amount of data they produce has gone metastatic. And obviously digital cameras, especially on phones, are now ubiquitous, so video and photographs account for a lot of the data growth. But are selfies, instagrammed burgers and cute kittens really data? Same with other content: how much of this data explosion is mirroring, retweets, quoting, spam and AdSense farms? Not to mention the smut. Anyway, that 90% was first cited in 2012; it's now 2013 and somebody needs to ~~invent~~ derive a new figure.

The day rounded off with a panel and a user presentation. Toby Price opened the Q&A by asking Oracle's Nick Whitehead, how does Hadoop fit into an Oracle estate? It's a good question. After all, Oracle has been able to handle unstructured data, i.e. text, since the introduction of ConText in 8.0 (albeit as a chargeable extra in those days). And there's nothing special about MapReduce: PL/SQL can do that. So what's the deal with Hadoop? Here's the impertinent answer to this pertinent question: Hadoop allows us to run massively parallel jobs without paying Oracle's per processor licenses. Let's face it, not even Tony Stark could afford to run a one-thousand core database.

The closing session was a presentation from James Wyper & Dirk Shelley about upgrading the BI architecture at John Lewis Partnership. They described it as a war story, but actually it was a report from the front lines, because the implementation is not yet finished. James and Dirk covered the products - which ones worked as advertised, which ones gave them grief (integration was a particular source of grief). They also discussed their approach to the project, relating what they did well and what they would do differently with the advantage of hindsight. This sort of session is the best part of any user group: real users sharing their experiences with the community. We need more of them.

Monday, June 24, 2013

Let me SLEEP!

DBMS_LOCK is a slightly obscure built-in package. It provides components which so we build our own locking schemes. Its obscurity stems from the default access on the package, which is restricted to its owner SYS and the other power user accounts. Because implementing your own locking strategy is a good way to wreck a system, unless you really know what you're doing. Besides, Oracle's existing functionality is such that there is almost no need to need to build something extra (especially since 11g finally made the SELECT ... FOR UPDATE SKIP LOCKED syntax legal). So it's just fine that DBMS_LOCK is private to SYS. Except ...

... except that one of the sub-programs in the package is SLEEP(). And SLEEP() is highly useful. Most PL/SQL applications of any sophistication need the ability to pause processing for a short while, either a fixed time or perhaps polling for a specific event. So it is normal for PL/SQL applications to need access to DBMS_SLEEP.LOCK().

Commonly this access is granted at the package level, that is grant execute on dbms_lock to joe_dev. Truth to be told, there's not much harm in that. The privilege is granted to a named account, and if somebody uses the access to implement a roll-your-own locking strategy which brings Production to its knees, well, the DBAs know who to look for.

But we can employ a schema instead. The chief virtue of a schema is managing rights on objects. So let's create a schema for mediating access to powerful SYS privileges:

create user sys_utils identified by &pw
temporary tablespace temp
/
grant create procedure, create view, create type to sys_utils
/

Note that SYS_UTILS does not get the create session privilege. Hence nobody can connect to the account, a sensible precaution for a user with potentially damaging privileges. Why bar connection in all databases and not just Production? The lessons of history tell us that developers will assume they can do in Production anything they can do in Development, and write their code accordingly.

Anyway, as well as granting privileges, the DBA user will need to build SYS_UTIL's objects on its behalf:

grant execute on dbms_lock to sys_utils
/
create or replace procedure sys_utils.sleep
    ( i_seconds in number)
as
begin
    dbms_lock.sleep(i_seconds);
end sleep;
/
create public synonym sleep for sys_utils.sleep
/
grant execute on sys_utils.sleep to joe_dev
/

I think it's a good idea to be proactive about creating an account like this; granting it some obviously useful privileges before developers ask for them, simply because some developers won't ask. The forums occasionally throw up extremely expensive PL/SQL loops whose sole purpose is to burn CPU cycles or wacky DBMS_JOB routines which run every second. These WTFs have their genesis in ignorance of, or lack of access to, DBMS_LOCK.SLEEP().

Oracle 10g - a time traveller's tale

Time travel sucks, especially going back in time. Nobody takes a bath, there are no anaesthetics and you can't get a decent wi-fi signal anywhere. As for killing your own grandfather, forget about it.

The same is true for going back in database versions. In 2009 I had gone straight from an Oracle 9i project to an Oracle 11g one. So when I eventually found myself on a 10g project it was rather disorientating. I would keep reaching for tools which weren't in the toolbox: LISTAGG(), preprocessor scripts for external tables, generalized invocation for objects.

I had missed out on 10g while it was shiny and new, and now it just seemed limited. Take Partitioning. Oracle 10g supported exactly the same composite partitioning methods as 9i: just Range-hash and Range-List, whereas 11g is full of wonders like Interval-Range, Hash-Hash and the one I needed, List-List.

Faking a List-List composite partitioning scheme in 10g

Consider this example of a table with a (slightly forced) need for composite List-List partitioning. It is part of a engineering stock control system, in which PRODUCTS are grouped in LINES (Ships, Cars, Planes) and COMPONENTS are grouped into CATEGORIES (Frame, interior fittings, software, etc). We need an intersection table which links components to products.

There are hundreds of thousands of components and tens of thousands of products. But we are almost always only interested in components for a single category within a single product line (or product) so composite partitiong on (product_line, component_category) is a good scheme. In 11g the List-List method works just fine:

SQL> create table product_components
  2      (product_line varchar2(10) not null
  3          , product_id number not null
  4          , component_category varchar2(10) not null
  5          , component_id number not null
  6          , constraint pc_pk primary key (product_id, component_id )
  7          , constraint pc_prd_fk foreign key (product_id )
  8             references products (product_id)
  9          , constraint pc_com_fk foreign key (component_id )
 10             references components (component_id)
 11      )
 12  partition by range(product_line) subpartition by list(component_category)
 13       subpartition template
 14           (subpartition sbody values ('BODY')
 15            , subpartition sint values ('INT')
 16            , subpartition selectr values ('ELECTR')
 17            , subpartition ssoft values ('SOFT')
 18           )
 19      (partition pship values ('SHIP')
 20       , partition pcar values  ('CAR')
 21       , partition pplane values ('PLANE')
 22       )
 23  /

Table created.

SQL>

But in 10g the same statement hurls ORA-00922: missing or invalid option . The workaround is a bit of a nasty hack: replace the first List with a Range, producing a legitimate Range-List composite:

SQL> create table product_components
  2      (product_line varchar2(10) not null
  3          , product_id number not null
  4          , component_category varchar2(10) not null
  5          , component_id number not null
  6          , constraint pc_pk primary key (product_id, component_id )
  7          , constraint pc_prd_fk foreign key (product_id )
  8             references products (product_id)
  9          , constraint pc_com_fk foreign key (component_id )
 10             references components (component_id)
 11      )
 12  partition by range(product_line) subpartition by list(component_category)
 13       subpartition template
 14           (subpartition sbody values ('BODY')
 15            , subpartition sint values ('INT')
 16            , subpartition selectr values ('ELECTR')
 17            , subpartition ssoft values ('SOFT')
 18           )
 19      (partition pcar values less than ('CAS')
 20       , partition pplane values less than ('PLANF')
 21       , partition pship values less than ('SHIQ')
 22       )
 23  /

Table created.

SQL>

Note the wacky spellings which ensure that 'CAR' ends up in the right partition. Also we have to re-order the partition clause so that the partition bounds don't raise an ORA-14037 exception. We are also left with the possibility that a rogue typo might slip records into the wrong partition, so we really ought to have a foreign key constraint on the product_line column:

alter table product_components add constraint pc_prdl_fk foreign key (product_line) 
           references product_lines (line_code)
/

I described this as a nasty hack. It is not really that nasty, in fact it actually works very well in daily processing. But managing the table is less intuitive. Say we want to manufacture another line, rockets. We cannot just add a new partition:

SQL> alter table product_components 
    add partition prock values less than ('ROCKEU')
/
  2    3      add partition prock values less than ('ROCKEU')
                  *
ERROR at line 2:
ORA-14074: partition bound must collate higher than that of the last partition


SQL>

Instead we have to split the PSHIP partition in two:

SQL> alter table product_components split partition pship
  2     at ('ROCKEU')
  3     into (partition  prock, partition pship)
  4  /

Table altered.

SQL>

The other snag is, that once we do get back to the future it's a bit of a chore to convert the table to a proper List-List scheme. Probably too much of a chore to be worth the effort. Even with a time machine there are only so many hours in the day.