Monday 3 October 2022

Beware when using SQL.LINE

In my experience, there are two general cases of insertion into a table during a procedure: the table is normally one of the STACK tables or GENERALLOAD, but in certain cases can be another table, normally a private one that has been defined specially for the procedure. The identity of the table is not important in the cases that I am going to describe. The two types might be called 'explicit' and 'implicit', with reference to the key field of the table into which data will be inserted.

I am going to describe in general terms the copying of a customer order. This would use the GENERALLOAD table; the fields of the order header would go into a tuple whose value for RECORDTYPE would be '1' and whose LINE would be 1. The technique for copying the order lines depends on whether only the lines are being copied, or whether any sub-forms of the lines are being copied as well. In the first case (no sub-forms), one can simply write

INSERT INTO GENERALLOAD (LINE, RECORDTYE, ... SELECT 1 + ORDERITEMS.LINE, '2', ...
If one were feeling adventurous, or there was no natural key for the sub-form, one could replace ORDERITEMS.LINE with SQL.LINE. What is important is that this number is incremented by one every time, as line 1 in GENERALLOAD holds the header line. This is what I would describe as 'implicit' inserting.

Should there be sub-forms, the data has to be entered by means of a cursor, where first line data is added then sub-form data. As there will no longer be any correspondence between GENERALLOAD.LINE and ORDERITEMS.LINE, one has to maintain a local variable (normally :LINE) whose value is incremented prior to every insert. This is 'explicit' inserting. At the same time, data for the sub-form could be inserted either implicitly or explicitly.
:LINE = 1; INSERT INTO GENERALLOAD (LINE, RECORDTYE, ... SELECT :LINE, '1', {header data}; DECLARE C1 CURSOR FOR SELECT ORDERITEMS.ORDI, .... OPEN C1; GOTO 200 WHERE :RETVAL <= 0; LABEL 100; FETCH C1 INTO :ORDI, .... GOTO 200 WHERE :RETVAL <= 0; :LINE = :LINE + 1; INSERT INTO GENERALLOAD (LINE, RECORDTYPE, ... VALUES (:LINE, '2', {line data} ...); /* sub-form */ INSERT INTO GENERALLOAD (LINE, RECORDTYPE, ... SELECT :LINE + SQL.LINE, '3', {sub-form data} ...; LOOP 100; LABEL 200; CLOSE C1; EXECUTE INTERFACE ....
There is a deliberate mistake in the above code, but first let's think it through. The value of GENERALLOAD.LINE for header data will obviously be 1, and the value of this field for the first line's data will be 2 (as the variable :LINE is explicitly incremented). For the first line of the sub-form data, GENERALLOAD.LINE will be :LINE + SQL.LINE, i.e. 2 + 1, or 3. The second line will have GENERALLOAD.LINE = 4. This can be represented in the following table
 
GENERALLOAD.LINE DATA
1 order header
2 first line of order
3 first line of sub-form for order line 1
4 second line of sub-form for order line 1

So far so good. For the second order line, :LINE will explicitly be incremented, so this line will have GENERALLOAD.LINE = 3 ... except for the fact that there is already a tuple with this key value in GENERALLOAD, and so the second line will not be inserted. What is missing is the following line:
SELECT MAX (LINE) INTO :LINE FROM GENERALLOAD;
This line should appear just before LOOP 100. As a result of this line, :LINE will have the value 4 after the insertion of the second sub-form line, and as this value is incremented prior to inserting the second order line, this line will have GENERALLOAD.LINE = 5.

One can generalise this: whenever one uses the construct 'INSERT INTO ... SELECT SQL.LINE', one must remember to increment SQL.LINE with a variable (such as :LINE or :MAX), then after the INSERT statement should come the statement that selects the maximum key number inserted so far into the above variable.

Such a simple heuristic, so easy to forget: if this variable (:LINE) is not used again, it seems that there is no point in extracting its value. For example, I have written many procedures that send a report by email: the procedure starts by defining some variables, then there is an insert statement into STACK4 (or similar) based on those variables, using SQL.LINE as the key field, then a report is executed passing STACK4 as its data. There is no loop and so there is no need to select the maximum value of KEY from STACK4. 

The problem rears its head when such code - which was written for a non-looping procedure - gets copied into a procedure that does have a loop. For example, I wrote a somewhat complicated procedure that collects BOM data from all the lines in a given order; this was then extended to work on several orders. Each order is selected via a cursor; local variables have to be reinitialised for each new order, but as the data are being inserted into the same STACK table, the key value should continue to increment. In other words, the maximum value of KEY should be extracted before the LOOP command that causes a new order to be selected.

This blog is of course being written because I fell foul of this heuristic. In this case, the procedure had to call an external program for calculating budget use; this program appears to work on one year's data at a time. As a result, I was forced to use the rather arcane structure of having a loop at the level of SQLI stages as shown below.

Stage remarks
10 Set up variables
20 increment variables
30 external procedure
40 insert the data returned from the external procedure into a special table. At the end check the terminating condition and set :$.GO appropriately
50 GOTO 20 if the end has not been reached, 60 if it has been reached
60 cleanup

One can guess what the problem was: the code in stage 40 had originally been written for one year's data and used the 'INSERT INTO ... SELECT SQL.LINE' construct without having 'SELECT MAX (LINE)' at its end. When the procedure was run, it seemed to work. Let's say that the first year had 200 lines to be inserted, the second year 300 lines and the third year 100 lines. In this case, the 200 lines of the first year would get inserted without problem; the first 200 lines of the second year would not be inserted because SQL.LINE would return values that had already been inserted, but the final 100 lines (i.e. 201-300) would be inserted, giving the impression that the procedure worked for this year. Data from the final year would not be inserted at all.

It took me quite some time to figure out what the problem was; this was exacerbated by the facts that the procedure was working on a client's data (that are unfamiliar to me) and that the procedure was based on complicated code involving budgets (that too are somewhat unfamiliar to me). All the checks that I inserted (sorry for the inadvertent pun) showed the expected results, but somehow data was not being inserted into the table. I think that I wrote about a similar problem years ago: when data does not get inserted into a table, check its primary key.

It was less clear in this case because the loop construct was not within the same stage as the data insertion, but even so .... After 'INSERT INTO ... SELECT SQL.LINE', ALWAYS add 'SELECT MAX (LINE)' at the insert statement's end.