Tuesday 12 January 2021

NFILE: advantages and disadvantages

I am sure that I have written about NFILE at some time but I can't find the reference and so I'll start from the beginning. [Edit: I found the entry which now has the NFILE label]

Let's say that one has a procedure that accepts a list of customers as its input - let's even say that it's the HTML document that I worked on yesterday. When one enters a list of customers, the linked customers table will include only those customers, obviously. When one writes * as the input, all the customers from the unlinked customer table have to be copied into the linked customers table; this takes time! 

Enter NFILE: this was introduced in version 17 IIRC. The documentation says: Select NFILE if the linked file comprises a group of records and you want the link table to remain empty when the user enters * or leaves the field empty. How does one use this? By writing code as follows

LINK CUSTOMERS TO :$.CST; /* NFILE */ ERRMSG 1 WHERE :RETVAL <= 0; GOTO 1 FROM CUSTOMERS WHERE CUST > 0; UNLINK CUSTOMERS; LABEL 1;

This saves time in copying records from the unlinked table to the linked table. Tables that benefit from this are PART, DOCUMENTS, INVOICES and CUSTOMERS. Maybe a few others.

Now back to the HTML document that I described yesterday. I realised that I could filter unwanted customers using a cursor and a series of expressions such as

GOTO 999 WHERE NOT EXISTS (SELECT 1 FROM BRANCHES WHERE BRANCH > 0); LINK CUSTOMERS TO :$.CST; LINK BRANCHES TO :$.BRN; DECLARE C15 CURSOR FOR SELECT CUST FROM CUSTOMERS WHERE CUST > 0; OPEN C15; GOTO 999 WHERE :RETVAL <= 0; LABEL 100; FETCH C15 INTO :CST; GOTO 800 WHERE :RETVAL <= 0; GOTO 200 WHERE :$.A3 <> 'Y'; /* No need to check price quotes if not wanted */ :HASDATA = 0; SELECT 1 INTO :HASDATA FROM CPROF, CPROFA, BRANCHES WHERE CPROF.PROF = CPROFA.PROF AND CPROFA.BRANCH = BRANCHES.BRANCH AND CPROF.CUST = :CST AND CPROF.PDATE >= :$.DAT; LOOP 100 WHERE :HASDATA = 1; .... LABEL 700; /* If we are here, then the customer has failed all the tests so delete it from the linked table */ DELETE FROM CUSTOMERS WHERE CUST = :CST; LOOP 100; LABEL 800; ...

This code is repeated, once for each type of record - sales opportunities, price quotes, orders and projects. The first two lines check whether any branches were passed as a parameter: there's no point in checking anything if there are no branches. I added all the necessary code, ran the program using one specific branch without designating customers: the procedure took a few minutes to sort itself out then showed the output: as designed, only customers with data from the chosen branch were shown.

I wondered how I could improve the procedure's speed - the obvious step would be to define CUSTOMERS as type NFILE instead of FILE. There would be no need to copy customers to the linked table which would save time. So I added this change, ran the program ... and then realised with a shudder that the DELETION WAS NOT FROM THE LINKED TABLE BUT FROM THE REAL TABLE!!! The first code snippet that I showed unlinks CUSTOMERS; thus the DELETE statement does not work on the linked table but rather the real table.

I reached for the power switch (figuratively) as soon as possible but it was too late. A few minutes later, someone called to ask whether customers had disappeared. My code has managed to delete about 16,000 customers out of 24,000 in a few minutes. Straightaway I phoned the network manager who took Priority off line, and then we turned to our database administrator who started restoring the damaged table.

Bottom line: be wary of NFILE! Normally it can help but sometimes it is dangerous.

2 comments:

  1. Oops. I imagine that your heart was in your mouth or just stopped altogether. I hope that you have made a good recovery.

    But I think it would be fair to say that this is actually not related to NFILE as such and in fact about the Priority quirk called linked tables that has us working like that in the first place. We do things which look like they're acting on the real table but they're not.

    I also used do to things like your example. But shortly after I started my current job, my team leader at the time saw some code I wrote which must have done something similar to the above and said, "we don't do that". To that I replied, "but I check that the link succ.... OK, I see what you mean" and changed it.

    It's really too easy to get to a situation where the link doesn't happen like you thought it would. Your example is one. Or if the link fails, your code skips the whole section with a GOTO but then someone adds a LABEL in the middle with the same number, and one day something causes the link to fail and you arrive at the DELETE and the table's not linked...

    So, as a matter of course, I now never (ever ever) delete from a linked copy of a "permanent" table, unlike when I'm working with tables which primarily exist to be linked to. Even updates and inserts on linked "real" tables will be with care; I'm not likely to insert anything except data from the original table. Rather than manipulating the "real" table (even as a link) I'll manipulate a stack table which contains the relevant keys (and maybe any other columns needed to do that manipulation). If need be, once done, one can fill a linked copy of the "real" table with only the records indicated by the stack. I strongly recommend this strategy.

    Regarding restoring the data: I don't know if your Priority is on SQL Server or Oracle, but since I started working on Oracle I learned about a rather nice feature called Flashback which allows for an "AS OF" clause when specifying a table in a query. It allows the query to show the data in the table at a particular time in the past (subject to the transaction logs having been retained and not cleared by a backup or whatever). In your case, on Oracle your simple deletion of records would be relatively straightforward to restore - something like (this is a pseudoquery) "INSERT INTO CUSTOMERS SELECT * FROM CUSTOMERS-as-it-was-the-moment-before-the-deletion WHERE CUST NOT IN (SELECT CUST FROM CUSTOMERS)". So long as it's done quickly enough, there's no need to mess around making a backup in order to do a point-in-time restore on the side and pull the data from there, as is necessary on SQL Server. If you're on Oracle, then I hope your sysadmin/DBA did that if they could, and that that's useful knowledge for you for the future.

    I learned just this last week a new thing about NFILE. One could perhaps guess that this happens but unsurprisingly it's not actually documented.
    If you have a procedure which takes an NFILE input, we know that it might be empty and if we access it in an SQLI step we need to unlink as you did above. But what if your procedure just passes that parameter to a report. Surely the report will receive the empty table too? No! When the report step runs, the system "magically" knows that that table is empty (but at the same time represents all records) and doesn't link the table!! You have to be disciplined and make the parameter NFILE on the report too; if it's FILE (they should agree but they can not agree) then that check doesn't happen and you'll get the empty linked table. On the other hand, I don't think there's any way to pass an NFILE which isn't an input to a report and have it link or not link depending on what's in it.

    Hope that's all of interest...

    ReplyDelete
  2. It would be a good idea to copy the customers into STACK; either only suitable customers would be copied, or all customers would be copied and then deleted. BUT I am working in the context of an HTML document, and the all important stage HTMLCURSOR works implicitly on the table that is linked to the PAR parameter in the initial stage. I've tried a few ideas to get around this problem, but so far with no success. Maybe the filtering code should be in the HTMLCURSOR stage, but I haven't succeeded in writing a single SQL statement that would fulfil the requirements.

    ReplyDelete