Over the months, I have collected various mistakes that I have made whilst programming procedures in Priority and that were not noted by the in-built syntax checker. I want to write a program that will check things that are not caught by the internal program, to supplement it and not replace it. Simply put, I want to write an external syntax checker that will check things like matched LINK/UNLINK pairs, uninitialised variables and a few other problems. I've been devoting a fair amount of thought as to how to store the data of such a program; traditional cross reference programs in Pascal used linked lists, and indeed I found such a program yesterday evening. But I would like to have a much more modern interface and use types such as stringlists and similar. A stringlist is ideal for storing what would have been an array of identifiers but isn't so useful when additional data regarding those identifiers is required.
I will no doubt continue to debate the subject in my mind until I commence coding; at the moment, my inclination is to take the old school cross referencer and adapt it to the Priority SQL syntax. This will be a complex task that would have to be done one way or another, so it's probably better to start with something that works so that I can concentrate on the syntax and not on how everything is stored. A cross referencer is a good idea anyway: it makes finding references to a variable much easier. I started writing a long program in Priority yesterday afternoon and finished writing and debugging this morning: this is 270 lines long which is fairly long but not too complicated. During the writing process, I moved pieces from place to place within the program (primarily moving non-variant operations out of loops, to be technical) and sometimes these edits slightly mangled the text. I discovered a new bug: a variable will always start with a colon (e.g. :DAYS); in the course of one of these edits and pastes, I had a variable named ::DAYS which is not the same as :DAYS.
A cross referencer helps in finding variables that appear only once; this can mean that either the variable is superfluous as it is never used, or more problematic, it is a variable without value (as in the above case of ::DAYS). I wasted an hour yesterday on another procedure, trying to figure out why a value being saved in a variable was not being written later on. Eventually I saw the problem: the value was saved in :PARTCOST but was later accessed as :PARCOST. A cross referencer would find this immediately.
After spending more than a few hours over the weekend working on the cross-referencer, I have completed the first version.
As in the army, everything divides into three. For this program, the first stage is parsing the input file, then displaying the references and finally displaying the analysis. The first stage can also be split into three: the tokeniser, the lexical analysis and the storage. A token is a string extracted from a text file; for example, if the current line is 'select part, partname from part', then there are five tokens: 'select', 'part', 'partname', 'from' and again 'part'. In programming languages with regular syntax, the tokeniser is normally quite straight-forward, but it turns out that the procedural SQL language of Priority does not have regular syntax and cannot be considered to be context free.
Two examples of the ad hoc syntax: I want to note when a variable is initialised and when it is not. Initialisation can occur in one of two forms: either there is an equals sign after the token (e.g. :SEARCHNAME = '12345') or the keyword INTO precedes the token (e.g. SELECT DAY INTO :DAYS). These two opposite options (one prefix and one postfix, to use the technical terms) make it complicated to program. Another syntactic problem is the colon - :. Normally this serves to mark variables, e.g. :DAYS, but it can also be used to separate between two clauses in a ternary comparison (e.g. :DAYS < 7 ? 3 : 5).
The correct tokenisation of table aliases (e.g. GENERALLOAD F1) took quite a bit of time.
Storage of the identifiers and their references is by means of a binary tree; this part was based on the cross referencer that I found a few days ago which was written in standard Pascal. The references are stored in a queue for each node. I added a few fields to these variable types in order to store further information: the type of identifier (variable, cursor, table) and the operation in progress at the reference (e.g. variable initialisation, opening a cursor, linking a table). This part was simple. Displaying the references was also fairly straight-forward.
The analysis part is dependent on the type of identifier: there are certain checks for variables, certain checks for cursors and certain checks for tables. I found a method to make these checks as stream-lined as possible.
I tested the program by running it alternately on a short test file into which at times included deliberate errors (so that I could check that the errors were being picked up) and on the file for the procedure that I wrote a few days ago. Every time I would look at the references, noting mistakes that had to be fixed. Now I'm 99% confident that I've correctly parsed the files and have correctly denoted variable initisalisation (this was very complicated). Running the finished program on my procedure finds three variables that were initialised and never used. These can be safely deleted from the procedure.
My next step is to publicise the program within a small community, inviting examples of procedures whose analysis appears to be wrong. Maybe there are other checks that need to be added.
No comments:
Post a Comment