Data (rdbtable)

The data is contained in regular UNIX, ASCII files and therefore can be manipulated by regular UNIX utilities, e.g. grep, ls, wc, mv, cp, cat, more, less, head, RCS, and editors like the RAND editor 'e', vi, etc. A good way to view the data of course, would be to use the RDB operator that prints such datafiles 'ptbl'.

The relation, or table structure is achieved by separating the columns with ASCII TAB characters, and terminating the rows with ASCII NEWLINE characters. That is, each row of data in a file contains the data values (a data field) separated by TAB characters and terminated with a NEWLINE character. Therefore a fundamental rule is that data values must NOT contain TAB characters.

The first section of the file, called the header, contains the file structure information used by the operators. The header also contains optional embedded documentation relating to the entire datafile (table documentation) and/or each data column (column documentation). The rest of the file, called the body, contains the actual data values. A file of data, so structured, is said to be an 'rdbtable'.

The header consists of two or more lines. There is an optional number (zero or more) of lines of table documentation followed by exactally two lines that contain the structure information: the column name row and the column definition row. The table documentation lines start with either a sharp sign (#) followed by a space character, or one or more space characters followed by a sharp sign (#). The rest of each line may contain any documentation desired. Note that the table documentation lines are the only lines in an rdbtable that are not required to conform to the table structure defined above. The fields in the column name row contain the names of each column. The fields in the column definition row contain the data definitions and optional column documentation for each column.

The column names are case sensitive, i.e. 'COUNT' is different from 'Count'. The guideline for characters that may be used in column names is that alphabetic, numeric, and non-alphanumeric characters that are not special to the UNIX shell are good choices. Column names must include at least one alphabetic character. It is highly recommended (but not required) that column names start with an alphabetic or numeric character.

Non-alphanumeric characters that are acceptable in column names are the percent sign (%) colon (:) at sign (@) equals (=) comma (,) and dot (.). The sharp sign (#) underscore (_) and dash (-) characters may also be used but they must not be the first character in a column name. The TAB character must never be used in column names, nor should internal spaces or UNIX I/O redirection characters (<,>,|) be used.

The data definitions include column width, data type, and justification. The column width must be explicitly specified; the others are optional and are frequently specified by default.

The data definitions are specified by adjacent characters in a single word. The width of each field is specified by a numeric count. The type of data is "string", "numeric", or "month". The types are specified by an 'S', 'N', or 'M' respectively, and the default is type string. Printout justification is 'left', or 'right', and is specified by an '<' or '>' character respectively. If not specified, data types string and month will be left justified and type numeric will be right justified.

Note that column width is used primarily by the operator 'ptbl' and in no way limits the actual data size. It is not an error if some actual data in a column is wider than the defined width; a listing produced with 'ptbl' may be out of alignment however.

The optional documentation for each column follows the data definition word in the field. There must be one or more space characters after the data definition word and before the column documentation; the column documentation may be as long as necessary. Note that the data definition and the optional column documentation are contained in a single field in the row.

If the column name and/or column definition rows contain much information and/or column documentation they can become long and confusing to read. However the operators 'valid' and 'headchg' have options to print the header contents as a 'template' file, an organized list of information about the header.

A sample rdbtable (named sample) that will be used in later examples is shown in Table 1. The picture in Table 1 is for illustrative purposes; what the file would actually look like is shown in Table 2, where a TAB character is represented by '<T>' and a NEWLINE character is represented by '<N>'.

It is important to note that only actual data is stored in the data fields, with no leading or trailing space characters. This fact can (and usually does) have a major effect on the size of the resulting datafiles (rdbtables) compared to data stored in "fixed field width" systems. The datafiles in RDB are almost always smaller, sometimes dramatically smaller.

                                Table 1
                           RDBTABLE (SAMPLE)

        # Table documentation lines. These describe and
        # identify the rdbtable contents.
        # They may be read by many normal UNIX utilities,
        # which is useful to easily identify a file.
        # May also contain RCS or SCCS control information.
        NAME    COUNT   TYP     AMT     OTHER   RIGHT
        6       5N      3       5N      8       8>
        Bush    44      A       133     Another This
        Hansen  44      A       23      One     Is
        Jones   77      X       77      Here    On
        Perry   77      B       244     And     The
        Hart    77      D       1111    So      Right
        Holmes  65      D       1111    On      Edge

                                Table 2
                    RDBTABLE (SAMPLE) ACTUAL CONTENT

        # Table documentation lines. These describe and<N>
        # identify the rdbtable contents.<N>
        # They may be read by many normal UNIX utilities,<N>
        # which is useful to easily identify a file.<N>
        # May also contain RCS or SCCS control information.<N>
        NAME<T>COUNT<T>TYP<T>AMT<T>OTHER<T>RIGHT<N>
        6<T>5N<T>3<T>5N<T>8<T>8><N>
        Bush<T>44<T>A<T>133<T>Another<T>This<N>
        Hansen<T>44<T>A<T>23<T>One<T>Is<N>
        Jones<T>77<T>X<T>77<T>Here<T>On<N>
        Perry<T>77<T>B<T>244<T>And<T>The<N>
        Hart<T>77<T>D<T>1111<T>So<T>Right<N>
        Holmes<T>65<T>D<T>1111<T>On<T>Edge<N>