There must be one variable record for each numeric variable and each string variable with width 8 bytes or less. String variables wider than 8 bytes have one variable record for each 8 bytes, rounding up. The first variable record for a long string specifies the variable’s correct dictionary information. Subsequent variable records for a long string are filled with dummy information: a type of -1, no variable label or missing values, print and write formats that are ignored, and an empty string as name. A few system files have been encountered that include a variable label on dummy variable records, so readers should take care to parse dummy variable records in the same way as other variable records.
The dictionary index of a variable is a 1-based offset in the set of variable records, including dummy variable records for long string variables. The first variable record has a dictionary index of 1, the second has a dictionary index of 2, and so on.
The system file format does not directly support string variables wider than 255 bytes. Such very long string variables are represented by a number of narrower string variables. See Very Long String Record, for details.
A system file should contain at least one variable and thus at least one variable record, but system files have been observed in the wild without any variables (thus, no data either).
int32 rec_type; int32 type; int32 has_var_label; int32 n_missing_values; int32 print; int32 write; char name[8]; /* Present only ifhas_var_label
is 1. */ int32 label_len; char label[]; /* Present only ifn_missing_values
is nonzero. */ flt64 missing_values[];
int32 rec_type;
Record type code. Always set to 2.
int32 type;
Variable type code. Set to 0 for a numeric variable. For a short string variable or the first part of a long string variable, this is set to the width of the string. For the second and subsequent parts of a long string variable, set to -1, and the remaining fields in the structure are ignored.
int32 has_var_label;
If this variable has a variable label, set to 1; otherwise, set to 0.
int32 n_missing_values;
If the variable has no missing values, set to 0. If the variable has one, two, or three discrete missing values, set to 1, 2, or 3, respectively. If the variable has a range for missing variables, set to -2; if the variable has a range for missing variables plus a single discrete value, set to -3.
A long string variable always has the value 0 here. A separate record indicates missing values for long string variables (see Long String Missing Values Record).
int32 print;
Print format for this variable. See below.
int32 write;
Write format for this variable. See below.
char name[8];
Variable name. The variable name must begin with a capital letter or the at-sign (‘@’). Subsequent characters may also be digits, octothorpes (‘#’), dollar signs (‘$’), underscores (‘_’), or full stops (‘.’). The variable name is padded on the right with spaces.
The ‘name’ fields should be unique within a system file. System files written by SPSS that contain very long string variables with similar names sometimes contain duplicate names that are later eliminated by resolving the very long string names (see Very Long String Record). PSPP handles duplicates by assigning them new, unique names.
int32 label_len;
This field is present only if has_var_label
is set to 1. It is
set to the length, in characters, of the variable label. The
documented maximum length varies from 120 to 255 based on SPSS
version, but some files have been seen with longer labels. PSPP
accepts labels of any length.
char label[];
This field is present only if has_var_label
is set to 1. It has
length label_len
, rounded up to the nearest multiple of 32 bits.
The first label_len
characters are the variable’s variable label.
flt64 missing_values[];
This field is present only if n_missing_values
is nonzero. It
has the same number of 8-byte elements as the absolute value of
n_missing_values
. Each element is interpreted as a number for
numeric variables (with HIGHEST and LOWEST indicated as described in
the chapter introduction). For string variables of width less than 8
bytes, elements are right-padded with spaces; for string variables
wider than 8 bytes, only the first 8 bytes of each missing value are
specified, with the remainder implicitly all spaces.
For discrete missing values, each element represents one missing value. When a range is present, the first element denotes the minimum value in the range, and the second element denotes the maximum value in the range. When a range plus a value are present, the third element denotes the additional discrete missing value.
The print
and write
members of sysfile_variable are output
formats coded into int32
types. The least-significant byte
of the int32
represents the number of decimal places, and the
next two bytes in order of increasing significance represent field width
and format type, respectively. The most-significant byte is not
used and should be set to zero.
Format types are defined as follows:
Value Meaning 0 Not used. 1 A
2 AHEX
3 COMMA
4 DOLLAR
5 F
6 IB
7 PIBHEX
8 P
9 PIB
10 PK
11 RB
12 RBHEX
13 Not used. 14 Not used. 15 Z
16 N
17 E
18 Not used. 19 Not used. 20 DATE
21 TIME
22 DATETIME
23 ADATE
24 JDATE
25 DTIME
26 WKDAY
27 MONTH
28 MOYR
29 QYR
30 WKYR
31 PCT
32 DOT
33 CCA
34 CCB
35 CCC
36 CCD
37 CCE
38 EDATE
39 SDATE
40 MTIME
41 YMDHMS
A few system files have been observed in the wild with invalid
write
fields, in particular with value 0. Readers should
probably treat invalid print
or write
fields as some
default format.