Jan's Blog

Programming, SAS

SAS LOCF For Multiple Variables


It is often necessary to replace missing measurements with the closest, previous measurement. This technique is referred to as LOCF (last observation carried forward).

In this example, we will create a dataset with 4 columns: subject ID, visit number, body weight, and systolic blood pressure.

data have;
1 10 85 125
1 20 84 .
1 30 86 .
1 40 . 130
1 50 . 128
1 60 85 .
2 10 . 110
2 20 90 .
2 30 91 .
2 40 91 123
2 50 . .
2 60 . 130
Input dataset, with subject ID, visit number, body weight, and systolic blood pressure

Thereafter, we will sort the dataset to ensure it is in the order we expect. Never assume that your input will be appropriately sorted.

proc sort data=have;
	by id visit;

Now we will define 2 macro variables, which are simply lists of variables. The first contains the original variables available in the dataset, which will not be altered, and the second names the variables which will contain the LOCF values.

%let origvars = %str(WT  SBP);
%let locfvars = %str(WT_ SBP_);

This brings us to our final block of code. The lists of variables defined above are loaded into arrays and a loop performs the LOCF operation across all the variables defined.

data want(drop = j);
	set have;
	by id visit;

	/*Create arrays of the variable lists*/
	array orig[*] &origvars.;
	array locf[*] &locfvars.;
	retain 	      &locfvars.;

	do j = 1 to dim(orig);
		if first.id then do;
			/*Set a placeholder value for initial missings*/
			if orig(j) = . then locf(j) = -99;

		/*Replace retained value with latest non-missing value*/
		if not missing(orig(j)) then locf(j) = orig(j);
The final dataset with LOCF’ed variables, WT_ and SBP_
Back to top