Creating a temporary SAS array of dynamic size

Arrays in SAS are incredibly useful things. For example, if you’re dealing with a concomitant medications dataset and want to check for certain medicines across multiple columns, you’d be hard-pressed to find a faster method than using arrays!

Within the SDTM.CM domain, medication names are spread across several columns, usually: CMTRT (Reported Name of Drug, Med, or Therapy), CMMODIFY (Modified Reported Name), and CMDECOD (Standardized Medication Name).

If we wanted to find and flag, for example, the following three medicines: Aspirin, Antacid, Potassium Chloride ; we could do it as follows:

data cm;

    array all_meds {*} $ cmtrt cmmodify cmdecod;
    array check_meds {3} $ _temporary_ ("aspirin", "antacid", "potassium chloride");

    do i = 1 to dim(all_meds);
        do j = 1 to dim(check_meds);
            if upcase(all_meds[i]) = upcase(check_meds[j]) then occur = "Y";

    if occur ^= "Y" then delete;

The above method works well to quickly check for multiple conmeds across multiple columns. However, usually a conmeds list is not restricted to only 3 items and counting the number of unique items is a slow and tedious process. If an item needs to be added or removed, then the space to be reserved needs to be updated.

Unfortunately specifying the list of conmeds in a temporary array prevents the use of the dynamic sizing, usually indicated by {*}. This is something that will hopefully be fixed in a newer version of SAS, but until such time I’ve taken it upon myself to create a macro which creates dynamically sized (sort of!) temporary arrays. It does this by counting the number of items and automatically reserving the space for it.

Note: if you are feeling lazy, you can still create a temporary array and simply oversize it, e.g., specify a size of 100 items even though you may only need half that. However, this will result in SAS posting a WARNING to the log about partial array initialization, which is not ideal in the pharmaceutical environment where we want clean logs!

If you’d prefer to avoid another O(n) loop which checks for the maximum length needed for character variables, simply set a static length, say $20.

    Macro makeTempArray
    Purpose: To create temporary arrays without knowing the
             size of the array needed beforehand. This is a 
             limitation of the original SAS procedure for
             creating temporary arrays.
        arrayname : an arbitrary name for your array
        ischar    : pass either Y for a character array 
                    or N for numeric
        items     : pass the list of items to be contained in
                    the array, wrapped in %str() and separated
                    by commas

%macro makeTempArray(arrayname=, ischar=, items=);
    %let n=%sysfunc(countw(&items., %str(,), )); /*count the number of items to reserve space for*/
    %if &ischar.=Y %then %do; /*if this is a character array, we need the length of the longest item*/ 
        %let l = 1;
        %do j = 1 %to &n.;
            %let l0 = %sysfunc(length(%sysfunc(scan(&items., &j., %str(,), r))));
            %if  &l0. > &l. %then %let l = &l0.;
    array &arrayname. {&n.} %if &ischar.=Y %then $&l.; _temporary_ (
        %do i = 1 %to &n.;
            %let item = %sysfunc(scan(&items., &i., %str(,), r));
            %if &ischar.=Y %then %str("&item." ); %else &item.;
%mend makeTempArray;

With this macro, we can now modify our initial starting block of code as follows (items must be separated by commas):

data cm;

    array all_meds {*} $ cmtrt cmmodify cmdecod;
    %makeTempArray(arrayname=%str(check_meds), ischar=%str(Y), items=%str(aspirin, antacid, potassium chloride));

    do i = 1 to dim(all_meds);
        do j = 1 to dim(check_meds);
            if upcase(all_meds[i]) = upcase(check_meds[j]) then occur = "Y";

        if occur = "Y" then leave;

    if occur ^= "Y" then delete;

Hope this helps you next time you need to cross-check multiple items across multiple columns! Happy hacking!

Updates: My colleague, Mazi Ntintelo has rightly pointed out that the commas within the macro’s scan functions should be wrapped as %str(,) and also that the do loop checking for conmeds can be optimised with a leave statement. Thanks, Mazi!

Significant Figures in SAS

For three significant figures, the SAS Institute provides the following code snippet to accomplish the task.

However, it is often useful to round to more or less than 3 significant figures. I’ve developed a macro to do so for my own use and am sharing the code below.

%macro _nsigfig(varin=, varout=, n=);
if &varin. = 0 then &varout. = 0;
else do;
	if int(&varin.) ^= 0 then do;
	    &varout. = round(&varin., 10**(int(log10(abs(&varin.))) + (1 - &n.)));
	else do;
	    &varout. = round(&varin., 10**(-1*(abs(int(log10(abs(&varin.)))) + &n.)));
%mend _nsigfig;

Searching for a string in an entire SAS library

For the most part, experienced SAS programmers know where to look for the source data they need. In the pharmaceutical industry, we are familiar with CDISC standards and data structures.

However, should the data standard be unfamiliar or the source datasets include new or unusual parameters, it may be prudent to have SAS look through the data on your behalf to save time.

We can do this by making use of PROC CONTENTS and SAS macro loops with the forward re-scan rule. If you’re not familiar with the forward re-scan rule, see my other blog post covering the topic.

First, let us define the two input parameters we’ll need: the library we’re delving into, and the character string we’re looking for. These are the only two parameters the end user needs to edit.

/*Search parameters - LIBRARY and text to search for*/
%let lib = %str(sashelp);
%let searchtext = %str(cholesterol);

Next, we’ll find all the datasets in the library specified above by running PROC CONTENTS and packing the results into iterable macro variables. In this case we can allocate a maximum of 999 datasets, but you can increase this value as needed.

/*Find all datasets within the library and allocate into numbered macro variables DS1, DS2, ...*/
proc contents data=&lib.._all_ out=members noprint;

proc sql noprint;
    select distinct memname into :ds1-:ds999 from members;
%let ds_num = &sqlobs.;

Specify an empty results dataset. As matches are found during iteration, they will be appended to this dataset.

/*Clear the results dataset. As matches are found, they will be appended here.*/
data results;
    set _null_;

Next we will loop over the datasets allocated as DS1, DS2, DS… above. The logical outline of the loop is as follows:

/*The macro logic for looping over each dataset in DS1, DS2, ...*/
%macro loop;
    %do i = 1 %to &ds_num.;
        /*For each dataset in the library, output its metadata contents*/

        /*Check the metadata for matches of the search text. This includes the dataset name, as well as the column names and labels.*/

        /*Check the actual data values for matches of the search text.*/

        /*For the current dataset, set the metadata and row value results.*/

        /*For the output dataset append the results of each loop iteration.*/
%mend loop;

For each step above, detailed logic follows below:

/*For each dataset in the library, output its metadata contents*/
proc contents data=&lib..&&ds&i out=c noprint;
/*Check the metadata for matches of the search text. This includes the dataset name, as well as the column names and labels.*/
data metadata;
    length msg $500. dataset varname varlabel $100.;
    set c;
    /*Dataset name checking*/
    if index(upcase(memname), upcase("&searchtext.")) > 0 then do;
        msg = "Dataset: Found &searchtext.";
        dataset = "&&ds&i";
        varname = "";
        varlabel = "";

    /*Column name and label checking*/
    if index(upcase(name), upcase("&searchtext.")) > 0 or index(upcase(label), upcase("&searchtext.")) > 0 then do;
        msg = "Col Meta: Found &searchtext.";
        dataset = "&&ds&i";
        varname = name;
        varlabel = label;
    keep dataset varname varlabel msg;
    /*Get rid of duplicate messages in the case of multiple identical matches.*/
    proc sort nodupkey; by msg dataset varname varlabel;
/*Check the actual data values for matches of the search text.*/
data actualdata;
    length msg $500. dataset varname varlabel varval $100.;
    set &lib..&&ds&i;
    /*Create an array of all character variables in the current dataset.*/
    array allChar _CHARACTER_;

    /*Loop over each character column and check for matches of the search text.*/
    do over allChar;
        if index(upcase(allchar), upcase("&searchtext.")) > 0 then do;
            msg = "Value: Found &searchtext.";
            dataset = "&&ds&i";
            varname = vname(allchar);
            varlabel = vlabel(allchar);
            varval = allchar;
    keep dataset varname varlabel varval msg;
    /*Get rid of duplicate messages in the case of multiple identical matches.*/
    proc sort nodupkey; by msg dataset varname varlabel varval;
/*For the current dataset, set the metadata and row value results.*/
data allmsgs;
    set metadata actualdata;
    msg = upcase(msg);
/*For the output dataset, OUT, keep appending the results of each loop iteration.*/
data results;

The steps above conclude the inner logic of the loop. Finally, outside the loop, we can clean up the datasets we no longer need.

/*Clean up*/
proc datasets library=work nolist;
    delete members c metadata actualdata allmsgs;

We started off with the SASHELP library, looking for the string “cholesterol”. These are the results:

Since we are searching in SASHELP, the results also include values from the SAS helper datasets, i.e., VCOLUMN and VMACRO.

This search function is not case sensitive. Optimise and implement as needed for your particular scenario.

NONMEM ADDL Calculation and Compression in SAS

ADDL represents the number of additional doses that are copies of the current row, with the time since first dose (RTFD) increased at the regular dosing interval (II). This allows for the compression of dose records in the NONMEM dataset. To put it more eloquently:

The NONMEM data item ADDL on a dose record expresses the number of additional implicit doses that should follow at a regular interval II. In the case where explicit doses exist, ADDL supports compacting them into ADDL/II notation.

The gist of calculating ADDL is the following:

  • Determine an allowed window between doses – if dosing occurs outside of this window (too soon or too late) the information should not be compressed and the modeller should be aware of it
  • Group and sort the data by subject ID, treatment, date of exposure, and dose
  • Assign a sequence number starting from 1 for each treatment (A, B, C, …) per subject
  • Determine which doses are “must-list”; the first record of every subject should be listed and any exposure attribute changes or exposure delays should be listed
  • Filter the exposure records to keep only the “must-list” data and other data required at the modeller’s discretion, e.g., the exposure records linked to PK samples
  • After filtering, the differences in SEQ values are the ADDL values

For this example, we will focus on compressing explicit doses and we will stick to date-based dosing only. The principles can be easily adjusted for date-and-time-based exposure data. Instead of defining the dosing interval (II) as 24h, we will just define it as 1 day in a macro variable.

%let allowed_dose_delay = 1;

The exposure data will be defined across two subjects, with multiple treatments, multiple dose levels, and occasional gaps between doses exceeding the 1 day window.

data ex;
	format date date9.;
	input usubjid trt $ dose date : date9.;
1 A 50 01JAN1990
1 B 75 02JAN1990
1 A 50 02JAN1990
1 B 100 03JAN1990
1 A 50 03JAN1990
1 B 100 04JAN1990
1 A 50 05JAN1990
1 A 100 06JAN1990
1 B 50 06JAN1990
1 A 100 07JAN1990
1 A 100 08JAN1990
1 A 100 09JAN1990
2 B 50 01JAN1990
2 A 75 02JAN1990
2 B 50 02JAN1990
2 A 100 03JAN1990
2 B 50 03JAN1990
2 B 100 04JAN1990
2 B 50 05JAN1990
2 A 100 06JAN1990
2 B 50 06JAN1990
2 A 100 07JAN1990

proc sort data=ex out=addl_pre;
	by usubjid trt date dose;

After the initial dataset has been populated and sorted, a unique sequence number starting from 1 should be assigned to each treatment within each subject. In addition, a “must_list” variable will be populated with “Y” if:

  • the first record for the subject is encountered
  • a new treatment within subject is encountered
  • the dose level within a treatment was changed
  • a delay in dosing longer or shorter than the allowed delay window was encountered
data addl1;
	set addl_pre;
	by usubjid trt date dose;

	retain seq 1;

	if first.usubjid or first.trt then do;
		must_list = "Y";
		seq = 1;
	else do;
		seq + 1;

	if trt ^= lag(trt) then must_list = "Y";
	if dose ^= lag(dose) then must_list = "Y";
	if date - lag(date) ^= &allowed_dose_delay. then must_list = "Y";

The maximum sequence per subject and treatment should be determined to calculate ADDL for the final, must-list exposure records.

proc sql noprint;
	create table addl2 as
		select *, max(seq) as max_seq
		from addl1
		group by usubjid, trt
		order by usubjid, trt, date

The exposure events are filtered to keep only the “must-list” records. The remaining records will be compressed into the ADDL counter. With real-world data, you may want to add any record linked to PK sampling to the “must-list” group. This can be done by linking the exposure date and time with the sample reference dose date and time, e.g., PC.PCRFTDT.

data addl3;
	set addl2(where=(must_list = "Y"));

Reverse sort the dataset to allow peeking at the next sequence records.

proc sort data=addl3 out=addl4_pre; 
	by descending usubjid descending trt descending date descending seq;

data addl4;
	set addl4_pre;
	by descending usubjid descending trt descending date descending seq;

	next_seq = lag(seq);
	if usubjid ^= lag(usubjid) or trt ^= lag(trt) then next_seq = .;

	proc sort; by usubjid trt date seq;

Finally, ADDL is calculated as the difference between MAX_SEQ and SEQ in the case of the final record, otherwise (NEXT_SEQ – (SEQ + 1)) to ensure we have exclusive bounds between the current and next exposure sequence numbers.

data addl5;
	set addl4;

	if missing(next_seq) then addl = max_seq - seq;
	else addl = next_seq - (seq + 1);

	keep usubjid trt date dose addl;
While not explicitly specified here, our data has II defined as 24 hours or 1 day. In real-world data an II column would be visible in the data.

Provided that your data is well-structured and clean, compressing the exposure records and calculating ADDL is a straightforward task!

SAS and the Forward Re-scan Rule

The “Forward Re-scan Rule” (FRR) is used by SAS to resolve macro variables over several passes. This is especially useful when having one macro variable point to another macro variable, or when trying to resolve numbered macro variables.

The SAS Advanced Prep Guide summarises the FRR as follows:

  • When multiple ampersands or percent signs precede a name token, the macro processor resolves two ampersands (&&) to one ampersand (&), and re-scans the reference.
  • To re-scan a reference, the macro processor scans and resolves tokens from left to right from the point where multiple ampersands or percent signs are coded, until no more triggers can be resolved.

Example: numbered list of macro variables

To illustrate this example for a numbered list of macro variables, we can load each unique car manufacturer from SASHELP.CARS into a unique macro variable:

data cars;

proc sql;
	select distinct make into :car1-:car999
	from cars;

%put Number of obs = &sqlobs;

SAS will not create more macro variables than necessary. We have accounted for the possibility of 999 distinct manufacturers, but in reality the dataset contains only 38. SAS will only reserve the variables car1 to car38.

We can loop through the variables we’ve just created, by using the FRR. To further illustrate, also turn on the MPRINT, MLOGIC, and SYMBOLGEN options. Note the use of the double ampersand (&&).

options mprint mlogic symbolgen;
%macro printCars;
	%do i = 1 %to &sqlobs;
		%put &&car&i;
%mend printCars;
 MLOGIC(PRINTCARS):  %DO loop beginning; index variable I; start value is 1; stop value is 38; by value is 1.  
 SYMBOLGEN:  && resolves to &.
 SYMBOLGEN:  Macro variable I resolves to 1
 SYMBOLGEN:  Macro variable CAR1 resolves to Acura

From the log output, we can trace the FRR resolution as follows:

  1. &&car&i
  2. &car1
  3. Acura

Example: nested macro variables

Suppose we declare the following macro variables:

%let one = two;
%let two = three;
%let three = one;

To test your understanding of the FRR, can you accurately predict the resolution of these macro variables?

%put &one;
%put &&one;
%put &&&one;
%put &&&&one;
%put &&&&&one;
%put &&&&&&one;
%put &&&&&&&one;
%put &&&&&&&&one;
%put &&&&&&&&&one;
%put &&&&&&&&&&one;

The FRR will process from left to right. Any double ampersand (&&) will be resolved to a single ampersand (&) and any instances of a single ampersand will be resolved.

Let’s work through two examples together.

%put &&&&&one;

To better organise our desk-checking of the code, we can rewrite it in a more human-readable, and -friendly format:

&& && &one;

Each double ampersand (&&) resolves to a single ampersand, and the remaining single ampersand and macro reference is resolved.

& & two;

The remaining two ampersands are resolved to a single ampersand:


Which resolves to:


And upon checking the SAS log, we can see our result has been confirmed:

%put &&&&&one;

As a final example, let’s work through:

%put &&&&&&&&one;

Organise the ampersands into a more human-readable format, group and resolve double ampersands, and finally resolve remaining single ampersands:

&& && && && one;
& & & & one;     compress ==>  && && one;
& & one;         compress ==>  && one;

Our result is once again confirmed by the SAS log:

%put &&&&&&&&one;