{"id":213,"date":"2020-10-29T15:22:53","date_gmt":"2020-10-29T13:22:53","guid":{"rendered":"http:\/\/jvdl.me\/wordpress\/?p=213"},"modified":"2020-10-29T15:28:30","modified_gmt":"2020-10-29T13:28:30","slug":"searching-for-a-string-in-an-entire-sas-library","status":"publish","type":"post","link":"https:\/\/jvdl.me\/blog\/?p=213","title":{"rendered":"Searching for a string in an entire SAS library"},"content":{"rendered":"\n<p>For the most part, experienced SAS programmers know where to look for the source data they need. In the pharmaceutical industry, we are familiar with CDISC standards and data structures.<\/p>\n\n\n\n<p>However, should the data standard be unfamiliar or the source datasets include new or unusual parameters, it may be prudent to have SAS look through the data on your behalf to save time.<\/p>\n\n\n\n<p>We can do this by making use of PROC CONTENTS and SAS macro loops with the forward re-scan rule. If you&#8217;re not familiar with the forward re-scan rule, see my other blog post covering the topic. <\/p>\n\n\n\n<p>First, let us define the two input parameters we&#8217;ll need: the library we&#8217;re delving into, and the character string we&#8217;re looking for. These are the only two parameters the end user needs to edit.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*Search parameters - LIBRARY and text to search for*\/\n%let lib = %str(sashelp);\n%let searchtext = %str(cholesterol);<\/code><\/pre>\n\n\n\n<p>Next, we&#8217;ll find all the datasets in the library specified above by running PROC CONTENTS and packing the results into iterable macro variables. In this case we can allocate a maximum of 999 datasets, but you can increase this value as needed. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*Find all datasets within the library and allocate into numbered macro variables DS1, DS2, ...*\/\nproc contents data=&amp;lib.._all_ out=members noprint;\nrun;\n\nproc sql noprint;\n    select distinct memname into :ds1-:ds999 from members;\nquit;\n%let ds_num = &amp;sqlobs.;<\/code><\/pre>\n\n\n\n<p>Specify an empty results dataset. As matches are found during iteration, they will be appended to this dataset.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*Clear the results dataset. As matches are found, they will be appended here.*\/\ndata results;\n    set _null_;\nrun;<\/code><\/pre>\n\n\n\n<p>Next we will loop over the datasets allocated as DS1, DS2, DS&#8230; above. The logical outline of the loop is as follows:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*The macro logic for looping over each dataset in DS1, DS2, ...*\/\n%macro loop;\n    %do i = 1 %to &amp;ds_num.;\n        \/*For each dataset in the library, output its metadata contents*\/\n\n        \/*Check the metadata for matches of the search text. This includes the dataset name, as well as the column names and labels.*\/\n\n        \/*Check the actual data values for matches of the search text.*\/\n\n        \/*For the current dataset, set the metadata and row value results.*\/\n\n        \/*For the output dataset append the results of each loop iteration.*\/\n    %end;\n%mend loop;\n%loop;<\/code><\/pre>\n\n\n\n<p>For each step above, detailed logic follows below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*For each dataset in the library, output its metadata contents*\/\nproc contents data=&amp;lib..&amp;&amp;ds&amp;i out=c noprint;\nrun;<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*Check the metadata for matches of the search text. This includes the dataset name, as well as the column names and labels.*\/\ndata metadata;\n    length msg $500. dataset varname varlabel $100.;\n    set c;\n            \n    \/*Dataset name checking*\/\n    if index(upcase(memname), upcase(\"&amp;searchtext.\")) > 0 then do;\n        msg = \"Dataset: Found &amp;searchtext.\";\n        dataset = \"&amp;&amp;ds&amp;i\";\n        varname = \"\";\n        varlabel = \"\";\n        output;\n    end;\n\n    \/*Column name and label checking*\/\n    if index(upcase(name), upcase(\"&amp;searchtext.\")) > 0 or index(upcase(label), upcase(\"&amp;searchtext.\")) > 0 then do;\n        msg = \"Col Meta: Found &amp;searchtext.\";\n        dataset = \"&amp;&amp;ds&amp;i\";\n        varname = name;\n        varlabel = label;\n        output;\n    end;\n            \n    keep dataset varname varlabel msg;\n            \n    \/*Get rid of duplicate messages in the case of multiple identical matches.*\/\n    proc sort nodupkey; by msg dataset varname varlabel;\nrun;<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*Check the actual data values for matches of the search text.*\/\ndata actualdata;\n    length msg $500. dataset varname varlabel varval $100.;\n    set &amp;lib..&amp;&amp;ds&i;\n            \n    \/*Create an array of all character variables in the current dataset.*\/\n    array allChar _CHARACTER_;\n\n    \/*Loop over each character column and check for matches of the search text.*\/\n    do over allChar;\n        if index(upcase(allchar), upcase(\"&amp;searchtext.\")) > 0 then do;\n            msg = \"Value: Found &amp;searchtext.\";\n            dataset = \"&amp;&amp;ds&amp;i\";\n            varname = vname(allchar);\n            varlabel = vlabel(allchar);\n            varval = allchar;\n            output;\n        end;\n    end;\n    keep dataset varname varlabel varval msg;\n            \n    \/*Get rid of duplicate messages in the case of multiple identical matches.*\/\n    proc sort nodupkey; by msg dataset varname varlabel varval;\nrun;<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*For the current dataset, set the metadata and row value results.*\/\ndata allmsgs;\n    set metadata actualdata;\n    msg = upcase(msg);\nrun;<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*For the output dataset, OUT, keep appending the results of each loop iteration.*\/\ndata results;\n    set \n        results \n        allmsgs\n    ;\nrun;<\/code><\/pre>\n\n\n\n<p>The steps above conclude the inner logic of the loop. Finally, outside the loop, we can clean up the datasets we no longer need.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"sas\" class=\"language-sas line-numbers\">\/*Clean up*\/\nproc datasets library=work nolist;\n    delete members c metadata actualdata allmsgs;\nquit;<\/code><\/pre>\n\n\n\n<p>We started off with the SASHELP library, looking for the string &#8220;cholesterol&#8221;. These are the results:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"976\" height=\"251\" src=\"http:\/\/jvdl.me\/wordpress\/wp-content\/uploads\/2020\/10\/find_string_sas.png\" alt=\"\" class=\"wp-image-222\" srcset=\"https:\/\/jvdl.me\/blog\/wp-content\/uploads\/2020\/10\/find_string_sas.png 976w, https:\/\/jvdl.me\/blog\/wp-content\/uploads\/2020\/10\/find_string_sas-300x77.png 300w, https:\/\/jvdl.me\/blog\/wp-content\/uploads\/2020\/10\/find_string_sas-150x39.png 150w, https:\/\/jvdl.me\/blog\/wp-content\/uploads\/2020\/10\/find_string_sas-768x198.png 768w, https:\/\/jvdl.me\/blog\/wp-content\/uploads\/2020\/10\/find_string_sas-604x155.png 604w\" sizes=\"auto, (max-width: 976px) 100vw, 976px\" \/><\/figure>\n\n\n\n<p>Since we are searching in SASHELP, the results also include values from the SAS helper datasets, i.e., VCOLUMN and VMACRO. <\/p>\n\n\n\n<p>This search function is not case sensitive. Optimise and implement as needed for your particular scenario.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For the most part, experienced SAS programmers know where to look for the source data they need. In the pharmaceutical industry, we are familiar with CDISC standards and data structures. However, should the data standard be unfamiliar or the source datasets include new or unusual parameters, it may be prudent to have SAS look through [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[53,54],"tags":[56,57,59,60,55,58],"class_list":["post-213","post","type-post","status-publish","format-standard","hentry","category-programming","category-sas","tag-datasets","tag-find","tag-libraries","tag-library","tag-sas","tag-string"],"_links":{"self":[{"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=\/wp\/v2\/posts\/213","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=213"}],"version-history":[{"count":13,"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=\/wp\/v2\/posts\/213\/revisions"}],"predecessor-version":[{"id":227,"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=\/wp\/v2\/posts\/213\/revisions\/227"}],"wp:attachment":[{"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=213"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=213"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jvdl.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=213"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}