This blog is mainly used to take notes related to SAS.

DECLARE: All stories and plots mentioned in this article is made-up, including names, locations, time and etc.

Table

June

20210616

CALL SYMPUT: Assigns a value produced in a DATA step to a macro variable.

CALL SYMPUT(macro-variable, value);

Link

I have shared one of my frind G’s career stories here before, when I just wrote this blog. But out of respect for his privacy, I decided to delete it.

There are more than that, but I guess G is too tired to share(complain) his career bullying story with me anymore. I believe he knows complaining can’t change anything, it’s much better to improve and arm ourself in order to get rid of the circle that we are unsatisfied with.

“Keep learning” Might He think.

Grow faster, grow stronger.

20210617

%sysfunc(): Execute SAS functions or user-written functions.

%SYSFUNC(function(argument(s))<, format>);

where function is the name of the function to execute. This function can be a SAS function, a function written with SAS/TOOLKIT software, or a function created using the FCMP Procedure in Base SAS Procedures Guide. The function CANNOT be a macro function. format is an optional format to apply to the result of function. This format can be provided by SAS, generated by PROC FORMAT, or created with SAS/TOOLKIT. There is no default value for format. If you do not specify a format, the SAS macro facility does not perform a format operation on the result and uses the default of the function.

PWD: Print Working Directory

SYSGET: function, returns the value of the specified operating-environment variable.

GETOPTION: function, returns the value of a SAS system or graphics option.

%EVAL: Evaluates arithmetic and logical expressions using integer arithmetic.

%EVAL operates by converting its argument from a character value to a numeric or logical expression. Then, it performs the evaluation. Finally, %EVAL converts the result back to a character value and returns that value.

libname … INENCODING=, OUTENCODING=: The INENCODING= option is used to read SAS data sets in the SAS library. The OUTENCODING= option is used to write SAS data sets in the SAS library.

OPTIONS MISSING = System option: Specifies the character to print for missing numeric values.

OPTIONS NONUMBER: Suppress page numbers.

OPTIONS NOTES: Specifies whether notes are written to the SAS log.

OPTIONS SOURCE: Specifies whether SAS writes source statements to the SAS log.

Similarly for arguments including mprint and mlogic.

OPTIONS FMTSEARCH: Specifies the order in which format catalogs are searched.

OPTIONS SASAUTOS: Specifies the location of one or more autocall libraries.

options ps = &stndrdps ls =&stndrdls nodate center nonumber missing = " ";
options notes source mprint mlogic symbolgen=yes msglevel=i  validvarname=v7;
options fmtsearch=(raw tab ana) sasautos = ('!SASROOT/sasautos/' "&wkdir/macros/" "&wkdir/macros/utilities/");

20210618

Remember CALL SYMPUT: Assigns a value produced in a DATA step to a macro variable.

CALL SYMPUT(macro-variable, value);

Now, CALL SYMPUTX: Assigns a value to a macro variable, and removes both leading and trailing blanks.

CALL SYMPUTX(macro-variable, value <, symbol-table>);

which is doing the same work as

call symput(macro, trim(left(value)));

NOBS: a SAS automatic variable which contains the number of records in the data set named in the set statement.

data _NULL_;
	if 0 then set sashelp.class nobs=n;
	call symputx('nrows',n);
	stop;
run;
%put nobs=&nrows;

PUT: Use the PUT function to convert a numeric value to a character value. You cannot use the PUT function to directly change the type of variable in a data set from numeric to character. However, you can create a new character variable as the result of the PUT function.

INPUT: The INPUT function enables you to convert the value of source by using a specified informat. The informat determines whether the result is numeric or character. Use INPUT to convert character values to numeric values or other character values.

var1 = PUT(source, format.);
var2 = INPUT(source, <? | ??> informat.);

20210621

Sick.

20210622

best.: SAS chooses the best notation when a format is not specified for writing a numeric value. For example,

var = put(x, best.);

name: appears when using a tranpose step, so let’s talk about the transpose step.

20210623

PROC TRANSPOSE:

STRIP: Returns a character string with all leading and trailing blanks removed.

Put proc sort into a data step

data ...;

statements;

proc sort; 
by ...;

run;

20210624

index(source, string);: Searches a character expression for a string of characters, and returns the position of the string’s first character for the first occurrence of the string.

20210625

SO FAR SO GOOD.

20210628

Array: a list of variables(columns) in SAS. See here.

PROC COMPARE: Compares the contents of two SAS data sets, selected variables in different data sets, or variables within the same data set. See here.

libname proclib 'SAS-library';
options nodate pageno=1 linesize=80 pagesize=40;
proc compare base=proclib.one compare=proclib.two nosummary;
   var gr1;
   with gr2;
   title 'Comparison of Variables in Different Data Sets';
run;

20210629

I have been struggled with three variables for a whole day, they are VISIT, VISITNUM and VISITDY. I am still confused with how to program these three variables when I wrote this line. There are several documents/sheets involving, I need to refer to different document from time to time… And then I lost myself. It could be something wrong with the guide, or more possibly, I sucked.

Okay, anyway, let me reorganize my mind. I have several resoueces here:

  • The original dataset sheet (IE/PE), which requires me to add these three variables
  • The METHODS sheet told me how to program VISIT
  • The (TV) sheet should be referred to generate VISIT
  • Some other datasets including TV and SV

Now, when I worked on the VISITNUM & VISITDY, the programming notes said “Assign from MAPPING.TV.VISITNUM for Schedule visits for Unschedule visits mapp from MAPPING.SV.VISITNUM”. So I had my first question here, how could I tell whether it is schedule or not? I didn’t see any variable regarding this, if my memory is correct…

Okay let’s talk about the VISIT variable, like I said, the METHODS sheet is used to program VISIT, but I met the same problem like I mentioned above, what’s schedule and unscheduled visit?

Besides this, it said “Refer to (TV), set VISIT from FOLDERNAME and INSTANCENAME. “ Then I got confused again… I guess I know what it means generally but I can’t be totally sure about my guess. In a word, I am not very sure about how to assign values to the VISIT variable.

Now I am waiting for my manager’s feedback, hopefully he could help me out.

One of the most challenging things I met so far is the meaning of various terms and abbreviations… Because the project is related to clinical research, it always involves with kinds of trials and treatments. To some extent, those abbreviations bring much more problems for me. Oh, the standard could be referred to CDISC (Clinical Data Interchange Standards Consortium).

20210630

Finally finished the problem I mentioned above… Next step is to work on the CV and PR…

This is the first time that I don’t have an example even the raw data when dealing with a sdtm dataset. I guess I need to spend more time on them for the moment.

July

20210701

Today is Canada Day, as well as CPC(Communist Party of China) Founding Day.

20210702

Jokes on myself. The CV(Cardiovascular System) dataset is based on the raw ECHO(Echocardiogram/MUGA) instead of cv (actually there is no raw data called CV.)

Until just now I figured out several variables only. The worse thing is, the person who are expected to help and lead me with the programming rejected to provide his help. And the person he assigned to me for asking is in a totally opposite working time zone. Well, what a lucky dog as i am!

in operator: note add , among values, such as x in ('a', 'b', 'c');.

20210705

Now it is 3:40 am, and I am working. Yes, because I want to finish these two datasets, but until now, I am still confused about the specs in the document.

And again, the worst thing is, I basically have no one to ask. Honestly, I feel very very very terrible now. These whole bunch of things makes me feel like an idiot.


I was stuck at somewhere early this morning, around 5 am I guess. Then just now, after digging for a while, I found this dataset is not defined in the attrib document, which is used to add labels, order and normalize variables. See, this is another tricky point as well as one of the hardest things in my work, because it’s not guaranteed that everyone’s work is accurate 100%, but usually everyone’s work is dependent on others (although no one rely on mine so far).

Anyway, as a REMINDER: when something wrong with the %m_attr() step in your program but no errors happened before, check the attrib.csv file, some variables may be missed over there.

20210706

Finally I finished those two datasets, but with the reminder from another team member, I found there are still some problems with my previous work…

Anyway, keep working! BTW, to have a chat with that teammate and don’t bother her too much, I worked at midnight again from 11 pm to 12:45 am (July 7th)… I guess next time I should contact her earlier, better before her lunch time. But the result is good. Here is what I learned:

In a raw SDTM dataset, there could be some datetime variables such as –STDAT and –STDAT_RAW (usually the former one is numeric/datetime type and the latter is character), and in the spec, the programming notes for a –STDTC variable could be “set to –STDAT”.

Note, this doesn’t mean you are actually expected to use –STDAT to create the required variable, it could be –STDAT_RAW. Since they essentially stand for the same thing/date, so you really need to look at the macro %miso() and decide which variable you should go with.

20210707

Nothing big deal. Revised my previous work and had a meeting. In the next few days, I think I will spend most of my time learning spec of several datasets including DM TA TE TD TI TS and TV.

20210708

One of the most torturous things as a new statistical programmer at a CRO the is the standards/specifications of various datasets, that is, CDISC.

I am glad there are some good articles online, even those written in Chinese. They definitely help me a lot.

20210712

For some large or complex datasets, we usually need to construct several subsets based on different raw datasets and then merge them together, usually by subject. Under this case, we could first make sure the topic variable and finish general variables and those variables defined in the same dataset. Then based on the different data source, we could finish the rest variables group by group.


OMG I have to say the naming convention of CDISC is soooooo tough. Many of them share the similar abbreviations, although I can find their full names, I am still confused about their relations and differences sometimes. I think this is just due to my lack of knowledge regarding the clinical trials, perhaps.

COMPRESS(source ,< characters> ,< modifier(s)>): Returns a character string with specified characters removed from the original string. Examples:

data one;
   a='AB C D ';
   b=compress(a);
   put b=;
run;

b=ABCD

data _null_;
   x='123–4567–8901 B 234–5678–9012 c';
   y=compress(x, 'ABCD', 'l');
   put y=;
run;

y=123-4567-8901 234-5678-9012

COMPBL(source): Removes multiple blanks from a character string. Note, The COMPRESS function removes every occurrence of the specific character from a string. If you specify a blank as the character to remove from the source string, the COMPRESS function removes all blanks from the source string, The COMPBL function compresses multiple blanks to a single blank and has no effect on a single blank. For example,

data one;
   string='Hey
   Diddle Diddle';
   string=compbl(string);
   put string=;
run;

string=HeyDiddle Diddle

data one;
string='125 E Main St';
length address $10;
address=compbl(string);
put address=;
run;

address=125 E Main

20210713

Fail to save.

20210714

Fail to save.


It turns out I should remind myself to submit it after making any changes everytime :(


20210715 - 20210716

PUT() VS INPUT()

  • PUT() always creates character variables
  • INPUT() can create character or numeric variables based on the informat
  • The source format must match the source variable type in PUT()
  • The source variable type for INPUT() must always be character variables

20210719

Please remember to click Commit changes before leaving next time! PLEASE!

I wrote many things today but since I closed my laptop without saving, they are just gone.

20210720

IFC(logical-expression, value-returned-when-true, value-returned-when-false <, value-returned-when-missing>): Returns a character value based on whether an expression is true, false, or missing.

retain: Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.

data ...;
   ...
   array City{3} $ City1-City3;
   array cp{3} Citypop1-Citypop3;
   retain Year Taxyear 1999 City ' ' cp (10000,50000,100000);
   ...
run;

Year and Taxyear are assigned the initial value 1999.
City1, City2, and City3 are assigned missing values.
Citypop1 is assigned the value 10000.
Citypop2 is assigned 50000.
Citypop3 is assigned 100000.


Although I am very very very unpleasant with my current position, it’s still a good chance for me to exercise my mind and tolerance. Keep working, keep asking, please do NOT step back even if your path ahead is full of enemy and challenge.

20210721

From now I will move forward to study the intervention datasets, which reminds me the sad fact that my notes about the special purpose datasets was gone without saving.

20210722

Although I still have some questions about the special purpose datasets, I think I should move forward for now, maybe later I will figure them out at some point.

20210723

Failed to save.

Exposure Domains

Clinical trial study designs can range from open label (where subjects and investigators know which product each subject is receiving) to blinded (where the subject, investigator, or anyone assessing the outcome is unaware of the treatment assignment(s) to reduce potential for bias). To support standardization of various collection methods and details, as well as process differences between open-label and blinded studies, two SDTM domains based on the Interventions General Observation Class are available to represent details of subject exposure to protocol-specified study treatment(s).

  • EXPOSURE (EX): An interventions domain that contains the details of a subject’s exposure to protocol-specified study treatment. Study treatment may be any intervention that is prospectively defined as a test material within a study, and is typically but not always supplied to the subject
  • Exposure as Collected (EC): An interventions domain that contains information about protocol-specified study treatment administrations, as collected.

20210726

When some dataset has a supplement dataset, perhaps take a look at the ValueLevel sheet of it first, so you could prepare the variables required in SUPP– while creating the dataset, which will help you save a lot of time.


cmiss(argument-1 <, argument-2,…, argument-n>): Counts the number of missing arguments.

nmiss(argument-1 <, argument-2,…, argument-n>): Counts the number of missing numeric arguments.

Note: The CMISS function does not convert any argument. The NMISS function converts all arguments to numeric values.

SCAN(string, count <, character-list <, modifier»): Returns the nth word from a character string. (Remember the order of arguments.)

20210727

Procedures (PR): An interventions domain that contains interventional activity intended to have diagnostic, preventive, therapeutic, or palliative effects.

20210728

Failed to save… :(

20210729

Domain Code Domain Name Description
AE Adverse Events An events domain that contains data describing untoward medical occurrences in a patient or subjects that are
administered a pharmaceutical product and which may not necessarily have a causal relationship with the treatment.    
BE Biospecimen Events  
CE Clinical Events An events domain that contains clinical events of interest that would not be classified as adverse events.
DE Device Events  
DS Disposition An events domain that contains information encompassing and representing data related to subject disposition.
DT Device Tracking and Disposition  
DV Protocol Deviations An events domain that contains protocol violations and deviations during the course of the study.
HO Healthcare Encounters A events domain that contains data for inpatient and outpatient healthcare events (e.g., hospitalization, nursing home stay, rehabilitation facility stay, ambulatory surgery).
MH Medical History The medical history dataset includes the subject’s prior history at the start of the trial. Examples of subject medical history information could include general medical history, gynecological history, and primary diagnosis.

Z Format: Writes standard numeric data with leading 0s.

20210730

It is great! I got some different work to do now. They are about TLG(TLF), and my job is to do five listings. Let’s do it!

Well, it seems that I forgot to save my notes, AGAIN…

August

When I worked on the TLGs, I found the most difficult point is to find the corresponding variables among the ADAM datasets.

20210803

I could find the variables required to build the table but now there is something wrong with the proc report process, which is very strange to me.

I checked with another two of my colleagues, they both think finishing 5 listings within two days is a pretty pushing (actually I believe they need to take two or more days to finish one listing at the beginning). Well, what can I say? Nothing. Here I am. I am not sure whether I can finish them before the DDL, but I could promise that I would try my best! Cheers!

OK, having said that. Given so many problems I met when dealing with the first listing assignment, I think I better ignore the bugs at first and focus on the variable selections. After constructing the datasets I could then move forward to solving these programming bugs, since most of them are pretty common I believe!

Great, let do it!


Now it’s 11:23 pm. I decided to add my extra work time to my time sheet. I wanna use the F word here but to be a gentleman, I decide to save it until I leave this company, at least this team.

Btw, because of this new assignment, I ruined my girlfriend party, which is another reason why I don’t think I will stay here for a long time. If I worked so hard but still don’t get a happy life, then why should I insist on the same work? Time to change.

20210804

Now it’s 1:20 am. I guess I may understand a little bit why the team lead think I could finish 5 listings within 2 days. In my opinion, I wish to figure out every detail of each listing, but for them, they just want me to modify the existing example program a little bit to fit the requirement of the TLGs shell. My purpose is to learn during this process, but they just focused on the efficiency and results, instead of what I learned in this process. I believe if one company wish to accomplish more outstanding achievements, they must generously allow their employees to learn and get used to the specific working pattern. Otherwise, this company’s culture will never attract other great candidates.


ODS ESCAPECHAR= ‘escape-character’; : specifies the special character that identifies the inline formatting symbol. The escape-character should be one of the following rarely used characters: @, ^, or . (escape character: 转义字符).

For example,

title "test of ^{super ^{style [color=red] red ^{style [color=green] green} and ^{style [color=blue] blue }formatting }} and such" ;

The ^{super<text> is invoked to start using the superscript function. Then the style function is used to add another style attribute, ^{style [color=red]<text> for your text.

Column statement in Report procedure: COLUMN column-specification(s); describes the arrangement of all columns and of headings that span more than one column.

column-specification(s) is one or more of the following:

  • report-item(s)
  • report-item-1, report-item-2 <. . . , report-item-n>
  • (‘header-1 ‘ < . . . ‘header-n ‘> report-item(s) )
  • report-item=name
proc report data=grocery;
	column sector sales,min;
	define sector/group;
	define sales/analysis sum;
run;

20210805

ANYDTDTE Informat: Reads and extracts the date value from various date, time, and datetime forms.

ANYDTDTM Informat: Reads and extracts datetime values from various date, time, and datetime forms.

ANYDTTME Informat: Reads and extracts time values from various date, time, and datetime forms.

20210806

COMPUTE Statement in the PROC REPORT procedure: Starts a compute block containing one or more programming statements that PROC REPORT executes as it builds the report.

The code below would add your text just between any titles and the column headings.

compute before _page_;
   line @1 "~S={bordertopcolor=white borderbottomcolor=black borderbottomwidth=1}Your text.";  
endcomp;

Here is an example to add a bottom line to the column: (Or try "^S={borderbottomcolor=black borderbottomwidth=2}Spanned Header")

proc report nowd data=sashelp.class 
  style(report)=[rules=none frame=void] 
  style(header)=[borderbottomcolor=black borderbottomwidth=2 background=_undef_]
;
  column
    ('^{style[borderbottomcolor=white borderbottomwidth=2
        bordertopcolor=black bordertopwidth=2]}' name) 
    ('^{style[borderbottomcolor=black borderbottomwidth=2
        bordertopcolor=black bordertopwidth=2]spanned header}' 
    age sex weight height)
  ;

  compute after _page_ / style={bordertopcolor=black bordertopwidth=2};
    line ' ';
  endcomp;
run;

I think it’s good to spend some time writing a summary after finishing the five listings:

  1. TLGs are mainly based on the analysis datasets (ADaMs), but sometimes they also depend on some SDTM datasets. For example, when worked on the listing for the protocol deviations, I used the DV and SUPPDV in the SDTM. The reason why this happened is simple, because ADaM datasets may not contain all information required for the TLGs.
  2. In the same study, the TLG shells may differ at different stages, but not too much.
  3. Basically, when doing TLGs, there are mainly several parts: i) prepare the title and table settings; ii) prepare your data (this part usually takes the longest time); iii) Generate titles and footnotes; iv) Generate reports (usually takes much time preparing the required format). These four should be quite standard in every study, but not absolutely.

20210809

Okay, now I got another 5 listings to do, but without examples, which means I have to find the corresponding variables to create the listings by myself. And this is THE PROBLEM. Because I haven’t create any ADaM datasets, I don’t know which one I should use actually. Anyway, cheers.

So far so good. Except that I have to look through the variable list in the ADaM dataset. One problem is I don’t know how to remove the WARNING when creating report for a null dataset.

20210810

Refer to this article Compute Block Basics – Part I Tutorial for Compute block in the proc report process.

20210811

This is a list of issues I met so far:

  1. I am not sure whether the treatment should be determined by the variable cohort.

Yes

  1. How to output a sentence saying there is no data without generating any warning when working in a proc report procedure?

Try to understand the following code:

%macro empty;
  %if &nobs =0  %then %do;
    data final;
      length col1-col5 $200;
      subjid="";
      cohort="";
      *col0="Number of subjects with any AE";
      col1="No violation reported.";
      col2="";
      col3="";
      col4="";
      col5="";
    run;
  %end;
%mend;
%empty;

The core idea is to create a null dataset before the proc report process, if the there is no data in the desired dataset.

  1. When working on variables about frequency, sometimes I don’t know what the abbreviation stands for.

Still not clear, but doesn’t matter a lot.

FILE Statement: Specifies the current output file for PUT statements.

FILE file-specification <device-type> <options> <operating-environment-options>;

ATTRN(data-set-id, attribute-name): Returns the value of a numeric attribute for a SAS data set.

CMISS(arguments…): counts the number of missing values, for character and numeric.

20210812

My wrist got hurtful again today. I guess this is due to my typing in the previous days… This feeling is terriable. It’s so unbearable. But I am glad I basically finished my work yesterday so I got some time to rest it.

20210813

Had a meeting with one of my colleagues. Have a better understanding then.

20210816

Modify previous programs.

20210817

Modify previous programs.

Well, it seems that I have to be more careful about the details in the specs/shells.

20210818

How to solve WARNING: Multiple lengths were specified for the BY variable USUBJID by input data sets. This might cause unexpected results. ?

Ans

The fix if you actually have data issues can be very simple by adding a length statement before the SET statement. The length statement should use the larger value that appears for either data set.

20210819

Using the compute block to add a blank line break in proc report:

proc repot ...;
	column ...;
	
	define var1 /order ... ;
	define ...;
	...
	define ...;
	
	compute after var1;
	line " ";
	endcomp;
run;

Here is a problem that I met yesterday but haven’t figured out:

How to display the ordered variable at the first row across pages?

The optimal solution I found is to use the spanrows argument in the proc report, but it only fits for the TAGSETS.RTF file, excluding the plain rtf file.

20210823

Ridiculous.

20210825

COMPRESS(source ,< characters> ,< modifier(s)>): Returns a character string with specified characters removed from the original string.

20210826

time w.d : time format. tod w.d : time format with leading zeros.

**PROC PRINTTO ; **: Defines destinations, other than ODS destinations, for SAS procedure output and for the SAS log.

September

Update Spet 15, 2021

I have stopped taking notes for a while, but this doesn’t mean I didn’t learn anything new in the past few days. On the contrary, I think I may have learned more and become more practiced with SAS clinical programming jobs. But after thinking twice, I still decide to make some changes because I realized I am not in favour of my current job and the environment. If possible, I would take some time to record what happened in the last few months.

Reference

Here is a list of articles that I found useful.

如何判断数据属于哪一个SDTM域?

How can I learn TLF, SDTM, ADAM with SAS on myself?

Compute Block Basics – Part I Tutorial