Data Types and Formatting

Numerics

This section discusses Python types and provides a basic introduction to strings and numeric formatting.

Python has three distinct numeric data types:

Integers
Floating Point
Complex Numbers

Booleans are a subtype of integers.

Example of manipulating mixed types.

x = 1
y = 1.5
z = x * y

nl = '\n'

print('x type:', type(x),
      nl,
      'y type:', type(y),
      nl,
      'z is:', z, 
      nl,
      'z type:', type(z))

x type: <class 'int'>   
y type: <class 'float'>   
z is: 1.5
z type: <class 'float'>

The corresponding SAS Example.

data types;
   x = 1;
   y = 1.5;
   z = x*y;

Put "x is: " x /
    "y is: " y /
    "z is: " z;

 title "SAS Reported 'type' for numeric variables";

 proc print data=sashelp.vcolumn(where=(libname='WORK' and
                                        memname='TYPES'));
   id name;
   var type;
run;

x is: 1
y is: 1.5
z is: 1.5

Unlike Python, SAS maps all numerics to type num.

Booleans

Python’s two Boolean values are True and False capitalization, as shown. In a numerical context, when used as arguments to arithmetic operations they behave like integers with values 0 (zero) for False and 1 (one) for True.

print(bool(0))

False

print(bool(1))

True

These Python objects are always False:

None
False
0 (for integer, float, and complex)
Empty strings
Empty collections such as " ", ( ), [ ], { }

Comparison Operators

Operator	Meaning
<	Strictly less than
<=	Less than or equal
>	Strictly greater than
>=	Greater than or equal
==	Equal
!=	Not equal
is	Object identity
is not	Negated object identity

Eqaulity Testing

SAS uses Boolean operators. For example, the FINDC function searches strings for characters and returns a value of 0 (zero), or false, if the search excerpt is not found in the target string.

A Python equivalence test.

x = 32.0
y = 32
if (x == y):
   print ("True. 'x' and 'y' are equal")
else:
   print("False.  'x' and 'y' are not equal")

True. 'x' and 'y' are equal

Python uses == to test for equality in contrast to SAS which uses = .

Python’s is operator for identity testing between x and y.

x = 32.0
y = 32
x is y

False

Another Python identity test.

x = 32.0
y = x
x is y

True

Boolean Tests for Empty and non-Empty Sets.

print(bool(''))

False

print(bool(' '))

True

print(bool('Arbitrary String'))

True

The above is a departure for how SAS handles missing character variables. Zero or more whitespaces (ASCII 32) assigned to a character variable is considered a missing value.

Boolean chained comparisons.

x = 20
1 < x < 100

True

Another Boolean chained comparison.

x = 20
10 < x < 20

False

Inequality Testing

Python numeric inequality.

x = 2
y = 3
x != y

True

SAS numeric inequality.

data _null_;

   x = 2;
   y = 3;

if x ^= y then put 'True';
   else put 'False';
run;

True

Boolean string inequality.

s1 = 'String'
s2 = 'string'
s1 == s2

False

The first character in object s1 is ‘S’ and the first character in object s2 is ‘s’.

SAS string inequality.

data _null_;     

   s1 = 'String';
   s2 = 'string';

if s1 = s2 then put 'True';
   else put 'False';
run;

False

in/not in

The in and not in operators perform membership tests.

'on' in 'Python is easy to learn'

True

'on' not in 'Python is easy to learn'

False

Precedence	Operation	Results
1	not x	If x is false, then True, False otherwise

2	x and y	If x is false, its value is returned; otherwise y is evaluated and the resulting value is returned False

3	x or y	If x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned

and/or

Python Boolean operators and / or precedence example.

True and False

False

True and False or True

True

(True or False) or True

True

SAS Boolean AND operator.

data _null_;

   s3 = 'Longer String';

if findc(s3,'r') ^= 0 and findc(s3,' ') ^= 0 then put 'True';
   else put 'False';
run;

True

Python Boolean or.

s4 = 'Skinny'
s5 = 'Hunger'
'y' in s4 or s5

True

SAS Boolean OR .

data _null_;

   s4 = 'Skinny';
   s5 = 'hunger';

if findc(s4,'y') ^= 0 or findc(s5,'y') ^= 0 then put 'True';
   else put 'False';
run;

True

The FINDC function searches the s4 variable left to right for the character ‘y’ returning the location for the first occurrence of the character. This results in the first half of the IF predicate to evaluate true. Since the IF predicate evaluates true, the THEN statement executes.

Numerical Precision

It is a mathematical truth that .1 multiplied by 10 produces 1 (for Base10, of course).

x = [.1] * 10
x == 1

False

So how is this possible? OK, let's try:

.1 + .1 + .1 + .1 + .1 + .1 + .1 + .1 + .1 + .1

0.9999999999999999

Turns out 0.1 cannot be represented exactly as a Base2 fraction since it is infinitely repeating. Python details are here. Use Python’s round function to return numbers rounded to a given precision after the decimal point.

nl = '\n'

total = 0
list = [.1] * 10

for i in list:
   total += i

print(nl, 
      "Boolean expression: 1 == total is:   ", 1 == total, 
      nl,
      "Boolean expression: 1 == round(total) is:", 1 == round(total),
      nl,
      "total is:", total, 
      nl,
      "total type is:", type(total))

Boolean expression: 1 == total: False  
Boolean expression: 1 == round(total): True  
total: 0.9999999999999999
total type: <class 'float'>

This issue is not unique to Python. The same challenge exists for SAS, or any other language utilizing floating-point arithmetic, which is to say nearly all computer languages.

data _null_;
   one = 1;
   total = 0;
   inc = .1;

do i = 1 to 10;
   total + inc;
   put inc ', ' @@;
end;

put;

if total = one then put 'True';
   else put 'Comparison of "one = total" evaluates: False';

if round(total) = one then put 'Comparison of "one = round(total)" evaluates: True';
   else put 'False';

put 'Total: ' total 8.3;
run;

0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 
Comparison of "one = total" evaluates: False
Comparison of "one = round(total)" evaluates: True
Total:    1.000

Strings

Strings are immutable, meaning they cannot be updated in place. Methods applied to a string such as replace or split return a copy of the modified string.

s5 = 'Hello'
s6 = "World"
nl = '\n'

print(s5, s6,
      nl,
      s5 + s6,
      nl,
     'Type for s5:', type(s5),
      nl,
     'Type for s6:', type(s6))

Hello World  
HelloWorld  
Type for s5: <class 'str'>
Type for s6: <class 'str'>

Written in SAS as:

data _null_;
   s5 = 'Hello';
   s6 = 'World';

   concat1 = cat(s5, s6);
   concat2 = cat(s5, ' ', s6);

put s5= s6= /
    concat1= /
    concat2=;
run;

s5=Hello s6=World
concat1=HelloWorld
concat2=Hello World

Like SAS, the Python Standard Library has an extensive set of string manipulation methods.

print(s5 + " " + s6.upper())

Hello WORLD

SAS UPCASE Function.

data _null_;
   s5 = 'Hello';
   s6 = 'World';

   upcase = cat(s5, ' ', upcase(s6));

put upcase;
run;

Hello WORLD

String Slicing

With a sequence of characters Python automatically creates an index with a start position of zero (0) for the first character in the sequence and increments to the end position of the string (length -1).

The general form for Python string slicing is:

string[start : stop : step]

Python provides an 'end-to-begin' indexer with a start position of -1 for the last character in the string and decrements to the beginning position.

A number of the SAS character handling function use modifiers to scan right to left.

Character           H   e   l    l    o       W    o    r    l    d
Index Value         0   1   2    3    4   5   6    7    8    9   10
R - L Index Value -11 -10  -9   -8   -7  -6  -5   -4   -3   -2   -1

Python string slicing.

s = 'Hello World'
s[0]

'H'

SAS SUBSTR Function.

data _null_;
   s       = 'Hello World';
   extract = substr(s, 1, 1);

put extract;
run;

If no start position is supplied the default is 0 (zero). The stop position following the colon (:) goes up to but does not include index position 5.

s = 'Hello World'
s[:5]

'Hello'

What happens when an index start value is greater than the length of sequence?

s          = 'Hello World'    
empty      = s[12:]
empty_len  = len(empty)
empty_bool = bool(empty)

nl = '\n'

print(nl,
     'String length:',  len(s),
      nl,
     'empty value:' ,   empty,
      nl,
     'empty length:' , empty_len,
      nl,
     'empty_bool:' ,    empty_bool)

String length: 11
empty value:
empty length: 0
empty_bool: False

What happens when the stop value is greater than the length of the string?

s = 'Hello World'
s[:12]

'Hello World'

What happens when the stop position value is -1?

s = 'Hello World'
s[3:-1]

'lo Worl'

If we want to include the last letter in this sequence, leave the stop index value blank.

s = 'Hello World'
s[3:]

'lo World'

Python's quoting escape character is backslash (\).

q1   = 'Python\'s capabilities'
q2   = "Python's features"
path = 'C:\\Path\\to\\Nowhere'

nl = '\n'

print(nl,
     'q1:  ',   q1,
      nl,
     'q2:  ',   q2,
      nl,
     'path:',   path)

q1:   Python's capabilities  
q2:   Python's features
path: C:\Path\to\Nowhere

Formatting Strings

Formatting strings involve defining a string constant containing one or more format codes. Format codes are fields to be replaced enclosed by curly braces { }. Anything not contained in the replacement field is considered literal text. Format arguments to be substituted into replacement fields use keyword (gender, e.g.) or positional ({0}, {1} etc.) arguments.

'The subject\'s gender is {0}'.format("Female")

"The subject's gender is Female"

Format method with positional arguments illustrating multiple positional arguments. Positional arguments can be called in any order.

nl = '\n'
scale = 'Ratings are: {0} {1} or {2}'
scale_out1 =scale.format('1. Agree', '2. Neutral', '3. Disagree')

scale = 'Rating are: {2} {0} {1}'
scale_out2 =scale.format('1. Agree', '2. Neutral', '3. Disagree')

print(nl,
      'scale_out1:', scale_out1,
      nl,
      'scale_out2:', scale_out2)

scale_out1: Ratings are: 1. Agree 2. Neutral or 3. Disagree
scale_out2: Rating are: 3. Disagree 1. Agree 2. Neutral

The format method also accepts keyword= arguments.

location = 'Subject is in {city}, {state} {zip}'
location.format(city='Denver', state='CO', zip='80218')

'Subject is in Denver, CO 80218'

Combining positional and keyword arguments together.

location = 'Subject is in {city}, {state}, {0}'
location.format(80218, city='Denver', state='CO'

'Subject is in Denver, CO, 80218'

Formatting Integers

The pattern for applying formats to integers is similar to that of strings.

'pi estimate: {:>20}'.format("3.14159265")

'pi estimate:           3.14159265'

Combining format specifications.

print("{:>10,d}\n".format(123456789),
"{:>10,d}".format(1089))

123,456,789
      1,089

Integers can be displayed with their corresponding octal, hexadecimal, and binary representation.

"99 is: {0:x} hex, {0:o} octal, {0:b} binary".format(99)

'99 is: 63 hex, 143 octal, 1100011 binary'

The analog SAS program.

data _null_;
   input int 8.;
   int_left = left(put(int, 8.));

   put 'int:      ' int_left /
       'hex:      ' int hex2. /
       'octal:    ' int  octal. /
       'binary:   ' int binary8. /
       'Original: ' _infile_;
list;
datalines;
99
;;;;
run;

int:      99
hex:      63
octal:    143
binary:   01100011
Original: 99
RULE:      ----+----1----+----2----+----3----+----4----+----5
15         99

For Python and SAS the default is to display integer values without leading zeros (0) or a plus (+) sign to indicate positive integer values. This behavior can be altered.

'Integer 99 displayed as {:04d}'.format(99)

'Integer 99 displayed as 0099'

The analog SAS program uses the SAS-supplied Zw.d format.

data _null_;
   input int 8.;
   display_int = put(int, z4.);

put 'int:      ' display_int/
    'Original: ' _infile_;

list;
datalines;
99
;;;;
run;

int:      0099
Original: 99
RULE:      ----+----1----+----2----+----3----+----4----+----5
13         99

Insert leading plus sign (+).

'{:+3d}'.format(99)

'+99'

Analog SAS program using PROC FORMAT

proc format;
   picture plussign
           0 - 99 = '  00' (prefix='+');
run;

data _null_;
   input int 8.;

put 'int:      ' int plussign. /
    'Original: ' _infile_;
list;
datalines;
99
;;;;
run;

int:      +99
Original: 99
RULE:      ----+----1----+----2----+----3----+----4----+----5
15         99

Formatting Floats

Format specifications for controlling how many significant digits to the right of the decimal to display.

"precision: {0:.1f} or {0:.4f}".format(3.14159265)

'precision: 3.1 or 3.1416'

For both Python and SAS the percent format multiples the resulting number by 100 and places a trailing percent (%) sign at the end.

"6.33 as a Percentage of 150: {0:.2%}".format(6.33/150)

'6.33 as a Percentage of 150: 4.22%'

Analog SAS porgram using the PERCENTw.d format.

data _null_;
   pct = 6.33 / 150;

put '6.33 as a percentage of 150: ' pct percent8.2;
run;

6.33 as a percentage of 150:   4.22%

Formatting Datetimes

Python date, datetime and time objects support the strftime method which is used to derive a string representing either dates or times from date, datetime and time objects. This string is manipulated with directives to produce the desired appearances when displaying output.

from datetime import datetime, date, time

now = datetime.now()
nl = '\n'

print(nl,
      'now:     ', now,
      nl,
      'now type:', type(now))

now:      2018-12-30 16:21:51.880366
now type: <class 'datetime.datetime'>

Introducing formatting directives for datetime formatting. More detailed formatting examples are HERE.

from datetime import datetime, date, time

nl = '\n'
now = datetime.now()

print(nl,
      'now:    ' , now,
      nl         , 
      'Year:   ' , now.strftime("%Y"), 
      nl         ,
      'Month:  ' , now.strftime("%B"), 
      nl         ,
      'Day:    ' , now.strftime("%d"), 
      nl, nl     , 
      'concat1:' , now.strftime("%A, %B %d, %Y A.D."), 
      nl, 
      'datetime:' , now.strftime("%c"))

now:     2018-012-30 16:28:37.072155
Year:    2018
Month:   December
Day:     30

concat1: Sunday, December 30, 2018 A.D.
datetime: Sun Dec 30 16:28:37 2018