Numerics
This section discusses Python types and provides a basic introduction to strings and numeric formatting.
Python has three distinct numeric data types:
- Integers
- Floating Point
- Complex Numbers
Booleans are a subtype of integers.
Example of manipulating mixed types.
x = 1
y = 1.5
z = x * y
nl = '\n'
print('x type:', type(x),
nl,
'y type:', type(y),
nl,
'z is:', z,
nl,
'z type:', type(z))
x type: <class 'int'>
y type: <class 'float'>
z is: 1.5
z type: <class 'float'>
The corresponding SAS Example.
data types;
x = 1;
y = 1.5;
z = x*y;
Put "x is: " x /
"y is: " y /
"z is: " z;
title "SAS Reported 'type' for numeric variables";
proc print data=sashelp.vcolumn(where=(libname='WORK' and
memname='TYPES'));
id name;
var type;
run;
x is: 1
y is: 1.5
z is: 1.5
Unlike Python, SAS maps all numerics to type num.
Booleans
Python’s two Boolean values are True and False capitalization, as shown. In a numerical context, when used as arguments to arithmetic operations they behave like integers with values 0 (zero) for False and 1 (one) for True.
print(bool(0))
False
print(bool(1))
True
These Python objects are always False:
- None
- False
- 0 (for integer, float, and complex)
- Empty strings
- Empty collections such as " ", ( ), [ ], { }
Comparison Operators
Operator | Meaning |
---|---|
< | Strictly less than |
<= | Less than or equal |
> | Strictly greater than |
>= | Greater than or equal |
== | Equal |
!= | Not equal |
is | Object identity |
is not | Negated object identity |
Eqaulity Testing
SAS uses Boolean operators. For example, the FINDC function searches strings for characters and returns a value of 0 (zero), or false, if the search excerpt is not found in the target string.
A Python equivalence test.
x = 32.0
y = 32
if (x == y):
print ("True. 'x' and 'y' are equal")
else:
print("False. 'x' and 'y' are not equal")
True. 'x' and 'y' are equal
Python uses == to test for equality in contrast to SAS which uses = .
Python’s is operator for identity testing between x and y.
x = 32.0
y = 32
x is y
False
Another Python identity test.
x = 32.0
y = x
x is y
True
Boolean Tests for Empty and non-Empty Sets.
print(bool(''))
False
print(bool(' '))
True
print(bool('Arbitrary String'))
True
The above is a departure for how SAS handles missing character variables. Zero or more whitespaces (ASCII 32) assigned to a character variable is considered a missing value.
Boolean chained comparisons.
x = 20
1 < x < 100
True
Another Boolean chained comparison.
x = 20
10 < x < 20
False
Inequality Testing
Python numeric inequality.
x = 2
y = 3
x != y
True
SAS numeric inequality.
data _null_;
x = 2;
y = 3;
if x ^= y then put 'True';
else put 'False';
run;
True
Boolean string inequality.
s1 = 'String'
s2 = 'string'
s1 == s2
False
The first character in object s1 is ‘S’ and the first character in object s2 is ‘s’.
SAS string inequality.
data _null_;
s1 = 'String';
s2 = 'string';
if s1 = s2 then put 'True';
else put 'False';
run;
False
in/not in
The in and not in operators perform membership tests.
'on' in 'Python is easy to learn'
True
'on' not in 'Python is easy to learn'
False
Precedence | Operation | Results |
---|---|---|
1 | not x | If x is false, then True, False otherwise |
2 | x and y | If x is false, its value is returned; otherwise y is evaluated and the resulting value is returned False |
3 | x or y | If x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned |
and/or
Python Boolean operators and / or precedence example.
True and False
False
True and False or True
True
(True or False) or True
True
SAS Boolean AND operator.
data _null_;
s3 = 'Longer String';
if findc(s3,'r') ^= 0 and findc(s3,' ') ^= 0 then put 'True';
else put 'False';
run;
True
Python Boolean or.
s4 = 'Skinny'
s5 = 'Hunger'
'y' in s4 or s5
True
SAS Boolean OR .
data _null_;
s4 = 'Skinny';
s5 = 'hunger';
if findc(s4,'y') ^= 0 or findc(s5,'y') ^= 0 then put 'True';
else put 'False';
run;
True
The FINDC function searches the s4 variable left to right for the character ‘y’ returning the location for the first occurrence of the character. This results in the first half of the IF predicate to evaluate true. Since the IF predicate evaluates true, the THEN statement executes.
Numerical Precision
It is a mathematical truth that .1 multiplied by 10 produces 1 (for Base10, of course).
x = [.1] * 10
x == 1
False
So how is this possible? OK, let's try:
.1 + .1 + .1 + .1 + .1 + .1 + .1 + .1 + .1 + .1
0.9999999999999999
Turns out 0.1 cannot be represented exactly as a Base2 fraction since it is infinitely repeating. Python details are here. Use Python’s round function to return numbers rounded to a given precision after the decimal point.
nl = '\n'
total = 0
list = [.1] * 10
for i in list:
total += i
print(nl,
"Boolean expression: 1 == total is: ", 1 == total,
nl,
"Boolean expression: 1 == round(total) is:", 1 == round(total),
nl,
"total is:", total,
nl,
"total type is:", type(total))
Boolean expression: 1 == total: False
Boolean expression: 1 == round(total): True
total: 0.9999999999999999
total type: <class 'float'>
This issue is not unique to Python. The same challenge exists for SAS, or any other language utilizing floating-point arithmetic, which is to say nearly all computer languages.
data _null_;
one = 1;
total = 0;
inc = .1;
do i = 1 to 10;
total + inc;
put inc ', ' @@;
end;
put;
if total = one then put 'True';
else put 'Comparison of "one = total" evaluates: False';
if round(total) = one then put 'Comparison of "one = round(total)" evaluates: True';
else put 'False';
put 'Total: ' total 8.3;
run;
0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 , 0.1 ,
Comparison of "one = total" evaluates: False
Comparison of "one = round(total)" evaluates: True
Total: 1.000
Strings
Strings are immutable, meaning they cannot be updated in place. Methods applied to a string such as replace or split return a copy of the modified string.
s5 = 'Hello'
s6 = "World"
nl = '\n'
print(s5, s6,
nl,
s5 + s6,
nl,
'Type for s5:', type(s5),
nl,
'Type for s6:', type(s6))
Hello World
HelloWorld
Type for s5: <class 'str'>
Type for s6: <class 'str'>
Written in SAS as:
data _null_;
s5 = 'Hello';
s6 = 'World';
concat1 = cat(s5, s6);
concat2 = cat(s5, ' ', s6);
put s5= s6= /
concat1= /
concat2=;
run;
s5=Hello s6=World
concat1=HelloWorld
concat2=Hello World
Like SAS, the Python Standard Library has an extensive set of string manipulation methods.
print(s5 + " " + s6.upper())
Hello WORLD
SAS UPCASE Function.
data _null_;
s5 = 'Hello';
s6 = 'World';
upcase = cat(s5, ' ', upcase(s6));
put upcase;
run;
Hello WORLD
String Slicing
With a sequence of characters Python automatically creates an index with a start position of zero (0) for the first character in the sequence and increments to the end position of the string (length -1).
The general form for Python string slicing is:
string[start : stop : step]
Python provides an 'end-to-begin' indexer with a start position of -1 for the last character in the string and decrements to the beginning position.
A number of the SAS character handling function use modifiers to scan right to left.
Character H e l l o W o r l d Index Value 0 1 2 3 4 5 6 7 8 9 10 R - L Index Value -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
Python string slicing.
s = 'Hello World'
s[0]
'H'
SAS SUBSTR Function.
data _null_;
s = 'Hello World';
extract = substr(s, 1, 1);
put extract;
run;
H
If no start position is supplied the default is 0 (zero). The stop position following the colon (:) goes up to but does not include index position 5.
s = 'Hello World'
s[:5]
'Hello'
What happens when an index start value is greater than the length of sequence?
s = 'Hello World'
empty = s[12:]
empty_len = len(empty)
empty_bool = bool(empty)
nl = '\n'
print(nl,
'String length:', len(s),
nl,
'empty value:' , empty,
nl,
'empty length:' , empty_len,
nl,
'empty_bool:' , empty_bool)
String length: 11
empty value:
empty length: 0
empty_bool: False
What happens when the stop value is greater than the length of the string?
s = 'Hello World'
s[:12]
'Hello World'
What happens when the stop position value is -1?
s = 'Hello World'
s[3:-1]
'lo Worl'
If we want to include the last letter in this sequence, leave the stop index value blank.
s = 'Hello World'
s[3:]
'lo World'
Python's quoting escape character is backslash (\).
q1 = 'Python\'s capabilities'
q2 = "Python's features"
path = 'C:\\Path\\to\\Nowhere'
nl = '\n'
print(nl,
'q1: ', q1,
nl,
'q2: ', q2,
nl,
'path:', path)
q1: Python's capabilities
q2: Python's features
path: C:\Path\to\Nowhere
Formatting Strings
Formatting strings involve defining a string constant containing one or more format codes. Format codes are fields to be replaced enclosed by curly braces { }. Anything not contained in the replacement field is considered literal text. Format arguments to be substituted into replacement fields use keyword (gender, e.g.) or positional ({0}, {1} etc.) arguments.
'The subject\'s gender is {0}'.format("Female")
"The subject's gender is Female"
Format method with positional arguments illustrating multiple positional arguments. Positional arguments can be called in any order.
nl = '\n'
scale = 'Ratings are: {0} {1} or {2}'
scale_out1 =scale.format('1. Agree', '2. Neutral', '3. Disagree')
scale = 'Rating are: {2} {0} {1}'
scale_out2 =scale.format('1. Agree', '2. Neutral', '3. Disagree')
print(nl,
'scale_out1:', scale_out1,
nl,
'scale_out2:', scale_out2)
scale_out1: Ratings are: 1. Agree 2. Neutral or 3. Disagree
scale_out2: Rating are: 3. Disagree 1. Agree 2. Neutral
The format method also accepts keyword= arguments.
location = 'Subject is in {city}, {state} {zip}'
location.format(city='Denver', state='CO', zip='80218')
'Subject is in Denver, CO 80218'
Combining positional and keyword arguments together.
location = 'Subject is in {city}, {state}, {0}'
location.format(80218, city='Denver', state='CO'
'Subject is in Denver, CO, 80218'
Formatting Integers
The pattern for applying formats to integers is similar to that of strings.
'pi estimate: {:>20}'.format("3.14159265")
'pi estimate: 3.14159265'
Combining format specifications.
print("{:>10,d}\n".format(123456789),
"{:>10,d}".format(1089))
123,456,789
1,089
Integers can be displayed with their corresponding octal, hexadecimal, and binary representation.
"99 is: {0:x} hex, {0:o} octal, {0:b} binary".format(99)
'99 is: 63 hex, 143 octal, 1100011 binary'
The analog SAS program.
data _null_;
input int 8.;
int_left = left(put(int, 8.));
put 'int: ' int_left /
'hex: ' int hex2. /
'octal: ' int octal. /
'binary: ' int binary8. /
'Original: ' _infile_;
list;
datalines;
99
;;;;
run;
int: 99
hex: 63
octal: 143
binary: 01100011
Original: 99
RULE: ----+----1----+----2----+----3----+----4----+----5
15 99
For Python and SAS the default is to display integer values without leading zeros (0) or a plus (+) sign to indicate positive integer values. This behavior can be altered.
'Integer 99 displayed as {:04d}'.format(99)
'Integer 99 displayed as 0099'
The analog SAS program uses the SAS-supplied Zw.d format.
data _null_;
input int 8.;
display_int = put(int, z4.);
put 'int: ' display_int/
'Original: ' _infile_;
list;
datalines;
99
;;;;
run;
int: 0099
Original: 99
RULE: ----+----1----+----2----+----3----+----4----+----5
13 99
Insert leading plus sign (+).
'{:+3d}'.format(99)
'+99'
Analog SAS program using PROC FORMAT
proc format;
picture plussign
0 - 99 = ' 00' (prefix='+');
run;
data _null_;
input int 8.;
put 'int: ' int plussign. /
'Original: ' _infile_;
list;
datalines;
99
;;;;
run;
int: +99
Original: 99
RULE: ----+----1----+----2----+----3----+----4----+----5
15 99
Formatting Floats
Format specifications for controlling how many significant digits to the right of the decimal to display.
"precision: {0:.1f} or {0:.4f}".format(3.14159265)
'precision: 3.1 or 3.1416'
For both Python and SAS the percent format multiples the resulting number by 100 and places a trailing percent (%) sign at the end.
"6.33 as a Percentage of 150: {0:.2%}".format(6.33/150)
'6.33 as a Percentage of 150: 4.22%'
Analog SAS porgram using the PERCENTw.d format.
data _null_;
pct = 6.33 / 150;
put '6.33 as a percentage of 150: ' pct percent8.2;
run;
6.33 as a percentage of 150: 4.22%
Formatting Datetimes
Python date, datetime and time objects support the strftime method which is used to derive a string representing either dates or times from date, datetime and time objects. This string is manipulated with directives to produce the desired appearances when displaying output.
from datetime import datetime, date, time
now = datetime.now()
nl = '\n'
print(nl,
'now: ', now,
nl,
'now type:', type(now))
now: 2018-12-30 16:21:51.880366
now type: <class 'datetime.datetime'>
Introducing formatting directives for datetime formatting. More detailed formatting examples are HERE.
from datetime import datetime, date, time
nl = '\n'
now = datetime.now()
print(nl,
'now: ' , now,
nl ,
'Year: ' , now.strftime("%Y"),
nl ,
'Month: ' , now.strftime("%B"),
nl ,
'Day: ' , now.strftime("%d"),
nl, nl ,
'concat1:' , now.strftime("%A, %B %d, %Y A.D."),
nl,
'datetime:' , now.strftime("%c"))
now: 2018-012-30 16:28:37.072155
Year: 2018
Month: December
Day: 30
concat1: Sunday, December 30, 2018 A.D.
datetime: Sun Dec 30 16:28:37 2018