< Section 14 | Section 16 >
Collections Module
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3512826?start=0#questions
The collections module is a built in module that implements specialized container datatypes providing alternatives to python’s general purpose build-in containers. We’ve already discussed several of the basics: dict, list, set and tuple
Counter
Counter counts the number of time an item appears in a list.
Counter is a dict subclass which helps count hashable objects. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value
from collections import Counter from collections import Counter mylist = [1,1,1,1,2,2,3,3,3,4,4,2,2,5,1,5,2,5,4,6,6,7] Counter(mylist)
Counter({1: 5, 2: 5, 3: 3, 4: 3, 5: 3, 6: 2, 7: 1})
s = 'Mississippi' Counter(s)
Counter({‘M’: 1, ‘i’: 4, ‘s’: 4, ‘p’: 2})
Count the word repititions in a sentence
s = "How many Up times does times show times uP in show this UP sentence" Counter(s.lower().split()
Counter({‘how’: 1,
‘many’: 1,
‘up’: 3,
‘times’: 3,
‘does’: 1,
‘show’: 2,
‘in’: 1,
‘this’: 1,
‘sentence’: 1})
Common methods
s = "How many Up times does times show times uP in show this UP sentence" words = s.lower().split() c = Counter(words)
.most_common(n)
c.most_common(2)
[(‘up’, 3), (‘times’, 3)]
To get the Least Common elements:
.most_common(:-n-1:-1)
Hint: Use a negative step
c.most_common()[:-2-1:-1]
[(‘sentence’, 1), (‘this’, 1)]
List()
Shows unique elements
list(c)
[‘how’, ‘many’, ‘up’, ‘times’, ‘does’, ‘show’, ‘in’, ‘this’, ‘sentence’]
sum(.values())
Total of all counts
sum(c.values())
14
defaultdict
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3512824#questions
defaultdict is a dictionary like oibject which provides all methods provided by dictionary, but takes first argument (default_factory) as default data type for the dictionary. using defaultdict is faster than doing the same using dict.set_default method.
A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.
from collections import defaultdict
d = {}
d['one']
KeyError: ‘one’
d = defaultdict(object)
d['one']
for item in d:
print(item)
one
Assign default values to 0 using lambda
d = defaultdict(lambda: 0) d['two]
0
d['three']=3 d['three']
3
Ordered Dictionaries OrderedDict
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3779906#questions
Standard dict object do not retain a specific order
d = {}
d['a']=1
d['b']=2
d['c']=3
d['d']=4
d['e']=5
for k,v in d.items():
print( k,v)
a 1
b 2
e 5
d 4
c 3
from collections import OrderedDict
d = orderedDict()
d['a']=1
d['b']=2
d['c']=3
d['d']=4
d['e']=5
for k,v in d.items():
print( k,v)
a 1
b 2
c 3
d 4
e 5
With OrderedDict, order is important
d1={'a':1, 'b':2}
d2={'b':2, 'a':1}
d1 == d2
True
this would fail with OrderedDict, because the order of the keys would not match.
namedtuple
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3512830#questions
These are similar to creating classes with both named and indexed values.
Named values are sent as a string with spaces between each key value.
from collections import namedtuple
Dog = namedtuple('Dog','age breed name')
sam = Dog(age=2, breed='Lab', name='Sammy')
print(sam.breed)
print(sam[0])
Lab
2
Datetime
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3547908#questions
datetime.time
import datetime #datetime.time( hours, minutes, seconds, microseconds) t = datetime.time(5, 25, 1) print(t.hour) print(t.minute) print(t.second) print(t.microsecond) print(t.resolution
5
25
1
0
0:00:00.000001
datetime.date
today = datetime.date.today() print(today)
2019-06-17
today.timetuple()
time.struct_time(tm_year=2019, tm_mon=6, tm_mday=17, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=168, tm_isdst=-1)
d1 = datetime.date(2019, 6, 17) d2 = d1.replace(year = 2018) print(d2)
2018-06-17
Math on Dates
lastyear = today.replace(year = 2018) print(today - lastyear)
365 days, 0:00:00
Python Debugger pdb
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3547912#questions
Tool good for tracing through your code and being able to see where errors occur.
import pdb x = [1,3,5] y = 4 z = 7 result = y +z print(result) result2 = x + y print(result2)
11
TypeError: can only concatenate list (not “int”) to list
Modify your code to stop before the error point and perform some checking
To stop, press ‘Q’
import pdb x = [1,3,5] y = 4 z = 7 result = y +z print(result) pdb.set_trace() result2 = x + y print(result2)
11
(Pdb) x
[1, 3, 5]
Timing your Code
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3547914#questions
This is useful for checking small sections of code for optimizing. Running these small snipits multiple times will provide more accurate results than a single iteration.
The command is passed to the function as a string.
import timeit
# How long does it take to create the string: '0-1-2-3...99'
timeit.timeit('"-".join(str(n) for n in range(100))', number = 10000)
0.6442793710000387
# as a list comprehension
timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
0.5642282049999494
# as a map function
timeit.timeit('"-".join(map(str, range(100)))', number=10000)
0.3922121389999802
To use Jupiter Notebooks built in magic fucntion
%timeit "-".join(str(n) for n in range(100))
61.9 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit "-".join(map(str, range(100)))
39.4 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Regular Expressions – re
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3547904#questions
These are a common interview question!
Regular expressions are text matching patters used for
- finding repetition
- matching patterns
Basic search
import re
re.search('hello','hello world')
<re.Match object; span=(0, 5), match=’hello’>
Notice this is a ‘Match’ object
import re
patterns = ['term1', 'term2']
text = 'This is a string with term2, but not the other term'
for pattern in patterns:
print(f"Searching for {pattern} in:\n{text}")
if re.search(pattern, text):
print("\nMatch found!\n")
else:
print("\nNo match found.\n")
Searching for term1 in:
This is a string with term2, but not the other term
No match found.
Searching for term2 in:
This is a string with term2, but not the other term
Match found!
Object type
No match = NoneType
match = re.Match
match = re.search(patterns[0],text) type(match)
NoneType
match = re.search(patterns[1],text) type(match)
re.Match
Methods on Match
match.start()
22
match.end()
27
Splitting with Regular Expressions
splitterm = '@' phrase = "Is your email address hello@gmail.com?" re.split(splitterm, phrase)
[‘Is your email address hello’, ‘gmail.com?’]
findall
re.findall('match', 'Here is one match and here is another match')
[‘match’, ‘match’]
Using Meta Characters
def multi_re_find(patterns,phrase):
'''
Takes in a list of regex patterns
Prints a list of all matches
'''
for pattern in patterns:
print('Searching the phrase using the re check: %r' %(pattern))
print(re.findall(pattern,phrase))
print('\n')
Repetition Syntax
There are five ways to express repetition in a pattern:
- A pattern followed by the meta-character * is repeated zero or more times.
- Replace the * with + and the pattern must appear at least once.
- Using ? means the pattern appears zero or one time.
- For a specific number of occurrences, use {m} after the pattern, where m is replaced with the number of times the pattern should repeat.
- Use {m,n} where m** is the minimum number of repetitions and **n is the maximum. Leaving out n** {m,} means the value appears at least **m times, with no maximum.
test_phrase = 'sdsd..sssddd...sdddsddd...dsds...dsssss...sdddd'
test_patterns = [ 'sd*', # s followed by zero or more d's
'sd+', # s followed by one or more d's
'sd?', # s followed by zero or one d's
'sd{3}', # s followed by three d's
'sd{2,3}', # s followed by two to three d's
]
multi_re_find(test_patterns,test_phrase)
Searching the phrase using the re check: ‘sd*’
[‘sd’, ‘sd’, ‘s’, ‘s’, ‘sddd’, ‘sddd’, ‘sddd’, ‘sd’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘sdddd’]
Searching the phrase using the re check: ‘sd+’
[‘sd’, ‘sd’, ‘sddd’, ‘sddd’, ‘sddd’, ‘sd’, ‘sdddd’]
Searching the phrase using the re check: ‘sd?’
[‘sd’, ‘sd’, ‘s’, ‘s’, ‘sd’, ‘sd’, ‘sd’, ‘sd’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘sd’]
Searching the phrase using the re check: ‘sd{3}’
[‘sddd’, ‘sddd’, ‘sddd’, ‘sddd’]
Searching the phrase using the re check: ‘sd{2,3}’
[‘sddd’, ‘sddd’, ‘sddd’, ‘sddd’]
Character Sets
(Think of a list without commas) [ab] = ‘a’ or ‘b’
Character sets are used when you wish to match any one of a group of characters at a point in the input. Brackets are used to construct character set inputs. For example: the input [ab] searches for occurrences of either a** or **b. Let’s see some examples:
test_phrase = 'sdsd..sssddd...sdddsddd...dsds...dsssss...sdddd'
test_patterns = ['[sd]', # either s or d
's[sd]+'] # s followed by one or more s or d
multi_re_find(test_patterns,test_phrase)
Searching the phrase using the re check: ‘[sd]’
[‘s’, ‘d’, ‘s’, ‘d’, ‘s’, ‘s’, ‘s’, ‘d’, ‘d’, ‘d’, ‘s’, ‘d’, ‘d’, ‘d’, ‘s’, ‘d’, ‘d’, ‘d’, ‘d’, ‘s’, ‘d’, ‘s’, ‘d’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘d’, ‘d’, ‘d’, ‘d’]
Searching the phrase using the re check: ‘s[sd]+’
[‘sdsd’, ‘sssddd’, ‘sdddsddd’, ‘sds’, ‘sssss’, ‘sdddd’]
Exclusion
We can use ^ to exclude terms by incorporating it into the bracket syntax notation. For example: [^…] will match any single character not in the brackets. Let’s see some examples:
test_phrase = 'This is a string! But it has punctuation. How can we remove it?'
re.findall('[^!.? ]+',test_phrase)
[‘This’, ‘is’, ‘a’, ‘string’, ‘But’, ‘it’, ‘has’, ‘punctuation’, ‘How’, ‘can’, ‘we’, ‘remove’, ‘it’]
Character Ranges
As character sets grow larger, typing every character that should (or should not) match could become very tedious. A more compact format using character ranges lets you define a character set to include all of the contiguous characters between a start and stop point. The format used is [start-end].
Common use cases are to search for a specific range of letters in the alphabet. For instance, [a-f] would return matches with any occurrence of letters between a and f.
Let’s walk through some examples:
test_phrase = 'This is an example sentence. Lets see if we can find some letters.'
test_patterns=['[a-z]+', # sequences of lower case letters
'[A-Z]+', # sequences of upper case letters
'[a-zA-Z]+', # sequences of lower or upper case letters
'[A-Z][a-z]+'] # one upper case letter followed by lower case letters
multi_re_find(test_patterns,test_phrase)
Searching the phrase using the re check: ‘[a-z]+’
[‘his’, ‘is’, ‘an’, ‘example’, ‘sentence’, ‘ets’, ‘see’, ‘if’, ‘we’, ‘can’, ‘find’, ‘some’, ‘letters’]
Searching the phrase using the re check: ‘[A-Z]+’
[‘T’, ‘L’]
Searching the phrase using the re check: ‘[a-zA-Z]+’
[‘This’, ‘is’, ‘an’, ‘example’, ‘sentence’, ‘Lets’, ‘see’, ‘if’, ‘we’, ‘can’, ‘find’, ‘some’, ‘letters’]
Searching the phrase using the re check: ‘[A-Z][a-z]+’
[‘This’, ‘Lets’]
Escape Codes
You can use special escape codes to find specific types of patterns in your data, such as digits, non-digits, whitespace, and more. For example:
| Code | Meaning |
|---|---|
| \d | a digit |
| \D | a non-digit |
| \s | whitespace (tab, space, newline, etc.) |
| \S | non-whitespace |
| \w | alphanumeric |
| \W | non-alphanumeric |
Escapes are indicated by prefixing the character with a backslash r, eliminates this problem and maintains readability.
Personally, I think this use of r to escape a backslash is probably one of the things that block someone who is not familiar with regex in Python from being able to read regex code at first. Hopefully after seeing these examples this syntax will become clear.
test_phrase = 'This is a string with some numbers 1233 and a symbol #hashtag'
test_patterns=[ r'\d+', # sequence of digits
r'\D+', # sequence of non-digits
r'\s+', # sequence of whitespace
r'\S+', # sequence of non-whitespace
r'\w+', # alphanumeric characters
r'\W+', # non-alphanumeric
]
multi_re_find(test_patterns,test_phrase)
Searching the phrase using the re check: ‘\\d+’
[‘1233’]
Searching the phrase using the re check: ‘\\D+’
[‘This is a string with some numbers ‘, ‘ and a symbol #hashtag’]
Searching the phrase using the re check: ‘\\s+’
[‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘]
Searching the phrase using the re check: ‘\\S+’
[‘This’, ‘is’, ‘a’, ‘string’, ‘with’, ‘some’, ‘numbers’, ‘1233’, ‘and’, ‘a’, ‘symbol’, ‘#hashtag’]
Searching the phrase using the re check: ‘\\w+’
[‘This’, ‘is’, ‘a’, ‘string’, ‘with’, ‘some’, ‘numbers’, ‘1233’, ‘and’, ‘a’, ‘symbol’, ‘hashtag’]
Searching the phrase using the re check: ‘\\W+’
[‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ ‘, ‘ #’]
StringIO
https://www.udemy.com/complete-python-bootcamp/learn/lecture/3547906#questions
The StringIP module implements an in-memory file like object. This object can then be used as input or output to most functions that would expect a standard file object.
import StringIO message = "This is just a normal string." f = StringIO.StringIO(message) f.read() f.write() f.seek(0)