4.4.3. Developing a regex¶
Think of the PATTERN you want to capture in general terms. “I want three letter words.”
Write
pattern = "\w{3}"
and then try it on a few practice strings. The goal is to BREAK your pattern, find out where it fails, and notice new parts of the pattern you missed.
import re
pattern = "\w{3}"
re.findall(pattern,"hey there guy") # whoops, "the" isnt a 3 letter word
['hey', 'the', 'guy']
# tried but failed:
# "(\w{3}) " <-- a space
# "(\w{3})\b" <-- a word boundary should work! why not?
pattern = r"(\w{3})\b" # trying that raw string notation thing
re.findall(pattern,"hey there guy")
# it made the `\b` work!, but pattern still it is failing...
['hey', 'ere', 'guy']
pattern = r"\b(\w{3})\b" # make sur the word has a boundary before it
re.findall(pattern,"hey there guy") # got it!
['hey', 'guy']