post-thumb

Introduction to regular expression

 Regular expression is the powerful tools for various kinds of string manupulation. They are domain specific language (DSL) that present as a library in most modern programming language, not just python. DSl is highly specialized programming language. RE is one and SQL is another which is mainly used for data manipulation.

They are useful for two main tasks.

1.Verifying that string match a patterns(example string has a format of an email)

2.performing substitutions in a string (such as changing all american spellings to british ones)

RE in python can be accessed by using re module, which is the part of standard library.

re.match function can be used to determine whether it matches at the beginning of a string. if it does match returns and object, if not return None. To avoid any confusion, while working with RE,we would use raw string as r"expression". raw string don't escape anything, which make use of re easier.

 other function

re.search() -> find a match of a pattern anywhere in the string

re.findall() -> retur a list of all substrings that match a pattern

re.finditer()

match function doesnot match the pattern,as it looks at the beginning of the string.

finditer() does same thing as re.findall() except it return an iterator, rather than a list

import re
pattern=r"spam"
if re.match(pattern,"eggspam sausagespam"):
    print("match")
else:
    print("no match")

if re.search(pattern,"eggspam sausagespam"):
    print("match")
else:
    print("no match")

print(re.findall(pattern,"eggspam sausagespam"))
print(re.finditer(pattern,"eggspam sausagespam"))


"""
no match
match
['spam', 'spam']
"""

 

The regex search returns an object with several method that give details about it.

These method include group which returns the string matched,start and end which return start and ending position of the first match,span which return the start and end positions of the first match as a tuple.

import re
pattern=r"spam"
match=re.search(pattern,"eggspam sausagespam")
if match:
    print(match.group())
    print(match.start())
    print(match.end())
    print(match.span())
else:
    print("no match")
"""
spam
3
7
(3, 7)
"""

Search and replace

syntax:re.sub(patterns,replace,string,connt=0)

This method replaces all occurance of the ptterns in string with replace, substituting all occurances , unless count provided. This method returns the modified string.

import re
string=r"my name is amrit. i am studing computer engineering"
pattern=r"am studing"
new_str=re.sub(pattern,"had completed",string)
print(new_str)

"""
my name is amrit. i had completed computer engineering
"""

RE for email validation

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/