A regular expression in python is a special module that helps filter sequence of strings, characters and symbols using specialized syntax written in a particular pattern. Like any other programming language, regular expression module ‘re’ in python allows you to process unlimited amount of strings that passes regular expression requirements. If you want to process large amount of text data with some conditions then regular expression is your best bet. There are many other reasons to use regular expressions in your programs.
Why Regular Expressions?
- Search and Replace : It’s easy to extract specific strings from large documents. You can use it as a search and replace feature to correct grammatical errors or to add/remove strings in document. It is also possible to use the regular expressions for the purpose search the occurrence of strings, as purely for search purpose.
- Splitting String: You can split the strings if occurrence of a character or symbol is found. Likewise there are many ways to approach splitting your document as per regex matches.
- Validation: You can validate your document for certain requirements. This is one good feature if you’re testing some web based standards or company specific standards in your code.
Let’s take a look at some of the examples of regular expression so that you can feel comfortable writing your own as per your needs. In these examples we’re using a one simple string and using the regex for performing replace, search, split, match and to check the occurrences operations.
Replace
If you want to replace particular word in your statement or block of text then you have to use sub() method. This method ensures that you substitute new text in the place of pattern. This method takes 4 arguments sub(pattern,rep,string,count).
Check this example:
import re
st=”big black fox jumps over lazy blue fox”;
st_new=re.sub(‘fox’,’beever’,’big black fox jumps over lazy blue fox’,1);
You can also write this code by using the name of the string st instead of writing it down. If you’re not declaring any string separately then you have to explicitly use it in the method.
import re
st=”big black fox jumps over lazy blue fox”;
st_new=re.sub(‘fox’,’beever’,st,1);
Split Strings
You can split a particular string or word from the block of text. It is possible to do that using split() method. It takes three arguments and you get a choice to replace the text upto particular count. For example,
import re
st=”big black fox jumps over lazy blue fox”;
st=re.split(‘fox’,st,2)
Here, we have word fox 2 times in the string and we have placed the maxcount at 2, so it only splits the string for ‘fox’ word twice. If you have that word for more than two times then you need to set the maxcount for higher number as per your requirement.
Occurrence of Word in Strings
As you can see in previous example, we are repeating word “fox” for more than once. You can find the number of occurrence of that word in the string and get it in output.
import re
st=”big black wolf jumps over the lazy blue wolf’;
print ‘wolf’ in st, re.findall(‘wolf’,st);
This will print out the number of occurrences of word “wolf” in string. If you’re using the word joint with another word like wolfstream or wolfram then you have to set the boundary using r’\b’ for the exact match in your stream. This way you’re limited to exact match.
import re
st=”big black wolf jumps over the lazy wolf but there is no wolfpack here to see”;
print ‘wolf’ in st, re.findall(r’\bwolf\b’,st);
If you see the output of the following program you’ll notice that it only prints the wolf occurrence count as two.
Search() and Match() method
Search() method checks for the match of a particular word in the string and shows the success message if it finds specific word in string. In case of match(), python checks the exact match from the beginning of the string and usually returns none or no match message if the beginning of the string doesn’t match the queried word.
import re
st=”big black wolf jumps over the lazy wolf but there is no wolfpack here to see”;
print re.search(‘wolf’,st);
print re.match(‘wolf’,st);
Possible Errors and Solutions
While writing regular expression programs you’ll find some errors in your program. Make sure you read the error thrown by the python interpreter. I’m going to save your time by pointing to some common errors.
- method() and argument related errors : Some methods in re needs exact arguments before you can process them. Some methods may not have that compulsion and you’ll be able to execute the code without adding additional arguments in these methods. Check each method carefully and see if it requires you to type all arguments.
- Failed to import re: If you failed to import re module then all the methods called in your program for regex are going to throw an error.
- Coma and semicolon: Check if you’re using coma in methods and not semicolon. This is often possible if you’re migrating from another language.
This completes our simple tutorial on regular expressions in python. Hope this tutorial helped you to understand how to use regex in python. There are many other things which are not covered in this tutorial and we expect you to learn these things on your own and share it with others on the internet. Regular expressions can save a lot of your time so I suggest you to learn them and use it in code and benefit from the usefulness of it. If you have any questions or suggestions to improve this tutorial, feel free to send tweet to @maheshkale.