Regular Expressions in Python – Pattern Matching & Data Extraction
Regular expressions (regex) allow hackers and developers to search, match, and manipulate strings based on specific patterns. Python’s built-in re
module is a powerful tool that can help in CTFs, log analysis, web scraping, and custom data filters.
Getting Started with the re
Module
import re
# Basic match
pattern = r"hacker"
text = "python for hackers"
match = re.search(pattern, text)
if match:
print("Match found!")
Common Regex Functions
re.search()
– searches for first matchre.findall()
– returns all matching substringsre.match()
– checks for a match at the beginningre.sub()
– replaces matched substrings
Extracting All Matches
text = "Email me at vaibhav@example.com or test123@hackmail.com"
pattern = r"[\w.-]+@[\w.-]+"
emails = re.findall(pattern, text)
print(emails)
Replacing Sensitive Data
log = "User: root, Password: 123456"
pattern = r"Password: \w+"
clean_log = re.sub(pattern, "Password: ****", log)
print(clean_log)
Regex Meta-Characters
.
– any character\d
– digit\w
– word character (a-z, A-Z, 0-9, _)\s
– whitespace+
– one or more*
– zero or more?
– optional[]
– character set()
– capture group
Capture Groups Example
text = "IP: 192.168.1.1"
pattern = r"IP: (\d+\.\d+\.\d+\.\d+)"
match = re.search(pattern, text)
if match:
print("Captured IP:", match.group(1))
Use Cases for Hackers
- Log parsing for credentials/IPs
- Extracting tokens or secrets from web content
- Brute-force automation for credential leaks
- Scraping specific patterns (emails, phone numbers, IPs)
Conclusion
Regex is an essential skill in any hacker’s toolbox. Mastering the re
module helps you automate detection, extraction, and redaction of data — making your scripts powerful and efficient. Keep experimenting and building!
Comments
Post a Comment