What is a RegEx to find phone numbers in Python? - python

I am trying to make a regex in python to detect 7-digit numbers and update contacts from a .vcf file. It then modifies the number to 8-digit number (just adding 5 before the number).Thing is the regex does not work.
I am having as error message "EOL while scanning string literal"
regex=re.compile(r'^(25|29|42[1-3]|42[8-9]|44|47[1-9]|49|7[0-9]|82|85|86|871|87[5-8]|9[0-8])/I s/^/5/')
#Open file for scanning
f = open("sample.vcf")
#scan each line in file
for line in f:
#find all results corresponding to regex and store in pattern
pattern=regex.findall(line)
#isolate results
for word in pattern:
print word
count = count+1 #display number of occurences
wordprefix = '5{}'.format(word)
s=open("sample.vcf").read()
s=s.replace(word,wordprefix)
f=open("sample.vcf",'w')
print wordprefix
f.write(s)
f.close()
I am suspecting that my regex is not in the correct format for detecting a particular pattern of numbers with 2 digits which have a particular format like the 25x and 29x and 5 digits that can be any pattern of numbers.. (TOTAL 7 digits)
can anyone help me out on the correct format to adopt for such a case?

/I is not how you give modifiers for regex in python. And neither you do substitution like s///.
You should use re.sub() for substitution, and give the modifier as re.I, as 2nd argument to re.compile:
reg = re.compile(regexPattern, re.I)
And then for a string s, the substitution would look like:
re.sub(reg, replacement, s)
As such, your regex looks weird to me. If you want to match 7 digits numbers, starting with 25 or 29, then you should use:
r'(2[59][0-9]{5})'
And for replacement, use "5\1". In all, for a string s, your code would look like:
reg = re.compile(r'(2[59][0-9]{5})', re.I)
new_s = re.sub(reg, "5\1", s)

Related

Split string from digits/number according to sentence length

I have cases that I need to seperate chars/words from digits/numbers which are written consecutively, but I need to do this only when char/word length more than 3.
For example,
input
ferrari03
output must be:
ferrari 03
However, it shouldn't do any action for the followings:
fe03, 03fe, 03ferrari etc.
Can you help me on this one ? I'm trying to do this without coding any logic, but re lib in python.
Using re.sub() we can try:
inp = ["ferrari03", "fe03", "03ferrari", "03fe"]
output = [re.sub(r'^([A-Za-z]{3,})([0-9]+)$', r'\1 \2', i) for i in inp]
print(output) # ['ferrari 03', 'fe03', '03ferrari', '03fe']
Given an input word, the above regex will match should that word begin with 3 or more letters and end in 1 or more digits. In that case, we capture the letters and numbers in the \1 and \2 capture groups, respectively. We replace by inserting a separating space.

Match the words using re compile in Python

I'm new to Python, i have text file which consists of punctuation and other words how to recompile using specific text match.
text file looks like below actual with more that 100 sentences like below
file.txt
copy() {
foundation.d.k("cloud control")
this.is.a(context),reality, new point {"copy.control.ZOOM_CONTROL", "copy.control.ACTIVITY_CONTROL"},
context control
I just want the output something like this
copy.control.ZOOM_CONTROL
copy.control.ACTIVITY_CONTROL
i coded something like this
file=(./data/.txt)
data=re.compile('copy.control. (.*?)', re.DOTALL | re.IGNORECASE).findall(file)
res= str("|".join(data))
The above regex doesn't match for my required output. please help me on this issue. Thanks in Advance
You need to open and read the file first, then apply the re.findall method:
data = []
with open('./data/.txt', 'r') as file:
data = re.findall(r'\bcopy\.control\.(\w+)', file.read())
The \bcopy\.control\.(\w+) regex matches
\bcopy\.control\. - a copy.control. string as a whole word (\b is a word boundary)
(\w+) - Capturing group 1 (the output of re.findall): 1 or more letters, digits or _
See the regex demo.
Then, you may print the matches:
for m in data:
print(m)

Correct way to replace date YYYYMMDD to YYYY-MM-DD by using replace method

What is the best practice to open a text file with Python and replace date from YYYYMMDD to YYYY-MM-DD?
.replace('YYYYMMDD','YYYY-MM-DD')
Need to find all numbers 8 digits long (which are in first 20 characters of each line)
Add dash after 4 digits and after 6
You can use regex to replace numbers with the same numbers in a different format:
>>> import re
>>> s = "Text to replace parts of 20190913 here is the part to replace"
>>> re.sub(r'\s(\d{4})([0-1][0-9])([0-3][0-9])\s', r' \1-\2-\3 ', s)
'Text to replace parts of 2019-09-13 here is the part to replace'
Explanation:
\d captures numbers (digits).
{n} marks the number of digits to capture
\s captures white spaces, so we don't capture number sequences longer than 8 digits.
(...) mark groups, which can be indexed
[] sets allow for only certain characters to match, to catch non-dates
So we capture three groups, right after each other, the first with any numbers 4 digit long, and the next to must start with a 0 or 1 and the second must start with between 0 and 3; and than we use the same groups just separated by - characters in the replace part.
The r'' is used to specify raw string encoding, so the \ is picked up by regex, and not by Python.
You open can open a textfile and read the content the following way
with open(filePath) as f:
content = f.read()
Then you can reformat all 8 digits numbers to your date format with this:
re.sub(r'(\d{4})(\d{2}(\d{2}))', r'\1-\2-\3', content)

Extracting numbers from a text file using regexp

Iam trying to make a python script that reads a text file input.txt and then scans all phone numbers in that file and writes back all matching phone no's to output.txt
lets say text file is like:
Hey my number is 1234567890 and another number is +91-1234567890. but if none of these is available you can call me on +91 5645454545 (or) mail me at abc#xyz.com
it should match 1234567890, +91-1234567890 and +91 5645454545
import re
no = '^(\+[1-9]\d{0,2}[- ]?)?[1-9][0-9]{9}' #i think problem is here
f2 = open('output.txt','w+')
for line in open('input.txt'):
out = re.findall(no,line)
for i in out :
f2.write(i + '\n')
Regexp for no is like : it takes country codes upto 3 digits and then a - or space which is optional and country code itself is optional and then a 10 digit number.
Yes, the problem is with your regex. Fortunately, it's a small one. You just need to remove the ^ character:
'(\+[1-9]\d{0,2}[- ]?)?[1-9]\d{9}'
The ^ signifies that you want to match only at the beginning of the string. You want to match multiple times throughout the string. Here's a 101demo.
For python, you'll need to specify a non-capturing group as well with ?:. Otherwise, re.findall does not return the complete match:
Return all non-overlapping matches of pattern in string, as a list of
strings. The string is scanned left-to-right, and matches are returned
in the order found. If one or more groups are present in the pattern,
return a list of groups.
Bold emphasis mine. Here's a relevant question.
This is what you get when you specify non-capturing groups for your problem:
In [485]: re.findall('(?:\+[1-9]\d{0,2}[- ]?)?[1-9]\d{9}', text)
Out[485]: ['1234567890', '+91-1234567890', '+91 5645454545']
this code will work:
import re
no = '(?:\+[1-9]\d{0,2}[- ]?)?[1-9][0-9]{9}' #i think problem is here
f2 = open('output.txt','w+')
for line in open('input.txt'):
out = re.findall(no,line)
for i in out :
f2.write(i + '\n')
The output will be:
1234567890
+91-1234567890
+91 5645454545
you can use
(?:\+[1-9]\d{1,2}-?)?\s?[1-9][0-9]{9}
see the demo at demo
pattern = '\d{10}|\+\d{2}[- ]+\d{10}'
matches = re.findall(pattern,text)
o/p -> ['1234567890', '+91-1234567890', '+91 5645454545']

Using regular expressions to find a pattern

If I have a file that consists of sentences like this:
1001 apple
1003 banana
1004 grapes
1005
1007 orange
Now I want to detect and print all such sentences where there is a number but no corresponding text (eg 1005), how can I design the regular expression to find such sentences? I find them a bit confusing to construct.
res=[]
with open("fruits.txt","r") as f:
for fruit in f:
res.append(fruit.strip().split())
Would it be something like this: re.sub("10**"/.")
Well you don't need a regular expressions for this:
with open("fruits.txt", "r") as f:
res = [int(line.strip()) for line in f if len(line.split()) == 1]
A regex that would detect a number, then a space, then an underscore word is ([0-9])+[ ]\w+.
A good ressource for trying that stuff out is http://regexr.com/
The re pattern for this would be re.sub("[0-9][0-9][0-9][0-9]"). This looks if there are only four numbers and nothing else, so it will find your 1005.
Hope this helps!
There are two ways to go about this: search() and findall(). The former will find the first instance of a match, and the latter will give a list of every match.
In any case, the regex you want to use is "^\d{4}$". It's a simple regex which matches a 4-digit number that takes up the entirety of a string, or, in multiline mode, a line. So, to find 'only number' sections, you will use the following code:
# assume 'func' is set to either be re.search or re.findall, whichever you prefer
with open("fruits.txt", "r") as f:
solo = func("^\d{4}$", f.read(), re.MULTILINE)
# 'solo' now has either the first 'non-labeled' number,
# or a list of all such numbers in the file, depending on
# the function you used. search() will return None if there
# are no such numbers, and findall() will return an empty list.
# if you prefer brevity, re.MULTILINE is equivalent to re.M
Additional explanation of the regex:
^ matches at the beginning of the line.
\d is a special sequence which matches any numeric digit.
{4} matches the prior element (\d) exactly four times.
$ matches at the end of the line.
Please try:
(?:^|\s+)(\d{4}\b)(?!\s.*\w+)
DEMO

Categories

Resources