Python Findall not finding anything - python

I am learning about Regex and am stuck with this code:
import re
resume = '''
(738) 383-5729
(373) 577-0492
(403) 443-2759
(375) 880-8576
(641) 576-2342
(951) 268-8744
'''
phoneRegex = re.compile(r'\d')
mo = phoneRegex.findall(resume)
print(mo.group())
When I try with search instead of findall, it works. But it can't find any match with findall.
What am I doing wrong?

findall() returns a simple list of strings matching the pattern.
It has no group() method, just omit that:
>>> print(mo)
['7', '3', '8', '3', '8', '3', '5', '7', '2', '9', '3', '7', '3', '5', '7',
'7', '0', '4', '9', '2', '4', '0', '3', '4', '4', '3', '2', '7', '5', '9',
'3', '7', '5', '8', '8', '0', '8', '5', '7', '6', '6', '4', '1', '5', '7',
'6', '2', '3', '4', '2', '9', '5', '1', '2', '6', '8', '8', '7', '4', '4']

It seems that you're trying to match phone numbers in resume, for that you can use:
resume = '''
(738) 383-5729
(373) 577-0492
(403) 443-2759
(375) 880-8576
(641) 576-2342
(951) 268-8744
'''
mo = re.findall(r'\(\d{3}\) \d{3}-\d{4}', resume)
for x in mo:
print(x)
Output:
(738) 383-5729
(373) 577-0492
(403) 443-2759
(375) 880-8576
(641) 576-2342
(951) 268-8744
Python Demo
Regex Demo & Explanation

Since (it looks like) you're just out for the numbers, you could do something like
>>> [''.join(c for c in l if c in '0123456789') for l in resume.strip().splitlines()]
['7383835729', '3735770492', '4034432759', '3758808576', '6415762342', '9512688744']
That might save you some trouble from internationally formed numbers (such as +46-(0)7-08/123 456 and the like).

Re.findall() module is used when you want to iterate over the lines by line, it will return a list of all the matches not group.
So in your case it returns as list
print(mo[0])

Related

Read lines of .txt or excel file into tuples

I would like to read line by line two .txt files. THE FILES HAVE DATA DIVIDED IN FIVE COLUMNS
FILE_1:
843.19598 2396.10278 3579.13778 4210.15674 4209.37549
841.93976 2397.21948 3573.11963 4205.89209 4226.73926
842.01642 2397.72266 3573.06494 4202.88379 4226.93799
842.22083 2397.47974 3574.27515 4204.19043 4223.82088
842.42065 2397.20142 3575.47437 4205.52246 4220.64795
FILE_2:
3586.02124 2391.50342 837.45227 -837.29681 -2385.97513
3587.69238 2387.48218 836.60445 -840.75067 -2390.17529
3588.44531 2387.44556 836.00555 -840.79022 -2389.77612
3588.08203 2388.25439 836.26544 -840.17017 -2389.07544
3587.66553 2389.05566 836.53046 -839.53912 -2388.40405
Each line of the files must be converted into a tuple. For example for the first line of both files, the output should be:
FILE_1/1stLine = (843.19598, 2396.10278, 3579.13778, 4210.15674, 4209.37549)
FILE_2/1stline = (3586.02124, 2391.50342, 837.45227, -837.29681, -2385.97513)
Then I need to combine the lines of these two files into a new variable called aux, in which the first element it's a line of FILE_1 and the second element it's the line of the same position in FILE_2
aux = (FILE_1/1stLine, FILE_2/1stline) ----- aux 1stLine
aux = (FILE_1/2ndLine, FILE_2/2ndline) ----- aux 2ndLine
.
.
aux = (FILE_1/LastLine, FILE_2/Lastline) ----- aux 2ndLastLine
For instance, taking the first lines of both files, the first aux must be:
((843.19598, 2396.10278, 3579.13778, 4210.15674, 4209.37549), (3586.02124, 2391.50342, 837.45227, -837.29681, -2385.97513))
Any ideas?
f1 = open("FILE_1.txt", "r")
f2 = open("FILE_2.txt", "r")
for a in f1:
for b in f2:
x = tuple(a)
y = tuple(b)
aux = (x, y)
The results with this code is:
('8', '4', '3', '.', '1', '9', '5', '9', '8', ' ', '2', '3', '9', '6', '.', '1', '0', '2', '7', '8', ' ', '3', '5', '7', '9', '.', '1', '3', '7', '7', '8', ' ', '4', '2', '1', '0', '.', '1', '5', '6', '7', '4', ' ', '4', '2', '0', '9', '.', '3', '7', '5', '4', '9', '\n')
('3', '5', '8', '6', '.', '0', '2', '1', '2', '4', ' ', '2', '3', '9', '1', '.', '5', '0', '3', '4', '2', ' ', '8', '3', '7', '.', '4', '5', '2', '2', '7', ' ', '-', '8', '3', '7', '.', '2', '9', '6', '8', '1', ' ', '-', '2', '3', '8', '5', '.', '9', '7', '5', '1', '3', '\n')
(('8', '4', '3', '.', '1', '9', '5', '9', '8', ' ', '2', '3', '9', '6', '.', '1', '0', '2', '7', '8', ' ', '3', '5', '7', '9', '.', '1', '3', '7', '7', '8', ' ', '4', '2', '1', '0', '.', '1', '5', '6', '7', '4', ' ', '4', '2', '0', '9', '.', '3', '7', '5', '4', '9', '\n'), ('3', '5', '8', '6', '.', '0', '2', '1', '2', '4', ' ', '2', '3', '9', '1', '.', '5', '0', '3', '4', '2', ' ', '8', '3', '7', '.', '4', '5', '2', '2', '7', ' ', '-', '8', '3', '7', '.', '2', '9', '6', '8', '1', ' ', '-', '2', '3', '8', '5', '.', '9', '7', '5', '1', '3', '\n'))
Many thanks!
Instead of getting each element of f1/f2 like '843.19598', I need the elements without quotes like 843.19598.
Let me show the code to which these data is the input (there is a set of points as an example)
The problem is that I have to read x and y from these files, and for each set I need to fit an ellipse.
import ellipses as el
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
x = (5727.53135, 7147.62235, 10330.93573, 8711.17228, 7630.40262,
4777.24983, 4828.27655, 9449.94416, 5203.81323, 6299.44811,
6494.21906)
y = (67157.77567 , 66568.50068 , 55922.56257 , 54887.47348 ,
65150.14064 , 66529.91705 , 65934.25548 , 55351.57612 ,
63123.5103 , 67181.141725, 56321.36025)
data = (x, y)
lsqe = el.LSqEllipse()
lsqe.fit(data)
center, width, height, phi = lsqe.parameters()
print (center, width, height, phi)
plt.close('all')
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111)
ax.axis('equal')
ax.plot(data[0], data[1], 'ro', label='test data', zorder=1)
ellipse = Ellipse(xy=center, width=2*width, height=2*height, angle=np.rad2deg(phi),
edgecolor='b', fc='None', lw=2, label='Fit', zorder = 2)
ax.add_patch(ellipse)
plt.legend()
plt.show()
The dataset
FILE 1 (saved as f1.csv and f1.xls)
843.19598 2396.10278 3579.13778 4210.15674 4209.37549
841.93976 2397.21948 3573.11963 4205.89209 4226.73926
842.01642 2397.72266 3573.06494 4202.88379 4226.93799
842.22083 2397.47974 3574.27515 4204.19043 4223.82088
842.42065 2397.20142 3575.47437 4205.52246 4220.64795
FILE 2 (saved as f2.csv and f2.xls)
3586.02124 2391.50342 837.45227 -837.29681 -2385.97513
3587.69238 2387.48218 836.60445 -840.75067 -2390.17529
3588.44531 2387.44556 836.00555 -840.79022 -2389.77612
3588.08203 2388.25439 836.26544 -840.17017 -2389.07544
3587.66553 2389.05566 836.53046 -839.53912 -2388.40405
Using import csv (works for ascii files, i.e. .csv, .txt etc.)
import csv
# Files to read
files = ['f1.csv', 'f2.csv']
tup_files = ()
aux = ()
# Read each file and concatenate to tup_files
for file in files:
with open(file) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=' ')
tmp_rows = ()
for row in csv_reader:
tmp_rows += (tuple(row), )
tup_files += (tmp_rows, )
for row_f1, row_f2 in zip(tup_files[0], tup_files[1]):
aux += (row_f1, row_f2)
print(f'printing f1\n{tup_files[0]}\n')
print(f'printing f2\n{tup_files[1]}\n')
print(f'printing aux\n{aux}')
Using pandas (works for .xls)
import pandas as pd
# Files to read
files = ['f1.xls', 'f2.xls']
tup_files = ()
aux = ()
# Read each file and concatenate to tup_files
for file in files:
data = pd.read_excel(file, header=None)
tup_files += (tuple(data.itertuples(index=False, name=None)), )
for row_f1, row_f2 in zip(tup_files[0], tup_files[1]):
aux += (row_f1, row_f2)
print(f'printing f1\n{tup_files[0]}\n')
print(f'printing f2\n{tup_files[1]}\n')
print(f'printing aux\n{aux}')
Which Yields:
printing f1
(('843.19598', '2396.10278', '3579.13778', '4210.15674', '4209.37549'),
('841.93976', '2397.21948', '3573.11963', '4205.89209', '4226.73926'),
('842.01642', '2397.72266', '3573.06494', '4202.88379', '4226.93799'),
('842.22083', '2397.47974', '3574.27515', '4204.19043', '4223.82088'),
('842.42065', '2397.20142', '3575.47437', '4205.52246', '4220.64795'))
printing f2
(('3586.02124', '2391.50342', '837.45227', '-837.29681', '-2385.97513'),
('3587.69238', '2387.48218', '836.60445', '-840.75067', '-2390.17529'),
('3588.44531', '2387.44556', '836.00555', '-840.79022', '-2389.77612'),
('3588.08203', '2388.25439', '836.26544', '-840.17017', '-2389.07544'),
('3587.66553', '2389.05566', '836.53046', '-839.53912', '-2388.40405'))
printing aux
(('843.19598', '2396.10278', '3579.13778', '4210.15674', '4209.37549'),
('3586.02124', '2391.50342', '837.45227', '-837.29681', '-2385.97513'),
('841.93976', '2397.21948', '3573.11963', '4205.89209', '4226.73926'),
('3587.69238', '2387.48218', '836.60445', '-840.75067', '-2390.17529'),
('842.01642', '2397.72266', '3573.06494', '4202.88379', '4226.93799'),
('3588.44531', '2387.44556', '836.00555', '-840.79022', '-2389.77612'),
('842.22083', '2397.47974', '3574.27515', '4204.19043', '4223.82088'),
('3588.08203', '2388.25439', '836.26544', '-840.17017', '-2389.07544'),
('842.42065', '2397.20142', '3575.47437', '4205.52246', '4220.64795'),
('3587.66553', '2389.05566', '836.53046', '-839.53912', '-2388.40405'))
Results using tuples as required.

Is there a way to generate a list string within Python, without any other 3rd party packages?

this code generates a list of integers
dir_list = list(range(11))
dir_list
numpy could transfer each element to string type
import numpy as np
dir_list = np.array(dir_list, dtype=np.str)
dir_list
array(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
dtype='<U2')
is there a way to finish the job within Python, without any other 3rd party packages?
You can simply map each integer to string using inbuilt map function and map returns iterator so you can convert it into list.
list(map(str, range(11))) should do.
output:
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
Of course, this way you can iterate over each element of range object and convert to str and store as list:
dir_list = [str(i) for i in range(11)]
>>> dir_list
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

Remove duplicate items from list

I tried following this post but, it doesnt seem to be working for me.
I tried this code:
for bresult in response.css(LIST_SELECTOR):
NAME_SELECTOR = 'h2 a ::attr(href)'
yield {
'name': bresult.css(NAME_SELECTOR).extract_first(),
}
b_result_list.append(bresult.css(NAME_SELECTOR).extract_first())
#set b_result_list to SET to remove dups, then change back to LIST
set(b_result_list)
list(set(b_result_list))
for brl in b_result_list:
print("brl: {}".format(brl))
This prints out:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
brl: https://facebook.site.com/users/login
When I just need:
brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
What am I doing wrong here?
Thank you!
you are discarding the result when you need to save it ... b_result_list never actually changes... so you are just iterating over the original list. instead save the result of the set operation
b_result_list = list(set(b_result_list))
(note that sets do not preserve order)
If you want to maintain order and uniqueify, you can do:
>>> li
['1', '1', '2', '2', '3', '3', '3', '3', '1', '1', '4', '5', '4', '6', '6']
>>> seen=set()
>>> [e for e in li if not (e in seen or seen.add(e))]
['1', '2', '3', '4', '5', '6']
Or, you can use the keys of an OrderedDict:
>>> from collections import OrderedDict
>>> OrderedDict([(k, None) for k in li]).keys()
['1', '2', '3', '4', '5', '6']
But a set alone may substantially change the order of the original list:
>>> list(set(li))
['1', '3', '2', '5', '4', '6']

Python 3: how to create list out of float numbers?

Anyone knows how can I solve this issue?
I have the following code.
result=[]
for i in range(len(response_i['objcontent'][0]['rowvalues'])):
lat = response_i['objcontent'][0]['rowvalues'][i][0]
print(lat)
for i in lat:
result.append(i)
print (result)
Following is the output of print(lat):
92.213725
191.586143
228.981615
240.353291
and following is the output of print(result):
['9', '2', '.', '2', '1', '3', '7', '2', '5', '1', '9', '1', '.', '5', '8',
'6', '1', '4', '3', '2', '2', '8', '.', '9', '8', '1', '6', '1', '5', '2',
'4', '0', '.', '3', '5', '3', '2', '9', '1']
I expected to get the output in following format:
[92.213725, 191.586143, 228.981615, 240.353291]
Anyone knows how to fix this issue?
Thanks
So, your error is that instead of simply adding your latitute to the list, you are iterating over each character of the latitude, as a string, and adding that character to a list.
result=[]
for value in response_i['objcontent'][0]['rowvalues']:
lat = value[0]
print(lat)
result.append(float(lat))
print (result)
Besides that, using range(len(...))) is the way things have to be done in almost all modern languages, because they either don't implement a "for ...each" or do it in an incomplete or faulty way.
In Python, since the beginning it is a given that whenever one wants a for iteration he wants to get the items of a sequence, not its indices (for posterior retrieval of the indices). Some auxiliar built-ins come in to play to ensure you just interate the sequence: zip to mix one or more sequences, and enumerate to yield the indices as well if you need them.

How could i refresh a list once an item has been removed from a list within a list in python

This is quite complicated but i would like to be able to refresh a larger list once at item has been taken out of a mini list within the bigger list.
listA = ['1','2','3','4','5','6','6','8','9','5','3','7']
i used the code below to split it into lists of threes
split = [listA[i:(i+3)] for i in range(0, len(listA) - 1, 3)]
print(split)
# [['1','2','3'],['4','5','6'],['6','8','9'],['5','3','7']]
split = [['1','2','3'],['4','5','6'],['6','8','9'],['5','3','7']]
if i deleted #3 from the first list, split will now be
del split[0][-1]
split = [['1','2'],['4','5','6'],['6','8','9'],['5','3','7']]
after #3 has been deleted, i would like to be able to refresh the list so that it looks like;
split = [['1','2','4'],['5','6','6'],['8','9','5'],['3','7']]
thanks in advance
Not sure how big this list is getting, but you would need to flatten it and recalculate it:
>>> listA = ['1','2','3','4','5','6','6','8','9','5','3','7']
>>> split = [listA[i:(i+3)] for i in range(0, len(listA) - 1, 3)]
>>> split
[['1', '2', '3'], ['4', '5', '6'], ['6', '8', '9'], ['5', '3', '7']]
>>> del split[0][-1]
>>> split
[['1', '2'], ['4', '5', '6'], ['6', '8', '9'], ['5', '3', '7']]
>>> listA = sum(split, []) # <- flatten split list back to 1 level
>>> listA
['1', '2', '4', '5', '6', '6', '8', '9', '5', '3', '7']
>>> split = [listA[i:(i+3)] for i in range(0, len(listA) - 1, 3)]
>>> split
[['1', '2', '4'], ['5', '6', '6'], ['8', '9', '5'], ['3', '7']]
Just recreate the single list from your nested lists, then re-split.
You can join the lists, assuming they are only one level deep, with something like:
rejoined = [element for sublist in split for element in sublist]
There are no doubt fancier ways, or single-liners that use itertools or some other library, but don't overthink it. If you're only talking about a few hundred or even a few thousand items this solution is quite good enough.
I need this for turning of cards in the deck in a solitaire game.
You can deal your cards using itertools.groupby() with a good key function:
def group_key(x, n=3, flag=[0], counter=itertools.count(0)):
if next(counter) % n == 0:
flag[0] = flag[0] ^ 1
return flag[0]
^ is a bitwise operator, basically it change the value of the flag from 0 to 1 and viceversa. The flag value is an element of a list because we're doing some kind of memoization.
Example:
>>> deck = ['1', '2', '3', '4', '5', '6', '6', '8', '9', '5', '3', '7']
>>> for k,g in itertools.groupby(deck, key=group_key):
... print(list(g))
['1', '2', '3']
['4', '5', '6']
['6', '8', '9']
['5', '3', '7']
Now let's say you've used card '9' and '8', so your new deck looks like:
>>> deck = ['1', '2', '3', '4', '5', '6', '6', '5', '3', '7']
>>> for k,g in itertools.groupby(deck, key=group_key):
... print(list(g))
['1', '2', '3']
['4', '5', '6']
['6', '5', '3']
['7']
Build an object that contains a list and tracks when the list is altered (probably by controlling write to it), then have the object do it's own split every time the data is altered and save the split list to a member of the object.

Categories

Resources