Python regex find single digit if no digits before it - python

I have a list of strings and I want to use regex to get a single digit if there are no digits before it.
strings = ['5.8 GHz', '5 GHz']
for s in strings:
print(re.findall(r'\d\s[GM]?Hz', s))
# output
['8 GHz']
['5 GHz']
# desired output
['5 GHz']
I want it to just return '5 GHz', the first string shouldn't have any matches. How can I modify my pattern to get the desired output?

As per my comment, it seems that you can use:
(?<!\d\.)\d+\s[GM]?Hz\b
This matches:
(?<!\d\.) - A negative lookbehind to assert position is not right after any single digit and literal dot.
\d+ - 1+ numbers matching the integer part of the frequency.
[GM]?Hz - An optional uppercase G or M followed by "Hz".
\b - A word boundary.

>>> strings = ['5.8 GHz', '5 GHz']
>>>
>>> for s in strings:
... match = re.match(r'^[^0-9]*([0-9] [GM]Hz)', s)
... if match:
... print(match.group(1))
...
5 GHz

Updated Answer
import re
a = ['5.8 GHz', '5 GHz', '8 GHz', '1.2', '1.2 Some Random String', '1 Some String', '1 MHz of frequency', '2 Some String in Between MHz']
res = []
for fr in a:
if re.match('^[0-9](?=.[^0-9])(\s)[GM]Hz$', fr):
res.append(fr)
print(res)
Output:
['5 GHz', '8 GHz']

My two cents:
selected_strings = list(filter(
lambda x: re.findall(r'(?:^|\s+)\d+\s+(?:G|M)Hz', x),
strings
))
With ['2 GHz', '5.8 GHz', ' 5 GHz', '3.4 MHz', '3 MHz', '1 MHz of Frequency'] as strings, here selected_strings:
['2 GHz', ' 5 GHz', '3 MHz', '1 MHz of Frequency']

Related

Separation of a splited string

def getOnlyNames(unfilteredString):
unfilteredString = unfilteredString[unfilteredString.index(":"):]
NamesandNumbers = [item.strip() for item in unfilteredString.split(';')]
OnlyNames = []
for i in len(productsPrices):
x = [item.strip() for item in productsPrices[i].split(',')]
products.append(x[0])
return products
So I'm trying to make a function that will separate a following string
"Cars: Mazda 3,30000; Mazda 5, 49900;"
So I will get only:
Mazda 3,Mazda 5
First I was removing the :
then I try to get only the name of the car without the price of it
You can use regex for this:
import re
>>> s = "Cars: Mazda 3,30000; Mazda 5, 49900;"
>>> re.findall("[:;]\W*([^:;]*?)(?:,)", s)
['Mazda 3', 'Mazda 5']
>>> s = "Mazda 3, 35000; Cars: Mazda 4,30000; Mazda 5, 49900;"
>>> re.findall("[:;]\W*([^:;]*?)(?:,)", s)
['Mazda 4', 'Mazda 5']
"Cars: Mazda 3,30000; Mazda 5, 49900;"
split on the colon
['Cars', ' Mazda 3,30000; Mazda 5, 49900;']
split the last item on the semicolon
[' Mazda 3,30000', ' Mazda 5, 49900', '']
split the first two items on the comma.
[' Mazda 3', '30000'], [' Mazda 5', ' 49900']
take the first item of each and strip the whitespace
'Mazda 3'
'Mazda 5'

How can I sort this list numerically?

How do I sort this list numerically?
sa = ['3 :mat', '20 :zap', '20 :jhon', '5 :dave', '14 :maya' ]
print(sorted(sa))
This shows
[ '14 :maya', '20 :zap','20 :jhon', '3 :mat', '5 :dave']
You can do it like this, since your numbers are part a the string:
sorted(sa, key = lambda x: int(x.split(' ')[0]))
You can do something like the below, which will use the numbers in the string and sort them.
sa.sort(key=lambda x: int(''.join(filter(str.isdigit, x))))
print(sa)
using regex:
sorted(sa, key=lambda x:int(re.findall('\d+', x)[0]))
['3 :mat', '5 :dave', '14 :maya', '20 :zap', '20 :jhon']
Using module natsort
from natsort import natsorted
natsorted(sa)
['3 :mat', '5 :dave', '14 :maya', '20 :jhon', '20 :zap']

What's the best way to parse through a list of strings and return joined strings based on slices of these strings?

Here is an example list and the desired output:
list = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
output = [ 'Michael 1', 'Jessica 1', 'Christopher 2', 'Ashley 2', 'Matthew 3', 'Brittany 3', etc]
# Then I sort it but that doesn't matter right now
I'm a python newbie and combined the concepts I understand to yield this horrendously ridiculous code that I'm almost embarrassed to post. No doubt there is a proper and easier way! I'd love some advice and help. Please don't worry about my code or editing it. Just posting it for reference if it helps. Ideally, brand new code is what I'm looking for.
list = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
list3 = []
list4 = []
y = []
for n in list:
x = n.split()
y.append(x)
print(y)
for str in y:
for pos in range(0, 3, 2): # Number and Name 1
test = str[pos]
list3.append(test)
for str in y:
for pos in range(0, 2): # Number and Name 2
test = str[pos]
list4.append(test)
list3.reverse()
list4.reverse()
print(list3)
print(list4)
length = int(len(list3) / 2)
start = 0
finish = 2
length2 = int(len(list4) / 2)
start2 = 0
finish2 = 2
for num in range(0, length):
list3[start:finish] = [" ".join(list3[start:finish])]
start += 1
finish += 1
for num in range(0, length):
list4[start2:finish2] = [" ".join(list4[start2:finish2])]
start2 += 1
finish2 += 1
print(list3)
print(list4)
list5 = list3 + list4
list5.sort()
print(list5)
Other answers are also looks good, I believe this would be the much dynamic way if there is any displacement in numbers. So re will be the good choice to slice and play.
import re
ls = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for l in ls:
key = re.findall('\d+',l)[0]
for i in re.findall('\D+',l):
for val in i.split():
result.append('{} {}'.format(val, key))
print(result)
Below is the one liner for the same:
result2 = ['{} {}'.format(val, re.findall('\d+',l)[0]) for l in ls for i in re.findall('\D+',l) for val in i.split()]
print(result2)
Happy Coding !!!
This is one approach using a simple iteration and str.split
Ex:
lst = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for i in lst:
key, *values = i.split()
for n in values:
result.append(f"{n} {key}") #or result.append(n + " " + key)
print(result)
Output:
['Michael 1', 'Jessica 1', 'Christopher 2', 'Ashley 2', 'Matthew 3', 'Brittany 3', 'Joshua 4', 'Amanda 4']
[" ".join([item, name.split()[0]]) for name in a for index, item in enumerate(name.split()) if index != 0]
input = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for item in input:
item_split = item.split(' ')
item_number = item_split.pop(0)
for item_part in item_split:
result.append('{} {}'.format(item_part, item_number))
print(result)
lst = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for item in lst:
a, b, c = item.split()
result.append("{} {}".format(b, a))
result.append("{} {}".format(c, a))
print(result)
output
['Michael 1', 'Jessica 1', 'Christopher 2', 'Ashley 2', 'Matthew 3', 'Brittany 3', 'Joshua 4', 'Amanda 4']

Match and append inside a list

So, I am working on a project, and I have the following list :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
I want to run a code that will check whether the first character of each string is present in an other string, and select them to add them in a new list if yes.
I know how to do it, but only for two strings. Here, I want to do it so that it will select all of those which start with the same string, and sort it through the number of original string there is . For example, I want to regroup by sublist of 3 strings (so, coming from the original list), all the possible combinations of strings which start with the same string.
Also, I wish the result would only count one string per possible association of substrings, and not give different combinations with the same substrings but different orders.
The expected result in that case (i.e when i want strings of 3 substrings and with a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']) is:
['2 co, 2 tr, ,2 pi', '2 co, 2 tr, 2, ca', '2pi, 2ca, 2tr', '2pi, 2ca, 2co', 3 co, 3 ca, 3 pi]
You see that here, I don't have '2 tr, 2 co, 2 pi', because i already have '2 co, 2 tr, ,2 pi'
And when i want to regroup by sublist of 4, the expected output is
['2 co, 2 tr, 2, pi, 2 ca']
I managed how to do it, but only when grouping by subset of two, and it gives all the combinations including the one with the same substrings but different order... here is it :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
result = []
for i in range(len(a)):
for j in a[:i]+a[i+1:]:
if a[i][0] == j[0]:
result.append(j)
print(result)
Thanks for your help !
You can use itertools.groupby and itertools.combinations for that task:
import itertools as it
import operator as op
groups = it.groupby(sorted(a), key=op.itemgetter(0))
result = [', '.join(c) for g in groups for c in it.combinations(g[1], 3)]
Note that if the order of elements should only depend on the first character you might want to add another key=op.itemgetter(0) to the sorted function. If the data is already presorted such that "similar" items (with the same first character) are next to each other then you can drop the sorted all together.
Details
it.groupby puts the data into groups, based on their first character (due to key=op.itemgetter(0), which selects the first item, i.e. the first character, from each string). Expanding groups, it looks like this:
[('2', ['2 co', '2 tr', '2 pi', '2 ca']),
('3', ['3 co', '3 ca', '3 pi']),
('6', ['6 tr', '6 pi']),
('7', ['7 ca', '7 pi']),
('8', ['8 tr'])]
Then for each of the groups it.combinations(..., 3) computes all possible combinations of length 3 and concatenates them in the list comprehension (for groups with less than 3 members no combinations are possible):
['2 co, 2 tr, 2 pi',
'2 co, 2 tr, 2 ca',
'2 co, 2 pi, 2 ca',
'2 tr, 2 pi, 2 ca',
'3 co, 3 ca, 3 pi']

python - removing all non-numeric characters from a string inside a list

i have a this list:
my_list = ['Judy 88 5', 'animal 91 5', 'Mo 86 5', 'Geno 87 6', 'exhaled 87 6']
i want to remove all non-numeric items from this list i.e-
['88 5', '91 5', '86 5', '87 6', '87 6']
and i want just the double digit numbers i.e -
['88', '91', '86', '87', '87']
how can i make it happen without the index being changed?
i tried using
my_list = [elem for elem in my_list if not any(c.isalpha() for c in elem)]
but it just returned me an empty list...
edit:
regex helped me on this list but what if the list was something like so:
my_list = ['J55udy 88 5', 'anim31al 91 5', 'Mo2 86 5', 'Geno 87 6', 'exhaled 87 6']
my list all ways changing but it has a constant style to it, first a user name then two digit number then a one digit number, problem is sometimes the user is using digits in his name, how can i get only the 2 digits in the middle even if the list look's like this?
my_list = ['J558udy 88 5', 'anim31al 91 5', 'Mo52 86 5', 'Gen3o 87 6', 'exhaled 87 6']
Using Regex.
Ex:
import re
my_list = ['Judy 88 5', 'animal 91 5', 'Mo 86 5', 'Geno 87 6', 'exhaled 87 6']
res = []
for i in my_list:
m = re.search(r"\b(\d{2})\b", i)
if m:
res.append(m.group())
print(res)
Output:
['88', '91', '86', '87', '87']
\b Regex boundaries.
\d{2} Looks for int with 2 digits.
You can use the following regex:
import re
my_list = ['Judy 88 5', 'animal 91 5', 'Mo 86 5', 'Geno 87 6', 'exhaled 87 6']
regex = re.compile(r'\b\d\d\b')
my_list = [regex.search(i).group() for i in my_list]
my_list would become:
['88', '91', '86', '87', '87']
Regex is indeed a good solution, but it can also be achieved without. The solution below will find all the double digits number in the strings, even if you have multiple instance, like: 'blabla 88 5 63'.
my_list = ['Judy 88 5', 'animal 91 5', 'Mo 86 5', 'Geno 87 6', 'exhaled 87 6']
digits = "123456789"
new_list = []
for elt in my_list:
for k, l in enumerate(elt):
if l in digits and k!= len(elt)-1 and elt[k+1] in digits:
new_str = elt[k:k+2]
new_list.append(new_str)
It can be turned into a one liner:
digits = "123456789"
[elt[k:k+2] for elt in my_list for k, l in enumerate(elt) if l in digits and k!= len(elt)-1 and elt[k+1] in digits]
Out[37]: ['88', '91', '86', '87', '87']
You can probably use regular expression to extract the numeric strings.
import re
my_list = ['Judy 88 5', 'animal 91 5', 'Mo 86 5', 'Geno 87 6', 'exhaled 87 6']
nums = [re.search('\d+', lst).group(0) for lst in my_list]
print(nums)
Output
['88', '91', '86', '87', '87']
[Finished in 0.1s]

Categories

Resources