Match and append inside a list - python

So, I am working on a project, and I have the following list :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
I want to run a code that will check whether the first character of each string is present in an other string, and select them to add them in a new list if yes.
I know how to do it, but only for two strings. Here, I want to do it so that it will select all of those which start with the same string, and sort it through the number of original string there is . For example, I want to regroup by sublist of 3 strings (so, coming from the original list), all the possible combinations of strings which start with the same string.
Also, I wish the result would only count one string per possible association of substrings, and not give different combinations with the same substrings but different orders.
The expected result in that case (i.e when i want strings of 3 substrings and with a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']) is:
['2 co, 2 tr, ,2 pi', '2 co, 2 tr, 2, ca', '2pi, 2ca, 2tr', '2pi, 2ca, 2co', 3 co, 3 ca, 3 pi]
You see that here, I don't have '2 tr, 2 co, 2 pi', because i already have '2 co, 2 tr, ,2 pi'
And when i want to regroup by sublist of 4, the expected output is
['2 co, 2 tr, 2, pi, 2 ca']
I managed how to do it, but only when grouping by subset of two, and it gives all the combinations including the one with the same substrings but different order... here is it :
a = ['2 co',' 2 tr',' 2 pi', '2 ca', '3 co', '3 ca', '3 pi', '6 tr', '6 pi', '8 tr', '7 ca', '7 pi']
result = []
for i in range(len(a)):
for j in a[:i]+a[i+1:]:
if a[i][0] == j[0]:
result.append(j)
print(result)
Thanks for your help !

You can use itertools.groupby and itertools.combinations for that task:
import itertools as it
import operator as op
groups = it.groupby(sorted(a), key=op.itemgetter(0))
result = [', '.join(c) for g in groups for c in it.combinations(g[1], 3)]
Note that if the order of elements should only depend on the first character you might want to add another key=op.itemgetter(0) to the sorted function. If the data is already presorted such that "similar" items (with the same first character) are next to each other then you can drop the sorted all together.
Details
it.groupby puts the data into groups, based on their first character (due to key=op.itemgetter(0), which selects the first item, i.e. the first character, from each string). Expanding groups, it looks like this:
[('2', ['2 co', '2 tr', '2 pi', '2 ca']),
('3', ['3 co', '3 ca', '3 pi']),
('6', ['6 tr', '6 pi']),
('7', ['7 ca', '7 pi']),
('8', ['8 tr'])]
Then for each of the groups it.combinations(..., 3) computes all possible combinations of length 3 and concatenates them in the list comprehension (for groups with less than 3 members no combinations are possible):
['2 co, 2 tr, 2 pi',
'2 co, 2 tr, 2 ca',
'2 co, 2 pi, 2 ca',
'2 tr, 2 pi, 2 ca',
'3 co, 3 ca, 3 pi']

Related

Setting X-Tick Labels on Transposed Line Plot

I'm trying to properly label my line plot and set the x-tick labels but have been unsuccessful.
Here is what I've tried so far:
plt.xticks(ticks = ... ,labels =...)
AND
labels = ['8 pcw', '12 pcw', '13 pcw', '16 pcw', '17 pcw', '19 pcw', '21 pcw',
'24 pcw', '35 pcw', '37 pcw', '4 mos', '1 yrs', '2 yrs', '3 yrs',
'4 yrs', '8 yrs', '11 yrs', '13 yrs', '18 yrs', '19 yrs', '21 yrs',
'23 yrs', '30 yrs', '36 yrs', '37 yrs', '40 yrs']
ax.set_xticks(labels)
The code that I've used to transpose this dataframe into a line graph is this:
mean_df.transpose().plot().line(figsize = (25, 10))
plt.xlabel("Age")
plt.ylabel("Raw RPKM")
plt.title("BTRC Expression in V1C")
The dataframe I'm using (mean_df) contains columns that are already named with their respective label (8 pcw, 12 pcw, ... 36yrs, 40yrs) so I would have thought that it would have pulled them automatically from there. However, it looks like matplotlib automatically removes the x-ticks and displays only 5 values for the x-ticks. How can I get it to display all 24 values instead?
I keep getting the following two errors when I try the methods listed above:
Failed to convert value(s) to axis units:
OR
ValueError: The number of FixedLocator locations (n), usually from a
call to set_ticks, does not match the number of ticklabels (n)
Here is an image of my plot:

Python regex find single digit if no digits before it

I have a list of strings and I want to use regex to get a single digit if there are no digits before it.
strings = ['5.8 GHz', '5 GHz']
for s in strings:
print(re.findall(r'\d\s[GM]?Hz', s))
# output
['8 GHz']
['5 GHz']
# desired output
['5 GHz']
I want it to just return '5 GHz', the first string shouldn't have any matches. How can I modify my pattern to get the desired output?
As per my comment, it seems that you can use:
(?<!\d\.)\d+\s[GM]?Hz\b
This matches:
(?<!\d\.) - A negative lookbehind to assert position is not right after any single digit and literal dot.
\d+ - 1+ numbers matching the integer part of the frequency.
[GM]?Hz - An optional uppercase G or M followed by "Hz".
\b - A word boundary.
>>> strings = ['5.8 GHz', '5 GHz']
>>>
>>> for s in strings:
... match = re.match(r'^[^0-9]*([0-9] [GM]Hz)', s)
... if match:
... print(match.group(1))
...
5 GHz
Updated Answer
import re
a = ['5.8 GHz', '5 GHz', '8 GHz', '1.2', '1.2 Some Random String', '1 Some String', '1 MHz of frequency', '2 Some String in Between MHz']
res = []
for fr in a:
if re.match('^[0-9](?=.[^0-9])(\s)[GM]Hz$', fr):
res.append(fr)
print(res)
Output:
['5 GHz', '8 GHz']
My two cents:
selected_strings = list(filter(
lambda x: re.findall(r'(?:^|\s+)\d+\s+(?:G|M)Hz', x),
strings
))
With ['2 GHz', '5.8 GHz', ' 5 GHz', '3.4 MHz', '3 MHz', '1 MHz of Frequency'] as strings, here selected_strings:
['2 GHz', ' 5 GHz', '3 MHz', '1 MHz of Frequency']

How can I sort this list numerically?

How do I sort this list numerically?
sa = ['3 :mat', '20 :zap', '20 :jhon', '5 :dave', '14 :maya' ]
print(sorted(sa))
This shows
[ '14 :maya', '20 :zap','20 :jhon', '3 :mat', '5 :dave']
You can do it like this, since your numbers are part a the string:
sorted(sa, key = lambda x: int(x.split(' ')[0]))
You can do something like the below, which will use the numbers in the string and sort them.
sa.sort(key=lambda x: int(''.join(filter(str.isdigit, x))))
print(sa)
using regex:
sorted(sa, key=lambda x:int(re.findall('\d+', x)[0]))
['3 :mat', '5 :dave', '14 :maya', '20 :zap', '20 :jhon']
Using module natsort
from natsort import natsorted
natsorted(sa)
['3 :mat', '5 :dave', '14 :maya', '20 :jhon', '20 :zap']

What's the best way to parse through a list of strings and return joined strings based on slices of these strings?

Here is an example list and the desired output:
list = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
output = [ 'Michael 1', 'Jessica 1', 'Christopher 2', 'Ashley 2', 'Matthew 3', 'Brittany 3', etc]
# Then I sort it but that doesn't matter right now
I'm a python newbie and combined the concepts I understand to yield this horrendously ridiculous code that I'm almost embarrassed to post. No doubt there is a proper and easier way! I'd love some advice and help. Please don't worry about my code or editing it. Just posting it for reference if it helps. Ideally, brand new code is what I'm looking for.
list = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
list3 = []
list4 = []
y = []
for n in list:
x = n.split()
y.append(x)
print(y)
for str in y:
for pos in range(0, 3, 2): # Number and Name 1
test = str[pos]
list3.append(test)
for str in y:
for pos in range(0, 2): # Number and Name 2
test = str[pos]
list4.append(test)
list3.reverse()
list4.reverse()
print(list3)
print(list4)
length = int(len(list3) / 2)
start = 0
finish = 2
length2 = int(len(list4) / 2)
start2 = 0
finish2 = 2
for num in range(0, length):
list3[start:finish] = [" ".join(list3[start:finish])]
start += 1
finish += 1
for num in range(0, length):
list4[start2:finish2] = [" ".join(list4[start2:finish2])]
start2 += 1
finish2 += 1
print(list3)
print(list4)
list5 = list3 + list4
list5.sort()
print(list5)
Other answers are also looks good, I believe this would be the much dynamic way if there is any displacement in numbers. So re will be the good choice to slice and play.
import re
ls = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for l in ls:
key = re.findall('\d+',l)[0]
for i in re.findall('\D+',l):
for val in i.split():
result.append('{} {}'.format(val, key))
print(result)
Below is the one liner for the same:
result2 = ['{} {}'.format(val, re.findall('\d+',l)[0]) for l in ls for i in re.findall('\D+',l) for val in i.split()]
print(result2)
Happy Coding !!!
This is one approach using a simple iteration and str.split
Ex:
lst = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for i in lst:
key, *values = i.split()
for n in values:
result.append(f"{n} {key}") #or result.append(n + " " + key)
print(result)
Output:
['Michael 1', 'Jessica 1', 'Christopher 2', 'Ashley 2', 'Matthew 3', 'Brittany 3', 'Joshua 4', 'Amanda 4']
[" ".join([item, name.split()[0]]) for name in a for index, item in enumerate(name.split()) if index != 0]
input = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for item in input:
item_split = item.split(' ')
item_number = item_split.pop(0)
for item_part in item_split:
result.append('{} {}'.format(item_part, item_number))
print(result)
lst = ['1 Michael Jessica', '2 Christopher Ashley', '3 Matthew Brittany', '4 Joshua Amanda']
result = []
for item in lst:
a, b, c = item.split()
result.append("{} {}".format(b, a))
result.append("{} {}".format(c, a))
print(result)
output
['Michael 1', 'Jessica 1', 'Christopher 2', 'Ashley 2', 'Matthew 3', 'Brittany 3', 'Joshua 4', 'Amanda 4']

How to get all elements that contain a number from a list in Python

Suppose I have the following list
['0 1 2', '1 2 1', '2 3 2', '3 4 1', '4 5 2', '5 0 1', '0 2 3', '0 3 2', '0 4 3', '1 3 3', '1 4 1', '1 5 3', '2 4 3', '2 5 2', '3 5 2']
I want to get all elements that contain '0'.
How would I do that? I am a beginner in Python and have been stuck on this problem for days.
You could use list comprehension to iterate through each element in the list and check if that element contains '0'. If so, include that element.
nums = ['0 1 2', '1 2 1', '2 3 2', '3 4 1', '4 5 2', '5 0 1', '0 2 3', '0 3 2', '0 4 3', '1 3 3', '1 4 1', '1 5 3', '2 4 3', '2 5 2', '3 5 2']
has_zero = [num for num in nums if '0' in num]
print(has_zero)
Output:
['0 1 2', '5 0 1', '0 2 3', '0 3 2', '0 4 3']
try:
list2 = []
for element in list1:
if "0" in element:
list2.append(element)
this should work
You can use filter:
>>> l = ['0 1 2', '1 2 1', '2 3 2', '3 4 1', '4 5 2', ...]
>>> list(filter(lambda x: '0' in x, l))
Try this:
data = ['0 1 2', '1 2 1', '2 3 2', '3 4 1', '4 5 2', '5 0 1', '0 2 3', '0 3 2', '0 4 3', '1 3 3', '1 4 1', '1 5 3',
'2 4 3', '2 5 2', '3 5 2']
print([x for x in data if '0' in x])
This will work:
myList= ['0 1 2', '1 2 1', '2 3 2', '3 4 1', '4 5 2', '5 0 1', '0 2 3', '0 3 2', '0 4 3', '1 3 3', '1 4 1', '1 5 3',
'2 4 3', '2 5 2', '3 5 2']
resultList = []
for item in myList:
if "0" in item:
resultList.append(item)
If you ask that how this works, simply it loops through list item and checks if the "0" is in the item or not and if it is the item will be added to result list.
Already given options are better, but as understandable as possible for a beginner you can try:
myList = ['0 1 2', '1 2 1', '2 3 2', '3 4 1', '4 5 2', '5 0 1', '0 2 3', '0 3 2', '0 4
3', '1 3 3', '1 4 1', '1 5 3', '2 4 3', '2 5 2', '3 5 2']
elementsContainingZero = []
for element in list:
splitElement = element.split()
for number in splitElement:
if number == '0':
elementsContainingZero.append(element)
print(elementsContainingZero)

Categories

Resources