Python split a string by delimiter and trim all elements - python

Is there a Python lambda code available to split the following string by the delimiter ">" and create the list after trimming each element?
Input: "p1 > p2 > p3 > 0"
Output: ["p1", "p2", "p3", "0"]

I agree with the comment that all you need is:
>>> "p1 > p2 > p3 > 0".split(" > ")
['p1', 'p2', 'p3', '0']
However, if the whitespace is inconsistent and you need to do exactly what you said (split then trim) then you could use a list comprehension like:
>>> s = "p1 > p2 > p3 > 0"
>>> [x.strip() for x in s.split(">")]
['p1', 'p2', 'p3', '0']

Related

Splitting string, ignoring brackets including nested brackets

I would like to split a string at spaces (and colons), except inside curly brackets and rounded brackets. Similar questions have been asked, but the answers fail with nested brackets.
Here is an example of a string to split:
p1: I/out p2: (('mean', 5), 0.0, ('std', 2)) p3: 7 p4: {'name': 'check', 'value': 80.0}
The actual goal is to obtain a list of keys (p1, p2, p3 and p4) along with their values. When I try to split the string at spaces and colons, I can avoid splitting at spaces and colons inside the curly brackets. But I cannot avoid the splitting at some spaces inside the rounded brackets because of the nested brackets.
The closest I got is
[\s:]+(?=[^\{\(\)\}]*(?:[\{\(]|$))
which is fine except that it splits between (('mean', 5), and 0.0.
You can use the following PCRE/Python PyPi regex compliant pattern:
(?:(\((?:[^()]++|(?1))*\))|(\{(?:[^{}]++|(?2))*})|[^\s:])+
See the regex demo.
It matches
(?: - start of a container non-capturing group:
(\((?:[^()]++|(?1))*\)) - Group 1: a substring between two nested round brackets
| - or
(\{(?:[^{}]++|(?2))*}) - Group 2: a substring between two nested braces
| - or
[^\s:] - a char other than whitespace and colon
)+ - one or more occurrences.
See the Python demo:
import regex
text = "p1: I/out p2: (('mean', 5), 0.0, ('std', 2)) p3: 7 p4: {'name': 'check', 'value': 80.0}"
pattern = r"(?:(\((?:[^()]++|(?1))*\))|(\{(?:[^{}]++|(?2))*})|[^\s:])+"
print( [x.group() for x in regex.finditer(pattern, text)] )
Output:
['p1', 'I/out', 'p2', "(('mean', 5), 0.0, ('std', 2))", 'p3', '7', 'p4', "{'name': 'check', 'value': 80.0}"]

Convert a file of strings to required format : replace / with _ except the ones before comma

I need to covert a file in particular format.
Here is the example:
>>> x = "abc/xyz/abc/xyz/ab_c : abc/xyz/abc/xyz/ab_c,ab_c/xy_z/ab_c/xy_z/ab_c/xy_z,1"
I need to replace all the / with _ except the ones which are before , and a space after ,.
Output needed:
>>> 'abc_xyz_abc_xyz_ab_c : abc_xyz_abc_xyz/ab_c, ab_c_xy_z_ab_c_xy_z_ab_c/xy_z, 1'
I tried replacing / with _ but in this case, I have no way to exclude / before ,.
>>> x.replace("/", "_").replace(",", ", ")
'abc_xyz_abc_xyz_ab_c : abc_xyz_abc_xyz_ab_c, ab_c_xy_z_ab_c_xy_z_ab_c_xy_z, 1'
Is there any other way to achieve this? Thanks in advance.
zip() your splitted (at '/' ) text with itself shifted by 1 and put it back together using the correct in-betweens:
x = "abc/xyz/abc/xyz/ab_c : abc/xyz/abc/xyz/ab_c,ab_c/xy_z/ab_c/xy_z/ab_c/xy_z,1"
parts = x.split("/")
pp = zip(parts,parts[1:])
l = []
for at,after in pp:
if ',' in after:
l.extend([at,'/'])
else:
l.extend([at,'_'])
l.append(after)
# join and add spaces after ,
new_t = ''.join(l).replace(",",", ")
print(new_t)
print('abc_xyz_abc_xyz_ab_c : abc_xyz_abc_xyz/ab_c, ab_c_xy_z_ab_c_xy_z_ab_c/xy_z, 1')
Output:
abc_xyz_abc_xyz_ab_c : abc_xyz_abc_xyz/ab_c, ab_c_xy_z_ab_c_xy_z_ab_c/xy_z, 1
abc_xyz_abc_xyz_ab_c : abc_xyz_abc_xyz/ab_c, ab_c_xy_z_ab_c_xy_z_ab_c/xy_z, 1
The zipped thing looks like this:
# pp splitted zipped with itself, shifted by 1
[('abc', 'xyz'), ('xyz', 'abc'), ('abc', 'xyz'), ('xyz', 'ab_c : abc'),
('ab_c : abc', 'xyz'), ('xyz', 'abc'), ('abc', 'xyz'), ('xyz', 'ab_c,ab_c'),
('ab_c,ab_c', 'xy_z'), ('xy_z', 'ab_c'), ('ab_c', 'xy_z'), ('xy_z', 'ab_c'),
('ab_c', 'xy_z,1')]
This code uses python 3 style printing - but it works as well in python 2

Separating strings with lists

I was trying to find a way to separate strings in a project of my called 'Chemistry Calculator'. This project takes strings from an input() and compare it in a list:
substance1 = input('Substance 1: ')
substance2 = input('Substance 2: ')
elements = ['f','o','cl','br','i','s','c']
def affinity_table(element1:str,element2:str,table:list) -> str:
s = element1.lower()
r = element2.lower()
if s in table and r in table:
if table.index(s) < table.index(r):
print(s," will chage with ", r)
else:
print(s," won't change with ", r)
else:
print("Those substances are't in the list")
This code above works well.
So I wanted to have it working with hole substances and not just the element. To do this I need to separate the substance in to parts:
the cations parts
the anions parts.
Then I need to compare them with the list. I noticed that the contains() function showed exactly what I wanted, but only with one comparison.
My question came from:
Is there a way of using the contains() function with more than one string and then separate the string in to where the similarity is found.
Something similar to this:
a = 'NaCO3' #First input.
b = 'KCO3' #Second input.
list = ['Na','K'] #The list.
# Way of separating the values with the list.
# ^ my objective.
a1 = 'Na' #Separation with a.
a2 = 'CO3' #The rest of a.
b1 = 'K' #The rest of b.
b2 = 'CO3' #The rest of b.
# ^ expected outputs from the separation.
if table.index(a1) < table.index(a2):
print(a1,' will change with ', b1, 'and become', a1 + b2)
else:
print(a1," won't change with ", b1, 'and will stay normal')
# ^ the list index comparison from the 1st code.
#After the solution, here are the results:
Disclaimer
Just to be clear: for the constrained scope of what you are doing this solution might be applicable. If you want to parse any chemical compound (and those can look quite complicated) you need a full fledged parser, not the toy regex solution I came up with.
Here's an idea:
Dynamically build a regex with elements from your list as alternating matching groups. (re.split keeps groups when splitting.)
>>> import re
>>> lst = ['Na', 'K']
>>> regex = '|'.join('({})'.format(a) for a in lst)
>>> regex
>>> '(Na)|(K)'
Apply the regex...
>>> re.split(regex, 'NaCO3')
>>> ['', 'Na', None, 'CO3']
>>> re.split(regex, 'KCO3')
>>> ['', None, 'K', 'CO3']
... and filter out falsy values (None, '')
>>> list(filter(None, re.split(regex, 'NaCO3')))
>>> ['Na', 'CO3']
>>> list(filter(None, re.split(regex, 'KCO3')))
>>> ['K', 'CO3']
You can assign to those values with extended iterable unpacking:
>>> b1, b2, *unexpected_rest = filter(None, re.split(regex, 'KCO3'))
>>> b1
>>> 'K'
>>> b2
>>> 'CO3'
If you want to bias the split in favor of longer matches, sort lst in descending order first.
Not good:
>>> lst = ['N', 'Na', 'CO3']
>>> regex = '|'.join('({})'.format(a) for a in lst)
>>> list(filter(None, re.split(regex, 'NaCO3')))
>>> ['N', 'a', 'CO3']
Better:
>>> lst = ['N', 'Na', 'CO3']
>>> lst = sorted(lst, key=len, reverse=True)
>>> regex = '|'.join('({})'.format(a) for a in lst)
>>> list(filter(None, re.split(regex, 'NaCO3')))
>>> ['Na', 'CO3']
Let me know if that works for you.

Extracting a substring from a string in python based on Delimiter

I have an input string like:-
a=1|b=2|c=3|d=4|e=5 and so on...
What I would like to do is extract d=4 part from a very long string of similar pattern.Is there any way to get a substring based on starting point delimter and ending point delimiter?
Such that, I can start from 'd=' and search till '|' to extract its value.Any insights would be welcome.
You can use regex here :
>>> data = 'a=1|b=2|c=3|d=4|e=5'
>>> var = 'd'
>>> output = re.search('(?is)('+ var +'=[0-9]*)\|',data).group(1)
>>> print(output)
'd=4'
Or you can also use split which is more recommended option :
>>> data = 'a=1|b=2|c=3|d=4|e=5'
>>> output = data.split('|')
>>> print(output[3])
'd=4'
Or you can use dic also :
>>> data = 'a=1|b=2|c=3|d=4|e=5'
>>> output = dict(i.split('=') for i in data.split('|'))
{'a': '1', 'b': '2', 'c': '3', 'd': '4', 'e': '5'}
>>> output ['d']
'4'
Construct a dictionary!
>>> s = 'a=1|b=2|c=3|d=4|e=5'
>>> dic = dict(sub.split('=') for sub in s.split('|'))
>>> dic['d']
'4'
If you want to store the integer values, use a for loop:
>>> s = 'a=1|b=2|c=3|d=4|e=5'
>>> dic = {}
>>> for sub in s.split('|'):
... name, val = sub.split('=')
... dic[name] = int(val)
...
>>> dic['d']
4
You Can try this "startswith" , specific which variable you want the value
string = " a=1|b=2|c=3|d=4|e=5 "
array = string.split("|")
for word in array:
if word.startswith("d"):
print word

For loop, if element does not equal value, replace with empty string

my_list = ['1 ab ac bbba','23 abcba a aabb ab','345 ccc ab aaaaa']
I'm trying to get rid of the numbers and the spaces, basically everything that's not an 'a','b', or 'c'
I tried this but it didn't work and I'm not sure why:
for str in my_list:
for i in str:
if i != 'a' or 'b' or 'c':
i = ''
else:
pass
I want to eventually get:
my_list2 = ['abacbbba','abcbaaaabbab','cccabaaaaa']
You're misunderstanding how or works:
if i != 'a' or 'b' or 'c':
is equivalent to
if (i != 'a') or ('b') or ('c'):
and will therefore always be True (because b evaluates to True).
You probably meant to write
if i != 'a' and i != 'b' and i != 'c':
which can also be written as
if i not in ('a', 'b', 'c'):
or even (since a string can iterate over its characters)
if i not in 'abc':
But even then, you're not doing anything with that information; a string is immutable, and by assigning '' to i, you're not changing the string at all. So if you want to do it without a regex, the correct way would be
>>> my_list = ['1 ab ac bbba','23 abcba a aabb ab','345 ccc ab aaaaa']
>>> new_list = [''.join(c for c in s if c in 'abc') for s in my_list]
>>> new_list
['abacbbba', 'abcbaaaabbab', 'cccabaaaaa']
Use re.sub to replace everything that is not a, b, or c, i.e., [^abc], with an empty string:
import re
my_list2 = []
for str in my_list:
my_list2.append(re.sub("[^abc]", "", str))
DEMO.
m = ['1 ab ac bbba','23 abcba a aabb ab','345 ccc ab aaaaa']
n=[m[x][m[x].index(" "):] for x in range(len(m))]
n=[x.replace(" ","") for x in n]

Categories

Resources