Extracting a substring from a string in python based on Delimiter

Extracting a substring from a string in python based on Delimiter - python

I have an input string like:-
a=1|b=2|c=3|d=4|e=5 and so on...
What I would like to do is extract d=4 part from a very long string of similar pattern.Is there any way to get a substring based on starting point delimter and ending point delimiter?
Such that, I can start from 'd=' and search till '|' to extract its value.Any insights would be welcome.

You can use regex here :
>>> data = 'a=1|b=2|c=3|d=4|e=5'
>>> var = 'd'
>>> output = re.search('(?is)('+ var +'=[0-9]*)\|',data).group(1)
>>> print(output)
'd=4'
Or you can also use split which is more recommended option :
>>> data = 'a=1|b=2|c=3|d=4|e=5'
>>> output = data.split('|')
>>> print(output[3])
'd=4'
Or you can use dic also :
>>> data = 'a=1|b=2|c=3|d=4|e=5'
>>> output = dict(i.split('=') for i in data.split('|'))
{'a': '1', 'b': '2', 'c': '3', 'd': '4', 'e': '5'}
>>> output ['d']
'4'

Construct a dictionary!
>>> s = 'a=1|b=2|c=3|d=4|e=5'
>>> dic = dict(sub.split('=') for sub in s.split('|'))
>>> dic['d']
'4'
If you want to store the integer values, use a for loop:
>>> s = 'a=1|b=2|c=3|d=4|e=5'
>>> dic = {}
>>> for sub in s.split('|'):
... name, val = sub.split('=')
... dic[name] = int(val)
...
>>> dic['d']
4

You Can try this "startswith" , specific which variable you want the value
string = " a=1|b=2|c=3|d=4|e=5 "
array = string.split("|")
for word in array:
if word.startswith("d"):
print word

Related

Remove specific word in Python [duplicate]

This question already has answers here:
Apply function to each element of a list
(4 answers)
Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?
(3 answers)
Closed 5 months ago.
I have a set of strings and all the strings have one of two specific substrings which I want to remove:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
I want the ".good" and ".bad" substrings removed from all the strings. I tried this:
for x in set1:
x.replace('.good', '')
x.replace('.bad', '')
but it doesn't seem to work, set1 stays exactly the same. I tried using for x in list(set1) instead but that doesn't change anything.

Strings are immutable. str.replace creates a new string. This is stated in the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. [...]
This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
P.S. if you want to change the prefix or suffix of a string and you're using Python 3.9 or newer, use str.removeprefix() or str.removesuffix() instead:
new_set = {x.removesuffix('.good').removesuffix('.bad') for x in set1}

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace doesn't change the string, it returns a copy of the string with the replacement. You can't change the string directly because strings are immutable.
You need to take the return values from x.replace and put them in a new set.

In Python 3.9+ you could remove the suffix using str.removesuffix('mysuffix'). From the docs:
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string
So you can either create a new empty set and add each element without the suffix to it:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
set2 = set()
for s in set1:
set2.add(s.removesuffix(".good").removesuffix(".bad"))
Or create the new set using a set comprehension:
set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
print(set2)
Output:
{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}

All you need is a bit of black magic!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

When there are multiple substrings to remove, one simple and effective option is to use re.sub with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|) pipe.
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

You could do this:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('\.good$', '', x)
x = re.sub('\.bad$', '', x)
print(x)

# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)
print : Amin Is A Programmer

I did the test (but it is not your example) and the data does not return them orderly or complete
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

If list
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub is a patter that you do not wish to have in a list of lines LinSplitUnOr
for example
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A will be

How to convert a list to a dict?

I am using subprocess to print the output of ls.
output = subprocess.getoutput("ssh -i key.pem ubuntu#10.127.6.83 ls -l --time-style=long-iso /opt/databases | awk -F' ' '{print $6 $8}'")
lines = output.splitlines()
print(lines)
format = '%Y-%m-%d'
for line in lines:
if line != '':
date = datetime.strptime(line, format)
And when I print lines am getting a large list in the following format:
['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
I am trying to convert the above output to a dict with the dates in '%Y-%m-%d' format. So output would be something like:
{ '2019-04-25' : 'friendship_graph_43458',
'2019-07-18': 'friendship_graph_45359',
'2019-09-03': 'friendship_graph_46553' }
and so on, but not quite sure how to do so.

Technically if you don't want to use re if all dates are formatted the same then they will all be 10 characters long thus just slice the strings to make the dict in a comprehension:
data = ['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
output = {s[:10]: s[10:] for s in data if len(s) > 10}
{'2019-04-25': 'friendship_graph_43458', '2019-07-18': 'friendship_graph_45359', '2019-09-03': 'friendship_graph_46553', '2019-10-02': 'friendship_graph_46878'}

You could use a regular expression for each item in the list. For example:
(\d{4}-\d{2}-\d{2})(.*)
Then, you can just iterate through each item in the list and use the regular expression to the get the string in its two parts.
>>> import re
>>> regex = re.compile(r"(\d{4}-\d{2}-\d{2})(.*)")
>>> items = ['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
>>> items_dict = {}
>>> for i in items:
match = regex.search(i)
if match is None:
continue
items_dict[match.group(1)] = match.group(2)
>>> items_dict
{'2019-04-25': 'friendship_graph_43458', '2019-07-18': 'friendship_graph_45359', '2019-09-03': 'friendship_graph_46553', '2019-10-02': 'friendship_graph_46878'}

For lines that start with the date; use slices to separate the key from the value.
>>> s = '2019-04-25friendship_graph_43458'
>>> d = {}
>>> d[s[:10]] = s[10:]
>>> d
{'2019-04-25': 'friendship_graph_43458'}
>>>

Use re.findall and dictionary comprehension:
import re
lst = ['', '2019-04-25friendship_graph_43458', '2019-07-18friendship_graph_45359', '2019-09-03friendship_graph_46553', '2019-10-02friendship_graph_46878']
dct = {k: v for s in lst for k, v in re.findall(r'(\d\d\d\d-\d\d-\d\d)(.*)', s) }
print(dct)
# {'2019-04-25': 'friendship_graph_43458', '2019-07-18': 'friendship_graph_45359', '2019-09-03': 'friendship_graph_46553', '2019-10-02': 'friendship_graph_46878'}

Python Convert a part of a String into Float

I am trying to make a list containing 2 strings:
List=["Hight = 7.2", "baselength = 8.32"]
But I am having a problem trying to extract the numbers from the strings:
For example:
If "Hight = 7.2" then the result should be: 7.2
or if the "Hight= 7.3232" then the result should be: 7.3232

Using re.findall :
>>> out = []
>>> for s in l:
out.append( float(re.findall('\d+(?:\.\d+)?', s)[0]) )
>>> out
=> [7.2, 8.0]
Or, without regex, using split,
>>> out = []
>>> for s in l:
num = s.replace(' ','').split('=')[1]
#note : removed whitespace so don't have to deal with cases like
# `n = 2` or `n=2`
out.append(float(num))
>>> out
=> [7.2, 8.0]
#driver values :
IN : l = ["Hight = 7.2","baselength = 8"]

How about this
[(item.split('=')[0],float(item.split('=')[1]) ) for item in List]
Output :
[('Hight ', 7.2), ('baselength ', 8.32)]

Having a label associated to a value is best managed with a dictionary, however if you must have each label=value pair as an entry in a list because perhaps you are reading it into Python from elsewhere, you could use the re module to extract the numeric value from each string in the list:
import re
list=["height = 7.2", "length = 8.32"]
for dim in list:
print(float(re.search('\d+.\d+', dim).group()))

You could convert your list to a dictionary using a comprehension:
import re
List=["Height = 7.2", "baselength = 8.32"]
rx = re.compile(r'(?P<key>\w+)\s*=\s*(?P<value>\d+(?:\.\d+)?)')
Dict = {m.group('key'): float(m.group('value'))
for item in List
for m in [rx.search(item)]}
print(Dict)
# {'Height': 7.2, 'baselength': 8.32}
Afterwards, you can access your values with e.g. Dict["Height"] (here: 7.2).

It's very simple. Use this method for any type of value
List=["Hight = 7.2", "baselength = 8.32"]
# showing example for one value , but you can loop the entire list
a = List[0].split("= ")[1] #accessing first element and split with "= "
print a
'7.2'

How to remove specific substrings from a set of strings in Python? [duplicate]

This question already has answers here:
Apply function to each element of a list
(4 answers)
Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?
(3 answers)
Closed 5 months ago.
I have a set of strings and all the strings have one of two specific substrings which I want to remove:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
I want the ".good" and ".bad" substrings removed from all the strings. I tried this:
for x in set1:
x.replace('.good', '')
x.replace('.bad', '')
but it doesn't seem to work, set1 stays exactly the same. I tried using for x in list(set1) instead but that doesn't change anything.

Strings are immutable. str.replace creates a new string. This is stated in the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. [...]
This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
P.S. if you want to change the prefix or suffix of a string and you're using Python 3.9 or newer, use str.removeprefix() or str.removesuffix() instead:
new_set = {x.removesuffix('.good').removesuffix('.bad') for x in set1}

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace doesn't change the string, it returns a copy of the string with the replacement. You can't change the string directly because strings are immutable.
You need to take the return values from x.replace and put them in a new set.

In Python 3.9+ you could remove the suffix using str.removesuffix('mysuffix'). From the docs:
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string
So you can either create a new empty set and add each element without the suffix to it:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
set2 = set()
for s in set1:
set2.add(s.removesuffix(".good").removesuffix(".bad"))
Or create the new set using a set comprehension:
set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
print(set2)
Output:
{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}

All you need is a bit of black magic!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

When there are multiple substrings to remove, one simple and effective option is to use re.sub with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|) pipe.
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

You could do this:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('\.good$', '', x)
x = re.sub('\.bad$', '', x)
print(x)

# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)
print : Amin Is A Programmer

I did the test (but it is not your example) and the data does not return them orderly or complete
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

If list
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub is a patter that you do not wish to have in a list of lines LinSplitUnOr
for example
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A will be

Get values in string - Python

I am new to Python so I have lots of doubts. For instance I have a string:
string = "xtpo, example1=x, example2, example3=thisValue"
For example, is it possible to get the values next to the equals in example1 and example3? knowing only the keywords, not what comes after the = ?

You can use regex:
>>> import re
>>> strs = "xtpo, example1=x, example2, example3=thisValue"
>>> key = 'example1'
>>> re.search(r'{}=(\w+)'.format(key), strs).group(1)
'x'
>>> key = 'example3'
>>> re.search(r'{}=(\w+)'.format(key), strs).group(1)
'thisValue'

Spacing things out for clarity
>>> Sstring = "xtpo, example1=x, example2, example3=thisValue"
>>> items = Sstring.split(',') # Get the comma separated items
>>> for i in items:
... Pair = i.split('=') # Try splitting on =
... if len(Pair) > 1: # Did split
... print Pair # or whatever you would like to do
...
[' example1', 'x']
[' example3', 'thisValue']
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting a substring from a string in python based on Delimiter - python

You Can try this "startswith" , specific which variable you want the value string = " a=1|b=2|c=3|d=4|e=5 " array = string.split("|") for word in array: if word.startswith("d"): print word

Related

Remove specific word in Python [duplicate]

How to convert a list to a dict?

Python Convert a part of a String into Float

How to remove specific substrings from a set of strings in Python? [duplicate]

Get values in string - Python

Categories

Resources