Is there any way to remove the brackets () if the content inside .isnumeric()
I do know a little bit of RegEx but I'm unable to find a way to do it using RegEx.
Example:
input = '((1)+(1))+2+(1+2)+((2))'
output = somefunction(input)
Here the output should look like
(1+1)+2+(1+2)+2
import re
x = '((1)+(1))+2+(1+2)+((2))'
re.sub(r'(\()([\d*\.]+)(\))', r"\2", x)
"""
or
re.sub(r'\(([\d*\.]+)\)', r"\1", x) # #deceze
"""
But this will give you
(1+1)+2+(1+2)+(2)
Can maybe use re.subn to do this until number of replacements are 0
Related
I'm trying to extract financial data from a wall of text. basically I have a function that splits the text three times, but I know there is a more efficient way of doing so, but I cannot figure it out. Some curly braces really throw a wrench into my plan, because i'm trying to format a string.
I want to pass my function a string such as:
"totalCashflowsFromInvestingActivities"
and extract the following raw number:
"-2478000"
this is my current function, which works, but not efficient at all
def splitting(value, text):
x= text.split('"{}":'.format(value))[1]
y=x.split(',"fmt":')[0]
z=y.split(':')[1]
return z
any help would be greatly appreciated!
sample text:
"cashflowStatementHistory":{"cashflowStatements":[{"changeToLiabilities":{"raw":66049000,"fmt":"66.05M","longFmt":"66,049,000"},"totalCashflowsFromInvestingActivities":{"raw":-2478000,"fmt":"-2.48M","longFmt":"-2,478,000"},"netBorrowings":{"raw":-31652000,"fmt":"-31.65M","longFmt":"-31,652,000"}
Here is a solution using regex. It assumes the format is always the same, having the raw value always immediately after the title and separated by ":{.
import re
def get_value(value_name, text):
""" finds all the occurrences of the passed `value_name`
and returns the `raw` values"""
pattern = value_name + r'":{"raw":(-?\d*)'
return re.findall(pattern, text)
text = '"cashflowStatementHistory":{"cashflowStatements":[{"changeToLiabilities":{"raw":66049000,"fmt":"66.05M","longFmt":"66,049,000"},"totalCashflowsFromInvestingActivities":{"raw":-2478000,"fmt":"-2.48M","longFmt":"-2,478,000"},"netBorrowings":{"raw":-31652000,"fmt":"-31.65M","longFmt":"-31,652,000"}'
val = get_value('totalCashflowsFromInvestingActivities', text)
print(val)
['-2478000']
You can cast that result to a numeric type with map by replacing the return line.
return list(map(int, re.findall(pattern, text)))
If Buran is right and your source is Json, you might find this helpful:
import json
s = '{"cashflowStatementHistory":{"cashflowStatements":[{"changeToLiabilities":{"raw":66049000,"fmt":"66.05M","longFmt":"66,049,000"},"totalCashflowsFromInvestingActivities":{"raw":-2478000,"fmt":"-2.48M","longFmt":"-2,478,000"},"netBorrowings":{"raw":-31652000,"fmt":"-31.65M","longFmt":"-31,652,000"}}]}}'
j = json.loads(s)
for i in j["cashflowStatementHistory"]["cashflowStatements"]:
if "totalCashflowsFromInvestingActivities" in i:
print(i["totalCashflowsFromInvestingActivities"]["raw"])
In this way you can find anything in the wall of text.
Take a look at this too: https://www.w3schools.com/python/python_json.asp
I have several strings in a list:
['~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item2.png', '~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item3.png', '~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item4.png'
I need to remove the 'item2', 'item3', 'item4' so I can later replace with another variable that changes each time that I am passing in: variable = {changing item}
I have tried things like string.replace("item{i}.format(i) for i in range(20), "") or re.sub but I can't seem to get it to work - any suggestions?
I would expect the output [~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_{changing item1}.png, ~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_{changing item2}.png, ~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_{changing item3}.png]
You can use re.sub to replace string ('item<number>') like so:
re.sub(r'item\d+', var, x)
Code:
import re
lst = ['~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_item2.png', 'some/thing/0034_item5.png']
var = 'foo'
result = [re.sub(r'item\d+', var, x) for x in lst]
# ['~/tmp/GROUP-G07T01/items/GROUP-G07T01-000021_foo.png', 'some/thing/0034_foo.png']
Try using the re regex module. You need to use a valid regex. You can specify one or more characters 0 to 9 by using [0-9]+.
fixed_str = re.replace("item[0-9]+", "", input_str)
Here is the reference to how to format regexs:
https://docs.python.org/3/library/re.html
You can also use online sites such as regex101.com to experiment with regex formatting in real time to make sure it works ahead of time.
I have a String from which I want to take the values within the parenthesis. Then, get the values that are separated from a comma.
Example: x(142,1,23ERWA31)
I would like to get:
142
1
23ERWA31
Is it possible to get everything with one regex?
I have found a method to do so, but it is ugly.
This is how I did it in python:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
secondResult = re.search("(?<=\()(.*?)(?=\))", firstResult.group(0))
finalResult = [x.strip() for x in secondResult.group(0).split(',')]
for i in finalResult:
print(i)
142
1
23ERWA31
This works for your example string:
import re
string = "x(142,1,23ERWA31)"
l = re.findall (r'([^(,)]+)(?!.*\()', string)
print (l)
Result: a plain list
['142', '1', '23ERWA31']
The expression matches a sequence of characters not in (,,,) and – to prevent the first x being picked up – may not be followed by a ( anywhere further in the string. This makes it also work if your preamble x consists of more than a single character.
findall rather than search makes sure all items are found, and as a bonus it returns a plain list of the results.
You can make this a lot simpler. You are running your first Regex but then not taking the result. You want .group(1) (inside the brackets), not .group(0) (the whole match). Once you have that you can just split it on ,:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
for e in firstResult.group(1).split(','):
print(e)
A little wonky looking, and also assuming there's always going to be a grouping of 3 values in the parenthesis - but try this regex
\((.*?),(.*?),(.*?)\)
To extract all the group matches to a single object - your code would then look like
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?),(.*?),(.*?)\)", string).groups()
You can then call the firstResult object like a list
>> print(firstResult[2])
23ERWA31
Write a python function, remove_duplicate() which accepts a string and removes all duplicate chars from a given string and return it.
Sample
Input:
1122334455ababzzz##123#*#*
Output:
12345abz##*
I tried this regular expression:
import re
re.subn(r'([(0-9)(a-z)])\1+', r'\1', Sample)
it gives me: '12345ababz##123#*#*' not same as expected output.
You can use built-ins like set(), but that might mess up the order. If you want to keep the order and only remove the repetitions, you can make a simple for loop:
test = "1122334455ababzzz##123#*#*"
_out = ""
for x in test:
if x not in _out:
_out = _out+x
print(_out)
12345abz##*
I have something like:
GCF_002904975:2.6672e-05):2.6672e-05.
and I would like to add the word '_S' right after any GCF(any number) entry before the next colon.
In other words I would like my text becoming like:
GCF_002904975_S:2.6672e-05):2.6672e-05.
I have repeated pattern like that all along my text.
This can be easily done with re.sub function. A working example would look like this:
import re
inp_string='(((GCF_001297375:2.6671e-05,GCF_002904975:2.6672e-05)0.924:0.060046136,(GCF_000144955:0.036474926,((GCF_001681075:0.017937143,...'
if __name__ == "__main__":
outp_string = re.sub(r'GCF_(?P<gfc_number>\d+)\:', r'GCF_\g<gfc_number>_S:', inp_string)
print(outp_string)
This code gives the following result, which is hopefully what you need:
(((GCF_001297375_S:2.6671e-05,GCF_002904975_S:2.6672e-05)0.924:0.060046136,(GCF_000144955_S:0.036474926,((GCF_001681075_S:0.017937143,...
For more info take a look at the docs:
https://docs.python.org/3/library/re.html
You can use regular expressions with a function substitution. The solution below depends on the numbers always being 9 digits, but could be modified to work with other cases.
test_str = '(((GCF_001297375:2.6671e-05,GCF_002904975:2.6672e-05)0.924:0.060046136,GCF_000144955:0.036474926,((GCF_001681075:0.017937143,...'
new_str = re.sub(r"GCF_\d{9}", lambda x: x.group(0) + "_S", test_str)
print(new_str)
#(((GCF_001297375_S:2.6671e-05,GCF_002904975_S:2.6672e-05)0.924:0.060046136,GCF_000144955_S:0.036474926,((GCF_001681075_S:0.017937143,...
Why not just do a replace? Shortening your example string to make it easier to read:
"(((GCF_001297375:2.6671e-05,GCF_002904975:2.6672e-05)...".replace(":","_S:")