How to extract a Double from a string in Python? - python

At the moment I have this but it only works for integers (whole numbers) not doubles:
S = "Weight is 3.5 KG"
weight = [int(i) for i in S.split() if i.isdigit()]
print(weight)
result: []

You can use regular expression to extract the floating point number:
import re
S = "Weight is 3.5 KG"
pattern = re.compile(r'\-?\d+\.\d+')
weights = list(map(float, re.findall(pattern, S)))
print(weights)
re.findall() will return you the list of numbers found in the text.
The map function will convert the list results to floating point number. Since it returns a generator, you need to convert it to a list.

The following code will do the job for the example you placed:
if __name__ == "__main__":
S = "Weight is 3.5 KG"
# search for the dot (.)
t = S.find('.')
# check if the dot (.) exist in the string
# make sure the dot (.) is not the last character
# of the string
if t > 0 and t+1 != len(S):
# check the character before and after
# the dot (.)
t_before = S[t-1]
t_after = S[t+1]
# check if the charactef before and after the
# dot (.) is a digit
if t_before.isdigit() and t_after.isdigit():
# split the string
S_split = S.split()
for x in S_split:
if '.' in x:
print(float(x))
Output:
3.5

You can use re
import re
print(re.findall('\d+\.\d+',S)
#['3.5']
Using try-except
for i in S.split():
try:
new.append(float(i))
except Exception:
pass
print(new)
#['3.5']

Related

How can I get the specific character in string except the last one in Python?

I have several strings like below. How can I get the dot, except the decimal dot?
Please note: I need the digit after the decimal. I don't want to lose the decimal dot and the digit after that.
For example:
"8.2"
"88.2"
"888.2"
"8.888.2"
"8.888.888.2"
The output will be like this:
"8.2"
"88.2"
"888.2"
"8888.2"
"8888888.2"
Use str.rpartition. It will work correctly even when there isn't a . in the input, as evident below.
def fix_number(n):
a, sep, b = n.rpartition(".")
return a.replace(".", "") + sep + b
for case in ["8.2", "88.2", "888.2", "8.888.2", "8.888.888.2", "8"]:
print(case, fix_number(case))
8.2 8.2
88.2 88.2
888.2 888.2
8.888.2 8888.2
8.888.888.2 8888888.2
8 8
One way would be to split on the dots, then join together, treating the last one specially:
s = '8.888.888.2'
*whole, decimal = s.split('.')
res = ''.join(whole) + '.' + decimal # gives: 8888888.2
You can use a similar method if you want to replace the thousands and/or decimal separators with another character:
s = '8.888.888.2'
*whole, decimal = s.split('.')
res = "'".join(whole) + ',' + decimal # gives: 8'888'888,2
Another idea is to call str.replace() with the optional third argument count, which will replace only the first count occurrences of the character.
If we set count to be equal to the number of "." minus one we get the desired result:
words = ["8.2", "88.2", "888.2", "8.888.2", "8.888.888.2"]
new_words = []
for word in words:
new_word = word.replace('.', '', word.count('.')-1)
new_words.append(new_word)
print(new_words)
Output
['8.2', '88.2', '888.2', '8888.2', '8888888.2']
I have broken down the code into simpler lines for better understanding and readability.
Try this:
words = ["8", "8.2", "88.2", "888.2", "8.888.2", "8.888.888.2"]
changed_word = []
for word in words:
split_word = word.split(".")
if len(split_word) > 2:
before_decimal = "".join(split_word[0:len(split_word)-1])
after_decimal = split_word[-1]
final_word = before_decimal + "." + after_decimal
else:
final_word = word
changed_word.append(final_word)
print(changed_word)
The output is:
['8', '8.2', '88.2', '888.2', '8888.2', '8888888.2']
The next step: try to optimise this code in fewer lines.
You can use re here.
\.(?=.*\.)
Just replace with empty string.
See demo.
https://regex101.com/r/T9iX3B/1
import re
regex = r"\.(?=.*\.)"
test_str = ("\"8.2\"\n\n"
"\"88.2\"\n\n"
"\"888.2\"\n\n"
"\"8.888.2\"\n\n"
"\"8.888.888.2\" \n"
"8\n\n"
"the out put will be like this:\n\n"
" ")
subst = ""
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
x = "8.888.888.22"
y = x.split(".")
z = "".join(y[:-1]) + "." + y[-1]
print(z)
# '8888888.22'
a = ["8", "8.2", "88.2", "888.2", "8.888.2", "8.888.888.2"]
for x in a:
if '.' not in x:
print(x)
continue
ridx = x.rindex('.')
print(x[:ridx].replace('.', '') + x[ridx:])
You need to find the last occurrence of the dot(.). And then before that index, replace all the dots with an empty character, and add the remaining part of the string:
s = "8.888.888.2"
last = -1
for index, item in enumerate(s):
if item == ".":
last = max(index, last)
if last != -1:
s = s[:last].replace(".", "") + s[last:]
print(s)
There are two ways I can imagine to solve your problem.
The first one is by parsing the string to another variable
example = 8888888.2
example = str(example) # In case that your input was in another type format
example_1 = example[-1:]
The output in this case would be "2".
For the second way, you can simple split the string into a list, and then you get only what you want:
example = 8888888.2
example = str(example)
example_2 = example.split('.') # Inside the parenthesis you can put the element you want to split the string, like a space, a comma, a dot
example_output = example[1]
In this case, the output was still the same "2" and in both cases you maintain your base variable just in case that you want the original input.
A simpler function:
def remove_dot(word):
dot_cnt = word.count('.')
if dot_cnt <= 1:
return word
else:
word = word.replace(".", "", dot_cnt - 1)
return word
print(remove_dot("8.888.888.2"))
output : = 8888888.2
Yet another solution, without special functions
(inspired by the answer from #AKX).
def fix_number(n):
return int( n.replace(".","") ) / ( 10 ** ('.' in n) )
for case in ["8.2", "88.2", "888.2", "8.888.2", "8.888.888.2", "8", ""8.888.888"]:
print(case, fix_number(case))
The idea is that '.' in n returns 0 (False) if there are no dots, in which case we divide by 10^0 = 1. It returns 1 (True) if there are one or more dots, in which case we divide by 10.
8.2 8.2
88.2 88.2
888.2 888.2
8.888.2 8888.2
8.888.888.2 8888888.2
8 8.0
8.888.888 888888.8 <- not correct
As you can see, it returns a float even when there are no decimals. Not that the OP asked, but I think that's a nice feature :). However, it fails on the last test-case (which is not among the examples from the OP).
For this case we can replace the function with
def fix_number(n):
return int( n.replace(".","") ) / ( 10 ** ('.' == n[-2]) )
but this fails on 8.
Fixing that leads us to
def fix_number(n):
return int( n.replace(".","") ) / ( 10 ** (n.rfind('.') == max(0,len(n)-2)) )
which outputs
8.2 8.2
88.2 88.2
888.2 888.2
8.888.2 8888.2
8.888.888.2 8888888.2
8 8.0
8.888.888 8888888.0
But at this point is gets a bit ridiculous :). Also the answer by #AKX is about 3.5 times faster.

Extract substring from a python string

I want to extract the string before the 9 digit number below:
tmp = place1_128017000_gw_cl_mask.tif
The output should be place1
I could do this:
tmp.split('_')[0] but I also want the solution to work for:
tmp = place1_place2_128017000_gw_cl_mask.tif where the result would be:
place1_place2
You can assume that the number will also be 9 digits long
Using regular expressions and the lookahead feature of regex, this is a simple solution:
tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'.+(?=_\d{9}_)', tmp)
print(m.group())
Result:
place1_place2
Note that the \d{9} bit matches exactly 9 digits. And the bit of the regex that is in (?= ... ) is a lookahead, which means it is not part of the actual match, but it only matches if that follows the match.
Assuming we can phrase your problem as wanting the substring up to, but not including the underscore which is followed by all numbers, we can try:
tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'^([^_]+(?:_[^_]+)*)_\d+_', tmp)
print(m.group(1)) # place1_place2
Use a regular expression:
import re
places = (
"place1_128017000_gw_cl_mask.tif",
"place1_place2_128017000_gw_cl_mask.tif",
)
pattern = re.compile("(place\d+(?:_place\d+)*)_\d{9}")
for p in places:
matched = pattern.match(p)
if matched:
print(matched.group(1))
prints:
place1
place1_place2
The regex works like this (adjust as needed, e.g., for less than 9 digits or a variable number of digits):
( starts a capture
place\d+ matches "places plus 1 to many digits"
(?: starts a group, but does not capture it (no need to capture)
_place\d+ matches more "places"
) closes the group
* means zero or many times the previous group
) closes the capture
\d{9} matches 9 digits
The result is in the first (and only) capture group.
Here's a possible solution without regex (unoptimized!):
def extract(s):
result = ''
for x in s.split('_'):
try: x = int(x)
except: pass
if isinstance(x, int) and len(str(x)) == 9:
return result[:-1]
else:
result += x + '_'
tmp = 'place1_128017000_gw_cl_mask.tif'
tmp2 = 'place1_place2_128017000_gw_cl_mask.tif'
print(extract(tmp)) # place1
print(extract(tmp2)) # place1_place2

Python: Search a string for a number, decrement that number and replace in the string

If I have a string such as:
string = 'Output1[10].mystruct.MyArray[4].mybool'
what I want to do is search the string for the number in the array, decrement by 1 and then replace the found number with my decremented number.
What I have tried:
import string
import re
string = 'Output1[10].mystruct.MyArray[4].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
So, I can get a list of the numbers, convert to integers but I don't know how to use re.sub to search the string to replace, it should be considered that there might be multiple arrays. If anyone is expert enough to do that, help much appreciated.
Cheers
I don't undestand a thing... If there is more than 1 array, do you want to decrease the number in all arrays? or just in 1 of them?
If you want to decrease in all arrays, you can do this:
import re
string = 'Output1[10].mystruct.MyArray[4].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
num = [int(elem) for elem in num]
num.sort()
for elem in num:
aux = elem - 1
string = string.replace(str(elem), str(aux))
If you want to decrease just the first array, you can do this
import string
import re
string = 'Output1[10].mystruct.MyArray[4].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
new_num = int(num[0]) - 1
string = string.replace(num[0], str(new_num), 1)
Thanks to #João Castilho for his answer, based on this I changed it slightly to work exactly how I want:
import string
import re
string = 'Output1[2].mystruct.MyArray[2].mybool'
pattern = r'\[(\d+)\]'
num = re.findall(pattern, string)
num = [int(elem) for elem in set(num)]
num.sort()
for elem in num:
aux = elem - 1
string = string.replace('[%d]'% elem, '[%d]'% aux)
print(string)
This will now replace any number between brackets with the decremented value in all of the conditions that the numbers may occur.
Cheers
ice.

How to generate a list of numbers from a list of numeric and alphanumeric string ranges?

Given a list
n = ['4276-4279', 'I69-I71', 'V104-V112', '11528']
From the list above, I want to match the string with hyphens and increase the numeric or alphanumeric value to a given range. So far I could only match the value using re:
p = re.compile('([\d]|[A-Z\d]{1,})[\-]')
Expected output:
['4276', '4277', '4278', '4279', 'I69', 'I70', 'I71', 'V104', 'V105', 'V106', 'V107', 'V108', 'V109', 'V110', 'V111', 'V112', '11528']
You can process each element in your list, seeing if it matches the pattern
^([A-Z]*)(\d+)-\1(\d+)$
i.e. an optional letter, some digits, a hyphen (-), the letter repeated if it was present and finally some more digits.
If it does, you can generate a range from the 2nd and 3rd groups, and prepend the first group to each value generated from that range:
import re
lst = ['4276-4279', 'I69-I71', 'V104-V112', '11528']
new = []
for l in lst:
m = re.match(r'^([A-Z]*)(\d+)-\1(\d+)$', l)
if m:
new += [m.group(1) + str(i) for i in range(int(m.group(2)), int(m.group(3))+1)]
else:
new += [l]
print(new)
Output:
['4276', '4277', '4278', '4279', 'I69', 'I70', 'I71', 'V104', 'V105', 'V106', 'V107', 'V108', 'V109', 'V110', 'V111', 'V112', '11528']
import re
n = ['4276-4279', 'I69-I71', 'V104-V112', '11528']
big_list=[]
for item in n:
print(item)
if '-' in item:
part1,part2=item.split("-")
if part1.isnumeric() and part2.isnumeric():
big_list.extend([x for x in range(int(part1),int(part2))])
continue
if part1.isalnum() and part2.isalnum():
list1=re.findall(r"[^\W\d_]+|\d+", part1)
list2=re.findall(r"[^\W\d_]+|\d+", part2)
print(list1,list2)
temp_list=[]
for i in range(int(list1[1]),int(list2[1])):
temp_list.append(list1[0]+str(i))
big_list.extend(temp_list)
else:
if item.isnumeric():
big_list.append(int(item))
else:
big_list.extend(item)
print(big_list)
This code worked for me for your input.
Please try and tell me if it works.

clear and comprehensible way to calculate the string [12:3]

I new on python.
I have this string "[12:3]" and i what to calculate the difference between these two numbers.
Ex: 12 - 3 = 9
Of course I can do something (not very clear) like this:
num1 = []
num2 = []
s = '[12:3]'
dot = 0;
#find the ':' sign
for i in range(len(s)):
if s[i] == ':' :
dot = i
#left side
for i in range(dot):
num1.append(s[i])
#right side
for i in range(len(s) - dot-1):
num2.append(s[i+dot+1])
return str(int("".join(num1))-int("".join(num2))+1)
But i'm sure the is a more clear and comprehensible way.
Thanks!
You could use regex to pick the numbers out of your string:
import re
s = '[12:3]'
numbers = [int(x) for x in re.findall(r'\d+',s)]
return numbers[0]-numbers[1]
Or, without re
numbers = [int(x) for x in s.strip('[]').split(':')]
print numbers[0] - numbers[1]
prints
9
You should use regular expressions.
>>> import re
>>> match = re.match(r'\[(\d+):(\d+)\]', '[12:3]')
>>> match.groups()
('12', '3')
>>> a = int(match.groups()[0])
>>> b = int(match.groups()[1])
>>> a - b
9
The regular expression there says "match starting at the beginning of the string, find [, then any number of digits \d+ (and store them), then a :, then any number of digits \d+ (and store them), and finally ]". We then extract the stored digits using .groups() and do arithmetic on them.

Categories

Resources