python robustly convert any string with units to float - python

I would like to convert any string (a number with units) to a float. I have a list of values like
myList = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
And I would like to convert it to:
myList = [800.0, 0.1, 54.6, 100000000.0, 89.6, 1017.16]
And I would like to do this without using multiple replaces stacked together .replace("%","").replace(",","").replace(...)...
I feel like there is a really easy pythonic solution...

You could have used str.translate, but best way here is probably regex replacement since you can negate what you want to keep, i.e digits, dots, and minus signs.
import re
myList = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
newlist = [float(re.sub("[^0-9.\-]","",x)) for x in myList]
print(newlist)
result:
[800.0, 0.1, 54.6, 100000000.0, 89.6, 1017.16]
That converts every number to float. Could be refined to convert to int if no dot, for instance by chaining the comprehension with another one to discriminate candidates for integer conversion, like this:
newlist = [float(y) if "." in y else int(y) for y in (re.sub("[^0-9.\-]","",x) for x in myList)]
(doesn't take scientific notation into account, a "E" in y would have to be added if needed, and not filtered out by the regex.
Result is now:
[800, 0.1, 54.6, 100000000, 89.6, 1017.16]

You could replace the list by iterating through each item and only keeping values that are numeric with the isdigit() builtin function.
myList = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
new_list = []
for i in myList:
f = ''.join(x for x in i if x.isdigit() or x in ['.', '-'])
new_list.append(float(f))
or, for a single line expression:
new_list = [float(''.join([x for x in y if x.isdigit() or x in ['.', '-'])) for y in myList]
EDIT: missed including decimals and negatives. fixed. not sure about supporting notation such as 1.2e384
EDIT2: in general, this whole situation is really unsafe practice and i wouldn't recommend it.

You can use the regex module in python
import re
list = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
subbed_list = [float(re.sub('[^0-9.\-]','',i)) for i in list]

Related

how to filter a value from list in python?

I have list of values , need to filter out values , that doesn't follow a naming convention.
like below list : list = ['a1-23','b1-24','c1-25','c1-x-25']
need to filter : all values that starts with 'c1-' , except 'c1-x-' )
output expected: ['a1-23','b1-24','c1-x-25']
list = ['a1-23','b1-24','c1-25','c1-x-25']
[x for x in list if not x.startswith('c1-')]
['a1-23', 'b1-24']
You have the right idea, but you're missing the handing of values that start with c1-x-:
[x for x in list if not x.startswith('c1-') or x.startswith('c1-x-')]
import re
list1 = ['a1-23','b1-24','c1-25','c1-x-25',"c1-22"]
r = re.compile(r"\bc1-\b\d{2}$") # this regex matches anything with `c1-{2 digits}` exactly
[x for x in list1 if x not in list(filter(r.match,list1))]
# output
['a1-23', 'b1-24', 'c1-x-25']
So what my pattern does is match EXACTLY a word that starts with c1- and ends with two digits only.
Therefore, list(filter(r.match,list1)) will give us all the c1-## and then we do a list comprehension to filter out from list1 all the x's that aren't in the new provided list containing the matches.
x for x in [1,2,3] if x not in [1,2]
#output
[3]

pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],
Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)
You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]
The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}
Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).

How do I split a Very huge string And string without spaces?(python3)

Firstly I have this long string
s = '1MichaelAngelo'
How can I get the output as
new_s = '1 Michael Angelo'
and as a list
new_list = [1,'Michael', 'Angelo']
Note: I have like a thousand I parsed from an html.
Secondly, I have this huge string (consists of names and numbers up to 1000]). E.g
1\nfirstName\nlastName\n.......999\nfirstName\nlastName
where \n denotes a newline.
How can I extract data from it to output something like:
[1, 'Michael', 'Emily], [2,'Mathew','Jessica'], [3, 'Jacob', 'Ashley ']
and so on.
Two questions, two answers. Next time please ask one question at a time.
import re
s = '1MichaelAngelo'
[int(x) for x in re.findall(r'\d+',s)] + re.findall('[A-Z][^A-Z]*',s)
>>> [1, 'Michael', 'Angelo']
or, alternatively,
import re
s = '1MichaelAngelo'
[int(x) if re.match(r'\d+',x) else x for x in re.findall(r'\d+|[A-Z][^A-Z]*',s)]
where re.findall splits the longer string on the required boundaries;
and
import re
s = '1\nfirstName\nlastName\n999\nfirstName2\nlastName2'
[[int(x) if re.match(r'\d+',x) else x for x in s.split('\n')[i:i+3]] for i in range(0,len(s.split('\n')),3)]
>>> [[1, 'firstName', 'lastName'], [999, 'firstName2', 'lastName2']]
where the list comprehension first splits the entire string in threes (using the trick shown in https://stackoverflow.com/a/15890829/2564301), then scans the newly formed list for integers and convert only these.

Extraction of 2 or more digit number into a list from string like 12+13

I'm trying to extract numbers from a string like "12+13".
When I extract only the numbers from it into a list it becomes [1,2,1,3]
actually I want the list to take the numbers as [12,13] and 12,13 should be integers also.
I have tried my level best to solve this,the following is the code
but it still has a disadvantage .
I am forced to put a space at the end of the string...for it's correct functioning.
My Code
def extract(string1):
l=len(string1)
pos=0
num=[]
continuity=0
for i in range(l):
if string[i].isdigit()==True:
continuity+=1
else:
num=num+ [int(string[pos:continuity])]
continuity+=1
pos=continuity
return num
string="1+134-15 "#added a spaces at the end of the string
num1=[]
num1=extract(string)
print num1
This will work perfectly with your situation (and with all operators, not just +):
>>> import re
>>> equation = "12+13"
>>> tmp = re.findall('\\b\\d+\\b', equation)
>>> [int(i) for i in tmp]
[12, 13]
But if you format your string to be with spaces between operators (which I think is the correct way to go, and still supports all operators, with a space) then you can do this without even using regex like this:
>>> equation = "12 + 13"
>>> [int(s) for s in equation.split() if s.isdigit()]
[12, 13]
Side note: If your only operator is the + one, you can avoid regex by doing:
>>> equation = "12+13"
>>> [int(s) for s in equation.split("+") if s.isdigit()]
[12, 13]
The other answer is great (as of now), but I want to provide you with a detailed explanation. What you are trying to do is split the string on the "+" symbol. In python, this can be done with str.split("+").
When that translates into your code, it turns out like this.
ourStr = "12+13"
ourStr = ourStr.split("+")
But, don't you want to convert those to integers? In python, we can use list comprehension with int() to achieve this result.
To convert the entire array to ints, we can use. This pretty much loops over each index, and converts the string to an integer.
str = [int(s) for s in ourStr]
Combining this together, we get
ourStr = "12+13"
ourStr = ourStr.split("+")
ourStr = [int(s) for s in ourStr]
But lets say their might be other unknown symbols in the array. Like #Idos used, it is probably a good idea to check to make sure it is a number before putting it in the array.
We can further refine the code to:
ourStr = "12+13"
ourStr = ourStr.split("+")
ourStr = [int(s) for s in ourStr if s.isdigit()]
This can be solved with just list comprehension or built-in methods, no need for regex:
s = '12+13+14+15+16'
l = [int(x) for x in s.split('+')]
l = map(int, s.split('+'))
l = list(map(int, s.split('+'))) #If Python3
[12, 13, 14, 15, 16]
If you are not sure whether there are any non-digit strings, then just add condition to the list comprehension:
l = [int(x) for x in s.split('+') if x.isdigit()]
l = map(lambda s:int(s) if s.isdigit() else None, s.split('+'))
l = list(map(lambda s:int(s) if s.isdigit() else None, s.split('+'))) #If python3
Now consider a case where you could have something like:
s = '12 + 13 + 14+15+16'
l = [int(x.strip()) for x in s.split('+') if x.strip().isdigit()]#had to strip x for any whitespace
l = (map(lambda s:int(s.strip()) if s.strip().isdigit() else None, s.split('+'))
l = list(map(lambda s:int(s.strip()) if s.strip().isdigit() else None, s.split('+'))) #Python3
[12, 13, 14, 15, 16]
Or:
l = [int(x) for x in map(str.strip,s.split('+')) if x.isdigit()]
l = map(lambda y:int(y) if y.isdigit() else None, map(str.strip,s.split('+')))
l = list(map(lambda y:int(y) if y.isdigit() else None, map(str.strip,s.split('+')))) #Python3
You can just use Regular Expressions, and this becomes very easy:
>>> s = "12+13"
>>> import re
>>> re.findall(r'\d+',s)
['12', '13']
basically, \d matches any digit and + means 1 or more. So re.findall(r'\d+',s) is looking for any part of the string that is 1 or more digits in a row and returns each instance it finds!
in order to turn them to integers, as many people have said, you can just use a list comprehension after you get the result:
result = ['12', '13']
int_list = [int(x) for x in result]
python regex documentation
I have made a function which extracts number from a string.
def extract(string1):
string1=string1+" "
#added a spaces at the end of the string so that last number is also extracted
l=len(string1)
pos=0
num=[]
continuity=0
for i in range(l):
if string1[i].isdigit()==True:
continuity+=1
else:
if pos!=continuity:
''' This condition prevents consecutive execution
of else part'''
num=num+ [int(string1[pos:continuity])]
continuity+=1
pos=continuity
return num
string="ab73t9+-*/182"
num1=[]
num1=extract(string)
print num1

python split a vector format string

I have a string input in the following format: (x,y) where x and y are doubles.
For example : (1,2.556) can be a vector.
I want the easiest way to split it into the x,y values, 1 and 2.556 in this case.
What would you suggest?
You could use code like this:
import ast
text = '(1,2.556)'
vector = ast.literal_eval(text)
print(vector)
The literal_eval function does not have a security risks associated with eval and works just as well in this particular case.
The eval answers are good. But if you are sure of the format of your strings -- always start and end with parentheses, no spaces in the string, etc., then you can do this fairly efficiently:
x, y = (float(num) for num in s[1:-1].split(','))
eval works:
>>> s = "(1.2,3.40)"
>>> eval(s)
(1.2, 3.4)
>>> x,y = eval(s)
>>> x
1.2
>>> y
3.4
eval has potential security risks, but if you trust that you are dealing with strings of that form then this is adequate.
Remove the first and last (, ) and then do splitting according to the comma.
re.sub(r'^\(|\)$', '',string).split(',')
OR
>>> s = "(1,2.556)"
>>> x = [i for i in re.split(r'[,()]', s) if i]
>>> x[0]
'1'
>>> x[1]
'2.556'
If you're sure they'll be passed in exactly this way, try this:
>>> s = '(1,2.556)'
>>> [float(i) for i in s[1:-1].split(',')]
[1.0, 2.556]

Categories

Resources