python split a vector format string

python split a vector format string - python

I have a string input in the following format: (x,y) where x and y are doubles.
For example : (1,2.556) can be a vector.
I want the easiest way to split it into the x,y values, 1 and 2.556 in this case.
What would you suggest?

You could use code like this:
import ast
text = '(1,2.556)'
vector = ast.literal_eval(text)
print(vector)
The literal_eval function does not have a security risks associated with eval and works just as well in this particular case.

The eval answers are good. But if you are sure of the format of your strings -- always start and end with parentheses, no spaces in the string, etc., then you can do this fairly efficiently:
x, y = (float(num) for num in s[1:-1].split(','))

eval works:
>>> s = "(1.2,3.40)"
>>> eval(s)
(1.2, 3.4)
>>> x,y = eval(s)
>>> x
1.2
>>> y
3.4
eval has potential security risks, but if you trust that you are dealing with strings of that form then this is adequate.

Remove the first and last (, ) and then do splitting according to the comma.
re.sub(r'^\(|\)$', '',string).split(',')
OR
>>> s = "(1,2.556)"
>>> x = [i for i in re.split(r'[,()]', s) if i]
>>> x[0]
'1'
>>> x[1]
'2.556'

If you're sure they'll be passed in exactly this way, try this:
>>> s = '(1,2.556)'
>>> [float(i) for i in s[1:-1].split(',')]
[1.0, 2.556]

Related

python robustly convert any string with units to float

I would like to convert any string (a number with units) to a float. I have a list of values like
myList = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
And I would like to convert it to:
myList = [800.0, 0.1, 54.6, 100000000.0, 89.6, 1017.16]
And I would like to do this without using multiple replaces stacked together .replace("%","").replace(",","").replace(...)...
I feel like there is a really easy pythonic solution...

You could have used str.translate, but best way here is probably regex replacement since you can negate what you want to keep, i.e digits, dots, and minus signs.
import re
myList = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
newlist = [float(re.sub("[^0-9.\-]","",x)) for x in myList]
print(newlist)
result:
[800.0, 0.1, 54.6, 100000000.0, 89.6, 1017.16]
That converts every number to float. Could be refined to convert to int if no dot, for instance by chaining the comprehension with another one to discriminate candidates for integer conversion, like this:
newlist = [float(y) if "." in y else int(y) for y in (re.sub("[^0-9.\-]","",x) for x in myList)]
(doesn't take scientific notation into account, a "E" in y would have to be added if needed, and not filtered out by the regex.
Result is now:
[800, 0.1, 54.6, 100000000, 89.6, 1017.16]

You could replace the list by iterating through each item and only keeping values that are numeric with the isdigit() builtin function.
myList = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
new_list = []
for i in myList:
f = ''.join(x for x in i if x.isdigit() or x in ['.', '-'])
new_list.append(float(f))
or, for a single line expression:
new_list = [float(''.join([x for x in y if x.isdigit() or x in ['.', '-'])) for y in myList]
EDIT: missed including decimals and negatives. fixed. not sure about supporting notation such as 1.2e384
EDIT2: in general, this whole situation is really unsafe practice and i wouldn't recommend it.

You can use the regex module in python
import re
list = ["$800", "0.1mm", "54.6%", "100,000,000", "89.6", "1,017.16%"]
subbed_list = [float(re.sub('[^0-9.\-]','',i)) for i in list]

Extraction of 2 or more digit number into a list from string like 12+13

I'm trying to extract numbers from a string like "12+13".
When I extract only the numbers from it into a list it becomes [1,2,1,3]
actually I want the list to take the numbers as [12,13] and 12,13 should be integers also.
I have tried my level best to solve this,the following is the code
but it still has a disadvantage .
I am forced to put a space at the end of the string...for it's correct functioning.
My Code
def extract(string1):
l=len(string1)
pos=0
num=[]
continuity=0
for i in range(l):
if string[i].isdigit()==True:
continuity+=1
else:
num=num+ [int(string[pos:continuity])]
continuity+=1
pos=continuity
return num
string="1+134-15 "#added a spaces at the end of the string
num1=[]
num1=extract(string)
print num1

This will work perfectly with your situation (and with all operators, not just +):
>>> import re
>>> equation = "12+13"
>>> tmp = re.findall('\\b\\d+\\b', equation)
>>> [int(i) for i in tmp]
[12, 13]
But if you format your string to be with spaces between operators (which I think is the correct way to go, and still supports all operators, with a space) then you can do this without even using regex like this:
>>> equation = "12 + 13"
>>> [int(s) for s in equation.split() if s.isdigit()]
[12, 13]
Side note: If your only operator is the + one, you can avoid regex by doing:
>>> equation = "12+13"
>>> [int(s) for s in equation.split("+") if s.isdigit()]
[12, 13]

The other answer is great (as of now), but I want to provide you with a detailed explanation. What you are trying to do is split the string on the "+" symbol. In python, this can be done with str.split("+").
When that translates into your code, it turns out like this.
ourStr = "12+13"
ourStr = ourStr.split("+")
But, don't you want to convert those to integers? In python, we can use list comprehension with int() to achieve this result.
To convert the entire array to ints, we can use. This pretty much loops over each index, and converts the string to an integer.
str = [int(s) for s in ourStr]
Combining this together, we get
ourStr = "12+13"
ourStr = ourStr.split("+")
ourStr = [int(s) for s in ourStr]
But lets say their might be other unknown symbols in the array. Like #Idos used, it is probably a good idea to check to make sure it is a number before putting it in the array.
We can further refine the code to:
ourStr = "12+13"
ourStr = ourStr.split("+")
ourStr = [int(s) for s in ourStr if s.isdigit()]

This can be solved with just list comprehension or built-in methods, no need for regex:
s = '12+13+14+15+16'
l = [int(x) for x in s.split('+')]
l = map(int, s.split('+'))
l = list(map(int, s.split('+'))) #If Python3
[12, 13, 14, 15, 16]
If you are not sure whether there are any non-digit strings, then just add condition to the list comprehension:
l = [int(x) for x in s.split('+') if x.isdigit()]
l = map(lambda s:int(s) if s.isdigit() else None, s.split('+'))
l = list(map(lambda s:int(s) if s.isdigit() else None, s.split('+'))) #If python3
Now consider a case where you could have something like:
s = '12 + 13 + 14+15+16'
l = [int(x.strip()) for x in s.split('+') if x.strip().isdigit()]#had to strip x for any whitespace
l = (map(lambda s:int(s.strip()) if s.strip().isdigit() else None, s.split('+'))
l = list(map(lambda s:int(s.strip()) if s.strip().isdigit() else None, s.split('+'))) #Python3
[12, 13, 14, 15, 16]
Or:
l = [int(x) for x in map(str.strip,s.split('+')) if x.isdigit()]
l = map(lambda y:int(y) if y.isdigit() else None, map(str.strip,s.split('+')))
l = list(map(lambda y:int(y) if y.isdigit() else None, map(str.strip,s.split('+')))) #Python3

You can just use Regular Expressions, and this becomes very easy:
>>> s = "12+13"
>>> import re
>>> re.findall(r'\d+',s)
['12', '13']
basically, \d matches any digit and + means 1 or more. So re.findall(r'\d+',s) is looking for any part of the string that is 1 or more digits in a row and returns each instance it finds!
in order to turn them to integers, as many people have said, you can just use a list comprehension after you get the result:
result = ['12', '13']
int_list = [int(x) for x in result]
python regex documentation

I have made a function which extracts number from a string.
def extract(string1):
string1=string1+" "
#added a spaces at the end of the string so that last number is also extracted
l=len(string1)
pos=0
num=[]
continuity=0
for i in range(l):
if string1[i].isdigit()==True:
continuity+=1
else:
if pos!=continuity:
''' This condition prevents consecutive execution
of else part'''
num=num+ [int(string1[pos:continuity])]
continuity+=1
pos=continuity
return num
string="ab73t9+-*/182"
num1=[]
num1=extract(string)
print num1

Python Regular expression repeat

I have a string like this
--x123-09827--x456-9908872--x789-267504
I am trying to get all value like
123:09827
456:9908872
789:267504
I've tried (--x([0-9]+)-([0-9])+)+
but it only gives me last pair result, I am testing it through python
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "(--x([0-9]+)-([0-9]+))+"
>>> re.match(p,x)
>>> re.match(p,x).groups()
('--x789-267504', '789', '267504')
How should I write with nested repeat pattern?
Thanks a lot!
David

Code it like this:
x = "--x123-09827--x456-9908872--x789-267504"
p = "--x(?:[0-9]+)-(?:[0-9]+)"
print re.findall(p,x)

Just use the .findall method instead, it makes the expression simpler.
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> r = re.compile(r"--x(\d+)-(\d+)")
>>> r.findall(x)
[('123', '09827'), ('456', '9908872'), ('789', '267504')]
You can also use .finditer which might be helpful for longer strings.
>>> [m.groups() for m in r.finditer(x)]
[('123', '09827'), ('456', '9908872'), ('789', '267504')]

Use re.finditer or re.findall. Then you don't need the extra pair of parentheses that wrap the entire expression. For example,
>>> import re
>>> x = "--x123-09827--x456-9908872--x789-267504"
>>> p = "--x([0-9]+)-([0-9]+)"
>>> for m in re.finditer(p,x):
>>> print '{0} {1}'.format(m.group(1),m.group(2))

try this
p='--x([0-9]+)-([0-9]+)'
re.findall(p,x)

No need to use regex :
>>> "--x123-09827--x456-9908872--x789-267504".replace('--x',' ').replace('-',':').strip()
'123:09827 456:9908872 789:267504'

You don't need regular expressions for this. Here is a simple one-liner, non-regex solution:
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> [ x.replace("-", ":") for x in input.split("--x")[1:] ]
['123:09827', '456:9908872', '789:267504']
If this is an exercise on regex, here is a solution that uses the repetition (technically), though the findall(...) solution may be preferred:
>>> import re
>>> input = "--x123-09827--x456-9908872--x789-267504"
>>> regex = '--x(.+)'
>>> [ x.replace("-", ":") for x in re.match(regex*3, input).groups() ]
['123:09827', '456:9908872', '789:267504']

Python - Make sure string is converted to correct Float

I have possible strings of prices like:
20.99, 20, 20.12
Sometimes the string could be sent to me wrongly by the user to something like this:
20.99.0, 20.0.0
These should be converted back to :
20.99, 20
So basically removing anything from the 2nd . if there is one.
Just to be clear, they would be alone, one at a time, so just one price in one string
Any nice one liner ideas?

For a one-liner, you can use .split() and .join():
>>> '.'.join('20.99.0'.split('.')[:2])
'20.99'
>>> '.'.join('20.99.1231.23'.split('.')[:2])
'20.99'
>>> '.'.join('20.99'.split('.')[:2])
'20.99'
>>> '.'.join('20'.split('.')[:2])
'20'

You could do something like this
>>> s = '20.99.0, 20.0.0'
>>> s.split(',')
['20.99.0', ' 20.0.0']
>>> map(lambda x: x[:x.find('.',x.find('.')+1)], s.split(','))
['20.99', ' 20.0']
Look at the inner expression of find. I am finding the first '.' and incrementing by 1 and then find the next '.' and leaving everything from that in the string slice operation.

Edit: Note that this solution will not discard everything from the second decimal point, but discard only the second point and keep additional digits. If you want to discard all digits, you could use e.g. #Blender's solution
It only qualifies as a one-liner if two instructions per line with a ; count, but here's what I came up with:
>>> x = "20.99.1234"
>>> s = x.split("."); x = s[0] + "." + "".join(s[1:])
>>> x
20.991234
It should be a little faster than scanning through the string multiple times, though. For a performance cost, you can do this:
>>> x = x.split(".")[0] + "." + "".join(x.split(".")[1:])
For a whole list:
>>> def numify(x):
>>> s = x.split(".")
>>> return float( s[0] + "." + "".join(s[1:]))
>>> x = ["123.4.56", "12.34", "12345.6.7.8.9"]
>>> [ numify(f) for f in x ]
[123.456, 12.34, 12345.6789]

>>> s = '20.99, 20, 20.99.23'
>>> ','.join(x if x.count('.') in [1,0] else x[:x.rfind('.')] for x in s.split(','))
'20.99, 20, 20.99'

If you are looking for a regex based solution and your intended behaviour is to discard everthing after the second .(decimal) than
>>> st = "20.99.123"
>>> string_decimal = re.findall(r'\d+\.\d+',st)
>>> float(''.join(string_decimal))
20.99

How to get integer values from a string in Python?

Suppose I had a string
string1 = "498results should get"
Now I need to get only integer values from the string like 498. Here I don't want to use list slicing because the integer values may increase like these examples:
string2 = "49867results should get"
string3 = "497543results should get"
So I want to get only integer values out from the string exactly in the same order. I mean like 498,49867,497543 from string1,string2,string3 respectively.
Can anyone let me know how to do this in a one or two lines?

>>> import re
>>> string1 = "498results should get"
>>> int(re.search(r'\d+', string1).group())
498
If there are multiple integers in the string:
>>> map(int, re.findall(r'\d+', string1))
[498]

An answer taken from ChristopheD here: https://stackoverflow.com/a/2500023/1225603
r = "456results string789"
s = ''.join(x for x in r if x.isdigit())
print int(s)
456789

Here's your one-liner, without using any regular expressions, which can get expensive at times:
>>> ''.join(filter(str.isdigit, "1234GAgade5312djdl0"))
returns:
'123453120'

if you have multiple sets of numbers then this is another option
>>> import re
>>> print(re.findall('\d+', 'xyz123abc456def789'))
['123', '456', '789']
its no good for floating point number strings though.

Iterator version
>>> import re
>>> string1 = "498results should get"
>>> [int(x.group()) for x in re.finditer(r'\d+', string1)]
[498]

>>> import itertools
>>> int(''.join(itertools.takewhile(lambda s: s.isdigit(), string1)))

With python 3.6, these two lines return a list (may be empty)
>>[int(x) for x in re.findall('\d+', your_string)]
Similar to
>>list(map(int, re.findall('\d+', your_string))

this approach uses list comprehension, just pass the string as argument to the function and it will return a list of integers in that string.
def getIntegers(string):
numbers = [int(x) for x in string.split() if x.isnumeric()]
return numbers
Like this
print(getIntegers('this text contains some numbers like 3 5 and 7'))
Output
[3, 5, 7]

def function(string):
final = ''
for i in string:
try:
final += str(int(i))
except ValueError:
return int(final)
print(function("4983results should get"))

Another option is to remove the trailing the letters using rstrip and string.ascii_lowercase (to get the letters):
import string
out = [int(s.replace(' ','').rstrip(string.ascii_lowercase)) for s in strings]
Output:
[498, 49867, 497543]

integerstring=""
string1 = "498results should get"
for i in string1:
if i.isdigit()==True
integerstring=integerstring+i
print(integerstring)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python split a vector format string - python

I have a string input in the following format: (x,y) where x and y are doubles. For example : (1,2.556) can be a vector. I want the easiest way to split it into the x,y values, 1 and 2.556 in this case. What would you suggest?

You could use code like this: import ast text = '(1,2.556)' vector = ast.literal_eval(text) print(vector) The literal_eval function does not have a security risks associated with eval and works just as well in this particular case.

The eval answers are good. But if you are sure of the format of your strings -- always start and end with parentheses, no spaces in the string, etc., then you can do this fairly efficiently: x, y = (float(num) for num in s[1:-1].split(','))

eval works: >>> s = "(1.2,3.40)" >>> eval(s) (1.2, 3.4) >>> x,y = eval(s) >>> x 1.2 >>> y 3.4 eval has potential security risks, but if you trust that you are dealing with strings of that form then this is adequate.

Remove the first and last (, ) and then do splitting according to the comma. re.sub(r'^\(|\)$', '',string).split(',') OR >>> s = "(1,2.556)" >>> x = [i for i in re.split(r'[,()]', s) if i] >>> x[0] '1' >>> x[1] '2.556'

If you're sure they'll be passed in exactly this way, try this: >>> s = '(1,2.556)' >>> [float(i) for i in s[1:-1].split(',')] [1.0, 2.556]

Related

python robustly convert any string with units to float

Extraction of 2 or more digit number into a list from string like 12+13

Python Regular expression repeat

Python - Make sure string is converted to correct Float

How to get integer values from a string in Python?

Categories

Resources