error in splitting a string using "re.findall"

error in splitting a string using "re.findall" - python

i need to split a string into three values (x,y,z) the string is something like this (48,25,19)
i used "re.findall" and it works fine but sometimes it produces this error
(plane_X, plane_Y, plane_Z = re.findall("\d+.\d+", planepos)
ValueError: not enough values to unpack (expected 3, got 0))
this is the code:
def read_data():
# reading from file
file = open("D:/Cs/Grad/Tests/airplane test/Reading/Positions/PlanePos.txt", "r")
planepos = file.readline()
file.close()
file = open("D:/Cs/Grad/Tests/airplane test/Reading/Positions/AirportPosition.txt", "r")
airportpos = file.readline()
file.close()
# ==================================================================
# spliting and getting numbers
plane_X, plane_Y, plane_Z = re.findall("\d+\.\d+", planepos)
airport_X, airport_Y, airport_Z = re.findall("\d+\.\d+", airportpos)
return plane_X,plane_Y,plane_Z,airport_X,airport_Y,airport_Z
what i need is to split the string (48,25,19) to x=48,y=25,z=19
so if someone know a better way to do this or how to solve this error will be appreciated.

Your regex only works for numbers with a decimal point and not for integers, hence the error. You can instead strip the string of parentheses and white spaces, then split the string by commas, and map the resulting sequence of strings to the float constructor:
x, y, z = map(float, planepos.strip('() \n').split(','))

You can use ast.literal_eval which safely evaluates your string:
import ast
s = '(48,25,19)'
x, y, z = ast.literal_eval(s)
# x => 48
# y => 25
# z => 19

If your numbers are integers, you can use the regex:
re.findall(r"\d+","(48,25,19)")
['48', '25', '19']
If there are mixed numbers:
re.findall(r"\d+(?:\.\d+)?","(48.2,25,19.1)")
['48.2', '25', '19.1']

Related

Convert string to multidimensional array in Python

I'm having a problem managing some data that are saved in a really awful format.
I have data for points that correspond to the edges of a polygon. The data for each polygon is separated by the string >, while the x and y values for the points are separated with non-unified criteria, sometimes with a number of spaces, sometimes with some spaces and a tabulation. I've tried to load such data to an array of arrays with the following code:
f = open('/Path/Data.lb','r')
data = f.read()
splat = data.split('>')
region = []
for number, polygon in enumerate(splat[1:len(splat)], 1):
region.append(float(polygon))
But I keep getting an error trying to run the float() function (I've cut it as it's much longer):
ValueError: could not convert string to float: '\n -73.311 -48.328\n -73.311 -48.326\n -73.318 -48.321\n ...
... -73.324\t -48.353\n -73.315\t -48.344\n -73.313\t -48.337\n'
Is there a way to convert the data to float without modifying the source file? If not, is there a way to easily modify the source file so that all columns are separated the same way? I guess that way the same code should run smoothly.
Thanks!

Try:
with open("PataIce.lb", "r") as file:
polygons = file.read().strip(">").strip().split(">")
region =[]
for polygon in polygons:
sides = polygon.strip().split("\n")
points = [[float(num) for num in side.split()[:2]] for side in sides]
region.append(points)
Some of the points contain more than 2 values and I've restricted the script to only read the first two numbers in these cases.

You can use regex to match decimal numbers.
import re
PATH = <path_to_file>
coords = []
with open(PATH) as f:
for line in f:
nums = re.findall('-?\d+\.\d+', line)
if len(nums) >0:
coords.append(nums)
print(coords)
Note: this solution ignores the trailing 0 at the end of some lines.
Be aware that the results in coords are still strings. You'll need to convert them to float using float().

In [79]: astr = '\n -73.311 -48.328\n -73.311 -48.326\n -73.318 -48.321\n -73.324\
...: t -48.353\n -73.315\t -48.344\n -73.313\t -48.337\n'
In [80]: lines =astr.splitlines()
In [81]: lines
Out[81]:
['',
' -73.311 -48.328',
' -73.311 -48.326',
' -73.318 -48.321',
' -73.324\t -48.353',
' -73.315\t -48.344',
' -73.313\t -48.337']
splitlines deals with the \n separator; split() handles the tab and spaces.
In [82]: [line.split() for line in lines]
Out[82]:
[[],
['-73.311', '-48.328'],
['-73.311', '-48.326'],
['-73.318', '-48.321'],
['-73.324', '-48.353'],
['-73.315', '-48.344'],
['-73.313', '-48.337']]
The initial [] needs to be removed one way or other:
In [84]: np.array(Out[82][1:], dtype=float)
Out[84]:
array([[-73.311, -48.328],
[-73.311, -48.326],
[-73.318, -48.321],
[-73.324, -48.353],
[-73.315, -48.344],
[-73.313, -48.337]])
This works only if each line has the same number of elements, where 2. As long as the lists of strings in Out[82] is clean enough you can let np.array do the conversion from string to float.
Your actually file may require some further handling, but this should give you an idea of the basics.

How to plot from .txt file

I have a .txt file and would like to plot the data.
Here my code
import matplotlib.pyplot as plt
with open('/home/tont_fe/Desktop/Extra_for_paper/Lifbase OH emission2.txt') as f:
lines = f.readlines()
x = [(line.split()[0]) for line in lines]
item = [float(item) for item in x]
y = [(line.split()[1]) for line in lines]
I receive the following error:
ValueError: invalid literal for float(): 281,2
example of data:
281,228 0,01097
281,2289 0,0096
281,2297 0,00888
281,2306 0,00883
281,2315 0,00932
281,2324 0,01008
281,2333 0,01062
281,2341 0,01058
281,235 0,01013
281,2359 0,00981
281,2367 0,01013
281,2376 0,01141
281,2385 0,01377

ValueError: invalid literal for float(): 281,2
Float is excepting . but got , which caused ValueError. You might either replace , using . before feeding it into float or harness locale built-in module following way:
import locale
locale.setlocale(locale.LC_NUMERIC, 'de_DE') # I use Deutschland but can be any country using , in floats
value_str = "281,2"
value_float = locale.atof(value_str)
print(value_float)
output
281.2

You are trying to convert a string containing , to float. That's why the error!
Remove the , and then convert to float.
with open('/home/tont_fe/Desktop/Extra_for_paper/Lifbase OH emission2.txt') as f:
lines = f.readlines()
x = [(line.split()[0]) for line in lines]
# This line will remove ',' from each string of x
x = [i.replace(',', '') for i in x]
item = [float(item) for item in x]
y = [(line.split()[1]) for line in lines]

Python Print Hex variable

I have hex variable that I want to print as hex
data = '\x99\x02'
print (data)
Result is: ™
I want to the python to print 0x9902
Thank you for your help

Please check this one.
data = r'\x99\x02'
a, b = [ x for x in data.split(r'\x') if x]
d = int(a+b, base=16)
print('%#x'%d)

You have to convert every char to its number - ord(char) - and convert every number to hex value - '{:02x}'.format() - and concatenate these values to string. And add string '0x'.
data = '\x99\x02'
print('0x' + ''.join('{:02x}'.format(ord(char)) for char in data))
EDIT: The same but first string is converted to bytes using encode('raw_unicode_escape')
data = '\x99\x02'
print('0x' + ''.join('{:02x}'.format(code) for code in data.encode('raw_unicode_escape')))
and if you have already bytes then you don't have to encode()
data = b'\x99\x02'
print('0x' + ''.join('{:02x}'.format(code) for code in data))
BTW: Similar way you can convert to binary using {:08b}
data = '\x99\x02'
print(''.join('{:08b}'.format(code) for code in data.encode('raw_unicode_escape')))

json.loads with a unicode string

I've got a large json formatted string that I'm trying to convert into a python dictionary, but all of the keys and values are in unicode so they have a leading u in the string. When attempting to use json.loads() it complains that ValueError: Expecting property name: line 1 column 2 (char 1) because of the u.
I have:
x = "{u'abc': [{u'xyz': u'XYZ'}, {u'lmno': u'LMNO'}], u'def': u'DEF'}"
json.loads(x) --> ValueError
I want:
x = "{u'abc': [{u'xyz': u'XYZ'}, {u'lmno': u'LMNO'}], u'def': u'DEF'}"
z = x.strip_unicode()
r = json.loads(z)
# r = {'abc': [{'xyz':'XYZ'}, {'lmno': 'LMNO'}], 'def': 'DEF'}
So is there something like strip_unicode or maybe a different function from json where it can handle the leading u?

python Incorrect formatting Cyrillic

def inp(text):
tmp = str()
arr = ['.' for x in range(1, 40 - len(text))]
tmp += text + ''.join(arr)
print tmp
s=['tester', 'om', 'sup', 'jope']
sr=['тестер', 'ом', 'суп', 'жопа']
for i in s:
inp(i)
for i in sr:
inp(i)
Output:
tester.................................
om.....................................
sup....................................
jope...................................
тестер...........................
ом...................................
суп.................................
жопа...............................
Why is Python not properly handling Cyrillic? End of the line is not straight and scrappy. Using the formatting goes the same. How can this be corrected? thanks

Read this:
http://docs.python.org/2/howto/unicode.html
Basically, what you have in text parameter to inp function is a string. In Python 2.7, strings are bytes by default. Cyrilic characters are not mapped 1-1 to bytes when encoded in e.g. utf-8 encoding, but require more than one byte (usually 2 in utf-8), so when you do len(text) you don't get the number of characters, but number of bytes.
In order to get the number of characters, you need to know your encoding. Assuming it's utf-8, you can decode text to that encoding and it will print right:
#!/usr/bin/python
# coding=utf-8
def inp(text):
tmp = str()
utext = text.decode('utf-8')
l = len(utext)
arr = ['.' for x in range(1, 40 - l)]
tmp += text + ''.join(arr)
print tmp
s=['tester', 'om', 'sup', 'jope']
sr=['тестер', 'ом', 'суп', 'жопа']
for i in s:
inp(i)
for i in sr:
inp(i)
The important lines are these two:
utext = text.decode('utf-8')
l = len(utext)
where you first decode the text, which results in an unicode string. After that, you can use the built in len to get the length in characters, which is what you want.
Hope this helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

error in splitting a string using "re.findall" - python

You can use ast.literal_eval which safely evaluates your string: import ast s = '(48,25,19)' x, y, z = ast.literal_eval(s) # x => 48 # y => 25 # z => 19

If your numbers are integers, you can use the regex: re.findall(r"\d+","(48,25,19)") ['48', '25', '19'] If there are mixed numbers: re.findall(r"\d+(?:\.\d+)?","(48.2,25,19.1)") ['48.2', '25', '19.1']

Related

Convert string to multidimensional array in Python

How to plot from .txt file

Python Print Hex variable

json.loads with a unicode string

python Incorrect formatting Cyrillic

Categories

Resources