Convert python string into numpy array [duplicate] - python

This question already has answers here:
what is the fastest way in python to convert a string with formatted numbers in an numpy array
(2 answers)
Closed 4 years ago.
I am hoping someone could help me convert python string into numpy array. Essentially, given that I have a Python string like this:
'[ 0.11591 0.044932 0.66926 -0.67844 0.47253 -0.84737\n 1.0734 -0.075396 -0.22688 0.84021 -0.46608 0.019941\n -0.0020394 -0.13038 0.8911 -0.40015 0.52048 0.69283\n -0.10257 0.54296 -0.416 0.36585 0.96078 0.50816\n 0.50144 0.66489 -0.79224 0.44567 0.90822 -0.67522\n 0.047322 0.48399 -0.53316 0.76157 -0.86072 0.091377\n 0.30159 -1.194 0.8679 -0.58691 0.48712 -0.66167\n -0.24265 -0.18849 -0.19353 0.0014832 0.88768 0.36672\n 0.16211 0.56235 ]'
I want to convert it into a 1x50 dimensional array in Python. Is there any efficient way of doing it? Thanks in advance.
EDIT: How I get that string? It is initially a numpy array as a value in a dictionary. Then I save that into the database with the data type of TEXT. Afterward, I load the text that contains numpy array from the database.

Given you have such a string:
line = '[ 0.11591 0.044932 0.66926 -0.67844 0.47253 -0.84737\n 1.0734 -0.075396 -0.22688 0.84021 -0.46608 0.019941\n -0.0020394 -0.13038 0.8911 -0.40015 0.52048 0.69283\n -0.10257 0.54296 -0.416 0.36585 0.96078 0.50816\n 0.50144 0.66489 -0.79224 0.44567 0.90822 -0.67522\n 0.047322 0.48399 -0.53316 0.76157 -0.86072 0.091377\n 0.30159 -1.194 0.8679 -0.58691 0.48712 -0.66167\n -0.24265 -0.18849 -0.19353 0.0014832 0.88768 0.36672\n 0.16211 0.56235 ]'
Just remove the first and the last element from it, split it and convert the elements to numbers:
map(float, line[1:-2].split())
Or just use the numpy.fromstring function:
numpy.fromstring(line[1:-2], dtype=float, sep=' ')

This is one way to solve it:
import numpy as np
import re
txt = '[ 0.11591 0.044932 0.66926 -0.67844 0.47253 -0.84737\n 1.0734 -0.075396 -0.22688 0.84021 -0.46608 0.019941\n -0.0020394 -0.13038 0.8911 -0.40015 0.52048 0.69283\n -0.10257 0.54296 -0.416 0.36585 0.96078 0.50816\n 0.50144 0.66489 -0.79224 0.44567 0.90822 -0.67522\n 0.047322 0.48399 -0.53316 0.76157 -0.86072 0.091377\n 0.30159 -1.194 0.8679 -0.58691 0.48712 -0.66167\n -0.24265 -0.18849 -0.19353 0.0014832 0.88768 0.36672\n 0.16211 0.56235 ]'
txt = re.sub(r'\n','', txt)
myList = txt.split()[1:-1]
myList2 = list(map(float,myList))
n_arr = np.array(myList)
print(n_arr)

Related

Data cleaning: extracting numbers out of string array by deleting '.' and ';' characters

I have a big data set what is messed up. I tried to clean it.
The data looks like this:
data= np.array(['0,51\n0,64\n0,76\n0,84\n1,00', 1.36]) #...
My goal is to extract the raw numbers:
numbers= [51, 64, 76, 84, 100, 136]
What I tried worked, but I think it is not that elegant. Is there a better way to do it?
import numpy as np
import re
clean= np.array([])
for i in data:
i = str(i)
if ',' in i:
without= i.replace(',', '')
clean= np.append(clean, without)
elif '.' in i:
without= i.replace('.', '')
clean= np.append(clean, without)
#detect all numbers
numbers= np.array([])
for i in clean:
if type(i) == np.str_:
a= re.findall(r'\b\d+\b', i)
numbers= np.append(numbers, a)
Generally, you should never use np.append in a loop since it recreate a new array every time resulting in an inefficient quadratic complexity.
Besides this, you can use the following one-liner to solve your problem:
result = [int(float(n.replace(',', '.'))*100) for e in data for n in e.split()]
The idea is to replace , by a . and then parse the string as a float so to produce the right integer based on this. You can convert it to a numpy array with np.fromiter(result, dtype=int).

Spliting integers from string in python

Suppose I have a numpy array like -
A = ['83.56%' '2.74%' '2.74%' '4.11%' '4.11%' '19.18%' '76.71%' '20.55%'
'34.25%' '54.79%']
and I want to split this array as integers array only like -
B = ['83.56' '2.74' '2.74' '4.11' '4.11' '19.18' '76.71' '20.55'
'34.25' '54.79']
How should I do it using Python codes ?
Use-case for str.rstrip:
B = [item.rstrip('%') for item in A]

List of arrays into .txt file without brackets and well spaced

I'm trying to save .txt file of a list of arrays as follow :
list_array =
[array([-20.10400009, -9.94099998, -27.10300064]),
array([-20.42099953, -9.91499996, -27.07099915]),
...
This is the line I invoked.
np.savetxt('path/file.txt', list_array, fmt='%s')
This is what I get
[-20.10400009 -9.94099998 -27.10300064]
[-20.42099953 -9.91499996 -27.07099915]
...
This is what I want
-20.10400009 -9.94099998 -27.10300064
-20.42099953 -9.91499996 -27.07099915
...
EDIT :
It is translated from Matlab as followed where I .append to transform
Cell([array([[[-20.10400009, -9.94099998, -27.10300064]]]),
array([[[-20.42099953, -9.91499996, -27.07099915]]]),
array([[[-20.11199951, -9.88199997, -27.16399956]]]),
array([[[-19.99500084, -10.0539999 , -27.13899994]]]),
array([[[-20.4109993 , -9.87100029, -27.12800026]]])],
dtype=object)
I cannot really see what is wrong with your code, except for the missing imports. With array, do you mean numpy.array, or are you importing like from numpy import array (which you should refrain from doing)?
Running this example gives exactly what you want.
import numpy as np
list_array = [np.array([-20.10400009, -9.94099998, -27.10300064]),
np.array([-20.42099953, -9.91499996, -27.07099915])]
np.savetxt('test.txt', list_array, fmt='%s')
> cat test.txt
-20.10400009 -9.94099998 -27.10300064
-20.42099953 -9.91499996 -27.07099915

Reading a line with scientific numbers (like 0.4E-03)

I would like to process the following line (output of a Fortran program) from a file, with Python:
74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540
and obtain an array such as:
[74,0.4131493371345440e-3,-0.4592776407685850E-03,-0.1725046324754540]
My previous attempts do not work. In particular, if I do the following :
with open(filename,"r") as myfile:
line=np.array(re.findall(r"[-+]?\d*\.*\d+",myfile.readline())).astype(float)
I have the following error :
ValueError: could not convert string to float: 'E-03'
Steps:
Get list of strings (str.split(' '))
Get rid of "\n" (del arr[-1])
Turn list of strings into numbers (Converting a string (with scientific notation) to an int in Python)
Code:
import decimal # you may also leave this out and use `float` instead of `decimal.Decimal()`
arr = "74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540 \n"
arr = arr.split(' ')
del arr[-1]
arr = [decimal.Decimal(x) for x in arr]
# do your np stuff
Result:
>>> print(arr)
[Decimal('74'), Decimal('0.0004131493371345440'), Decimal('-0.0004592776407685850'), Decimal('-0.1725046324754540')]
PS:
I don't know if you wrote the file that gives the output in the first place, but if you did, you could just think about outputting an array of float() / decimal.Decimal() from that file instead.
#ant.kr Here is a possible solution:
# Initial data
a = "74 0.4131493371345440E-03 -0.4592776407685850E-03 -0.1725046324754540 \n"
# Given the structure of the initial data, we can proceed as follow:
# - split the initial at each white space; this will produce **list** with the last
# the element being **\n**
# - we can now convert each list element into a floating point data, store them in a
# numpy array.
line = np.array([float(i) for i in a.split(" ")[:-1]])

numpy matrix string with python3.4

i'm having trouble with 3.4 using numpy. My question is to know how can i have a numpy matrix with plain string format instead byte-string.
def res(data):
M = np.zeros(data.shape).astype(dtype='|S20')
lines,columns = M.shape
for l in range(lines):
M[l][0] = data[l][1]
M[l][1] = data[l][2]
M[l][2] = data[l][3]
return M
**result python2.7**
[['Ann' '38.72' '-9.133']
['John' '55.68' '12.566']
['Richard' '52.52' '13.411']
['Alex' '40.42' '-3.703']]
**result python3.4**
[[b'Ann' b'38.72' b'-9.133']
[b'John' b'55.68' b'12.566']
[b'Richard' b'52.52' b'13.411']
[b'Alex' b'40.42' b'-3.703']]
In Python3.4 How can i have my Matrix in plain string like in example for python2.7 this is bad because i have functions that expect string values and not byte-strings.
Any help would be great. thanks
in my case the solution were simply to change dtype('|S20') to dtype(str)..I hope this help.

Categories

Resources