Use specific columns and lines numpy array - python

I have a matrix where the first column and line are composed by strings and rest of it is floats:
[["City","Score1","Score2","Score3"],
["Berkley",23,432,321],
["Ohio",3,432,54],
["Columbia",123,432,53]]
I just need to make another matrix to store the floats.
It would look like this:
[[23,432,321],
[3,432,54],
[123,432,53]]

Using numpy:
import numpy as np
arr = np.array([["City","Score1","Score2","Score3"],
["Berkley",23,432,321],
["Ohio",3,432,54],
["Columbia",123,432,53]])
new_arr = arr[1:, 1:].astype(float)
NOTE: In your example those are ints not floats, but I've still used floats here

Related

delete row from numpy array based on partial string in python

I have a very large numpy array that looks similar to my example. The partial string that I'm trying to detect is "F_H" and it's usually on column 0 of my array.
a = np.array([['#define', 'bad_stringF_H', 'some_value'],
['#define', 'good_string', 'some_value2'],
['#define', 'good_string_2', 'some_value3'],
['#define', 'bad_string2F_H', 'some_value4']])
I just want to delete the whole array if that partial string is detected in the row so the desired output would be like this.
[['#define' 'good_string' 'some_value2']
['#define' 'good_string_2' 'some_value3']]
You can use NumPy's Boolean indexing to create a new array that only includes the rows that do not contain the string 'F_H':
import numpy as np
a = np.array([['#define', 'bad_stringF_H', 'some_value'],
['#define', 'good_string', 'some_value2'],
['#define', 'good_string_2', 'some_value3'],
['#define', 'bad_string2F_H', 'some_value4']])
mask = np.array('F_H' not in x[1] for x in a])
print(mask)
new_a = a[mask]
print(new_a)

How to properly change dtype of numpy array

I have a numpy array that I obtained from pandas dataframe
data_array = df['column_name'].to_numpy()
The resulting array has dtype object, just like the original column, and consists of lists of integer values with shape (2000,). I would like it be of int32 type. However when I attempt to use
data_array = data_array.astype(np.int32)
I get exception
setting an array element with a sequence.
All elements in array are lists with same number of integers (a hundred or so).
The general format is:
[[1,0,1,0],[0,0,0,0],[1,0,0,1]]
Is there something obvious I'm missing? Or is there another, better way, to convert pandas dataframes into numpy arrays of desired type?
Because it seems to me I'm running out of options.
EDIT
I figured it out, although the approach was a bit hacky.
data_array = np.array(df['column_name'].to_list(), np.int32)
I'm still not sure why it was needed. But apparently one can turn two dimensional list of integers to numpy array with the right dtype and a list of numpy arrays instead of the two dimensional list.

pd.ExcelFile: file contains lists of floats, but the lists are imported as a string

I exported lists of floats to an excel file which i now want to import again as a dataframe. I get a 10x7 dataframe, which looks all good, except that all my lists are stored as one string each (which makes sense I guess, as excel doesn't know what lists are, right?). I'm using this code to import the file:
pd.ExcelFile('fitness.xlsx')
Using the converter parameter doesn't help. Do you know whether there is an easy way to import my lists as lists containing floats directly? Should I maybe export to a different format in the first place? If so, which one could handle my dataformat? At the moment my entries look like this:
In:
xl_var.loc[xl_var['stimulus'] == -2, 'first spike'].values
Out:
array(['[14.25, 11.649999999999864]'], dtype=object)
This is how my imported dataframe looks like. You see that my last column even contains 2D lists, which makes converting afterwards a bit messy.
And this is how my excel file that I want to import looks like.
String representations of lists can be converted to lists by using eval(). For a single column, use .apply() as in:
xl_var['first spike'].apply(eval)
If you want to convert all of the columns you can use .applymap() with a list of the columns to convert:
cols = ['first spike', 'number spikes', 'peak', 'undershoot']
xl_var[cols] = xl_var[cols].applymap(eval)
In the case of your last column, you will either need to use string functions to remove the 'array' so that eval can process them like python lists, or you can import array from numpy to allow eval to convert them to numpy arrays.
from numpy import array
xl_var['average spike shape'].apply(lambda x:eval(x, globals()))
Example:
import pandas as pd
from numpy import array
xl_var = pd.DataFrame.from_items((('stimulus',[-2, -1.75]),
('first spike',['[14.25, 11.65]', '[14.15, 13.27]']),
('arrays', ['[array([1,2]),array([3,4])]','[array([5,6]),array([7,8])]']) ))
In :
xl_var['first spike'].values
Out:
array(['[14.25, 11.65]', '[14.15, 13.27]'], dtype=object)
In :
xl_var['first spike'].apply(eval)
Out:
0 [14.25, 11.65]
1 [14.15, 13.27]
Name: first spike, dtype: object
In :
xl_var['arrays'].values
Out:
array(['[array([1,2]),array([3,4])]', '[array([5,6]),array([7,8])]'], dtype=object)
In :
xl_var['arrays'].apply(lambda x:eval(x,globals())).values
Out:
array([list([array([1, 2]), array([3, 4])]),
list([array([5, 6]), array([7, 8])])], dtype=object)

Python Numpy: replace values in one array with corresponding values in another array

I am using Python Numpy arrays (rasters converted to 2D arrays, specifically) and what I want to do is take one array that has arbitrary dummy values of -999 representing "no data" and I want to replace those values with the corresponding "real" values from a different array of the same size and shape in the correct location. I couldn't find a very similar question to this but note that I am a novice with Python and Numpy.
But what I want to do is this:
array_a =
([[0.564,-999,-999],
[0.234,-999,0.898],
[-999,0.124,0.687],
[0.478,0.786,-999]])
array_b =
([[0.324,0.254,0.204],
[0.469,0.381,0.292],
[0.550,0.453,0.349],
[0.605,0.582,0.551]])
use the values of array_b to fill in the -999 values in array_a and create a new array:
new_array_a =
([[0.564,0.254,0.204],
[0.234,0.381,0.898],
[0.550,0.124,0.687],
[0.478,0.786,0.551]])
I don't really want to change the shape or dimensions of the array because I am going to convert back out into a raster afterwards so I need the correct values in the correct locations.
What is the best way to do this?
Just do boolean masking:
mask = (array_a == -999)
new_array = np.copy(array_a)
new_array[mask] = array_b[mask]
all you need to do is
array_a[array_a==-999]=array_b[array_a==-999]
we are putting boolean condition on array elements to update should have value -999
import numpy as np
array_a =np.array([[0.564,-999,-999],
[0.234,-999,0.898],
[-999,0.124,0.687],
[0.478,0.786,-999]])
array_b =np.array([[0.324,0.254,0.204],
[0.469,0.381,0.292],
[0.550,0.453,0.349],
[0.605,0.582,0.551]])
array_a[array_a==-999]=array_b[array_a==-999]
run this snippet

How to combine np string array with float array python

I would like to combine an array full of floats with an array full of strings. Is there a way to do this?
(I am also having trouble rounding my floats, insert is changing them to scientific notation; I am unable to reproduce this with a small example)
A=np.array([[1/3,257/35],[3,4],[5,6]],dtype=float)
B=np.array([7,8,9],dtype=float)
C=np.insert(A,A.shape[1],B,axis=1)
print(np.arround(B,decimals=2))
D=np.array(['name1','name2','name3'])
How do I append D onto the end of C in the same way that I appended B onto A (insert D as the last column of C)?
I suspect that there is a type issue between having strings and floats in the same array. It would also answer my questions if there were a way to change a float (or maybe a scientific number, my numbers are displayed as '5.02512563e-02') to a string with about 4 digits (.0502).
I believe concatenate will not work, because the array dimensions are (3,3) and (,3). D is a 1-D array, D.T is no different than D. Also, when I plug this in I get "ValueError: all the input arrays must have same number of dimensions."
I don't care about accuracy loss due to appending, as this is the last step before I print.
Use dtype=object in your numpy array; like bellow:
np.array([1, 'a'], dtype=object)
Try making D a numpy array first, then transposing and concatenating with C:
D=np.array([['name1','name2','name3']])
np.concatenate((C, D.T), axis=1)
See the documentation for concatenate for explanation and examples:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html
numpy arrays support only one type of data in the array. Changing the float to str is not a good idea as it will only result in values very close to the original value.
Try using pandas, it support multiple data types in single column.
import numpy as np
import pandas as pd
np_ar1 = np.array([1.3, 1.4, 1.5])
np_ar2 = np.array(['name1', 'name2', 'name3'])
df1 = pd.DataFrame({'ar1':np_ar1})
df2 = pd.DataFrame({'ar2':np_ar2})
pd.concat([df1.ar1, df2.ar2], axis=0)

Categories

Resources