How to combine np string array with float array python - python

I would like to combine an array full of floats with an array full of strings. Is there a way to do this?
(I am also having trouble rounding my floats, insert is changing them to scientific notation; I am unable to reproduce this with a small example)
A=np.array([[1/3,257/35],[3,4],[5,6]],dtype=float)
B=np.array([7,8,9],dtype=float)
C=np.insert(A,A.shape[1],B,axis=1)
print(np.arround(B,decimals=2))
D=np.array(['name1','name2','name3'])
How do I append D onto the end of C in the same way that I appended B onto A (insert D as the last column of C)?
I suspect that there is a type issue between having strings and floats in the same array. It would also answer my questions if there were a way to change a float (or maybe a scientific number, my numbers are displayed as '5.02512563e-02') to a string with about 4 digits (.0502).
I believe concatenate will not work, because the array dimensions are (3,3) and (,3). D is a 1-D array, D.T is no different than D. Also, when I plug this in I get "ValueError: all the input arrays must have same number of dimensions."
I don't care about accuracy loss due to appending, as this is the last step before I print.

Use dtype=object in your numpy array; like bellow:
np.array([1, 'a'], dtype=object)

Try making D a numpy array first, then transposing and concatenating with C:
D=np.array([['name1','name2','name3']])
np.concatenate((C, D.T), axis=1)
See the documentation for concatenate for explanation and examples:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

numpy arrays support only one type of data in the array. Changing the float to str is not a good idea as it will only result in values very close to the original value.
Try using pandas, it support multiple data types in single column.
import numpy as np
import pandas as pd
np_ar1 = np.array([1.3, 1.4, 1.5])
np_ar2 = np.array(['name1', 'name2', 'name3'])
df1 = pd.DataFrame({'ar1':np_ar1})
df2 = pd.DataFrame({'ar2':np_ar2})
pd.concat([df1.ar1, df2.ar2], axis=0)

Related

How to properly change dtype of numpy array

I have a numpy array that I obtained from pandas dataframe
data_array = df['column_name'].to_numpy()
The resulting array has dtype object, just like the original column, and consists of lists of integer values with shape (2000,). I would like it be of int32 type. However when I attempt to use
data_array = data_array.astype(np.int32)
I get exception
setting an array element with a sequence.
All elements in array are lists with same number of integers (a hundred or so).
The general format is:
[[1,0,1,0],[0,0,0,0],[1,0,0,1]]
Is there something obvious I'm missing? Or is there another, better way, to convert pandas dataframes into numpy arrays of desired type?
Because it seems to me I'm running out of options.
EDIT
I figured it out, although the approach was a bit hacky.
data_array = np.array(df['column_name'].to_list(), np.int32)
I'm still not sure why it was needed. But apparently one can turn two dimensional list of integers to numpy array with the right dtype and a list of numpy arrays instead of the two dimensional list.

Python Numpy: replace values in one array with corresponding values in another array

I am using Python Numpy arrays (rasters converted to 2D arrays, specifically) and what I want to do is take one array that has arbitrary dummy values of -999 representing "no data" and I want to replace those values with the corresponding "real" values from a different array of the same size and shape in the correct location. I couldn't find a very similar question to this but note that I am a novice with Python and Numpy.
But what I want to do is this:
array_a =
([[0.564,-999,-999],
[0.234,-999,0.898],
[-999,0.124,0.687],
[0.478,0.786,-999]])
array_b =
([[0.324,0.254,0.204],
[0.469,0.381,0.292],
[0.550,0.453,0.349],
[0.605,0.582,0.551]])
use the values of array_b to fill in the -999 values in array_a and create a new array:
new_array_a =
([[0.564,0.254,0.204],
[0.234,0.381,0.898],
[0.550,0.124,0.687],
[0.478,0.786,0.551]])
I don't really want to change the shape or dimensions of the array because I am going to convert back out into a raster afterwards so I need the correct values in the correct locations.
What is the best way to do this?
Just do boolean masking:
mask = (array_a == -999)
new_array = np.copy(array_a)
new_array[mask] = array_b[mask]
all you need to do is
array_a[array_a==-999]=array_b[array_a==-999]
we are putting boolean condition on array elements to update should have value -999
import numpy as np
array_a =np.array([[0.564,-999,-999],
[0.234,-999,0.898],
[-999,0.124,0.687],
[0.478,0.786,-999]])
array_b =np.array([[0.324,0.254,0.204],
[0.469,0.381,0.292],
[0.550,0.453,0.349],
[0.605,0.582,0.551]])
array_a[array_a==-999]=array_b[array_a==-999]
run this snippet

Use specific columns and lines numpy array

I have a matrix where the first column and line are composed by strings and rest of it is floats:
[["City","Score1","Score2","Score3"],
["Berkley",23,432,321],
["Ohio",3,432,54],
["Columbia",123,432,53]]
I just need to make another matrix to store the floats.
It would look like this:
[[23,432,321],
[3,432,54],
[123,432,53]]
Using numpy:
import numpy as np
arr = np.array([["City","Score1","Score2","Score3"],
["Berkley",23,432,321],
["Ohio",3,432,54],
["Columbia",123,432,53]])
new_arr = arr[1:, 1:].astype(float)
NOTE: In your example those are ints not floats, but I've still used floats here

Change dtype for particular values in numpy array?

I have a numpy array x, dimensions = (20, 4), in which only the first row and column are real string values (alphabets) and rest of the values are numerals with their types allocated as string. I want to change these numeral values to float or integer type.
I have tried some steps:
a. I made copies of first row and column of the array as separate variables:
x_row = x[0]
x_col = x[:,0]
Then deleted them from the original array x (using numpy.delete() method) and convertd the type of remaining values by applying a for loop that iterates over each value. However, when I stack back the copied rows and columns using numpy.vstack() and numpy.hstack(), then everything again converts to strings type. So, not sure why this is happening.
b. Same procedure as point a, except I used numpy.insert() method for inserting rows and columns, but is doing the same thing - converting everything back to string type.
So, is there a way through which I don't have to go through this deleting and stacking mechanism (which isn't working anyways) and I can change all the values (except first row and column) of an array to int() or float() type?
All items in a numpy array have to have the same dtype. That is a fundamental fact about numpy. You could possibly use a numpy recarray, or you could use dtype=object which basically lets all values be anything.
I'd recommend you take a look at pandas, which provides a tabular data structure that allows different columns to have different types. It sounds like what you have is a table with row and column labels, and that's what pandas deals with nicely.

Programmatically add column names to numpy ndarray

I'm trying to add column names to a numpy ndarray, then select columns by their names. But it doesn't work. I can't tell if the problem occurs when I add the names, or later when I try to call them.
Here's my code.
data = np.genfromtxt(csv_file, delimiter=',', dtype=np.float, skip_header=1)
#Add headers
csv_names = [ s.strip('"') for s in file(csv_file,'r').readline().strip().split(',')]
data = data.astype(np.dtype( [(n, 'float64') for n in csv_names] ))
Dimension-based diagnostics match what I expect:
print len(csv_names)
>> 108
print data.shape
>> (1652, 108)
"print data.dtype.names" also returns the expected output.
But when I start calling columns by their field names, screwy things happen. The "column" is still an array with 108 columns...
print data["EDUC"].shape
>> (1652, 108)
... and it appears to contain more missing values than there are rows in the data set.
print np.sum(np.isnan(data["EDUC"]))
>> 27976
Any idea what's going wrong here? Adding headers should be a trivial operation, but I've been fighting this bug for hours. Help!
The problem is that you are thinking in terms of spreadsheet-like arrays, whereas NumPy does use different concepts.
Here is what you must know about NumPy:
NumPy arrays only contain elements of a single type.
If you need spreadsheet-like "columns", this type must be some tuple-like type. Such arrays are called Structured Arrays, because their elements are structures (i.e. tuples).
In your case, NumPy would thus take your 2-dimensional regular array and produce a one-dimensional array whose type is a 108-element tuple (the spreadsheet array that you are thinking of is 2-dimensional).
These choices were probably made for efficiency reasons: all the elements of an array have the same type and therefore have the same size: they can be accessed, at a low-level, very simply and quickly.
Now, as user545424 showed, there is a simple NumPy answer to what you want to do (genfromtxt() accepts a names argument with column names).
If you want to convert your array from a regular NumPy ndarray to a structured array, you can do:
data.view(dtype=[(n, 'float64') for n in csv_names]).reshape(len(data))
(you were close: you used astype() instead of view()).
You can also check the answers to quite a few Stackoverflow questions, including Converting a 2D numpy array to a structured array and how to convert regular numpy array to record array?.
Unfortunately, I don't know what is going on when you try to add the field names, but I do know that you can build the array you want directly from the file via
data = np.genfromtxt(csv_file, delimiter=',', names=True)
EDIT:
It seems like adding field names only works when the input is a list of tuples:
data = np.array(map(tuple,data), [(n, 'float64') for n in csv_names])

Categories

Resources