Numpy/Pandas: Error converting ndarray to series

Numpy/Pandas: Error converting ndarray to series - python

I have the ndarray "diffTemp":
diffTemp = np.diff([df.Temp])
Where Temp are temperature values whose differences I compute using the difference operator. In this case using print() I get:
print(diffTemp) = [[-0.16 -0.05]]
To convert it into a column vector I use:
diffTemp = diffTemp.transpose()
And then convert is from ndarray into Series using:
diffTemp = pd.Series([diffTemp])
(This allows me later to concatenate diffTime with its corresponding Series dates (diffDates).)
Unfortunately this outputs that diffTemp is:
print(diffTemp) = 0 [[-0.16000000000000014], [-0.05000000000000071]]
If I instead use (i.e. without hard brackets [ ]), such that instead:
diffTemp = pd.Series(diffTemp)
I instead get the error message:
Exception: Data must be 1-dimensional
Totally new to Python and have tried google the last few days without any success. Any help is much much appreciated.

The issue here is that you are trying to convert a two-dimensional array into a 1-dimensional series. Notice that there are two brackets around [[-0.16 -0.05]]. You can write the following to get back a series by just grabbing the 1-d array that you want:
diffTemp = pd.Series(diffTemp[0])

Related

How do I convert 'the array saved as string to csv' back to float array?

I had to merge a lot of files (containing word embeddings and othe real valued vectors) based on some common attributes so I used Pandas DataFrame and saved the intermediate files as csv.
Currently I have a dataframe whose columns look something like this:
I want to merge all last 4 columns (t-1embedding1a,t-1embedding7b,t-2embedding1a,t-2embedding7b) into a single vector to pass to neural network.
I planned to iterate over the current dataframe and take 4 temporary tensors with value of each column and concatenate and write to new dataframe.
However torch.tensor doesn't work as it says:
torch_tensor = torch.tensor(final['t-1embedding1a'].astype(float).values)
could not convert string to float: '[-6.12873614e-01 -5.58319509e-01 -9.73452032e-01 3.66993636e-01\n
I also tried np.fromstring() but the original values are lost in this case.
Sorry, if the question is unnecessarily complicated, I am a newbie to pytorch. Any help is appreciated!

First of all, the data type for columns with "t-lembeddingXX" is string that look like "[-6.12873614e-01 -5.58319509e-01 -9.73452032e-01 3.66993636e-01]". You have to convert them to a list of float.
final["t-lembeddingXX"] = final["t-lembeddingXX"].apply(lambda x : [float(x) for x in x.replace("[", "").replace ("]", "").split()])
Then, you have to check that each list of final.loc[i,"t-lembeddingXX"] has the same lengths.
If I haven't mistaken, you want to merge the 4 columns into one verctor.
all_values = list(df["t-lembeddingX1"]) + list(df["t-lembeddingX2"]) + list(df["t-lembeddingX3"]) + list(df["t-lembeddingX4"])
# there is sureliy a better way
Then pass to tensor:
torch_tensor = torch.tensor(all_values)
------------
Finally, I advise you to take a look at the function of torch.cat. You can convert each column to a vector and then use this function to concatenate them together.

Python, manipulating dataframes

Department = input("what dept")
editfile = pd.read_csv('52.csv', encoding='Latin-1')
editfilevalues= editfile.loc[editfile['Customer'].str.contains(Department, na=False), 'May-18\nQty']
editfilevalues = editfilevalues.fillna(int(0))
print(int(editfilevalues) *1.3)
I have looked through stackoverflow and no answer seems to help me this problem. I simply want to be able to manipulate data in a series like this but I get different errors, with this current code I receive this:
"{0}".format(str(converter))) TypeError: cannot convert the series to <class 'int'>
My main issue is converting a series to an int type, I have tried several different ways to do this and none are giving me the results

So a pandas Series is a bit like a list, but with different functions and properties. You can't convert the Series to int using int() because the function wasn't designed to work on list-like objects in that way.
If you need to convert the Series to all integers, this method will work.
int_series = your_series.astype(int)
This will get your entire series as 'int32' specifically. Below is a bonus if you want it in a numpy array.
int_array = your_series.values.astype(int)
From here you have a few options to do your calculation.
# where x is a value in your series and lambda is a nameless function
calculated_series = int_series.apply(lambda x: some_number*x)
The output will be another Series object with your rows calculated. Bonus using numpy array below.
calculated_array = int_array * some_number
Edit to show everything at once.
# for series
int_series = your_series.astype(int)
calculated_series = int_series.apply(lambda x: x * some_number)
# for np.array
int_array = your_series.values.astype(int)
calculated_array = int_array * some_number
Either will work, and it is ultimately up to what kind of data structure you want at the end of it all.

Skipping INF values in 2d array Python

I am currently working with a data set that contains a lot of 'INF' values that are messing up my calculations. I've tried to remove these values, but I have been unable to find a way to do this with a 2D array. Some of my code is below:
def date2str(date_str):
date = datetime.strptime(date_str, '%Y-%m-%d %H:%M:%S')
return date.toordinal()
def readfiles(file_list):
data = []
for fname in file_list:
data.append(
np.loadtxt(fname,
usecols=(0,18),
comments='#', # skip comment lines
delimiter='\t',
converters = { 0 : date2str },
dtype=None))
return data
data = readfiles(['soundTransit1_remote_rawMeasurements_15m.txt'])
print data
data = data[np.isfinite(data)]
np.set_printoptions(threshold='nan')
column_0 = np.array(data)[0][:,0]
column_1 = np.array(data)[0][:,1]
thermTemp1_degC = 1/(1.401E-3 + 2.377E-4*np.log(column_1) + 9.730E-8*np.log(column_1)**3)-273.15
I have read in two of the columns (one containing data and the other containing the associated date and time) and separated them so I could manipulate the data (on the last line of my code). I need to be able to skip the lines of my data that contain 'INF'. The current solution I have, 'data = data[np.isfinite(data)] is not working. I receive the error 'TypeError: only integer arrays with one element can be converted to an index'. Can anyone give me some guidance?

Your function readfiles returns a python list, not a numpy array. Python lists can not be indexed with one of numpy's boolean arrays, which is why you get the error with data[np.isfinite(data)]; data is a python list, but np.isfinite(data) is a numpy array of boolean values.
Either return a numpy array from readfiles with something like
return np.array(data)
or convert the result to an array before you try to use numpy's boolean indexing:
data = np.array(data)
data = data[np.isfinite(data)]
You might have to the massage the list a bit to get an array with the desired shape.

How to combine np string array with float array python

I would like to combine an array full of floats with an array full of strings. Is there a way to do this?
(I am also having trouble rounding my floats, insert is changing them to scientific notation; I am unable to reproduce this with a small example)
A=np.array([[1/3,257/35],[3,4],[5,6]],dtype=float)
B=np.array([7,8,9],dtype=float)
C=np.insert(A,A.shape[1],B,axis=1)
print(np.arround(B,decimals=2))
D=np.array(['name1','name2','name3'])
How do I append D onto the end of C in the same way that I appended B onto A (insert D as the last column of C)?
I suspect that there is a type issue between having strings and floats in the same array. It would also answer my questions if there were a way to change a float (or maybe a scientific number, my numbers are displayed as '5.02512563e-02') to a string with about 4 digits (.0502).
I believe concatenate will not work, because the array dimensions are (3,3) and (,3). D is a 1-D array, D.T is no different than D. Also, when I plug this in I get "ValueError: all the input arrays must have same number of dimensions."
I don't care about accuracy loss due to appending, as this is the last step before I print.

Use dtype=object in your numpy array; like bellow:
np.array([1, 'a'], dtype=object)

Try making D a numpy array first, then transposing and concatenating with C:
D=np.array([['name1','name2','name3']])
np.concatenate((C, D.T), axis=1)
See the documentation for concatenate for explanation and examples:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html

numpy arrays support only one type of data in the array. Changing the float to str is not a good idea as it will only result in values very close to the original value.
Try using pandas, it support multiple data types in single column.
import numpy as np
import pandas as pd
np_ar1 = np.array([1.3, 1.4, 1.5])
np_ar2 = np.array(['name1', 'name2', 'name3'])
df1 = pd.DataFrame({'ar1':np_ar1})
df2 = pd.DataFrame({'ar2':np_ar2})
pd.concat([df1.ar1, df2.ar2], axis=0)

Programmatically add column names to numpy ndarray

I'm trying to add column names to a numpy ndarray, then select columns by their names. But it doesn't work. I can't tell if the problem occurs when I add the names, or later when I try to call them.
Here's my code.
data = np.genfromtxt(csv_file, delimiter=',', dtype=np.float, skip_header=1)
#Add headers
csv_names = [ s.strip('"') for s in file(csv_file,'r').readline().strip().split(',')]
data = data.astype(np.dtype( [(n, 'float64') for n in csv_names] ))
Dimension-based diagnostics match what I expect:
print len(csv_names)
>> 108
print data.shape
>> (1652, 108)
"print data.dtype.names" also returns the expected output.
But when I start calling columns by their field names, screwy things happen. The "column" is still an array with 108 columns...
print data["EDUC"].shape
>> (1652, 108)
... and it appears to contain more missing values than there are rows in the data set.
print np.sum(np.isnan(data["EDUC"]))
>> 27976
Any idea what's going wrong here? Adding headers should be a trivial operation, but I've been fighting this bug for hours. Help!

The problem is that you are thinking in terms of spreadsheet-like arrays, whereas NumPy does use different concepts.
Here is what you must know about NumPy:
NumPy arrays only contain elements of a single type.
If you need spreadsheet-like "columns", this type must be some tuple-like type. Such arrays are called Structured Arrays, because their elements are structures (i.e. tuples).
In your case, NumPy would thus take your 2-dimensional regular array and produce a one-dimensional array whose type is a 108-element tuple (the spreadsheet array that you are thinking of is 2-dimensional).
These choices were probably made for efficiency reasons: all the elements of an array have the same type and therefore have the same size: they can be accessed, at a low-level, very simply and quickly.
Now, as user545424 showed, there is a simple NumPy answer to what you want to do (genfromtxt() accepts a names argument with column names).
If you want to convert your array from a regular NumPy ndarray to a structured array, you can do:
data.view(dtype=[(n, 'float64') for n in csv_names]).reshape(len(data))
(you were close: you used astype() instead of view()).
You can also check the answers to quite a few Stackoverflow questions, including Converting a 2D numpy array to a structured array and how to convert regular numpy array to record array?.

Unfortunately, I don't know what is going on when you try to add the field names, but I do know that you can build the array you want directly from the file via
data = np.genfromtxt(csv_file, delimiter=',', names=True)
EDIT:
It seems like adding field names only works when the input is a list of tuples:
data = np.array(map(tuple,data), [(n, 'float64') for n in csv_names])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy/Pandas: Error converting ndarray to series - python

The issue here is that you are trying to convert a two-dimensional array into a 1-dimensional series. Notice that there are two brackets around [[-0.16 -0.05]]. You can write the following to get back a series by just grabbing the 1-d array that you want: diffTemp = pd.Series(diffTemp[0])

Related

How do I convert 'the array saved as string to csv' back to float array?

Python, manipulating dataframes

Skipping INF values in 2d array Python

How to combine np string array with float array python

Programmatically add column names to numpy ndarray

Categories

Resources