DataFrame.drop leads to ufunc loop error with numpy.sin - python

Introduction
My code is supposed to import the data from .xlsx files and make calculations based on this. The problem lies in that the unit of each column is saved in the second row of the sheet and is imported as first entry of the data column. Resulting in something like this:
import pandas as pd
import numpy as np
data = pd.DataFrame(data = {'alpha' : ['[°]', 180, 180, 180]})
data['sin'] = np.sin(data['alpha'])
Problem
Because the first cell is str type, the column becomes object type. I thought I could solve this problem by rearranging the dataframe by adding the following code between the two lines:
data = data.drop([0]).reset_index(drop = True)
data.astype({'alpha' : 'float64'})
The dataframe now looks like I want it to look and I suppose it should work as intendet, but instead I get an AttributeError and a TypeError:
AttributeError: 'float' object has no attribute 'sin'TypeError: loop of ufunc does not support argument 0 of type float which has no callable sin method
TypeError: loop of ufunc does not support argument 0 of type float which has no callable sin method
Any insight on why I get these errors and how to solve them would be appreciated!

You can use Pandas' conversion function like this:
data = pd.DataFrame(data = {'alpha' : ['[°]', 180, 180, 180]})
data['alpha'] = pd.to_numeric(data['alpha'], errors='coerce')
# is your alpha degrees or radians?
data['sin'] = np.sin(np.deg2rad(data['alpha']))
Output:
alpha sin
0 NaN NaN
1 180.0 1.224647e-16
2 180.0 1.224647e-16
3 180.0 1.224647e-16

Related

Cannot resolve stack() due to type mismatch

I have a pyspark code that looks like this:
from pyspark.sql.functions import expr
unpivotExpr = """stack(14, 'UeEnd', UeEnd,
'Encedreco', Endereco,
'UeSitFun', UeSitFun,
'SitacaoEscola', SituacaoEscola,
'Creche', Creche,
'PreEscola', PreEscola,
'FundAnosIniciais', FundAnosIniciais,
'FundAnosFinais', FundAnosFinais,
'EnsinoMedio', EnsinoMedio,
'Profissionalizante', Profissionalizante,
'EJA', EJA,
'EdEspecial', EdEspecial,
'Conveniada', Conveniada,
'TipoAtoCriacao', TipoAtoCriacao)
as (atributo, valor)"""
unpivotDf = df.select("Id", expr(unpivotExpr))
When I run it I get this Error:
cannot resolve 'stack(14, 'UeEnd', `UeEnd`, 'Encedreco', `Endereco`, 'UeSitFun', `UeSitFun`,
'SitacaoEscola', `SituacaoEscola`, 'Creche', `Creche`, 'PreEscola', `PreEscola`,
'FundAnosIniciais', `FundAnosIniciais`, 'FundAnosFinais', `FundAnosFinais`, 'EnsinoMedio',
`EnsinoMedio`, 'Profissionalizante', `Profissionalizante`, 'EJA', `EJA`, 'EdEspecial',
`EdEspecial`, 'Conveniada', `Conveniada`, 'TipoAtoCriacao', `TipoAtoCriacao`)'
due to data type mismatch: Argument 2 (string) != Argument 6 (bigint); line 1 pos 0;
What might be causing this problem?
When you unpivot a group of columns, all of their values are going to end up in the same column. Because of that, you should first make sure that all of the columns you are trying to unpivot into one have the same data types. Otherwise you would have a column with multiple different types in different rows.

np.log in a for loop, I always get TypeError: 'numpy.float64' object is not callable

I have a really easy dataset with just one column, and I would like to have a for loop over each row of the dataframe so that for each row it calculate the log of current_close_price/first_row_close_price. Whatever I do, it says:
TypeError: 'numpy.float64' object is not callable
import pandas as pd
import numpy as np
price.head()
Close
Date
2010-07-19 107.290001
2010-07-20 108.480003
2010-07-21 107.070000
2010-07-22 109.459999
2010-07-23 110.410004
for index, row in price.iterrows():
first_row_price=price.iloc[0,0]
current_price=price.iloc[index,0]
log_rt = np.log(current_price / reference_price)
Consider we have the table in a.csv file, which have two columns Date and Close, and writing first_row_price instead of reference_price in your code:
with open("a.csv", 'r') as a:
price = pd.read_csv(a, usecols=[1]) # which get data related to 'Close' column
for index, row in price.iterrows():
first_row_price = price.iloc[0, 0]
current_price = price.iloc[index, 0]
log_rt = np.log(current_price / first_row_price)
This code will get output as:
0.0
0.011030393877764241
-0.002052631799009411
0.020023718610826604
0.02866528771045947

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced

I am trying to convert a csv into numpy array. In the numpy array, I am replacing few elements with NaN. Then, I wanted to find the indices of the NaN elements in the numpy array. The code is :
import pandas as pd
import matplotlib.pyplot as plyt
import numpy as np
filename = 'wether.csv'
df = pd.read_csv(filename,header = None )
list = df.values.tolist()
labels = list[0]
wether_list = list[1:]
year = []
month = []
day = []
max_temp = []
for i in wether_list:
year.append(i[1])
month.append(i[2])
day.append(i[3])
max_temp.append(i[5])
mid = len(max_temp) // 2
temps = np.array(max_temp[mid:])
temps[np.where(np.array(temps) == -99.9)] = np.nan
plyt.plot(temps,marker = '.',color = 'black',linestyle = 'none')
# plyt.show()
print(np.where(np.isnan(temps))[0])
# print(len(pd.isnull(np.array(temps))))
When I execute this, I am getting a warning and an error. The warning is :
wether.py:26: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
temps[np.where(np.array(temps) == -99.9)] = np.nan
The error is :
Traceback (most recent call last):
File "wether.py", line 30, in <module>
print(np.where(np.isnan(temps))[0])
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
This is a part of the dataset which I am using:
83168,2014,9,7,0.00000,89.00000,78.00000, 83.50000
83168,2014,9,22,1.62000,90.00000,72.00000, 81.00000
83168,2014,9,23,0.50000,87.00000,74.00000, 80.50000
83168,2014,9,24,0.35000,82.00000,73.00000, 77.50000
83168,2014,9,25,0.60000,85.00000,75.00000, 80.00000
83168,2014,9,26,0.76000,89.00000,77.00000, 83.00000
83168,2014,9,27,0.00000,89.00000,79.00000, 84.00000
83168,2014,9,28,0.00000,90.00000,81.00000, 85.50000
83168,2014,9,29,0.00000,90.00000,79.00000, 84.50000
83168,2014,9,30,0.50000,89.00000,75.00000, 82.00000
83168,2014,10,1,0.02000,91.00000,75.00000, 83.00000
83168,2014,10,2,0.03000,93.00000,77.00000, 85.00000
83168,2014,10,3,1.40000,93.00000,75.00000, 84.00000
83168,2014,10,4,0.06000,89.00000,75.00000, 82.00000
83168,2014,10,5,0.22000,91.00000,68.00000, 79.50000
83168,2014,10,6,0.00000,84.00000,68.00000, 76.00000
83168,2014,10,7,0.17000,85.00000,73.00000, 79.00000
83168,2014,10,8,0.06000,84.00000,73.00000, 78.50000
83168,2014,10,9,0.00000,87.00000,73.00000, 80.00000
83168,2014,10,10,0.00000,88.00000,80.00000, 84.00000
83168,2014,10,11,0.00000,87.00000,80.00000, 83.50000
83168,2014,10,12,0.00000,88.00000,80.00000, 84.00000
83168,2014,10,13,0.00000,88.00000,81.00000, 84.50000
83168,2014,10,14,0.04000,88.00000,77.00000, 82.50000
83168,2014,10,15,0.00000,88.00000,77.00000, 82.50000
83168,2014,10,16,0.09000,89.00000,72.00000, 80.50000
83168,2014,10,17,0.00000,85.00000,67.00000, 76.00000
83168,2014,10,18,0.00000,84.00000,65.00000, 74.50000
83168,2014,10,19,0.00000,84.00000,65.00000, 74.50000
83168,2014,10,20,0.00000,85.00000,69.00000, 77.00000
83168,2014,10,21,0.77000,87.00000,76.00000, 81.50000
83168,2014,10,22,0.69000,81.00000,71.00000, 76.00000
83168,2014,10,23,0.31000,82.00000,72.00000, 77.00000
83168,2014,10,24,0.71000,79.00000,73.00000, 76.00000
83168,2014,10,25,0.00000,81.00000,68.00000, 74.50000
83168,2014,10,26,0.00000,82.00000,67.00000, 74.50000
83168,2014,10,27,0.00000,83.00000,64.00000, 73.50000
83168,2014,10,28,0.00000,83.00000,66.00000, 74.50000
83168,2014,10,29,0.03000,86.00000,76.00000, 81.00000
83168,2014,10,30,0.00000,85.00000,69.00000, 77.00000
83168,2014,10,31,0.00000,85.00000,69.00000, 77.00000
83168,2014,11,1,0.00000,86.00000,59.00000, 72.50000
83168,2014,11,2,0.00000,77.00000,52.00000, 64.50000
83168,2014,11,3,0.00000,70.00000,52.00000, 61.00000
83168,2014,11,4,0.00000,77.00000,59.00000, 68.00000
83168,2014,11,5,0.02000,79.00000,73.00000, 76.00000
83168,2014,11,6,0.02000,82.00000,75.00000, 78.50000
83168,2014,11,7,0.00000,83.00000,66.00000, 74.50000
83168,2014,11,8,0.00000,84.00000,65.00000, 74.50000
83168,2014,11,9,0.00000,84.00000,65.00000, 74.50000
83168,2014,11,10,1.20000,72.00000,65.00000, 68.50000
83168,2014,11,11,0.08000,77.00000,61.00000, 69.00000
83168,2014,11,12,0.00000,80.00000,61.00000, 70.50000
83168,2014,11,13,0.00000,83.00000,63.00000, 73.00000
83168,2014,11,14,0.00000,83.00000,65.00000, 74.00000
83168,2014,11,15,0.00000,82.00000,64.00000, 73.00000
83168,2014,11,16,0.00000,83.00000,64.00000, 73.50000
83168,2014,11,17,0.07000,84.00000,64.00000, 74.00000
83168,2014,11,18,0.00000,86.00000,71.00000, 78.50000
83168,2014,11,19,0.57000,78.00000,55.00000, 66.50000
83168,2014,11,20,0.05000,72.00000,56.00000, 64.00000
83168,2014,11,21,0.05000,77.00000,63.00000, 70.00000
83168,2014,11,22,0.22000,77.00000,69.00000, 73.00000
83168,2014,11,23,0.06000,79.00000,76.00000, 77.50000
83168,2014,11,24,0.02000,84.00000,78.00000, 81.00000
83168,2014,11,25,0.00000,86.00000,78.00000, 82.00000
83168,2014,11,26,0.07000,85.00000,77.00000, 81.00000
83168,2014,11,27,0.21000,82.00000,55.00000, 68.50000
83168,2014,11,28,0.00000,73.00000,53.00000, 63.00000
83168,2015,1,8,0.00000,80.00000,57.00000,
83168,2015,1,9,0.05000,72.00000,56.00000,
83168,2015,1,10,0.00000,72.00000,57.00000,
83168,2015,1,11,0.00000,80.00000,57.00000,
83168,2015,1,12,0.05000,80.00000,59.00000,
83168,2015,1,13,0.85000,81.00000,69.00000,
83168,2015,1,14,0.05000,81.00000,68.00000,
83168,2015,1,15,0.00000,81.00000,64.00000,
83168,2015,1,16,0.00000,78.00000,63.00000,
83168,2015,1,17,0.00000,73.00000,55.00000,
83168,2015,1,18,0.00000,76.00000,55.00000,
83168,2015,1,19,0.00000,78.00000,55.00000,
83168,2015,1,20,0.00000,75.00000,56.00000,
83168,2015,1,21,0.02000,73.00000,65.00000,
83168,2015,1,22,0.00000,80.00000,64.00000,
83168,2015,1,23,0.00000,80.00000,71.00000,
83168,2015,1,24,0.00000,79.00000,72.00000,
83168,2015,1,25,0.00000,79.00000,49.00000,
83168,2015,1,26,0.00000,79.00000,49.00000,
83168,2015,1,27,0.10000,75.00000,53.00000,
83168,2015,1,28,0.00000,68.00000,53.00000,
83168,2015,1,29,0.00000,69.00000,53.00000,
83168,2015,1,30,0.00000,72.00000,60.00000,
83168,2015,1,31,0.00000,76.00000,58.00000,
83168,2015,2,1,0.00000,76.00000,58.00000,
83168,2015,2,2,0.05000,77.00000,58.00000,
83168,2015,2,3,0.00000,84.00000,56.00000,
83168,2015,2,4,0.00000,76.00000,56.00000,
I am unable to rectify the error. How to overcome the warning in the 26th line? How can one solve this error?
Update :
when I try the same thing in different way like reading dataset from file instead of converting to dataframes, I am not getting the error. What would be the reason for that? The code is :
weather_filename = 'wether.csv'
weather_file = open(weather_filename)
weather_data = weather_file.read()
weather_file.close()
# Break the weather records into lines
lines = weather_data.split('\n')
labels = lines[0]
values = lines[1:]
n_values = len(values)
# Break the list of comma-separated value strings
# into lists of values.
year = []
month = []
day = []
max_temp = []
j_year = 1
j_month = 2
j_day = 3
j_max_temp = 5
for i_row in range(n_values):
split_values = values[i_row].split(',')
if len(split_values) >= j_max_temp:
year.append(int(split_values[j_year]))
month.append(int(split_values[j_month]))
day.append(int(split_values[j_day]))
max_temp.append(float(split_values[j_max_temp]))
# Isolate the recent data.
i_mid = len(max_temp) // 2
temps = np.array(max_temp[i_mid:])
year = year[i_mid:]
month = month[i_mid:]
day = day[i_mid:]
temps[np.where(temps == -99.9)] = np.nan
# Remove all the nans.
# Trim both ends and fill nans in the middle.
# Find the first non-nan.
i_start = np.where(np.logical_not(np.isnan(temps)))[0][0]
temps = temps[i_start:]
year = year[i_start:]
month = month[i_start:]
day = day[i_start:]
i_nans = np.where(np.isnan(temps))[0]
print(i_nans)
What is wrong in the first code and why the second doesn't even give a warning?
Posting as it might help future users.
As correctly pointed out by others, np.isnan won't work for object or string dtypes. If you're using pandas, as mentioned here you can directly use pd.isnull, which should work in your case.
import pandas as pd
import numpy as np
var1 = ''
var2 = np.nan
>>> type(var1)
<class 'str'>
>>> type(var2)
<class 'float'>
>>> pd.isnull(var1)
False
>>> pd.isnull(var2)
True
Try replacing np.isnan with pd.isna. Pandas' isna supports category dtypes
What's the dtype of temps. I can reproduce your warning and error with a string dtype:
In [26]: temps = np.array([1,2,'string',0])
In [27]: temps
Out[27]: array(['1', '2', 'string', '0'], dtype='<U21')
In [28]: temps==-99.9
/usr/local/bin/ipython3:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
#!/usr/bin/python3
Out[28]: False
In [29]: np.isnan(temps)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-29-2ff7754ed926> in <module>()
----> 1 np.isnan(temps)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
First, comparing strings with the number gives this future warning.
Second, testing for nan produces the error.
Note that given the dtype, the nan assignment assigns a string value, not a float (np.nan is a float).
In [30]: temps[-1] = np.nan
In [31]: temps
Out[31]: array(['1', '2', 'string', 'nan'], dtype='<U21')
isnan(ndarray) fails on ndarray dtype of "object"
isnan(ndarray.astype(np.float)), but strings cannot be coerced to float.
This is likely a result of an unwanted float to string conversion. To repair it, just reverse it by adding string-to-float conversion (assuming data is convertible to a number) using float or np.float64:
np.isnan(float(str(np.nan)))
True
or
np.isnan(float(str("nan")))
True
rather than:
np.isnan(str(np.nan))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [164], line 1
----> 1 np.isnan(str(np.nan))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Note that if your data is NOT convertible to numbers (floats), you need to use a string-compatible function such as pd.isna instead of np.isnan.
I came across this error when trying to transform my dataset using sklearn.preprocessing.OneHotEncoder. The error was thrown by _check_unknown function defined in sklearn.utils._encode.
This was caused by the fact that, at transform time, one of the columns to be transformed had a type float64 as opposed to object - in my case an entire column was NaN.
The solution was to cast the dataframe to object type before invoking transform:
ohe.transform(data.astype("O"))
Note: This answer is somewhat related to the title of the question because this error prompts when working with Decimal types.
I got the same error when considering Decimal type values. For some reason, one column of the dataframe I'm considering comes as decimal. For example, when calling .unique() on this column I got
[Decimal('0'), Decimal('95'), Decimal('38'), Decimal('25'),
Decimal('42'), Decimal('11'), Decimal('18'), Decimal('22'),
.....Decimal('220'), Decimal('724')]
As the traceback of the error showed me that it failed when calling some numpy function. I manage to reproduce the error by considering the min and maxvalues of the above array
from decimal import Decimal
xmin, xmax = Decimal('0'), Decimal('724')
np.isnan([xmin, xmax])
it will prompt the error
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The solution in this case was to cast all these values to int.
df.astype({col:int for col in desired_columns_to_convert})

Taking logarithm of column

Im quite new to programming (in python) and I would like to create a new variable that is the logarithm of a column (from an imported excel file). I have tried different solutions from this site, but I keep getting an error. My latest error is AttributeError: 'str' object has no attribute 'log'.
I have already dropped all the values that are not "numbers', but I still don't know how to convert the values from strings to integers (if this is the case, because 'int(neighborhood)' doesn't work).
This is the code I have now:
import pandas as pd
import numpy as np
df=pd.read_excel("kwb-2016_del_col_del_row.xls")
df = df[df.m_woz != "."] # drop rows with values "."
neighborhood=df[df.recs=="Neighborhood"]
neighborhood=neighborhood["m_woz"]
print(neighborhood)
np.log(neighborhood)
and this is the error I'm getting:
AttributeError Traceback (most recent call last)
<ipython-input-66-46698de51811> in <module>()
12 print(neighborhood)
13
---> 14 np.log(neighborhood)
AttributeError: 'str' object has no attribute 'log'
Could someone help me please?
Perhaps you are not removing the data you think you are?
Try printing the data types to see what they are.
In a DataFrame, your column might be filled with objects instead of numbers.
print(df.dtypes)
Also, you might want to look at these two pages
Select row from a DataFrame based on the type of the object(i.e. str)
Pandas: convert dtype 'object' to int
Here's an example I constructed and ran interactively that correctly gets the logarithms (don't type >>>):
>>> raw_data = {'m_woz': ['abc', 'def', 1.23, 45.6, '.xyz'],
'recs': ['Neighborhood', 'Neighborhood',
'unknown', 'Neighborhood', 'whatever']}
>>> df = pd.DataFrame(raw_data, columns = ['m_woz', 'recs'])
>>> print(df.dtypes)
m_woz object
recs object
dtype: object
Note that the type is object, not float or int or str
Continuing on, here is what df and neighborhood look like:
>>> df
m_woz recs
0 42 Neighborhood
1 def Neighborhood
2 1.23 unknown
3 45.6 Neighborhood
4 .xyz whatever
>>> neighborhood=df[df.recs=="Neighborhood"]
>>> neighborhood
m_woz recs
0 42 Neighborhood
1 def Neighborhood
3 45.6 Neighborhood
And here are the tricks...
This line selects all rows in neighborhood that are int or float (be careful to fix indents if you copy/paste this
>>> df_num_strings = neighborhood[neighborhood['m_woz'].
apply(lambda x: type(x) in (int, float))]
>>> df_num_strings
m_woz recs
0 42 Neighborhood
3 45.6 Neighborhood
Almost there... convert the numbers to floating point from string
>>> df_float = df_num_strings['m_woz'].astype(str).astype(float)
>>> df_float
0 42.0
3 45.6
Finally, compute logarithms:
>>> np.log(df_float)
0 3.737670
3 3.819908
Name: m_woz, dtype: float64

Cannot compare type 'Timestamp' with type 'int'

When running the following code:
for row,hit in hits.iterrows():
forwardRows = data[data.index.values > row];
I get this error:
TypeError: Cannot compare type 'Timestamp' with type 'int'
If I look into what is being compared here I have these variables:
type(row)
pandas.tslib.Timestamp
row
Timestamp('2015-09-01 09:30:00')
is being compared with:
type(data.index.values[0])
numpy.datetime64
data.index.values[0]
numpy.datetime64('2015-09-01T10:30:00.000000000+0100')
I would like to understand whether this is something that can be easily fixed or should I upload a subset of my data? thanks
Although this isn't a direct answer to your question, I have a feeling that this is what you're looking for: pandas.DataFrame.truncate
You could use it as follows:
for row, hit in hits.iterrows():
forwardRows = data.truncate(before=row)
Here's a little toy example of how you might use it in general:
import pandas as pd
# let's create some data to play with
df = pd.DataFrame(
index=pd.date_range(start='2016-01-01', end='2016-06-01', freq='M'),
columns=['x'],
data=np.random.random(5)
)
# example: truncate rows before Mar 1
df.truncate(before='2016-03-01')
# example: truncate rows after Mar 1
df.truncate(after='2016-03-01')
When using values you put it into numpy world. Instead, try
for row,hit in hits.iterrows():
forwardRows = data[data.index > row];

Categories

Resources