Conversion RGB to xyY with colormath - python

With colormath I make a conversion from RGB to xyY value. It works fine for 1 RGB value, but I can't find the right code to do the conversion for multiple RGB values imported from an Excel. I use to following code:
from colormath.color_objects import sRGBColor, xyYColor
from colormath.color_conversions import convert_color
import pandas as pd
data = pd.read_excel(r'C:/Users/User/Desktop/Color/Fontane/RGB/FontaneHuco.xlsx')
df = pd.DataFrame(data, columns=['R', 'G', 'B'])
#print(df)
rgb = sRGBColor(df['R'],df['G'],df['B'], is_upscaled=True)
xyz = convert_color(rgb, xyYColor)
print(xyz)
But when i run this code i receive to following error:
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\pythonProject4\Overige\Chroma.py", line 9, in <module>
lab = sRGBColor(df['R'], df['G'], df['B'])
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\colormath\color_objects.py", line 524, in __init__
self.rgb_r = float(rgb_r)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py", line 141, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
Does anyone has an idea how to fix this problem?

convert_color is expecting floats and you're giving it dataframe columns instead. You need to apply the conversion one row at at time, which can be done as follows:
xyz = df.apply(
lambda row: convert_color(
sRGBColor(row.R, row.G, row.B, is_upscaled=True), xyYColor
),
axis=1,
)

Related

Python3 Pandas - handle overflow when casting to number greater than data type int64

I am writing a standard script where I will fetch the data from database, do some manipulation and insert data back into another table.
I am facing an overflow issue while converting a column's type in Dataframe.
Here's an example :
import numpy as np
import pandas as pd
d = {'col1': ['66666666666666666666666666666']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1'].astype('int64')
print(df)
Error :
Traceback (most recent call last):
File "HelloWorld.py", line 6, in <module>
df['col1'] = df['col1'].astype('int64')
File "/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py", line 5548, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 604, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py", line 409, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py", line 595, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/cast.py", line 974, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 615, in pandas._libs.lib.astype_intsafe
OverflowError: Python int too large to convert to C long
I cannot control the values inside d['col1'] because in the actual code it is being generated by another function.
How can I solve this problem if I want to keep the final data type as 'int64'.
I was thinking to catch the exception and then assign the largest int64 value to the whole column but then the rows of the column which are not overflowing might also lead to inconsistent results.
Can you advise me on some elegant solutions here?
With your idea, you can use np.iinfo
ii64 = np.iinfo(np.int64)
df['col1'] = df['col1'].astype('float128').clip(ii64.min, ii64.max).astype('int64')
print(df)
# Output
col1
0 9223372036854775807
Take care of the limit of float128 too :-D
>>> np.finfo(np.float128)
finfo(resolution=1e-18, min=-1.189731495357231765e+4932, max=1.189731495357231765e+4932, dtype=float128)
>>> np.iinfo('int64')
iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

python - Getting error while taking difference between two dates in columns

this is my code, I am trying to get business days between two dates
the number of days is saved in a new column 'nd'
import numpy as np
df1 = pd.DataFrame(pd.date_range('2020-01-01',periods=26,freq='D'),columns=['A'])
df2 = pd.DataFrame(pd.date_range('2020-02-01',periods=26,freq='D'),columns=['B'])
df = pd.concat([df1,df2],axis=1)
# Iterate over each row of the DataFrame
for index , row in df.iterrows():
bc = np.busday_count(row['A'],row['B'])
df['nd'] = bc
I am getting this error.
Traceback (most recent call last):
File "<input>", line 35, in <module>
File "<__array_function__ internals>", line 5, in busday_count
TypeError: Iterator operand 0 dtype could not be cast from dtype('<M8[us]') to dtype('<M8[D]') according to the rule 'safe'
Is there a way to fix it or another way to get the solution?
busday_count only accepts dates, not datetimes change
bc = np.busday_count(row['A'],row['B'])
to
np.busday_count(row['A'].date(), row['B'].date())

Pandas not recognizing the `.cat` command when changing column to categorical data

I have a data frame with six categorical columns that I would like to change to categorical codes. I use to use the following:
cat_columns = ['col1', 'col2', 'col3']
df[cat_columns] = df[cat_columns].astype('category')
df[cat_columns = df[cat_columns].cat.codes
I'm on pandas 1.0.5.
I'm getting the following error:
Traceback (most recent call last):
File "<ipython-input-54-80cc82e5db1f>", line 1, in <module>
train_sample[non_loca_cat_columns].astype('category').cat.codes
File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pandas\core\generic.py", line 5274, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'cat'
I am not sure how to accomplish what i'm trying to do.
The .cat is not applicable for Dataframe, so you have to apply for each column separately as series.
You can use .apply() and apply cat as a lambda function
df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes)
Or loop through the columns and use the cat funtion
for col in cat_columns:
df[col] = df[col].cat.codes

Lifelines boolean index in Python did not match indexed array along dimension 0; dimension is 88 but corresponding boolean dimension is 76

This very simple piece of code,
# imports...
from lifelines import CoxPHFitter
import pandas as pd
src_file = "Pred.csv"
df = pd.read_csv(src_file, header=0, delimiter=',')
df = df.drop(columns=['score'])
cph = CoxPHFitter()
cph.fit(df, duration_col='Length', event_col='Status', show_progress=True)
produces an error:
Traceback (most recent call last):
File
"C:/Users/.../predictor.py", line 11,
in
cph.fit(df, duration_col='Length', event_col='Status', show_progress=True)
File
"C:\Users\...\AppData\Local\conda\conda\envs\hrpred\lib\site-packages\lifelines\fitters\coxph_fitter.py",
line 298, in fit
self._check_values(df)
File "C:\Users\...\AppData\Local\conda\conda\envs\hrpred\lib\site-packages\lifelines\fitters\coxph_fitter.py",
line 323, in _check_values
cols = str(list(X.columns[low_var]))
File "C:\Users\...\AppData\Local\conda\conda\envs\hrpred\lib\site-packages\pandas\core\indexes\base.py",
line 1754, in _ _ getitem _ _
result = getitem(key)
IndexError: boolean index did not match indexed array along dimension 0; dimension is 88 but corresponding
boolean dimension is 76
However, when I print df itself, everything's all right. As you can see, everything is inside the library. And the library's examples work fine.
Without knowing what your data look like - I had the same error, which was resolved when I removed all but the duration, event and coefficient(s) from the pandas df I was using. That is, I had a lot of extra columns in the df that were confusing the cox PH fitter since you don't actually specify which coef you want to include as an argument to cph.fit().

panda tseries convertion not working

I am new to python and I am trying to build Time Series through this. I am trying to convert this csv data into time series, however by the internet and stack research, 'result' should have had
<class 'pandas.tseries.index.DatetimeIndex'>,
but my output is not converted time series. Why is it not converting? How do I convert it? Thanks for the help in advance.
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
data = pd.read_csv('somedata.csv')
print data.head()
#selecting specific columns by column name
df1 = data[['a','b']]
#converting the data to time series
dates = pd.date_range('2015-01-01', '2015-12-31', freq='H')
dates #preview
results:
DatetimeIndex(['2015-01-01 00:00:00', '2015-01-01 01:00:00',
...
'2015-12-31 23:00:00', '2015-12-31 00:00:00'],
dtype='datetime64[ns]', length=2161, freq='H')
Above is working, however I get error below:
df1 = Series(df1[:,2], index=dates)
output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'Series' is not defined
After attempting the pd.Series...
df1 = pd.Series(df1[:,2], index=dates)
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/someid/miniconda2/lib/python2.7/site- packages/pandas/core/frame.py", line 1992, in __getitem__
return self._getitem_column(key)
File "/home/someid/miniconda2/lib/python2.7/site- packages/pandas/core/frame.py", line 1999, in _getitem_column
return self._get_item_cache(key)
File "/home/someid/miniconda2/lib/python2.7/site- packages/pandas/core/generic.py", line 1343, in _get_item_cache
res = cache.get(item)
TypeError: unhashable type
you do need to have the pd.Series. However, you were also doing something else wrong. I'm assuming you want to get all rows, 2nd column of df1 and return a pd.Series with an index of dates.
Solution
df1 = pd.Series(df1.iloc[:, 1], index=dates)
Explanation
df1.iloc is used to return the slice of df1 by row/column postitioning
[:, 1] gets all rows, 2nd columns
Also, df1.iloc[:, 1] returns a pd.Series and can be passed into the pd.Series constructor.

Categories

Resources