I have tried to write the following code but I get the error message: "ValueError: Cannot shift with no freq."
I have no idea of how to fix it? I tried to google on the error message but couldn't find any case similar to mine.
df is a python pandas dataframe for which I want to create new columns showing the daily change. The code is shown below. How can I fix the code to avoid the value error?
for column_names in df:
df[column_names+'%-daily'] =df[column_names].pct_change(freq=1).fillna(0)
The problem was that I had date as index. Since only weekdays were shown delta became incorrect. When I changed to period.
for column_names in list(df.columns.values):
df[column_names+'%-daily'] =df[column_names].pct_change(periods=1).fillna(0)
Related
I am very new to Python and cannot seem to solve the problem on my own. Currently I have a dataset which I already converted to a DataFrame using pandas which has a datetimeindex according to yyyy-mm-dd-HH-MM-SS with time stamps of minutes. The attached figure shows the already interpolated dataframe.
enter image description here
Now I want to convert the date/datetimeindex to week numbers to plot the corresponding HVAC Actual, Chiller power etc. to their week number. The index already was set to time but I got an error telling that 'Time' was not recognized in the columns. I tried to recall the index like in the code below and from there create a new column using dt.week
building_interpolated = building_interpolated.set_index('Time')
building_interpolated['Week number'] =
building_interpolated['Time'].dt.week
If I am correct this should create a new column called Week number with the week number in it. However, I still get an error telling that ['Time'] is not in the columns (see figure below)
enter image description here
Anyone who can help me?
Regards, nooby Boaz ;)
df.index = df.index.to_series().dt.isocalendar().week
This is a 2 part question. I would like to know more about is pd.(NaT) and why did it give me an error that did not exists. This was the solution to my problem but I do not understand the error I got and why the solution works.
First, I would like to say this error has come up before and I have solved it by going to one of these 3 questions asked in the links below. Resetting the index is a normal solution here.
Concat DataFrame Reindexing only valid with uniquely valued Index objects
changing all dates to standard date time in dataframe
How to convert a given ordinal number (from Excel) to a date
None of these worked and I was able to solve my problem but there is no online solution for it or an explanation on why it works.
My problem:
I needed to convert an object to a datetime for 4 series in my pandas data frame. date1 to date4 is what I'll call them for this question.
Date1 converted no problem, but the other 3 in the same data frame I would get that error in the subject title. After two hours of research and finding nothing to solve my problem. I remembered I had come across the pd.NaT before so solve a past problem to fill in missing values.
To solved it I did the following:
df_both["Date2"] = df_both["Date2"].fillna(pd.NaT)
df_both["Date3"] = df_both["Date3"].fillna(pd.NaT)
df_both["Date4"] = df_both["Date4"].fillna(pd.NaT)
then a normal Converting
df_both["Date2"] = pd.to_datetime(df_both["Date2"])
df_both["Date3"] = pd.to_datetime(df_both["Date3"])
df_both["Date4"] = pd.to_datetime(df_both["Date4"])
Can someone please explain to me how this works? Also, My index was unique when i tested it with df_both.index.is_unique = True. So why did I get that specific error?
Side note Date1 had missing values as well and did not need the pd.NaT to make it work.
I had big table which I sliced to many smaller tables based on their dates:
dfs={}
for fecha in fechas:
dfs[fecha]=df[df['date']==fecha].set_index('Hour')
#now I can acess the tables like this:
dfs['2019-06-23'].head()
I have done some modifictions to the dfs['2019-06-23'] specific table and now I would like to save it on my computer. I have tried to do this in two ways:
#first try:
dfs['2019-06-23'].to_csv('specific/path/file.csv')
#second try:
test=dfs['2019-06-23']
test.to_csv('test.csv')
both of them raised this error:
TypeError: get_handle() got an unexpected keyword argument 'errors'
I don't know why I get this error and haven't find any reason for that. I have saved many files this way but never had that before.
My goal: to be able to save this dataframe after my modification as csv
If you are getting this error, there are two things to check:
Whether the DataFrame is not actually a Series - see (Pandas : to_csv() got an unexpected keyword argument)
Your numpy version. For me, updating to numpy==1.20.1 with pandas==1.2.2 fixed the problem. If you are using Jupyter notebooks, remember to restart the kernel afterwards.
In the end what worked was to use pd.DataFrame and then to export it as following:
to_export=pd.DataFrame(dfs['2019-06-23'])
to_export.to_csv('my_table.csv')
that suprised me because when I checked the type of the table when I got the error it was dataframe . However, this way it works.
I have a dataframe that's the result of importing a csv and then performing a few operations and adding a column that's the difference between two other columns (column 10 - column 9 let's say). I am trying to sort the dataframe by the absolute value of that difference column, without changing its value or adding another column.
I have seen this syntax over and over all over the internet, with indications that it was a success (accepted answers, comments saying "thanks, that worked", etc.). However, I get the error you see below:
df.sort_values(by='Difference', ascending=False, inplace=True, key=abs)
Error:
TypeError: sort_values() got an unexpected keyword argument 'key'
I'm not sure why the syntax that I see working for other people is not working for me. I have a lot more going on with the code and other dataframes, so it's not a pandas import problem I don't think.
I have moved on and just made a new column that is the absolute value of the difference column and sorted by that, and exclude that column from my export to worksheet, but I really would like to know how to get it to work the other way. Any help is appreciated.
I'm using Python 3
df.loc[(df.c - df.b).sort_values(ascending = False).index]
Sorting by difference between "c" and "b" , without creating new column.
I hope this is what you were looking for.
key is optional argument
It accepts series as input , maybe you were working with dataframe.
check this
I have a data frame as the following;
I am trying to use the reshape function from pandas package and it keep giving me the error that
" the id variables need to uniquely identify each row".
This is my code to reshape:
link to the data: https://pastebin.com/GzujhX3d
GG_long=pd.wide_to_long(data_GG,stubnames='time_',i=['Customer', 'date'], j='Cons')
The combination of 'Customer' and 'Date' is a unique row within my data, so I don't understand why it throws me this error and how I can fix it. Any help is appreciated.
I could identify the issue. the error was due to two things- first the name of the columns having ":" in them and second the format of the date- for some reason it doesn't like dd-mm-yy, instead it works with dd/mm/yy.