I am very new to Python and cannot seem to solve the problem on my own. Currently I have a dataset which I already converted to a DataFrame using pandas which has a datetimeindex according to yyyy-mm-dd-HH-MM-SS with time stamps of minutes. The attached figure shows the already interpolated dataframe.
enter image description here
Now I want to convert the date/datetimeindex to week numbers to plot the corresponding HVAC Actual, Chiller power etc. to their week number. The index already was set to time but I got an error telling that 'Time' was not recognized in the columns. I tried to recall the index like in the code below and from there create a new column using dt.week
building_interpolated = building_interpolated.set_index('Time')
building_interpolated['Week number'] =
building_interpolated['Time'].dt.week
If I am correct this should create a new column called Week number with the week number in it. However, I still get an error telling that ['Time'] is not in the columns (see figure below)
enter image description here
Anyone who can help me?
Regards, nooby Boaz ;)
df.index = df.index.to_series().dt.isocalendar().week
Related
This is a 2 part question. I would like to know more about is pd.(NaT) and why did it give me an error that did not exists. This was the solution to my problem but I do not understand the error I got and why the solution works.
First, I would like to say this error has come up before and I have solved it by going to one of these 3 questions asked in the links below. Resetting the index is a normal solution here.
Concat DataFrame Reindexing only valid with uniquely valued Index objects
changing all dates to standard date time in dataframe
How to convert a given ordinal number (from Excel) to a date
None of these worked and I was able to solve my problem but there is no online solution for it or an explanation on why it works.
My problem:
I needed to convert an object to a datetime for 4 series in my pandas data frame. date1 to date4 is what I'll call them for this question.
Date1 converted no problem, but the other 3 in the same data frame I would get that error in the subject title. After two hours of research and finding nothing to solve my problem. I remembered I had come across the pd.NaT before so solve a past problem to fill in missing values.
To solved it I did the following:
df_both["Date2"] = df_both["Date2"].fillna(pd.NaT)
df_both["Date3"] = df_both["Date3"].fillna(pd.NaT)
df_both["Date4"] = df_both["Date4"].fillna(pd.NaT)
then a normal Converting
df_both["Date2"] = pd.to_datetime(df_both["Date2"])
df_both["Date3"] = pd.to_datetime(df_both["Date3"])
df_both["Date4"] = pd.to_datetime(df_both["Date4"])
Can someone please explain to me how this works? Also, My index was unique when i tested it with df_both.index.is_unique = True. So why did I get that specific error?
Side note Date1 had missing values as well and did not need the pd.NaT to make it work.
Apologies in advance if I say something that does not make sense. I have been trying to teach myself Python through Udemy courses, so I am not the most knowledgeable.
I work on an online text game and am looking at ways to tell if two accounts are being run by the same person. One thing I think would be helpful would be comparing the times that the accounts are active. I know there are ways to make better graphs, but for now, I am trying to start simple.
I have a df of the logs. One of the columns is a 'time' column. Example '15:27:33'
At first, I tried to make a bar chart to count the number of activities that happened every 15 mins.
timedf = df.filter(['time'])
timedf.set_index('time', drop=False, inplace=True)
timedf.groupby(pd.Grouper(freq='15Min')).count().plot(kind='bar')
But I got the error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
I tried making it with DateTime instead and it plotted but since this df spans 60 days it was unreadable.
I am just looking to plot the number of logs a user has during each 15 min window of a 24 hour day.
Any recommendations?
You can try convert the data type of the 'time' column to a DatetimeIndex data type so you can perform such operations
df['time'] = pd.to_datetime(df['time'])
I am currently working with a dataset which has two DateTime columns: ACTUAL_SHIPMENT_DTM and SHIPMENT_CONFIRMED_DTM.
I am trying to find the difference in time between the two columns. I have tried the following code but the output is giving me the time difference of one column based on the rows. Basically I want a new column to be populated with the time difference of (ACTUAL_SHIPMENT_DTM - SHIPMENT_CONFIRMED_DTM).
Golden['Cycle_TIme'] = Golden.groupby('ACTUAL_SHIPMENT_DTM')
['SHIPMENT_CONFIRMED_DTM'].diff().dt.total_seconds()
Can anyone see errors in my code or guide me to proper documentation?
Lol I underestimated myself and asked a question way too soon. Well if anyone wants to know how to find the time difference between two columns here is my example code. Golden = DataFrame
Golden['Cycle_TIme'] = Golden["SHIPMENT_CONFIRMED_DTM"]-
Golden["ACTUAL_SHIPMENT_DTM"]
Hi hoping someone can help. I have a data frame where one of the columns contains a list of names. These names are repeated in some circumstances but not all. I am trying to plot a graph where the x-axis contains the name and then the y-axis contains the number of times that name appears in the column.
I have used the following to count the number of time each name appears.
df.groupby('name').name.count()
Then tried to use the following to plot the graph. However, I get a key error messasge.
df.plot.bar(x='name', y=df.groupby('name').name.count())
Anyone able to tell me what I am doing wrong?
Thanks
I believe you need plot Series returned from count function by Series.plot.bar:
df.groupby('name').name.count().plot.bar()
Or use value_counts:
df['name'].value_counts().plot.bar()
I have tried to write the following code but I get the error message: "ValueError: Cannot shift with no freq."
I have no idea of how to fix it? I tried to google on the error message but couldn't find any case similar to mine.
df is a python pandas dataframe for which I want to create new columns showing the daily change. The code is shown below. How can I fix the code to avoid the value error?
for column_names in df:
df[column_names+'%-daily'] =df[column_names].pct_change(freq=1).fillna(0)
The problem was that I had date as index. Since only weekdays were shown delta became incorrect. When I changed to period.
for column_names in list(df.columns.values):
df[column_names+'%-daily'] =df[column_names].pct_change(periods=1).fillna(0)