Apologies in advance if I say something that does not make sense. I have been trying to teach myself Python through Udemy courses, so I am not the most knowledgeable.
I work on an online text game and am looking at ways to tell if two accounts are being run by the same person. One thing I think would be helpful would be comparing the times that the accounts are active. I know there are ways to make better graphs, but for now, I am trying to start simple.
I have a df of the logs. One of the columns is a 'time' column. Example '15:27:33'
At first, I tried to make a bar chart to count the number of activities that happened every 15 mins.
timedf = df.filter(['time'])
timedf.set_index('time', drop=False, inplace=True)
timedf.groupby(pd.Grouper(freq='15Min')).count().plot(kind='bar')
But I got the error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
I tried making it with DateTime instead and it plotted but since this df spans 60 days it was unreadable.
I am just looking to plot the number of logs a user has during each 15 min window of a 24 hour day.
Any recommendations?
You can try convert the data type of the 'time' column to a DatetimeIndex data type so you can perform such operations
df['time'] = pd.to_datetime(df['time'])
Related
I am trying to create a Calendar Heatmap looking at a sport team's Attendance by Date.
Sample DF
The Date column is an 'Object' so I tried a very lazy attempt at using "df['Date'] = pd.to_datetime(df['Date'])" but I would get the error of "Out of bounds nanosecond timestamp: 1-04-07 00:00:00".
I've also tried changing the format of 'Thursday, Apr 7' using something along the lines of "strftime('%m,%d,%Y')" but then I would get errors saying something about unable to work on Series.
I am lost in what to do next. I'm hoping someone can assist me in this.
I am very new to Python and cannot seem to solve the problem on my own. Currently I have a dataset which I already converted to a DataFrame using pandas which has a datetimeindex according to yyyy-mm-dd-HH-MM-SS with time stamps of minutes. The attached figure shows the already interpolated dataframe.
enter image description here
Now I want to convert the date/datetimeindex to week numbers to plot the corresponding HVAC Actual, Chiller power etc. to their week number. The index already was set to time but I got an error telling that 'Time' was not recognized in the columns. I tried to recall the index like in the code below and from there create a new column using dt.week
building_interpolated = building_interpolated.set_index('Time')
building_interpolated['Week number'] =
building_interpolated['Time'].dt.week
If I am correct this should create a new column called Week number with the week number in it. However, I still get an error telling that ['Time'] is not in the columns (see figure below)
enter image description here
Anyone who can help me?
Regards, nooby Boaz ;)
df.index = df.index.to_series().dt.isocalendar().week
I am at a total loss as to why this is impossible to find but I really just want to be able to groupby and then export to excel. Don't need counts, or sums, or anything else and can only find examples including these functions. Tried removing those functions and the whole code just breaks.
Anyways:
Have a set of monthly metrics - metric name, volumes, date, productivity, and fte need. Simple calcs got the data looking nice, good to go. Currently it is grouped in 1 month sections so all metrics from Jan are one after the other etc. Just want to change the grouping so first section is individual metrics from Jan to Dec and so on for each one.
Initial data I want to export to excel (returns not a dataframe error)
dfcon = pd.concat([PmDf,ReDf])
dfcon['Need'] = dfcon['Volumes'] / (dfcon['Productivity']*21*8*.80)
dfcon[['Date','Current Team','Metric','Productivity','Volumes','Need']]
dfg = dfcon.groupby(['Metric','Date'])
dfg.to_excel(r'S:\FilePATH\GroupBy.xlsx', sheet_name='pandas_group', index = 0)
The error I get here is: 'DataFrameGroupBy' object has no attribute 'to_excel' (I have tried a variety of conversions to dataframes and closest I can get is a correct grouping displaying counts only for each one, which I do not need in the slightest)
I have also tried:
dfcon.sort('Metric').to_excel(r'S:\FILEPATH\Grouped_Output.xlsx', sheet_name='FTE Need', index = 0)
this returns the error: AttributeError: 'DataFrame' object has no attribute 'sort'
Any help you can give to get this to be able to be exported grouped in excel would be great. I am at my wits end here after over an hour of googling. I am also self taught so feel like I may be missing something very, very basic/simple so here I am!
Thank you for any help you can provide!
Ps: I know I can just sort after in excel but would rather learn how to make this work in python!
I am pretty sure sort() doesnt work anymore, try sort_values()
I have a pandas data frame of dtype: int64, I then convert it to date time using pd.to_datetime. This gives a date as well as the time of day, I only want to plot a distribution plot of the times of the day. I have tried many different things and keep running into errors, I will post the code of my latest error:
type(justTime)
This returns 'pandas.core.frame.DataFrame' so I know it is a data frame.
justTime['ACCESS_TIME'].value_counts()
This returns a value_counts list of '2020-08-08 12:44:19.000' type objects, which it calls dtype: int64. Of note is when I do: type(justTime['ACCESS_TIME']) it returns 'pandas.core.series.Series'.
Next, I make it a datetime by doing the following:
justTime['ACCESS_TIME'] = pd.to_datetime(justTime['ACCESS_TIME'])
If I do the following: justTime['DE_ID_ACCESS_TIME'].dt.time it prints a list of just the times; for example "13:04:41" but shows them being of dtype: object.
Therefore, when I try
ax = sns.distplot(justTime['ACCESS_TIME'].dt.time)
I get the error: "TypeError: float() argument must be a string or a number, not 'datetime.time'"
Essentially I have a data frame of datetime object where I want to plot a distribution plot of just the times, no dates. I want to see around what time of the day these access times are clustering, and I have run into so many problems of how to handle this. Any help is appreciated, thank you.
with the new update on pandas I can't use this function that I used on on Datacamp learning course - (DAYOFWEEK doesn't exist anymore)
days_of_week = pd.get_dummies(dataframe.index.dayofweek,
prefix='weekday',
drop_first=True)
How can I change the syntax of my 'formula' to get the same results?
Sorry about the silly question but spent a lot of time here and I'm stuck...
Thanks in advance!
already tried just using the dataframe with index but doesn't get the days of the week on the get dummies\
used datetimeindex but messing up on the formulation as well
`days_of_week = pd.get_dummies(dataframe.index.dayofweek, prefix='weekday', drop_first=True)`
the dataframe is fairly big and need the outputs to get me the weekdays because I'm dealing with stock prices
Try weekday instead of dayofweek.
So
days_of_week = pd.get_dummies(dataframe.index.weekday,
prefix='weekday',
drop_first=True)
See docs below:
pandas.Series.dt.weekday