I have the below dataframe and i am trying to display how many rides per day.
But i can see only 1 column "near_penn" is considered as a column but "Date" is not.
c = df[['start day','near_penn','Date']]
c=c.loc[c['near_penn']==1]
pre_pandemic_df_new=pd.DataFrame()
pre_pandemic_df_new=c.groupby('Date').agg({'near_penn':'sum'})
print(pre_pandemic_df_new)
print(pre_pandemic_df_new.columns)
Why doesn't it consider "Date" as a column?
How can i make Date as a column of "pre_pandemic_df_new"?
Feel you can use to to_datetime method.
import pandas as pd
pre_pandemic_df_new["Date"]= pd.to_datetime(pre_pandemic_df_new["Date"])
Hope this works
Why doesn't it consider "Date" as a column?
Because the date is an index for your Dataframe.
How can I make Date as a column of "pre_pandemic_df_new"?
you can try this:
pre_pandemic_df_new.reset_index(level=['Date'])
df[['Date','near_penn']] = df[['Date_new','near_penn_new']]
Once you created your dataframe you can try this to add new columns to the end of the dataframe to test if it works before you make adjustments
OR
You can check for a value for the first row corresponding to the first "date" row.
These are the first things that came to my mind hope it helps
Related
I have a dataframe with the following details: (first df in image)
I want to be able to add new rows to to df that calculate the column next_apt + days with the new timestamp that it was run. So I want it to look like this:
the other columns should be left as it it. just add the next next_apt with the newer timestamp that it was calculated and append the rows to the same df.
Use date_add and cast it to timestamp
This should work:
df1.withColumn("newDateWithTimestamp", F.date_add(F.col("next_apt"), F.col("days")).cast("timestamp")).show()
Input
Output
I got a dataframe that looks like this. What I want to do is to:
Sort my DF by DateTime
After sorting my DF by date, adding a new column that counts and acummulates values for EACH rowname in "Cod_atc".
The problem is that everytime I add this new column, no matter what I do, I can not get my DF sorted by DateTime
This is the code I am using, I am just adding a column called "DateTime" and sorting everything by that column. The problem is when i add the new column called "count".
df1['DateTime'] = pd.to_datetime(df1['local_date'])
df1.sort_values(by='DateTime')
df1['count']=df1.groupby(['cod_atc']).cumcount() #sort=False
df1
This is the result I get and the problem is that, if I try to sort my DF by DateTime again, it works but the "count" column would not make any sense! "Count" column should be counting and acumulating values for EACH rowname in "COD_Atc" but following the DATETIME!
Did you not forgot to add inplace = True when you sorted df1?
Without that you lose the sort step.
df1['DateTime'] = pd.to_datetime(df1['local_date'])
df1.sort_values(by='DateTime', inplace =True)
df1['count']=df1.groupby(['cod_atc']).cumcount() #sort=False
df1
I realize this is probably a very trivial question but I have a dataframe of 1000+ rows and I want to create a new column "Date" but for a single date "2018-01-31". I tried the code below but python just returns "Length of values (1) does not match length of index"
I would really appreciate any help!
Date = ['2018-01-31']
for i in range(len(Output)):
Output['Date']= Date
Assuming Output is the name of your pandas dataframe with 1000+ rows you can do:
Output['Date'] = "2018-01-31"
or using the datetime library you could do:
from datetime import date
Output["Date"] = date(2018, 1, 31)
to format it as a date object rather than a string. You also do not need to iterate over each row if you are wanting the same value for each row. Simply adding a new column with the value will set the value of the new column to the assigned value for each row.
I have a dataframe that looks like this (df1):
I want to recreate the following dataframe(df2) to look like df1:
The number of years in df2 goes up to 2020.
So, essentially for each row in df2, a new row for each year should be created. Then, new columns should be created for each month. Finally, the value for % in each row should be copied to the column corresponding to the month in the "Month" column.
Any ideas?
Many thanks.
This is pivot:
(df2.assign(Year=df2.Month.str[:4],
Month=df2.Month.str[5:])
.pivot(index='Year', columns='Month', values='%')
)
More details about pivoting a dataframe here.
I want to group by two columns. day of the week and another second column. but I don't know How should I do this.
It is my query for one column:
grouped = (df.groupby(df['time'].dt.weekday_name)['id'].count().rename('count'))
Where should I add the second column? for example "type" column in my dataframe.
df.groupby() takes a list, like this:
df.groupby([df['time'].dt.weekday_name, df['type']])