I have a dataframe that looks like this (df1):
I want to recreate the following dataframe(df2) to look like df1:
The number of years in df2 goes up to 2020.
So, essentially for each row in df2, a new row for each year should be created. Then, new columns should be created for each month. Finally, the value for % in each row should be copied to the column corresponding to the month in the "Month" column.
Any ideas?
Many thanks.
This is pivot:
(df2.assign(Year=df2.Month.str[:4],
Month=df2.Month.str[5:])
.pivot(index='Year', columns='Month', values='%')
)
More details about pivoting a dataframe here.
Related
I have a dataframe with the following details: (first df in image)
I want to be able to add new rows to to df that calculate the column next_apt + days with the new timestamp that it was run. So I want it to look like this:
the other columns should be left as it it. just add the next next_apt with the newer timestamp that it was calculated and append the rows to the same df.
Use date_add and cast it to timestamp
This should work:
df1.withColumn("newDateWithTimestamp", F.date_add(F.col("next_apt"), F.col("days")).cast("timestamp")).show()
Input
Output
I have a dataframe 'raw' that looks like this -
It has many rows with duplicate values in each column.
I want to make a new dataframe 'new_df' which has unique customer_code corresponding and market_code.
The new_df should look like this -
It sounds like you simply want to create a DataFrame with unique customer_code which also shows market_code. Here's a way to do it:
df = df[['customer_code','market_code']].drop_duplicates('customer_code')
Output:
customer_code market_code
0 Cus001 Mark001
1 Cus003 Mark003
3 Cus004 Mark003
4 Cus005 Mark004
The part reading df[['customer_code','market_code']] gives us a DataFrame containing only the two columns of interest, and the drop_duplicates('customer_code') part eliminates all but the first occurrence of duplicate values in the customer_code column (though you could instead keep the last occurrence of each duplicate by calling it using the keep='last' argument).
I have the below dataframe and i am trying to display how many rides per day.
But i can see only 1 column "near_penn" is considered as a column but "Date" is not.
c = df[['start day','near_penn','Date']]
c=c.loc[c['near_penn']==1]
pre_pandemic_df_new=pd.DataFrame()
pre_pandemic_df_new=c.groupby('Date').agg({'near_penn':'sum'})
print(pre_pandemic_df_new)
print(pre_pandemic_df_new.columns)
Why doesn't it consider "Date" as a column?
How can i make Date as a column of "pre_pandemic_df_new"?
Feel you can use to to_datetime method.
import pandas as pd
pre_pandemic_df_new["Date"]= pd.to_datetime(pre_pandemic_df_new["Date"])
Hope this works
Why doesn't it consider "Date" as a column?
Because the date is an index for your Dataframe.
How can I make Date as a column of "pre_pandemic_df_new"?
you can try this:
pre_pandemic_df_new.reset_index(level=['Date'])
df[['Date','near_penn']] = df[['Date_new','near_penn_new']]
Once you created your dataframe you can try this to add new columns to the end of the dataframe to test if it works before you make adjustments
OR
You can check for a value for the first row corresponding to the first "date" row.
These are the first things that came to my mind hope it helps
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I have a temporary dataframe temp (as shown below) sliced from a larger dataframe.
I appreciate it if help me to assign the item_price value of each row to a related column associated with model as shown below:
Note: original and larger dataframe contains brands, prices and models which some of the rows have a similar brand name with different model and price, so I slice those similar records into temp dataframe and try to assign price to related columns associated with model for each record.
Thanks in advance!
If I were you I would delete the columns 'Sedan', 'Sport' and 'SUV' and use pivot
In your case you would want to do the following:
Create a new Dataframe called df1 like so:
df1 = df.pivot(index='brand', columns='model', values='item_price')
And then join your original DataFrame df1 with df1.
df = df.join(df1, on='brand')
This will give you the result you are looking for.
You can create a method that returns the value based on a condition like this:
I'm using df as the name of the dataframe, you can rename to temp.
def set_item_price(model):
if model == "Sedan":
return 78.00
return 0
df["item_price"] = [
set_item_price(a) for a in df['model']
]
I want to group by two columns. day of the week and another second column. but I don't know How should I do this.
It is my query for one column:
grouped = (df.groupby(df['time'].dt.weekday_name)['id'].count().rename('count'))
Where should I add the second column? for example "type" column in my dataframe.
df.groupby() takes a list, like this:
df.groupby([df['time'].dt.weekday_name, df['type']])