I saw this code
combine rows and add up value in dataframe,
but I want to add the values in cells for the same day, i.e. add all data for a day. how do I modify the code to achieve this?
Check below code:
import pandas as pd
df = pd.DataFrame({'Price':[10000,10000,10000,10000,10000,10000],
'Time':['2012.05','2012.05','2012.05','2012.06','2012.06','2012.07'],
'Type':['Q','T','Q','T','T','Q'],
'Volume':[10,20,10,20,30,10]
})
df.assign(daily_volume = df.groupby('Time')['Volume'].transform('sum'))
Output:
Related
The Dataframe that I am working on it have a column called "Brand" that have a value called "SEAT " with the white space. I achieved to drop the white space but I don't know how to put the new column inside the previous Dataframe. I have to do this because I need to filter the previous Dataframe by "SEAT" and show these rows.
I tried this:
import pandas as pd
brand_reviewed = df_csv2.Brand.str.rstrip()
brand_correct = 'SEAT'
brand_reviewed.loc[brand_reviewed['Brand'].isin(brand_correct)]
Thank you very much!
As far as I understand,
you're trying to return rows that match the pattern "SEAT".
You are not forced to create a new column. You can directly do the following:
df2 = brand_reviewed[brand_reviewed.Brand.str.rstrip() == "SEAT"]
print(df2)
You have done great. I will also mentioned another form on how you can clead the white spaces. And also, if you just want to add a new column in your current dataframe, just write the last line of this code.
import pandas as pd
brand_reviewed = pd.read_csv("df_csv2.csv")
data2 = data["Brand"].str.strip()
brand_reviewed["New Column"] = data2
If you have another query, let me know.
Octavio Velázquez
I have the below dataframe and i am trying to display how many rides per day.
But i can see only 1 column "near_penn" is considered as a column but "Date" is not.
c = df[['start day','near_penn','Date']]
c=c.loc[c['near_penn']==1]
pre_pandemic_df_new=pd.DataFrame()
pre_pandemic_df_new=c.groupby('Date').agg({'near_penn':'sum'})
print(pre_pandemic_df_new)
print(pre_pandemic_df_new.columns)
Why doesn't it consider "Date" as a column?
How can i make Date as a column of "pre_pandemic_df_new"?
Feel you can use to to_datetime method.
import pandas as pd
pre_pandemic_df_new["Date"]= pd.to_datetime(pre_pandemic_df_new["Date"])
Hope this works
Why doesn't it consider "Date" as a column?
Because the date is an index for your Dataframe.
How can I make Date as a column of "pre_pandemic_df_new"?
you can try this:
pre_pandemic_df_new.reset_index(level=['Date'])
df[['Date','near_penn']] = df[['Date_new','near_penn_new']]
Once you created your dataframe you can try this to add new columns to the end of the dataframe to test if it works before you make adjustments
OR
You can check for a value for the first row corresponding to the first "date" row.
These are the first things that came to my mind hope it helps
In one of the code snippet, the authors provide the input as:
variants = [ 'rs425277', 'rs1571149', 'rs1240707', 'rs1240708', 'rs873927', 'rs880051', 'rs1878745', 'rs2296716', 'rs2298217', 'rs2459994' ]
However I have similar values as one of the column in csv format. I would like to know how I can supply one of the column as input similar to above example?
Thanks in advance
First, import your csv as a Pandas df.
df = pd.read_csv('data.csv')
Then, you can get a list from pandas dataframe column:
col_one_list = df['column_one'].tolist()
My python code produces a pandas dataframe that looks as follows:
enter image description here
I need to transform it to another format to achieve following: loop through every row in the dataframe and output as many data frames as rows in the table. Each dataframe should have a additional column: timestamp and be named as the value in "Type" Column. So for instance I'd have
enter image description here
I am struggling with where to start- I hope someone here can advise me?
Here is a code for what you want to achieve. It takes a csv file like yours. Loops through the rows. Adds a column with current time and saves each row in a separate csv. Let me know if it works for you.
import pandas as pd
from datetime import datetime
#Give path to your csv
df = pd.read_csv('C:/Users/username/Downloads/test.csv')
#iterating on rows in dataframe
for index, row in df.iterrows():
#adding a new columns with value in the row
df.loc[index, 'Timestamp'] = datetime.now().strftime('%c')
print(df.loc[index])
#saving row in a new dataframe
df_new = df.loc[index].to_frame().T
#saving the dataframe in a separate csv
df_new.to_csv(f'C:/Users/username/Downloads/test_{index}.csv', index= False)
Pandas' .to_records? is your friend (doc here.)
from datetime import datetime
list_of_final_dataframes = []
for record in df.to_dict(orient='records'):
record_with_timestamp = {**record, **{'timestamp': datetime.now()}}
list_of_final_dataframes.append(pd.DataFrame([record_with_timestamp]))
I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]