so lets say i have a pandas dataframe which has three columns, Account Number, Date, and Volume.
i want to be able to create a new dataframe with the same columns but filtered by a prompt i chose (in this case 2022-08-17) and all accounts.
in reality the sheet is much larger and has alot of accounts.
see example below:
thank you
IIUC, you need:
(df[df['Prompt'].eq('2022-08-17')]
.groupby(['Account', 'Prompt'], as_index=False)
.sum()
)
Output:
No output provided as the input format was an image
Related
I have created a dataframe with pandas.
There are more than 1000 rows
I want to merge rows of overlapping columns among them.
For convenience, there are example screenshots made in Excel.
I want to make that form in PYTHON.
I want to make the above data like below
This should be as simple as setting the index.
df = df.set_index('Symbol', append=True).swaplevel(0,1)
Output should be as desired.
This may be a really easy question but I am having a hard time to figure out.
I have two python data frames that I am trying to join, here is a snip of the information on each
enter image description here
enter image description here
What I am trying to accomplish is to add the name showing in the first dataframe to each row in the second, however when I try to do a pandas.merge its only doing one row
Let's suppose that df_1 is that in which you have columns named name and networkId, and df_2 is the dataframe to which you want to attach the name information via the key id.
Then,
merged = df_2.merge(df_1[["networkId","name"]], left_on="id", right_on="networkId", how="left")
print(merged.head())
If this is not in line with your example, please provide some initial data.
I have a excel in below format
Note:- Values in Column Name will be dynamic. In current example 10 records are shown. In another set of data it can be different number of column name.
I want to convert the rows into columns as below
Is there any easy option in python pandas to handle this scenario?
Thanks #juhat for the suggestion on pivot table. I was able to achieve the intended result with this code:
fsdData = pd.read_csv("py_fsd.csv")
fsdData.pivot(index="msg Srl", columns="Column Name", values="Value")
EDIT: Using advanced search in Excel (under data tab) I have been able to create a list of unique company names, and am now able to SUMIF based on the cell containing the companies name!
Disclaimer: Any python solutions would be greatly appreciated as well, pandas specifically!
I have 60,000 rows of data, containing information about grants awarded to companies.
I am planning on creating a python dictionary to store each unique company name, with their total grant $ given (agreemen_2), and location coordinates. Then, I want to display this using Dash (Plotly) on a live MapBox map of Canada.
First thing first, how do I calculate and store the total value that was awarded to each company?
I have seen SUMIF in other solutions, but am unsure how to output this to a new column, if that makes sense.
One potential solution I thought was to create a new column of unique company names, and next to it SUMIF all the appropriate cells in col D.
PYTHON STUFF SO FAR
So with the below code, I take a much messier looking spreadsheet, drop duplicates, sort based on company name, and create a new pandas database with the relevant data columns:
corp_df is the cleaned up new dataframe that I want to work with.
and recipien_4 is the companies unique ID number, as you can see it repeats with each grant awarded. Folia Biotech in the screenshot shows a duplicate grant, as proven with a column i did not include in the screenshot. There are quite a few duplicates, as seen in the screenshot.
import pandas as pd
in_file = '2019-20 Grants and Contributions.csv'
# create dataframe
df = pd.read_csv(in_file)
# sort in order of agreemen_1
df.sort_values("recipien_2", inplace = True)
# remove duplicates
df.drop_duplicates(subset='agreemen_1', keep='first', inplace=True)
corp_dict = { }
# creates empty dict with only 1 copy of all corporation names, all values of 0
for name in corp_df_2['recipien_2']:
if name not in corp_dict:
corp_dict[name] = 0
# full name, id, grant $, longitude, latitude
corp_df = df[['recipien_2', 'recipien_4', 'agreemen_2','longitude','latitude']]
any tips or tricks would be greatly appreciated, .ittertuples() didn't seem like a good solution as I am unsure how to filter and compare data, or if datatypes are preserved. But feel free to prove me wrong haha.
I thought perhaps there was a better way to tackle this problem, straight in Excel vs. iterating through rows of a pandas dataframe. This is a pretty open question so thank you for any help or direction you think is best!
I can see that you are using pandas to read de the file csv, so you can use the method:
Group by
So you can create a new dataframe making groupings for the name of the company like this:
dfnew = dp.groupby(['recipien_2','agreemen_2']).sum()
Then dfnew have the values.
Documentation Pandas Group by:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
The use of group_by followed by a sum may be the best for you:
corp_df= df.group_by(by=['recipien_2', 'longitude','latitude']).apply(sum, axis=1)
#if you want to transform the index into columns you can add this after as well:
corp_df=corp_df.reset_index()
I have two dataframes.
DF1 looks like:
DF2 looks like:
I need to find the mean of Question_3 from DF2, then add it as Question_3_Mean to the appropriate row matching ID_1 and ID_2.
I feel that this is something relatively trivial to do in Pandas, but I am not sure about the nomenclature to use in order to find out how.
What I did originally was create a new sheet in Excel and manually (with formulas) combined the two IDs, then used a pivot to get the averages, then did a vlookup to match the results. I then used that as my df for my seaborn chart.
I'd like to do all of this in Pandas though because this "matching" is a task I have to do often and I want to cut out that manual step.
Looks like you can try groupby() then merge:
df1.merge(df2.groupby(['ID_1','ID_2']).mean().add_suffix('_Mean'),
on=['ID_1','ID_2'])