How to sum over sheets that have same index in pandas? - python

I have an excel data containing multiple sheets.
The data is hourly rainfall data value that have spatial index longitude in row and latitude in column.
This is the excel file
I need to sum over all these sheets to get the daily rainfall data.
How could I do with pandas in python?

You can use dict = pd.read_excel('myfile.xlsx', sheetname=None). This function gives you a dictionary of sheets that iterate by for sheet in dict.items(). The rest is same as the solution that #cs95 has provided. Concat them inside the for loop and then group by lat/long.

Related

How to iterate through for loop and write each dataframe to different excel sheets

I'm looping through list of jsons and storing them in a dataframe. for each iteration i want to write the dataframe into excel with different sheets. how to achieve this?
for item in data:
#removing empty columns in raw data
drop_none = lambda path, key, value: key is not None and value is not None
cleaned = remap(item, visit=drop_none)
new_data=flatten(cleaned)
#my_df = new_data.dropna(axis='columns', how='all') # Drops columns with all NA values
dfFromRDD2 = spark.createDataFrame(new_data)
I want to save the dataframe dfFromRDD2 to excel with different sheets on each iterations.
is their a way to do it using python?

Split per attribute

I am trying to read a big CSV. Then split big CSV into smaller CSV files, based on unique values in the column team.
At first I created new dataframes for each team. The new txt files generated, one for each unique value in team column.
Code:
import pandas as pd
df = pd.read_csv('combined.csv')
df = df[df.team == 'RED']
df.to_csv('RED.csv')
However I want to start from a single dataframe, read all unique 'teams', and create a .txt file for each team, with headers.
Is it possible?
pandas.DataFrame.groupby, when used without an aggregation, returns the dataframe components associated with each group in the groupby column.
The following code will create a file for the data associated to each unique value in the column used to groupby.
Use f-strings to create a unique filename for each group.
import pandas as pd
# create the dataframe
df = pd.read_csv('combined.csv')
# groupby the desired column and iterate through the groupby object
for group, dataframe in df.groupby('team'):
# save the dataframe for each group to a csv
dataframe.to_csv(f'{group}.txt', sep='\t', index=False)

Extract multiple dataframes from dictionary with Python

I'm using the pandas library in Python.
I've taken an excel file and stored the contents in a data frame by doing the following:
path = r"filepath"
sheets_dict = pd.read_excel(path,sheet_name=None)
As there was multiple sheets, each containing a table of data with identical columns, I used pd.read_excel(path,sheet_name=None). This stored all the individual sheets into a dictionary with the key for each value/sheet being the sheet name.
I now what to unpack the dictionary and place each sheet into a single data frame. I want to use the key of each sheet in the dictionary as either part of a mulitindex so I know what key/sheet of each table came from or appended as a new column which gives me the key/sheet name for each unique subset of the dataframe.
I've tried the following:
for k,df in sheets_dict.items():
df = pd.concat([pd.DataFrame(df)])
df['extract'] = k
However I'm not getting the results I want.
Any suggestions?
you can use the keys argument in pd.concat which will set the keys of your dict as the index.
df = pd.concat(sheets_dict.values(),keys=sheets_dict.keys())
by default, pd.concat(sheet_dict) will set the indices as the keys.

Fill an existing Excel file with data from a Pandas DataFrame

I have a Pandas DataFrame with a bunch of rows and labeled columns.
I also have an excel file which I prepared with one sheet which contains no data but only
labeled columns in row 1 and each column is formatted as it should be: for example if I
expect percentages in one column then that column will automatically convert a raw number to percentage.
What I want to do is fill the raw data from my DataFrame into that Excel sheet in such a way
that row 1 remains intact so the column names remain. The data from the DataFrame should fill
the excel rows starting from row 2 and the pre-formatted columns should take care of converting
the raw numbers to their appropriate type, hence filling the data should not override the column format.
I tried using openpyxl but it ended up creating a new sheet and overriding everything.
Any help?
If you're certain about the order of columns is same, you can try this after opening the sheet with openpyxl:
df.to_excel(writer, startrow = 2,index = False, Header = False)
If your # of columns and order is same then you may try xlsxwriter and also mention the sheet name to want to refresh:
df.to_excel('filename.xlsx', engine='xlsxwriter', sheet_name='sheetname', index=False)

How to fix the order of columns in dict

I have a series which I grouped and now I want to save that series as csv file with both index and values as two columns(Index followed by values).
So I first tried to covert the series as dataframe and then save the data frame as csv.
s_group_count=df_page_concat.groupby(df_page_concat).count()
df_grouped_values=pd.DataFrame({"page_path":s_group_count.index,"count":s_group_count.values})
Problem is since it is using dict to create a dataframe and that dicts are not ordered, it is adding the count which is the value of series as the first column in dataframe while I want the Index as first column and values (count) as second column.
Any advise how to fix the order and if this is the most optimal way to create a csv out of series with index stored as another column?
from collections import OrderedDict
This has been particularly helpful for me in groupby.agg operations, to enforce column order
So pass in
OrderedDict([("page_path",s_group_count.index),("count",s_group_count.values)])

Categories

Resources