Processing excel sheet in python - python

I have an excel sheet with a list of experiments, as shown in the picture below. How can I access specific rows and columns to find the mean and std dev? I am able to load the excel file and read the data using pandas, but I am not sure where to go from there. Ideally, the code can process sheets with many experiment results listed.
Excel input
For output, I would like a table summarizing the results, as shown in the picture below:
Result Summary

I am not sure if this is efficient solution, but this will do.
import pandas as pd
df = pd.read_excel("~/Desktop/delete.xlsx", header=None).T
df.dropna(axis=1, inplace = True)
df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace = True)
cols = df.columns.unique()
df1 = pd.DataFrame(df.values.reshape(3, 8), columns=cols)
df1['mean'] = df1[df1.columns[~df1.columns.isin(['Test','Sample', 'Site'])]].mean(axis=1)
df1['std'] = df1[df1.columns[~df1.columns.isin(['Test','Sample', 'Site'])]].std(axis=1)
print(df1[['Sample', 'mean', 'std']])

Related

How to read excel data only after a string is found but without using skiprows

I want to read the data after the string "Executed Trade". I want to do that dynamically. Not using "skiprows". I know openpyxl can be an option. But I am still struggling to do so. Could you guys please help me with that thing cause I have many files like the one is shown in image.
Try:
import pandas as pd
#change the Excel filename and the two mentions of 'col1' for whatever the column is
df = pd.read_excel('dictatorem.xlsx')
df = df.iloc[df.col1[df.col1 == 'Executed Trades'].index.tolist()[0]+1:]
df.columns = df.iloc[0]
df = df[1:]
df = df.reset_index(drop=True)
print(df)
Example input/output:

Multiindex Dataframe, Pandas

I am trying to manipulate a data from excel file, however it has merged heading for columns, I managed to transform them in pandas. Please see example of original data below.
So I transformed to this format.
and my final goal is to get the format below and plot brand items and their sales quantity and prices over the period, however I don't know how to access info in multiindex dataframe. Could you please suggest something. Thanks.
My code:
import pandas as pd
df = pd.read_excel('path.xls', sheet_name = 'data', header = [0,1])
a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).fillna('')
df.columns = [b, df.columns.get_level_values(1)]
df.drop(0, inplace=True)
try pandas groupby or pivot_table. The pivot table include index, columns, values and aggfunc. It really nice for summarizing data.

Import several sheets from the same excel into one dataframe in pandas

I have one excel file with several identical structured sheets on it (same headers and number of columns) (sheetsname: 01,02,...,12).
How can I get this into one dataframe?
Right now I would load it all seperate with:
df1 = pd.read_excel('path.xls', sheet_name='01')
df2 = pd.read_excel('path.xls', sheet_name='02')
...
and would then concentate it.
What is the most pythonic way to do it and get directly one dataframe with all the sheets? Also assumping I do not know every sheetname in advance.
read the file as:
collection = pd.read_excel('path.xls', sheet_name=None)
combined = pd.concat([value.assign(sheet_source=key)
for key,value in collection.items()],
ignore_index=True)
sheet_name = None ensures all the sheets are read in.
collection is a dictionary, with the sheet_name as key, and the actual data as the values. combined uses the pandas concat method to get you one dataframe. I added the extra column sheet_source, in case you need to track where the data for each row comes from.
You can read more about it on the pandas doco
you can use:
df_final = pd.concat([pd.read_excel('path.xls', sheet_name="{:02d}".format(sheet)) for sheet in range(12)], axis=0)

Exporting sorted/adjusted data to excel with python

I have a simple dataset that I have sorted with dataframe based on 'category'.
The sorting has gone all well. But now, I'd like to export the sorted/adjusted dataset in .xlsx format. That is the dataset that has been categorized, not the dataset that is read in excel.
I have tried the following:
import pandas as pd
df = pd.read_excel("python_sorting_test.xlsx",index_col=[1])
df.head()
print(df.sort_index(level=['Category'], ascending=True))
df.to_excel (r'C:\Users\Laptop\PycharmProjects\untitled8\export_dataframe.xlsx', header=True)
The issue: It doesn't doesn't store the sorted/adjusted dataset.
Actually, you doesn't save results of sort_index. You can add inplace=True
print(df.sort_index(level=['Category'], ascending=True, inplace=True))
or save results of df.sort_index
df = df.sort_index(level=['Category'], ascending=True)

Create new dataframe column equal to the value above row = 'Name'

I combined several excel worksheets into a new workbook using pandas that looks like the following:
Example Excel Workbook
I am trying to now clean up the workbook/dataframe using python (for practice) to by creating a new column where the equal to the table name which is listed in col[0] above 'Name'. I know how to do it in excel, but am trying to learn how to transform the data using python. There are 7051 rows currently in the dataset if that help.
The final outcome would look something like this:
Example Solution
Please let me know if you have any ideas on how to further clean it up using python. I have the excel solution but am really hoping to learn how to do it with python.
Example of code used to combine worksheets:
import pandas as pd
import numpy as np
import os, collections, csv
from os.path import basename
df = []
f = 'ex_DATA.xlsx'
numberOfSheets = 22 #Modify this.
for i in range(1,numberOfSheets+1):
data = pd.read_excel(f, sheetname = 'TAB_'+str(i), header=None)
df.append(data)
final = "ex_DATA2.xlsx" #Path to the file in which new sheet will be saved.
df = pd.concat(df)
df = df.dropna(axis=0, how='all')
df.to_excel(final, header=None, index=None)

Categories

Resources