Using pandas to convert excel sheet with formulas to csv - python

I am new to pandas and using it to convert and excel sheet with formulas to csv.
As expected, I want to just copy the values. However, in my csv I am getting the header but all other cells have "0" reported.
import pandas as pd
data_xls = pd.read_excel('my_result.xlsx', 'Dashboard', index_col=None)
data_xls.to_csv('myabc_result.csv', encoding='utf-8',index=False)
The formula in original excel sheet looks like this(surprisingly every cell has similar formula):
=INDEX(INDIRECT("Results!"&MATCH(INDIRECT(SUBSTITUTE(ADDRESS(1,COLUMN(),4),"1","")&"$1"),Results!$A:$A,0)&":"&MATCH(INDIRECT(SUBSTITUTE(ADDRESS(1,COLUMN(),4),"1","")&"$1"),Results!$A:$A,0)),1,INDIRECT("$A"&ROW())+1)
Thanks,

Related

Reading an excel sheet containing hyperlinks using pythons pandas.read_excel

I made an excel sheet using pandas dataframe to generate texts with clickable urls using the following code
import pandas as pd
df = pd.DataFrame({'link':['=HYPERLINK("https://ar.wikipedia.org/wiki/","wikipidia")',
'=HYPERLINK("https://www.google.com", "google")']})
df.to_excel('links.xlsx')
But currently i need to read the generated excel sheet (links.xlsx) using pandas.read_excel so i tried the following code:
import pandas as pd
excelDf=pd.read_excel('links.xlsx')
print(excelDf)
but this generates a dataframe with all zeroes in the link column.
Is there another way I can read the excel file i created, or another way to create an excel sheet containing clickable links on text using pandas dataframe that is readable?
you can do the same as a csv which is cleaner (avoids excel issues).
# %% write the date
import pandas as pd
df = pd.DataFrame({'link':['=HYPERLINK("https://ar.wikipedia.org/wiki/","wikipidia")',
'=HYPERLINK("https://www.google.com", "google")']})
df.to_csv('F:\\links.xlsx')
# %% read the data
import pandas as pd
excelDf=pd.read_csv('F:\\links.xlsx')
print(excelDf)
result:
Unnamed: 0 link
0 0 =HYPERLINK("https://ar.wikipedia.org/wiki/","w...
1 1 =HYPERLINK("https://www.google.com", "google")

Is there a way to export individual sheets in a excel workbook to separate csv files using pandas?

I have 5 sheets in an excel workbook. I would like to export each sheet to csv using python libraries.
This is a sheet showing sales in 2019. I have named the seets according to the year they represent as shown here.
I have read the excel spreadsheet using pandas. I have used the for loop since I am interested in saving the csv file like the_sheet_name.csv. This is my code in a jupyter notebook:
import pandas as pd
df = pd.DataFrame()
myfile = 'sampledata.xlsx’
xl = pd.ExcelFile(myfile)
for sheet in xl.sheet_names:
df_tmp = xl.parse(sheet)
print(df_tmp)
df = df.append(df_tmp, ignore_index=True,sort=False)
csvfile = f'{sheet_name}.csv'
df.to_csv(csvfile, index=False)
Executing the code is producing just one csv file that has the data for all the other sheets. I would like to know if there is a way to customize my code so that I can produce individual sheets e.g sales2011.csv, sales2012.csv and so on.
Use sheet_name=None returns a dictionary of dataframes:
dfs = pd.read_excel('file.xlsx', sheet_name=None)
for sheet_name, data in dfs.items():
data.to_csv(f"{sheet_name}.csv")

Using pandas to replace data in excel sheet

I tried to come up with a way to copy data from a sheet in an excel file as
import pandas as pd
origionalFile = pd.ExcelFile('AnnualReport-V5.0.xlsx')
Transfers = pd.read_excel(origionalFile, 'Sheet1')
I have another excel file, which named 'AnnualReport-V6.0.xlsx', it has existing data in the sheet named 'Transfers', I tried to use the dataframe I created easily on to replace data in the sheet 'Transfers' in 'AnnualReport-V6.0.xlsx' from column B, leave column A as it is.
I did a few searches, the closest to what I want is this
Modifying an excel sheet in a excel book with pandas
but it does not allow me the keep column A in the original sheet (column A has some equations I do want to keep them), any idea how to do it? Thanks
Would reading column A and inserting it to the fresh data you want to write solve your problem?

Loading only one sheet to dataframe

I am trying to read an excel sheet into df using pandas read_excel method. The excel file contains 6-7 different sheet. Out of it, 2-3 sheets are very huge. I only want to read one excel sheet out of the file.
If I copy the sheet out and read the time reduces by 90%.
I have read that xlrd that is used by pandas always loads the whole sheet to memory. I cannot change the format of the input.
Can you please suggest a way to improve the performance?
It's quite simple. Just do this.
import pandas as pd
xls = pd.ExcelFile('C:/users/path_to_your_excel_file/Analysis.xlsx')
df1 = pd.read_excel(xls, 'Sheet1')
print(df1)
# etc.
df2 = pd.read_excel(xls, 'Sheet2')
print(df2)
import pandas as pd
df = pd.read_excel('YourFile.xlsx', sheet_name = 'YourSheet_Name')
Whatever sheet you want to read just put the sheet name and your path to excel file.
Use openpyxl in read-only mode. See http://openpyxl.readthedocs.io/en/default/pandas.html

Is it possible to read data from an Excel sheet in Python using Xlsxwriter? If so how?

I'm doing the following calculation.
worksheet.write_formula('E5', '=({} - A2)'.format(number))
I want to print the value in E5 on the console. Can you help me to do it? Is it possible to do it with Xlsxwriter or should I use a different library to the same?
It is not possible to read data from an Excel file using XlsxWriter.
There are some alternatives listed in the documentation.
If you want to use xlsxwriter for manipulating formats and formula that you can't do with pandas, you can at least import your excel file into an xlsxwriter object using pandas. Here's how.
import pandas as pd
import xlsxwriter
def xlsx_to_workbook(xlsx_in_file_url, xlsx_out_file_url, sheetname):
"""
Read EXCEL file into xlsxwriter workbook worksheet
"""
workbook = xlsxwriter.Workbook(xlsx_out_file_url)
worksheet = workbook.add_worksheet(sheetname)
#read my_excel into a pandas DataFrame
df = pd.read_excel(xlsx_in_file_url)
# A list of column headers
list_of_columns = df.columns.values
for col in range(len(list_of_columns)):
#write column headers.
#if you don't have column headers remove the folling line and use "row" rather than "row+1" in the if/else statments below
worksheet.write(0, col, list_of_columns[col] )
for row in range (len(df)):
#Test for Nan, otherwise worksheet.write throws it.
if df[list_of_columns[col]][row] != df[list_of_columns[col]][row]:
worksheet.write(row+1, col, "")
else:
worksheet.write(row+1, col, df[list_of_columns[col]][row])
return workbook, worksheet
# Create a workbook
#read you Excel file into a workbook/worksheet object to be manipulated with xlsxwriter
#this assumes that the EXCEL file has column headers
workbook, worksheet = xlsx_to_workbook("my_excel.xlsx", "my_future_excel.xlsx", "My Sheet Name")
###########################################################
#Do all your fancy formatting and formula manipulation here
###########################################################
#write/close the file my_new_excel.xlsx
workbook.close()
Not answering this specific question, just a suggestion - simply try pandas and read data from excel. Thereafter you can simply manipulate the data using pandas DataFrame built-in methods:
df = pd.read_excel(file_,index_col=None, header=0)
df is the pandas.DataFrame, just go through DataFrame from this it's cookbook site. If you are unaware about this package, you might get surprised by this awesome python module.

Categories

Resources