How to write multiIndex-columns excel with pandas - python

I want to export a multiIndex-column.
I read an excel file (https://drive.google.com/open?id=1G6nE5wiNRf5sip22dQ8dfhuKgxzm4f8E) and exported it with the following code:
df = pd.read_excel('sample.xlsx')
df.to_excel('sample2.xlsx', index = False)
However, sample2.xlsx has different format as sample.xlsx.
For example, there are merged cells in sample.xlsx but not in sample2.xlsx and the blank cells in sample.xlsx become Unnamed:xx.
You can view sample2.xlsx here.
How to solve this problem?
Thank you.

Since you working with xlsx files, openpyxl package will do the job.
import openpyxl
wb_obj = openpyxl.load_workbook('sample.xlsx')
wb_obj.save('sample2.xlsx')
Further reading on openpyxl

Related

Can't see csv file (converted from df) in files

After saving my dataframe to a csv in a specific location, the csv file doesn't appear in the location I saved it to. Is there any reason why it possibly is not showing?
Here is the code to save my dataframe to csv:
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Even changing an empty df does not seem to work.
import pandas as pd
olympics={}
df = pd.DataFrame(olympics)
df.to_csv(r'C:\Users\gibso\OneDrive\Documents\JOSEPH\export_dataframe.csv', index = False)
Thanks for the help!
I would rather use the module openpyxl. Example of saving:
import openpyxl
workbook = openpyxl.Workbook()
sheet = workbook.active
# Work on your workbook. Once finished:
workbook.save(file_name) # file_name is a variable you must define
Don't forget installing openpyxl with pip first!

Edit .xlsx with python

I Completely have no idea where to start.
I want to edit some think like:
To:
I want to save the result in a .txt file.
Every thing i know is to open and read the file.
code:
import pandas as pd
file = "myfile.xlsx"
f = pd.read_excel(file)
print(f)
I think the image colors speak for themselves how the code have to run. If not, I'll answer any question.
My go-to for editing Excel spreadsheets is openpyxl
I don't believe it can turn .csv or .xlsx/xlsm into .txt files, but it can read .xlsx/xlsm and save them as a .csv, and pandas can read csv files, so you can probably go from there
Quick example:
from openpyxl import load_workbook
wb = load_workbook("foo.xlsx")
sheet = wb["baz"]
sheet["D5"] = "I'm cell D5"
Use openpyxl, and look at this below:
Get cell color from .xlsx
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color (in cell A2)
So you'd have to iterate across your columns/rows checking for a colour match, then if its a match, grab the value and apply it to your new sheet

Load sheet in an excel file and save it to another different excel file

Using python 3, I'm trying to append a sheet from an existing excel file to another excel file.
I have conditional formats in this excel file so I can't just use pandas.
from openpyxl import load_workbook
final_wb = load_workbook("my_final_workbook_with_lots_of_sheets.xlsx")
new_wb = load_workbook("workbook_with_one_sheet.xlsx")
worksheet_new = new_wb['Sheet1']
final_wb.create_sheet(worksheet_new)
final_wb.save("my_final_workbook_with_lots_of_sheets.xlsx")
This code doesn't work because the .create_sheet method only makes a new blank sheet, I want to insert my worksheet_new that I loaded from the other file into the final_wb
See this issue on openpyxl discussing this same feature: https://bitbucket.org/openpyxl/openpyxl/issues/171/
It is not currently a builtin function of openpyxl, but that thread has some workarounds available that may suit your needs.
EDIT: Also found a likely duplicate question here: Copy worksheet from one workbook to another one using Openpyxl
I found a solution using pandas.ExcelWriter, which also uses xlrd and openpyxl in the background:
I created two sample excel-files:
test1.xlsx
test2.xlsx
and append the first sheet of test1.xlsx to test2.xlsx:
In [1]: import pandas as pd
In [2]: from pandas import ExcelWriter
In [3]: with ExcelWriter("test2.xlsx", mode="a") as writer:
...: df1 = pd.read_excel("test1.xlsx", sheet_name=0)
...: df1.to_excel(writer, sheet_name="New sheet name")
Important is the mode="a", which toggles the append-mode.

Read table data from Excel file with python

I currently have an Excel workbook with some graphs (charts?). The graphs are plotted from numerical values. I can access the values in LibreOffice if I right click on the graph and select "Data table". These values are nowhere else in the file.
I would like to access these values programmatically with Python. I tried things like xlrd, but it seems xlrd ignores graphical elements. When I run it on my workbook I only get empty cells back.
Have you ever encountered this issue?
Sadly I cannot provide the file as it is confidential.
import pandas as pd
df = pd.read_excel('path/name_of_your_file.xlsx')
print(df.head())
You should have a dataframe (df) to play with in python!
I never worked with graphical excel file. But i used to read normal excel with following code. have you tried this?
import xlrd
file = 'temp.xls'
book = xlrd.open_workbook(file)
for sheet in book.sheets():
#to check columns in sheet
if sheet.ncols:
#row values
row_list = sheet.row_values
for value in row_list:
print(value)

Fill in pd data frame into existing excel sheet (using openpyxl v2.3.2)

I want to fill in some pandas data frames into an existing excel file. I followed the instructions in:
How to write to an existing excel file without overwriting data (using pandas)?
using:
from openpyxl import load_workbook
import pandas as pd
import numpy as np
book=load_workbook("excel_proc.xlsx")
writer=pd.ExcelWriter("excel_proc.xlsx", engine="openpyxl")
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_df.to_excel(writer, sheet_name="example", startrow=100, startcol=5, index=False)
writer.save()
However, the existing sheets will be deleted, the "example" sheet is generated and only the df is integrated at the defined location. What did I do wrong? I want the "data_df" written into the existing excel file in the existing "example" sheet, keeping the other sheets and data.
Thanks
Example df:
data_df=pd.DataFrame(np.arange(12).reshape((2, 6)), index=["Time","Value"])
I resolved the problem on my own. I realised that even load_workbook cannot load my file. Therefore, I updated the openpyxl package (conda install openpyxl). The version not working was : v2.3.2 (python 35). The version now working is: v2.4.0.
I do not really know, if it was the reason at the end. But now the excels are filled in the defined locations and the data is kept.
You might be interested in learning xlwings, which makes it a lot easier to work with excel files from python.
In any case I would start by reading the existing data in the sheet, combine the data as you wish in python, and finally overwrite the sheet.

Categories

Resources