Edit .xlsx with python - python

I Completely have no idea where to start.
I want to edit some think like:
To:
I want to save the result in a .txt file.
Every thing i know is to open and read the file.
code:
import pandas as pd
file = "myfile.xlsx"
f = pd.read_excel(file)
print(f)
I think the image colors speak for themselves how the code have to run. If not, I'll answer any question.

My go-to for editing Excel spreadsheets is openpyxl
I don't believe it can turn .csv or .xlsx/xlsm into .txt files, but it can read .xlsx/xlsm and save them as a .csv, and pandas can read csv files, so you can probably go from there
Quick example:
from openpyxl import load_workbook
wb = load_workbook("foo.xlsx")
sheet = wb["baz"]
sheet["D5"] = "I'm cell D5"

Use openpyxl, and look at this below:
Get cell color from .xlsx
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color (in cell A2)
So you'd have to iterate across your columns/rows checking for a colour match, then if its a match, grab the value and apply it to your new sheet

Related

Finding first Excel column with no data using xlwings

I have an workbook in Excel and I need to find the first column that is empty / has no data in it. I need to keep Excel open at all times, so something like openpyxl won't do.
Here's my code so far:
import xlwings as xw
from pathlib import Path
wbPath = Path('test.xlsx')
wb = xw.Book(wbPath)
sourceSheet = wb.sheets['source']
This can be done using
destinationSheet["A1"].expand("right").last_cell.column
Depending on what you need exactly, this code might be most robust. With using used_range, the code gives you the first empty column at the very end of the data as integer, regardless of empty/blank columns before the last column with data.
a_rng = sourceSheet.used_range[-1].offset(column_offset=1).column
print(a_rng)

Openpyxl don't preserve formatting

I have an excel file that i want to use as template, what i need is just change the values of some cells, and save it as another excel file. The problem is that when i save it the formatting, style and some date values are changed
from openpyxl import load_workbook
wb = load_workbook("test.xlsx")
ws = wb["RDO"]
ws["B8"].value = "MAI MAN"
wb.save("new.xlsx")
The old file:
The new one:
As you can see the borders and date fields were changed.
I was thinking in just unzip the excel and modify the xml files, then zip it back, but this approach has a problem. I will need to make a copy of some worksheets, so i tought i should be ok in just copy and paste the sheet.xml file and change the workbook.xml file to add this new sheet, but when i do this all the cells are cleared which is weird because when i copy the sheet in the excel program the output sheet file it's exactly the same as the original
I would like some simple solution if possible, maybe some other library or a fix for this xml sheet problem

How to write multiIndex-columns excel with pandas

I want to export a multiIndex-column.
I read an excel file (https://drive.google.com/open?id=1G6nE5wiNRf5sip22dQ8dfhuKgxzm4f8E) and exported it with the following code:
df = pd.read_excel('sample.xlsx')
df.to_excel('sample2.xlsx', index = False)
However, sample2.xlsx has different format as sample.xlsx.
For example, there are merged cells in sample.xlsx but not in sample2.xlsx and the blank cells in sample.xlsx become Unnamed:xx.
You can view sample2.xlsx here.
How to solve this problem?
Thank you.
Since you working with xlsx files, openpyxl package will do the job.
import openpyxl
wb_obj = openpyxl.load_workbook('sample.xlsx')
wb_obj.save('sample2.xlsx')
Further reading on openpyxl

Read table data from Excel file with python

I currently have an Excel workbook with some graphs (charts?). The graphs are plotted from numerical values. I can access the values in LibreOffice if I right click on the graph and select "Data table". These values are nowhere else in the file.
I would like to access these values programmatically with Python. I tried things like xlrd, but it seems xlrd ignores graphical elements. When I run it on my workbook I only get empty cells back.
Have you ever encountered this issue?
Sadly I cannot provide the file as it is confidential.
import pandas as pd
df = pd.read_excel('path/name_of_your_file.xlsx')
print(df.head())
You should have a dataframe (df) to play with in python!
I never worked with graphical excel file. But i used to read normal excel with following code. have you tried this?
import xlrd
file = 'temp.xls'
book = xlrd.open_workbook(file)
for sheet in book.sheets():
#to check columns in sheet
if sheet.ncols:
#row values
row_list = sheet.row_values
for value in row_list:
print(value)

use data from excel file in another python script

I want to write a python script that takes data from one excel file and uses this data and inputs it in another excel file to get the output. For eg, if i have input.csv, it takes the data from there, and replaces certain cells of output.csv and gets the value based of the calculation
import pandas as pd
import numpy as np
data=pd.read_excel("Data.xlsx")
Depth=data["Depth (D):"];
ID=data["Tubing inner diameter (dti):"];
API=data["Oil gravity (API):"];
oilvisc=data["Oil viscosity (cp):"];
this is the script i have currently, these are the inputs.
import xlwt
import xlrd
from xlutils.copy import copy
rb=xlrd.open_workbook("hagedornbrowncorrelation.xls")
wb=copy(rb)
w_sheet=wb.get_sheet(0)
w_sheet.write(4,2,700)
wb.save("hagedornbrowncorrelation.xls")
the workbook "hagedornbrowncorrelation.xls" is my calculator, i am replacing the C5 with 700, but when i save it, all the macros and formulas in the workbook just go away and it becomes a useless workbook with numbers
I have done a similar project with openpyxl module which can be found here
https://openpyxl.readthedocs.io/en/stable/
Because I build a UI with Tkinter, I did to open a file, you may not want to use a global variable like I did, this was a quick hack.
def getFilecurrent():
global path
# open dialog box to select file
path = filedialog.askopenfilename(initialdir="/", title="Select file")
Then you can store it using
ref_workbook = openpyxl.load_workbook("filevariable")
Then do your manipulation of the data by selecting the right cell using, also remember to select the right worksheet.
weeklyengagement = ws['B18'].value
Afterwards, you create a new template for the file pasted into like
template = openpyxl.load_workbook("Section12Grades.xlsx") #Add file name
temp_sheet = template.get_sheet_by_name("Sheet1") #Add Sheet name
Lastly, you copy the range and paste the range using loops. There are so many resources out there I'm not going to paste my code as it has some custom set up and it would only confuse you.
Edit: if you wish to save with Macro, you can do:
wb = load_workbook(filename='filename.xlsm', read_only=False, keep_vba=True)
Formulas are string and if you wish to save the formulas, you have to keep it in the string format and save.

Categories

Resources