Finding first Excel column with no data using xlwings - python

I have an workbook in Excel and I need to find the first column that is empty / has no data in it. I need to keep Excel open at all times, so something like openpyxl won't do.
Here's my code so far:
import xlwings as xw
from pathlib import Path
wbPath = Path('test.xlsx')
wb = xw.Book(wbPath)
sourceSheet = wb.sheets['source']

This can be done using
destinationSheet["A1"].expand("right").last_cell.column

Depending on what you need exactly, this code might be most robust. With using used_range, the code gives you the first empty column at the very end of the data as integer, regardless of empty/blank columns before the last column with data.
a_rng = sourceSheet.used_range[-1].offset(column_offset=1).column
print(a_rng)

Related

What is the fastest way to retrieve header names from excel files using pandas

I have a big size excel files that I'm organizing the column names into a unique list.
The code below works, but it takes ~9 minutes!
Does anyone have suggestions for speeding it up?
import pandas as pd
import os
get_col = list(pd.read_excel("E:\DATA\dbo.xlsx",nrows=1, engine='openpyxl').columns)
print(get_col)
Using pandas to extract just the column names of a large excel file is very inefficient.
You can use openpyxl for this:
from openpyxl import load_workbook
wb = load_workbook("E:\DATA\dbo.xlsx", read_only=True)
columns = {}
for sheet in worksheets:
for value in sheet.iter_rows(min_row=1, max_row=1, values_only=True):
columns = value
Assuming you only have one sheet, you will get a tuple of column names here.
If you want faster reading, then I suggest you use other type files. Excel, while convenient and fast are binary files, therefore for pandas to be able to read it and correctly parse it must use the full file. Using nrows or skipfooter to work with less data with only happen after the full data is loaded and therefore shouldn't really affect the waiting time. On the opposite, when working with a .csv() file, given its type and that there is no significant metadata, you can just extract the first rows of it as an interable using the chunksize parameter in pd.read_csv().
Other than that, using list() with a dataframe as value, returns a list of the columns already. So my only suggestion for the code you use is:
get_col = list(pd.read_excel("E:\DATA\dbo.xlsx",nrows=1, engine='openpyxl'))
The stronger suggestion is to change datatype if you specifically want to address this issue.

Edit .xlsx with python

I Completely have no idea where to start.
I want to edit some think like:
To:
I want to save the result in a .txt file.
Every thing i know is to open and read the file.
code:
import pandas as pd
file = "myfile.xlsx"
f = pd.read_excel(file)
print(f)
I think the image colors speak for themselves how the code have to run. If not, I'll answer any question.
My go-to for editing Excel spreadsheets is openpyxl
I don't believe it can turn .csv or .xlsx/xlsm into .txt files, but it can read .xlsx/xlsm and save them as a .csv, and pandas can read csv files, so you can probably go from there
Quick example:
from openpyxl import load_workbook
wb = load_workbook("foo.xlsx")
sheet = wb["baz"]
sheet["D5"] = "I'm cell D5"
Use openpyxl, and look at this below:
Get cell color from .xlsx
color_in_hex = sh['A2'].fill.start_color.index # this gives you Hexadecimal value of the color (in cell A2)
So you'd have to iterate across your columns/rows checking for a colour match, then if its a match, grab the value and apply it to your new sheet

How to output dataframe values to an Excel file? [Python]

For the past few days I've been trying to do a relatively simple task but I'd always encounter some errors so I'd really appreciate some help on this. Here goes:
I have an Excel file which contains a specific column (Column F) that has a list of IDs.
What I want to do is for the program to read this excel file and allow the user to input any of the IDs they would like.
When the user types in one of the IDs, I would want the program to return a bunch IDs that contain the text that the user has inputted, and after that I'd like to export those 'bunch of IDs' to a new & separate Excel file where all the IDs would be displayed in one column but in separate rows.
Here's my code so far, I've tried using arrays and stuff but nothing seems to be working for me :/
import pandas as pd
import numpy as np
import re
import xlrd
import os.path
import xlsxwriter
import openpyxl as xl;
from pandas import ExcelWriter
from openpyxl import load_workbook
# LOAD EXCEL TO DATAFRAME
xls = pd.ExcelFile('N:/TEST/TEST UTILIZATION/IA 2020/Dev/SCS-FT-IE-Report.xlsm')
df = pd.read_excel(xls, 'FT')
# GET USER INPUT (USE AD1852 AS EXAMPLE)
value = input("Enter a Part ID:\n")
print(f'You entered {value}\n\n')
i = 0
x = df.loc[i, "MFG Device"]
df2 = np.array(['', 'MFG Device', 'Loadboard Group','Socket Group', 'ChangeKit Group'])
for i in range(17367):
# x = df.loc[i, "MFG Device"]
if value in x:
df = np.array[x]
df2.append(df)
i += 1
print(df2)
# create excel writer object
writer = pd.ExcelWriter('N:/TEST/TEST UTILIZATION/IA 2020/Dev/output.xlsx')
# write dataframe to excel
df2.to_excel(writer)
# save the excel
writer.save()
print('DataFrame is written successfully to Excel File.')
Any help would be appreciated, thanks in advance! :)
It looks like you're doing much more than you need to do. Rather than monkeying around with xlsxwriter, pandas.DataFrame.to_excel is your friend.
Just do
df2.to_excel("output.xlsx")
You don't need xlsxwriter. Simply df.to_excel() would work. In your code df2 is a numpy array/ First convert it into a pandas DataFrame format a/c to the requirement (index and columns) before writing it to excel.

How do I execute this python code automatically in in excel cells?

I need to extract the domain for example: (http: //www.example.com/example-page, http ://test.com/test-page) from a list of websites in an excel sheet and modify that domain to give its url (example.com, test.com). I have got the code part figured put but i still need to get these commands to work on excel sheet cells in a column automatically.
here's_the_code
I think you should read in the data as a pandas DataFrame (pd.read_excel), make a function from your code then apply to the dframe (df.apply). Then it is easy to save to excel with pd.to_excel().
ofc you will need pandas to be installed.
Something like:
import pandas as pd
dframe = pd.read_excel(io='' , sheet_name='')
dframe['domains'] = dframe['urls col name'].apply(your function)
dframe.to_excel('your path')
Best

Read table data from Excel file with python

I currently have an Excel workbook with some graphs (charts?). The graphs are plotted from numerical values. I can access the values in LibreOffice if I right click on the graph and select "Data table". These values are nowhere else in the file.
I would like to access these values programmatically with Python. I tried things like xlrd, but it seems xlrd ignores graphical elements. When I run it on my workbook I only get empty cells back.
Have you ever encountered this issue?
Sadly I cannot provide the file as it is confidential.
import pandas as pd
df = pd.read_excel('path/name_of_your_file.xlsx')
print(df.head())
You should have a dataframe (df) to play with in python!
I never worked with graphical excel file. But i used to read normal excel with following code. have you tried this?
import xlrd
file = 'temp.xls'
book = xlrd.open_workbook(file)
for sheet in book.sheets():
#to check columns in sheet
if sheet.ncols:
#row values
row_list = sheet.row_values
for value in row_list:
print(value)

Categories

Resources