write xls cel in existing file without overwriting rest file - python

I'm trying to update a single cell in an existing excel file.
Here is my code:
file=(r'C:/Users/user/Desktop/test.xls')
df=pd.read_excel(file)
code=input('Patiste Kodiko:')
size=0
sizeint=int(input('Patiste Noumero:'))
given=int(input('Posa efigan?:'))
oldstock=(df[size].where(df['ΚΩΔΙΚΟΣ']==code))
oldstock=oldstock.dropna()
oldstock=oldstock.values[0]
oldstock = int(oldstock)
newstock = oldstock - given
x=(df['Α/Α'].where(df['ΚΩΔΙΚΟΣ']==code)+2)
x=x.dropna()
x = int(x)
dffin=df.at[x,size] = newstock
dffin.to_excel(file)
close()
After running this code, I receive an empty .xls file with only one cell written and everything else empty.
What am I missing here?
Thanks in advance.

You should be able to do a quick df.at function if you have your X and y cords or names.
import pandas as pd
fileLocation = (r'TestExcelsheet.xlsx')
excel = pd.read_excel(FileLocation,converters={'NimikeNro':str})
excel.dtypes
print(excel.index)
print(excel.head)
excel.at[1,'One'] = 444
print(excel)
excel.to_excel('TestExcelsheet.xlsx')
Where it's the Excel.at function you need to use to add data at a single cell and use a for loop for more than one cell

Related

python xlwings won' t save the excel file in each iteration

This is a bit complex to explain, please ask if there are any doubts.
I have two excel files named, initial and updated. updated always has more sheets and maybe more rows in each sheet or changed values. I am trying to compare each sheet that exists in both initial and updated files and write and highlight the changes into a new excel file.
This is the code that i have.
from pathlib import Path
import pandas as pd
import numpy as np
import xlwings as xw
initial_version = Path.cwd() / "ConfigurationReport_TEST.xlsx"
updated_version = Path.cwd() / "ConfigurationReport_DEV2.xlsx"
excel1 = pd.ExcelFile(initial_version)
excel2 = pd.ExcelFile(updated_version)
lesser_sheetnames_dict = {}
greater_sheetnames_dict = {}
for idx, value in enumerate(excel1.sheet_names if len(excel1.sheet_names) < len(excel2.sheet_names) else excel2.sheet_names):
lesser_sheetnames_dict[idx] = value
for idx, value in enumerate(excel1.sheet_names if len(excel1.sheet_names) > len(excel2.sheet_names) else excel2.sheet_names):
greater_sheetnames_dict[idx] = value
print(lesser_sheetnames_dict)
print(len(lesser_sheetnames_dict))
print(len(greater_sheetnames_dict))
for sheetnum,sheetname in lesser_sheetnames_dict.items():
if sheetname not in greater_sheetnames_dict.values():
continue
else:
df1 = pd.read_excel(initial_version,sheet_name=sheetname)
df2 = pd.read_excel(updated_version,sheet_name=sheetname)
df1 = df1.fillna('')
df2 = df2.fillna('')
df2 = df2.reset_index()
df3 = pd.merge(df1,df2,how='outer',indicator='Exist')
df3 = df3.query("Exist != 'both'")
df_highlight_right = df3.query("Exist == 'right_only'")
df_highlight_left = df3.query("Exist == 'left_only'")
highlight_rows_right = df_highlight_right['index'].tolist()
highlight_rows_right = [int(row) for row in highlight_rows_right]
first_row_in_excel = 2
highlight_rows_right = [x + first_row_in_excel for x in highlight_rows_right]
with xw.App(visible=False) as app:
updated_wb = app.books.open(updated_version)
print(updated_wb.sheets([x+1 for x in greater_sheetnames_dict.keys() if greater_sheetnames_dict[x] == sheetname][0]))
updated_ws = updated_wb.sheets([x+1 for x in greater_sheetnames_dict.keys() if greater_sheetnames_dict[x] == sheetname][0])
rng = updated_ws.used_range
print(f"Used Range: {rng.address}")
# Hightlight the rows in Excel
for row in rng.rows:
if row.row in highlight_rows_right:
row.color = (255, 71, 76) # light red
updated_wb.save(Path.cwd() / "Difference_Highlighted.xlsx")
The problem that im facing is in the with block. Ideally this code should run for each sheet that exists in both the files and highlight the changes and save it into a new file.
But in this case, it runs for each sheet that exists in both the files, but only highlights and saves the last sheet.
Being my first interaction with xlwings library, i have very little idea on how that block works. Any assistance will be much appreciated.
I feel stupid to post this question now. The error was because of the the scope of with block. Since it was inside the if block, it kept opening a workbook every single time, wrote to it, highlighted the changes of the current sheet that's being iterated on, then saved it. Obviously, during the last iteration(sheet) it opened the file again, wrote to it, highlighted the changes and overwritten the previously saved file.
To avoid this, I moved the with block's opening statement to before if block, and now it works perfectly as intended.
with xw.App(visible=False) as app:
for sheetnum,sheetname in lesser_sheetnames_dict.items():
if sheetname not in greater_sheetnames_dict.values():
continue
else:
// code
updated_wb.save(Path.cwd() / "Difference_Highlighted.xlsx")

Issues with the delimiter when trying to read a comma separated file (Python, Pandas & .csv)

The problem:
I am trying to reproduce results from a youtube course of Keith Galli's.
import pandas as pd
import os
import csv
input_loc = "./SalesAnalysis/Sales_Data/"
output_loc = "./SalesAnalysis/korbi_output/"
fileList = os.listdir(input_loc)
all_months_data = pd.DataFrame()
problem probably starts here:
for file in fileList:
if file.endswith(".csv"):
df = pd.read_csv(input_loc+file)
all_months_data = all_months_data.append(df)
all_months_data.to_csv(output_loc+"all_months_data.csv")
all_months_data.head()
this is my output and I don't want row 1 to be displayed, because it contains no data:
The issue seems to be line 3 in one of my csv files. A3 is empty except for commas:
So I go to the csv file, and delete A3 cell. run the code again and I get this:
instead of this:
What do I have to do to remove the cells without value and to still display everything correctly?
I did not understand, WHY this weird problems occured, but I figured out a workaround to change the data and save everything in a new csv file:
all_months_data_cleaned = all_months_data.copy()
all_months_data_cleaned = all_months_data.dropna()
all_months_data_cleaned.reset_index(drop=True, inplace=True)
all_months_data_cleaned.to_csv(output_loc+"all_months_data_cleaned.csv")

Retrieve the headers in excel using Python

I want to retrieve headers of this excel file (Only A,B,C)and store it in a list using Python. I opened my file but I am unable to retrieve it.
import xlrd
file_location= "C:/Users/Desktop/Book1.xlsx"
workbook= xlrd.open_workbook(file_location)
sheet=workbook.sheet_by_index(0)
Can anyone please help me with that? I am new to Python.
Thank you for your help.
You could also try using the numpy method loadtxt:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
Which gives you arrays for each column (you can also skip specified columns so that you only get the A,B,C as you desire). You can write a for loop to get the first entry of each column and put them in a list.
data = loadtxt("Book1.xlsx")
headers = []
for c in range(1,data.shape[0]):
if data[c, 0] != "":
headers.append(data[c, 0])
iterheaders = iter(sheet.row(1))
headers = [str(cell.value) for cell in iterheaders[1:] if cell is not None and cell.value != '']
Hope that helps...

Using generators outside of a loop

Relatively new to python so please excuse the newbie question, but google isn't helpful at this time.
I have 100 very large xlsx files from which I need to extract the first row (specifically cell A2). I found this gem of a tool called openpyxl which will iterate through my data files without loading everything in memory. It uses a generaotor to get the relevant row on each call
The thing that I can't get is how to initialize a generator outside of a loop. Right now my code is:
from openpyxl import load_workbook
wb = load_workbook(filename = "merged01.xlsx", use_iterators= True)
sheetName = wb.get_sheet_names()
ws = wb.get_sheet_by_name(name = sheetName[0])
row = ws.iter_rows() #row is a generator
for cell in row:
break
print (cell[1].internal_value) # A2
But there has to be a better way of doing this such as:
...
row = ws.iter_rows() #row is a generator
cell = row.first # line I'm trying to KISS
print (cell[1].internal_value) # A2
cell = next(row)
The next function retrieves the next value from any iterator.
You're looking for next().
cell = next(row)

Python to delete a row in excel spreadsheet

I have a really large excel file and i need to delete about 20,000 rows, contingent on meeting a simple condition and excel won't let me delete such a complex range when using a filter. The condition is:
If the first column contains the value, X, then I need to be able to delete the entire row.
I'm trying to automate this using python and xlwt, but am not quite sure where to start. Seeking some code snippits to get me started...
Grateful for any help that's out there!
Don't delete. Just copy what you need.
read the original file
open a new file
iterate over rows of the original file (if the first column of the row does not contain the value X, add this row to the new file)
close both files
rename the new file into the original file
I like using COM objects for this kind of fun:
import win32com.client
from win32com.client import constants
f = r"h:\Python\Examples\test.xls"
DELETE_THIS = "X"
exc = win32com.client.gencache.EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(Filename=f)
row = 1
while True:
exc.Range("B%d" % row).Select()
data = exc.ActiveCell.FormulaR1C1
exc.Range("A%d" % row).Select()
condition = exc.ActiveCell.FormulaR1C1
if data == '':
break
elif condition == DELETE_THIS:
exc.Rows("%d:%d" % (row, row)).Select()
exc.Selection.Delete(Shift=constants.xlUp)
else:
row += 1
# Before
#
# a
# b
# X c
# d
# e
# X d
# g
#
# After
#
# a
# b
# d
# e
# g
I usually record snippets of Excel macros and glue them together with Python as I dislike Visual Basic :-D.
You can try using the csv reader:
http://docs.python.org/library/csv.html
You can use,
sh.Range(sh.Cells(1,1),sh.Cells(20000,1)).EntireRow.Delete()
will delete rows 1 to 20,000 in an open Excel spreadsheet so,
if sh.Cells(1,1).Value == 'X':
sh.Cells(1,1).EntireRow.Delete()
If you just need to delete the data (rather than 'getting rid of' the row, i.e. it shifts rows) you can try using my module, PyWorkbooks. You can get the most recent version here:
https://sourceforge.net/projects/pyworkbooks/
There is a pdf tutorial to guide you through how to use it. Happy coding!
I have achieved this using Pandas package....
import pandas as pd
#Read from Excel
xl= pd.ExcelFile("test.xls")
#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])
#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)
dfs = dfs[dfs['Name'] != '']
#Updating the excel sheet with the updated DataFrame
dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)

Categories

Resources