Python to delete a row in excel spreadsheet - python

I have a really large excel file and i need to delete about 20,000 rows, contingent on meeting a simple condition and excel won't let me delete such a complex range when using a filter. The condition is:
If the first column contains the value, X, then I need to be able to delete the entire row.
I'm trying to automate this using python and xlwt, but am not quite sure where to start. Seeking some code snippits to get me started...
Grateful for any help that's out there!

Don't delete. Just copy what you need.
read the original file
open a new file
iterate over rows of the original file (if the first column of the row does not contain the value X, add this row to the new file)
close both files
rename the new file into the original file

I like using COM objects for this kind of fun:
import win32com.client
from win32com.client import constants
f = r"h:\Python\Examples\test.xls"
DELETE_THIS = "X"
exc = win32com.client.gencache.EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(Filename=f)
row = 1
while True:
exc.Range("B%d" % row).Select()
data = exc.ActiveCell.FormulaR1C1
exc.Range("A%d" % row).Select()
condition = exc.ActiveCell.FormulaR1C1
if data == '':
break
elif condition == DELETE_THIS:
exc.Rows("%d:%d" % (row, row)).Select()
exc.Selection.Delete(Shift=constants.xlUp)
else:
row += 1
# Before
#
# a
# b
# X c
# d
# e
# X d
# g
#
# After
#
# a
# b
# d
# e
# g
I usually record snippets of Excel macros and glue them together with Python as I dislike Visual Basic :-D.

You can try using the csv reader:
http://docs.python.org/library/csv.html

You can use,
sh.Range(sh.Cells(1,1),sh.Cells(20000,1)).EntireRow.Delete()
will delete rows 1 to 20,000 in an open Excel spreadsheet so,
if sh.Cells(1,1).Value == 'X':
sh.Cells(1,1).EntireRow.Delete()

If you just need to delete the data (rather than 'getting rid of' the row, i.e. it shifts rows) you can try using my module, PyWorkbooks. You can get the most recent version here:
https://sourceforge.net/projects/pyworkbooks/
There is a pdf tutorial to guide you through how to use it. Happy coding!

I have achieved this using Pandas package....
import pandas as pd
#Read from Excel
xl= pd.ExcelFile("test.xls")
#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])
#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)
dfs = dfs[dfs['Name'] != '']
#Updating the excel sheet with the updated DataFrame
dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)

Related

write xls cel in existing file without overwriting rest file

I'm trying to update a single cell in an existing excel file.
Here is my code:
file=(r'C:/Users/user/Desktop/test.xls')
df=pd.read_excel(file)
code=input('Patiste Kodiko:')
size=0
sizeint=int(input('Patiste Noumero:'))
given=int(input('Posa efigan?:'))
oldstock=(df[size].where(df['ΚΩΔΙΚΟΣ']==code))
oldstock=oldstock.dropna()
oldstock=oldstock.values[0]
oldstock = int(oldstock)
newstock = oldstock - given
x=(df['Α/Α'].where(df['ΚΩΔΙΚΟΣ']==code)+2)
x=x.dropna()
x = int(x)
dffin=df.at[x,size] = newstock
dffin.to_excel(file)
close()
After running this code, I receive an empty .xls file with only one cell written and everything else empty.
What am I missing here?
Thanks in advance.
You should be able to do a quick df.at function if you have your X and y cords or names.
import pandas as pd
fileLocation = (r'TestExcelsheet.xlsx')
excel = pd.read_excel(FileLocation,converters={'NimikeNro':str})
excel.dtypes
print(excel.index)
print(excel.head)
excel.at[1,'One'] = 444
print(excel)
excel.to_excel('TestExcelsheet.xlsx')
Where it's the Excel.at function you need to use to add data at a single cell and use a for loop for more than one cell

How can I find the last non-empty row of excel using openpyxl 3.03?

How can I find the number of the last non-empty row of an whole xlsx sheet using python and openpyxl?
The file can have empty rows between the cells and the empty rows at the end could have had content that has been deleted. Furthermore I don't want to give a specific column, rather check the whole table.
For example the last non-empty row in the picture is row 13.
I know the subject has been extensively discussed but I haven't found an exact solution on the internet.
# Open file with openpyxl
to_be = load_workbook(FILENAME_xlsx)
s = to_be.active
last_empty_row = len(list(s.rows))
print(last_empty_row)
## Output: 13
s.rows is a generator and its list contains arrays of each rows cells.
If you are looking for the last non-empty row of an whole xlsx sheet using python and openpyxl.
Try this:
import openpyxl
def last_active_row():
workbook = openpyxl.load_workbook(input_file)
wp = workbook[sheet_name]
last_row = wp.max_row
last_col = wp.max_column
for i in range(last_row):
for j in range(last_col):
if wp.cell(last_row, last_col).value is None:
last_row -= 1
last_col -= 1
else:
print(wp.cell(last_row,last_col).value)
print("The Last active row is: ", (last_row+1)) # +1 for index 0
if __name__ = '___main__':
last_active_row()
This should help.
openpyxl's class Worksheet has the attribute max_rows

python pandas automatic excel lookup system

I know this is alot of code and there is alot to do, but i am really stuck and don't know how to continue after i got the function that the program can match identical files. I am pretty sure you know how the lookup from excel works. This Program does basicly the same. I tried to comment out the important parts and hope you can give me some help how i can continue this project. Thank you very much!
import pandas as pd
import xlrd
File1 = pd.read_excel("Excel_test.xlsx", usecols=[0], header=None, index=False) #the two excel files with the columns that should be compared
File2 = pd.read_excel("Excel_test02.xlsx", usecols=[0], header=None, index=False)
fullFile1 = pd.read_excel("Excel_test.xlsx", header=None, index=False)#the full excel files
fullFile2 = pd.read_excel("Excel_test02.xlsx", header=None, index=False)
i = 0
writer = pd.ExcelWriter("output.xlsx")
def loadingTime(): #just a loader that shows the percentage of the matching process
global i
loading = (i / len(File1)) * 100
loading = round(loading, 2)
print(str(loading) + "%/100%")
def matcher():
global i
while(i < len(File1)):#goes in column that should be compared and goes on higher if there is a match found in second file
for o in range(len(File2)):#runs through the column in second file
matching = File1.iloc[i].str.lower() == File2.iloc[o].str.lower() #matches the column contents of the two files
if matching.bool() == True:
print("Match")
"""
df.append(File1.iloc[i])#the whole row of the matched column should be appended in Dataframe with the arrangement of excel file
df.append(File2.iloc[o])#the whole row of the matched column should be appended in Dataframe with the arrangement of excel file
"""
i += 1
matcher()
df.to_excel(writer, "Sheet")
writer.save() #After the two files have been compared to each other, now a file containing both excel contents and is also arranged correctly

how to edit a csv in python and add one row after the 2nd row that will have the same values in all columns except 1

I'm new in Python language and i'm facing a small challenge in which i havent been able to figure it out so far.
I receive a csv file with around 30-40 columns and 5-50 rows with various details in each cell. The 1st row of the csv has the title for each column and by the 2nd row i have item values.
What i want to do is to create a python script which will read the csv file and every time to do the following:
Add a row after the actual 1st item row, (literally after the 2nd row, cause the 1st row is titles), and in that new 3rd row to contain the same information like the above one with one difference only. in the column "item_subtotal" i want to add the value from the column "discount total".
all the bellow rows should remain as they are, and save this modified csv as a new file with the word "edited" added in the file name.
I could really use some help because so far i've only managed to open the csv file with a python script im developing, but im not able so far to add the contents of the above row to that newly created row and replace that specific value.
Looking forward any help.
Thank you
Here Im attaching the CSV with some values changed for privacy reasons.
order_id,order_number,date,status,shipping_total,shipping_tax_total,fee_total,fee_tax_total,tax_total,discount_total,order_total,refunded_total,order_currency,payment_method,shipping_method,customer_id,billing_first_name,billing_last_name,billing_company,billing_email,billing_phone,billing_address_1,billing_address_2,billing_postcode,billing_city,billing_state,billing_country,shipping_first_name,shipping_last_name,shipping_address_1,shipping_address_2,shipping_postcode,shipping_city,shipping_state,shipping_country,shipping_company,customer_note,item_id,item_product_id,item_name,item_sku,item_quantity,item_subtotal,item_subtotal_tax,item_total,item_total_tax,item_refunded,item_refunded_qty,item_meta,shipping_items,fee_items,tax_items,coupon_items,order_notes,download_permissions_granted,admin_custom_order_field:customer_type_5
15001_TEST_2,,"2017-10-09 18:53:12",processing,0,0.00,0.00,0.00,5.36,7.06,33.60,0.00,EUR,PayoneCw_PayPal,"0,00",0,name,surname,,name.surname#gmail.com,0123456789,"address 1",,41541_TEST,location,,DE,name,surname,address,01245212,14521,location,,DE,,,1328,302,"product title",103,1,35.29,6.71,28.24,5.36,0.00,0,,"id:1329|method_id:free_shipping:3|method_title:0,00|total:0.00",,id:1330|rate_id:1|code:DE-MWST-1|title:MwSt|total:5.36|compound:,"id:1331|code:#getgreengent|amount:7.06|description:Launchcoupon for friends","text string",1,
You can also use pandas to manipulate the data from the csv like this:
import pandas
import copy
Read the csv file into a pandas dataframe:
df = pandas.read_csv(filename)
Make a deepcopy of the first row of data and add the discount total to the item subtotal:
new_row = copy.deepcopy(df.loc[1])
new_row['item_subtotal'] += new_row['discount total']
Concatenate the first 2 rows with the new row and then everything after that:
df = pandas.concat([df.loc[:1], new_row, df.loc[2:]], ignore_index=True)
Change the filename and write the out the new csv file:
filename = filename.strip('.csv') + 'edited.csv'
df.to_csv(filename)
I hope this helps! Pandas is great for cleanly handling massive amounts of data, but may be overkill for what you are trying to do. Then again, maybe not. It would help to see an example data file.
The first step is to turn that .csv into something that is a little easier to work with. Fortunately, python has the 'csv' module which makes it easy to turn your .csv file into a much nicer list of lists. The below will give you a way to both turn your .csv into a list of lists and turn the modified data back into a .csv file.
import csv
import copy
def csv2list(ifile):
"""
ifile = the path of the csv to be converted into a list of lists
"""
f = open(ifile,'rb')
olist=[]
c = csv.reader(f, dialect='excel')
for line in c:
olist.append(line) #and update the outer array
f.close
return olist
#------------------------------------------------------------------------------
def list2csv(ilist,ofile):
"""
ilist = the list of lists to be converted
ofile = the output path for your csv file
"""
with open(ofile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
[csvwriter.writerow(x) for x in ilist]
Now, you can simply copy list[1] and change the appropriate element to reflect your summed value using:
listTemp = copy.deepcopy(ilist[1])
listTemp[n] = listTemp[n] + listTemp[n-x]
ilist.insert(2,listTemp)
As for how to change the file name, just use:
import os
newFileName = os.path.splitext(oldFileName)[0] + "edited" + os.path.splitext(oldFileName)[1]
Hopefully this will help you out!

In python removing rows from a excel file using xlrd, xlwt, and xlutils

Hello everyone and thank you in advance.
I have a python script where I am opening a template excel file, adding data (while preserving the style) and saving again. I would like to be able to remove rows that I did not edit before saving out the new xls file. My template xls file has a footer so I want to delete the extra rows before the footer.
Here is how I am loading the xls template:
self.inBook = xlrd.open_workbook(file_path, formatting_info=True)
self.outBook = xlutils.copy.copy(self.inBook)
self.outBookCopy = xlutils.copy.copy(self.inBook)
I then write the info to outBook while grabbing the style from outBookCopy and applying it to each row that I modify in outbook.
so how do I delete rows from outBook before writing it? Thanks everyone!
I achieved using Pandas package....
import pandas as pd
#Read from Excel
xl= pd.ExcelFile("test.xls")
#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])
#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)
dfs = dfs[dfs['Name'] != '']
#Updating the excel sheet with the updated DataFrame
dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)
xlwt does not provide a simple interface for doing this, but I've had success with a somewhat similar problem (inserting multiple copies of a row into a copied workbook) by directly changing the worksheet's rows attribute and the row numbers on the row and cell objects.
The rows attribute is a dict, indexed on row number, so iterating a row range takes a little care and you can't slice it.
Given the number of rows you want to delete and the initial row number of the first row you want to keep, something like this might work:
rows_indices_to_move = range(first_kept_row, worksheet.last_used_row + 1)
max_used_row = 0
for row_index in rows_indices_to_move:
new_row_number = row_index - number_to_delete
if row_index in worksheet.rows():
row = worksheet.rows[row_index]
row._Row__idx = new_row_number
for cell in row._Row__cells.values():
if cell:
cell.rowx = new_row_number
worksheet.rows[new_row_number] = row
max_used_row = new_row_number
else:
# There's no row in the block we're trying to slide up at this index, but there might be a row already present to clear out.
if new_row_number in worksheet.rows():
del worksheet.rows[new_row_number]
# now delete any remaining rows
del worksheet.rows[new_row_number + 1:]
# and update the internal marker for the last remaining row
if max_used_row:
worksheet.last_used_row = max_used_row
I would believe that there are bugs in that code, it's untested and relies on direct manipulation of the underlying data structures, but it should show the general idea. Modify the row and cell objects and adjust the rows dictionary so that the indices are correct.
Do you have merged ranges in the rows you want to delete, or below them? If so you'll also need to run through the worksheet's merged_ranges attribute and update the rows for them. Also, if you have multiple groups of rows to delete you'll need to adjust this answer - this is specific to the case of having a block of rows to delete and shifting everything below up.
As a side note - I was able to write text to my worksheet and preserve the predefined style thus:
def write_with_style(ws, row, col, value):
if ws.rows[row]._Row__cells[col]:
old_xf_idx = ws.rows[row]._Row__cells[col].xf_idx
ws.write(row, col, value)
ws.rows[row]._Row__cells[col].xf_idx = old_xf_idx
else:
ws.write(row, col, value)
That might let you skip having two copies of your spreadsheet open at once.
For those of us still stuck with xlrd/xlwt/xlutils, here's a filter you could use:
from xlutils.filter import BaseFilter
class RowFilter(BaseFilter):
rows_to_exclude: "Iterable[int]"
_next_output_row: int
def __init__(
self,
rows_to_exclude: "Iterable[int]",
):
self.rows_to_exclude = rows_to_exclude
self._next_output_row = -1
def _should_include_row(self, rdrowx):
return rdrowx not in self.rows_to_exclude
def row(self, rdrowx, wtrowx):
if self._should_include_row(rdrowx):
# Proceed with writing out the row to the output file
self._next_output_row += 1
self.next.row(
rdrowx, self._next_output_row,
)
# After `row()` has been called, `cell()` is called for each cell of the row
def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
if self._should_include_row(rdrowx):
self.next.cell(
rdrowx, rdcolx, self._next_output_row, wtcolx,
)
Then put it to use with e.g.:
from xlrd import open_workbook
from xlutils.filter import DirectoryWriter, XLRDReader
xlutils.filter.process(
XLRDReader(open_workbook("input_filename.xls", "output_filename.xls")),
RowFilter([3, 4, 5]),
DirectoryWriter("output_dir"),
)

Categories

Resources