Python Not Creating Excel File with xlsxwriter - python

I have an Excel file with items and descriptions and I'm trying to compare descriptions for similarity, and if they're similar, put them in a new Excel file. Those items also have Catalog #'s and I'm comparing them to see if they're nothing like each other and they're from the same vendor (buy_line) put them also on the new Excel file. When I run the file, it takes way too long and after I leave it running, I come back and Spyder is closed and no new file. So this is a 2 part question, is there a way to make the code faster? and why is there no file created? Thank you in advance. My code is below
`import xlrd
import xlsxwriter
from fuzzywuzzy import fuzz
AllItems = xlrd.open_workbook('2-18All_Items-CleanUp.xlsx','rb')
sheets = AllItems.sheet_names()
item = []
base = []
kit = []
buy_line = []
catalogs = []
descriptions = []
similar_desc_item = []
similar_desc = []
diff_catalog_samebuyline = []
sh = AllItems.sheet_by_index(0)
def readexcelfunc():
for rownum in range(sh.nrows):
row_values = sh.row_values(rownum)
item.append((row_values[0]))
base.append((row_values[1]))
kit.append((row_values[2]))
buy_line.append((row_values[6]))
catalogs.append((row_values[8]))
descriptions.append((row_values[12]))
def check_similar_desc():
for i,k in enumerate(descriptions):
for j,l in enumerate(descriptions):
ratio1 = fuzz.token_sort_ratio(k,l)
if ratio1 > 95 and k != l and base[i] != base[j] and kit[i] == "No":
similar_desc_item.append(item[i])
def check_notmatching_catalog():
for x,a in enumerate(catalogs):
for y,b in enumerate(catalogs):
ratio2 = fuzz.token_sort_ratio(a,b)
if ratio2 < 10 and buy_line[x] == buy_line[y]:
diff_catalog_samebuyline.append(catalogs[x])
def Create_ExcelFile():
NewWorkbook = xlsxwriter.Workbook('Sim_Desc.xlsx')
worksheet = NewWorkbook.add_worksheet()
row1 = 0
row2 = 0
for items in similar_desc_item:
worksheet.write(row1,0,items)
row1 += 1
for catalognumb in diff_catalog_samebuyline:
worksheet.write(row2,3,catalognumb)
NewWorkbook.save()
NewWorkbook.close()
readexcelfunc()
check_similar_desc()
print (similar_desc_item)
check_notmatching_catalog()
Create_ExcelFile()
print("Finished")`

There are a few issues in the Create_ExcelFile() function. The first is that there is no workbook save() method. Also you aren't incrementing row2, so the second write() will alway write to the first row, and overwrite whatever else is there. However, most importantly, the close() method is at the wrong level so you are closing the file too early. Something like this should work:
def Create_ExcelFile():
NewWorkbook = xlsxwriter.Workbook('Sim_Desc.xlsx')
worksheet = NewWorkbook.add_worksheet()
row1 = 0
row2 = 0
for items in similar_desc_item:
worksheet.write(row1,0,items)
row1 += 1
for catalognumb in diff_catalog_samebuyline:
worksheet.write(row2,3,catalognumb)
# Fix the row2 increment!!
NewWorkbook.close()

Related

Skip First Column in CSV File with Pandas

I have a csv file that is generated that has some information in the first line. I'm trying to skip it but it doesn't seem to work. I tried looking at several suggestions and examples.
I tried using skiprows.
I also looked at several other examples.
Pandas drop first columns after csv read
https://datascientyst.com/pandas-read-csv-file-read_csv-skiprows/
Nothing I tried worked the way I wanted it.
When I got it to work it deleted the entire row.
Here is a sample of the code
# Imports the Pandas Module. It must be installed to run this script.
import pandas as pd
# Gets source file link
source_file = 'Csvfile.csv'
# Gets csv file and encodes it into a format that is compatible.
dataframe = pd.read_csv(source_copy, encoding='latin1')
df = pd.DataFrame({'User': dataframe.User, 'Pages': dataframe.Pages, 'Copies': dataframe.Copies,
'Color': dataframe.Grayscale, 'Duplex': dataframe.Duplex, 'Printer': dataframe.Printer})
# Formats data so that it can be used to count Duplex and Color pages.
df.loc[df["Duplex"] == "DUPLEX", "Duplex"] = dataframe.Pages
df.loc[df["Duplex"] == "NOT DUPLEX", "Duplex"] = 0
df.loc[df["Color"] == "NOT GRAYSCALE", "Color"] = dataframe.Pages
df.loc[df["Color"] == "GRAYSCALE", "Color"] = 0
df.sort_values(by=['User', 'Pages'])
file = df.to_csv('PrinterLogData.csv', index=False)
# Opens parsed CSV file.
output_source = "PrinterLogData.csv"
dataframe = pd.read_csv(output_source, encoding='latin1')
# Creates new DataFrame.
df = pd.DataFrame({'User': dataframe.User, 'Pages': dataframe.Pages, 'Copies': dataframe.Copies,
'Color': dataframe.Color, 'Duplex': dataframe.Duplex, 'Printer':
dataframe.Printer})
# Groups data by Users and Printer Sums
Report1 = df.groupby(['User'], as_index=False).sum().sort_values('Pages', ascending=False)
Report2 = (df.groupby(['Printer'], as_index=False).sum()).sort_values('Pages', ascending=False)
Sample Data
Sample Output of what I'm looking for.
This is an early draft of what you appear to want for your program (based on the simulated print-log.csv):
import csv
import itertools
import operator
import pathlib
CSV_FILE = pathlib.Path('print-log.csv')
EXTRA_COLUMNS = ['Pages', 'Grayscale', 'Color', 'Not Duplex', 'Duplex']
def main():
with CSV_FILE.open('rt', newline='') as file:
iterator = iter(file)
next(iterator) # skip first line if needed
reader = csv.DictReader(iterator)
table = list(reader)
create_report(table, 'Printer')
create_report(table, 'User')
def create_report(table, column_name):
key = operator.itemgetter(column_name)
table.sort(key=key)
field_names = [column_name] + EXTRA_COLUMNS
with pathlib.Path(f'{column_name} Report').with_suffix('.csv').open(
'wt', newline=''
) as file:
writer = csv.DictWriter(file, field_names)
writer.writeheader()
report = []
for key, group in itertools.groupby(table, key):
report.append({column_name: key} | analyze_group(group))
report.sort(key=operator.itemgetter('Pages'), reverse=True)
writer.writerows(report)
def analyze_group(group):
summary = dict.fromkeys(EXTRA_COLUMNS, 0)
for row in group:
pages = int(row['Pages']) * int(row['Copies'])
summary['Pages'] += pages
summary['Grayscale'] += pages if row['Grayscale'] == 'GRAYSCALE' else 0
summary['Color'] += pages if row['Grayscale'] == 'NOT GRAYSCALE' else 0
summary['Not Duplex'] += pages if row['Duplex'] == 'NOT DUPLEX' else 0
summary['Duplex'] += pages if row['Duplex'] == 'DUPLEX' else 0
return summary
if __name__ == '__main__':
main()

Cannot save modifications made in xlsx file

I read a .xlsx file, update it but Im not able to save it
from xml.dom import minidom as md
[... some code ....]
sheet = workDir + '/xl/worksheets/sheet'
sheet1 = sheet + '1.xml'
importSheet1 = open(sheet1,'r')
whole_file= importSheet1.read()
data_Sheet = md.parseString(whole_file)
[... some code ....]
self.array_mem_name = []
y = 1
x = 5 #first useful row
day = int(day)
found = 0
while x <= len_array_shared:
readrow = data_Sheet.getElementsByTagName('row')[x]
c_data = readrow.getElementsByTagName('c')[0]
c_attrib = c_data.getAttribute('t')
if c_attrib == 's':
vName = c_data.getElementsByTagName('v')[0].firstChild.nodeValue
#if int(vName) != broken:
mem_name = self.array_shared[int(vName)]
if mem_name != '-----':
if mem_name == old:
c_data = readrow.getElementsByTagName('c')[day]
c_attrib = c_data.getAttribute('t')
if (c_attrib == 's'):
v_Attrib = c_data.getElementsByTagName('v')[0].firstChild.nodeValue
if v_Attrib != '':
#loc = self.array_shared[int(v_Attrib)]
index = self.array_shared.index('--')
c_data.getElementsByTagName('v')[0].firstChild.nodeValue = index
with open(sheet1, 'w') as f:
f.write(whole_file)
As you can see I use f.write(whole_file) but whole_file has not the changes made with index.
Checking the debug I see that the new value has been added to the node, but I can't save sheet1 with the modified value
I switched to using openpyxl instead, as was suggested in a comment by Lei Yang. I found that this tool worked better for my jobs. With openpyxl, reading cell values is much easier than with xml.dom.minidom.
My only concern is that openpyxl seems really slower than the dom to load the workbook. Maybe the memory was overloaded. But, I was more interested in using something simpler than this minor performance issue.

Writing columns to a CSV file

I would like to update a column called Score for a specific row in a csv file. When a button is pressed, I would like the code to search the csv file until the row with the specified name is found (which is stored in variable name and randomly pulled from the csv file in a previous function called NameGenerator()), and update the relevant cell in the Score column to increment by 1.
Please note I am using an excel file saved as a .csv for this.
Any ideas how to do this? The code below does not work. Any help would be appreciated.
def Correct():
writer = csv.writer(namelist_file)
score=0
for row in writer:
if row[0] == name:
score=score+1
writer.writerow([col[1]] = score)
![The CSV file looks as follows
]1
So for example if the name tom is selected (elsewhere in the code, however stored in variable name), his score of 3 should be incremented by 1, turning into 4.
Here is what the function which pulls a random name from the csv file looks like:
def NameGenerator():
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
rownum=0
global array
array=[]
for row in reader:
if row[0] != '':
array.append(row[0])
rownum=rownum+1
length = len(array)-1
i = random.randint(1,length)
name = array[i]
return name
Can you please check if this works :
import sys
import random,csv
def update(cells):
d=""
for cell in cells:
d=d + str(cell)+","
return d[:-1]
def update_score(name):
with open('StudentNames&Questions.csv', 'r') as file:
data = file.readlines()
name_index = - 1
score_index = -1
headers = data[0]
for index,header in enumerate(headers.split(",")):
if header.strip() == 'Names':
name_index=index
if header.strip() == 'Score':
score_index=index
if name_index == -1 or score_index == -1:
print "Headers not found"
sys.exit()
for index,row in enumerate(data):
cells = row.split(",")
if cells[name_index] == name:
cells[score_index] = int(cells[score_index]) + 1
data[index]=update(cells)
with open('/Users/kgautam/tmp/tempfile-47', 'w') as file:
file.writelines(data)
def NameGenerator():
namelist_file = open('StudentNames&Questions.csv')
reader = csv.reader(namelist_file)
rownum=0
global array
array=[]
for row in reader:
if row[0] != '':
array.append(row[0])
rownum=rownum+1
length = len(array)-1
i = random.randint(1,length)
name = array[i]
return name
randome_name=NameGenerator()
update_score(randome_name)

can not get proper output when i write in .csv file...the comma separated text is being written in multiple columns,i want only two columns

def classify(self):
for i in self.tweets:
tw = self.tweets[i]
count = 0
res = {}
for t in tw:
label = self.classifier.classify(self.helper.extract_features(t.split()))
if(label == 'positive'):
self.pos_count[i] += 1
elif(label == 'negative'):
self.neg_count[i] += 1
elif(label == 'neutral'):
self.neut_count[i] += 1
result = {'text': t, 'tweet': self.origTweets[i][count], 'label': label}
res[count] = result
count += 1
#end inner loop
self.results[i] = res
def writeOutput(self,filename, writeOption='wb'):
fp = open(filename, writeOption)
if(fp):
for i in self.results:
res = self.results[i]
for j in res:[enter image description here][1]
item = res[j]
text = item['text'].strip()
label = item['label']
writeStr = label+" , "+text+"\n"
pickle.dump(writeStr,fp)
I am writing 'label' and 'text' in csv "filename", but the text is getting divided in multiple columns which causing further error in my work.
You need to use Python's CSV library. This takes a list of values and writes it to a file, automatically adding suitable delimiters and quoting to make it a proper comma separated file. In your case, your text contains commas, as such these items will automatically have " correctly added around them. This can be done roughly as follows:
import csv
def writeOutput(self, filename, writeOption='wb'):
with open(filename, writeOption) as f_output:
csv_output = csv.writer(f_output)
for i in self.results:
res = self.results[i]
for j in res:
item = res[j]
csv_output.writerow([item['label'], item['text'].strip()])

Append Function Nested Inside IF Statement Body Not Working

I am fairly new to Python (just started learning in the last two weeks) and am trying to write a script to parse a csv file to extract some of the fields into a List:
from string import Template
import csv
import string
site1 = 'D1'
site2 = 'D2'
site3 = 'D5'
site4 = 'K0'
site5 = 'K1'
site6 = 'K2'
site7 = '0'
site8 = '0'
site9 = '0'
lbl = 1
portField = 'y'
sw = 5
swpt = 6
cd = 0
pt = 0
natList = []
with open(name=r'C:\Users\dtruman\Documents\PROJECTS\SCRIPTING - NATAERO DEPLOYER\NATAERO DEPLOYER V1\nataero_deploy.csv') as rcvr:
for line in rcvr:
fields = line.split(',')
Site = fields[0]
siteList = [site1,site2,site3,site4,site5,site6,site7,site8,site9]
while Site in siteList == True:
Label = fields[lbl]
Switch = fields[sw]
if portField == 'y':
Switchport = fields[swpt]
natList.append([Switch,Switchport,Label])
else:
Card = fields[cd]
Port = fields[pt]
natList.append([Switch,Card,Port,Label])
print natList
Even if I strip the ELSE statement away and break into my code right after the IF clause-- i can verify that "Switchport" (first statement in IF clause) is successfully being populated with a Str from my csv file, as well as "Switch" and "Label". However, "natList" is not being appended with the fields parsed from each line of my csv for some reason. Python returns no errors-- just does not append "natList" at all.
This is actually going to be a function (once I get the code itself to work), but for now, I am simply setting the function parameters as global variables for the sake of being able to run it in an iPython console without having to call the function.
The "lbl", "sw", "swpt", "cd", and "pt" refer to column#'s in my csv (the finished function will allow user to enter values for these variables).
I assume I am running into some issue with "natList" scope-- but I have tried moving the "natList = []" statement to various places in my code to no avail.
I can run the above in a console, and then run "append.natList([Switch,Switchport,Label])" separately and it works for some reason....?
Thanks for any assistance!
It seems to be that the while condition needs an additional parenthesis. Just add some in this way while (Site in siteList) == True: or a much cleaner way suggested by Padraic while Site in siteList:.
It was comparing boolean object against string object.
Change
while Site in siteList == True:
to
if Site in siteList:
You might want to look into the csv module as this module attempts to make reading and writing csv files simpler, e.g.:
import csv
with open('<file>') as fp:
...
reader = csv.reader(fp)
if portfield == 'y':
natlist = [[row[i] for i in [sw, swpt, lbl]] for row in fp if row[0] in sitelist]
else:
natlist = [[row[i] for i in [sw, cd, pt, lbl]] for row in fp if row[0] in sitelist]
print natlist
Or alternatively using a csv.DictReader which takes the first row as the fieldnames and then returns dictionaries:
import csv
with open('<file>') as fp:
...
reader = csv.DictReader(fp)
if portfield == 'y':
fields = ['Switch', 'card/port', 'Label']
else:
fields = ['Switch', '??', '??', 'Label']
natlist = [[row[f] for f in fields] for row in fp if row['Building/Site'] in sitelist]
print natlist

Categories

Resources