Newer to Python, brand new to openpyxl - I have a large text file that I used regexs to extract the data I need - now I need to write that data to an excel file with openpyxl. The text file is a network address translation (NAT) table which looks like
src_adtr:10.20.30.40 dst_adtr:185.50.40.50 src_adtr_translated:70.60.50.40 dst_adtr_translated:99.44.55.66
These four elements would be one row of the excel file (I have about 500 rows that need to be written).
When I run the code, I do not get any errors, but it also does nothing - how can I correct this?
import re
import openpyxl
with open("NAT-table.txt", "r") as f:
text = f.read()
source = re.findall(r':src_adtr\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
dest = re.findall(r':dst_adtr\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
source_adtr = re.findall(r':src_adtr_translated.*\s+.*\s+.*\s+.*\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
dest_adtr = re.findall(r':dst_adtr_translated\s+.*\s+.*\s+.*\s*\s+.*\s+.*\s.*\s+:Name\s+[\(](.*)', text)
firewall = re.findall(r'\w+LINT\d+(?!_)', text) #'\w' includes the '_'(underscore) char
natRules = list(zip(source, dest, source_adtr, dest_adtr, firewall))
wb = openpyxl.load_workbook('NAT-table.xlsx')
sheet = wb.active
for i in range(2, sheet.max_row):
for k in range(1, 5):
for t in natRules:
sheet.cell(row=i, column=k).value = natRules[i][k]
#sheet.cell(row=rowNum, column=1).value = i
wb.save('NAT-table.xlsx')
Your sample data does not match you regular expressions, so natRules is empty. So there is nothing to write. I think. I say this because it is difficult to tell from your question.
So, since there is no working sample data, it is hard to say exactly where you went wrong. I will point out that this part:
for t in natRules:
sheet.cell(row=i, column=k).value = natRules[i][k]
iterates on natRules, but makes no use of t. And in fact sets the cell multiple times.
Test Code:
Here is a chunk of test code, which uses your basic loop, and succeeds in writing an xlsx file. Suggest you could slowly modify this to look like your code, but with your data and see where it stops working.
data = [list('abcd'), list('efgh')]
wb = openpyxl.load_workbook('test.xlsx')
sheet = wb.active
for i, line in enumerate(data):
for k, val in enumerate(line):
sheet.cell(row=i+2, column=k+1).value = val
wb.save('test.xlsx')
Related
I'm reading data from in.txt and writing specific lines from that to Sample.xlsx. I'm grepping data between lines containing start and end and I set Flag when I'm parsing this section of input data. When Flag is set, whenever I encounter NAME: and AGE: in lines, it needs to be written to C and D columns respectively (Extra info: input file has the following pattern: first line contains NAME, next line contains AGE followed by an empty line and this pattern is repeated).
start is here
NAME:Abe
AGE:40
NAME:John
AGE:20
...
end
Input is similar to above. Now the problem is that I've around 1000 such lines, so roughly 333 NAMES, AGE. When I open excel sheet after running the code, I see that C2 has NAME:Abe repeated 21 times. D2 has AGE:40 repeated 21 times too. I reduced input to 100 lines, and the repetition reduced to 3. I can't seem to figure out why this is happening. When I change to 10 lines, ie just 3 name and age, this problem doesn't happen. C2 just has one name, C3 also one name.
from openpyxl import Workbook, load_workbook
fin = open('in.txt')
fout1 = open('name.txt','w')
fout2 = open('age.txt','w')
wb = Workbook()
ws = wb.active
i = 2
Flag = False
for lines in fin:
if 'start' in lines:
Flag = True
continue
if Flag and 'end' in lines:
break
if Flag:
if 'NAME:' in lines:
fout1.write(lines)
ws['C'+str(i)] = lines
elif 'AGE:' in lines:
fout2.write(lines)
ws['D'+str(i)] = lines
i += 1
wb.save(filename = 'Sample.xlsx')
Apologizing for the long write up. But please let me know what I'm doing wrong here.
Thanks for reading.
______________________________________ Edit-1 ________________________________
I just tried basic writing from text file to excel cells using the following minimal code.
for line in fin:
ws['C'+str(i)] = line
i += 1
This also creates the same error. Line gets written multiple times inside a cell. And the number of times it's getting repeated increases based on the number of lines in input text file.
__________________________________ Edit-2__________________________________
I seem to have fixed the issue, but still don't know why it got fixed. Since strings were getting printed without any issue, I removed last character from lines which should be next-line character. And everything is working as expected now. I'm not sure if this is a proper solution or why this is even happening. Anyway the below given code seems to resolve this issue.
for line in fin:
ws['C'+str(i)] = line[:-1]
i += 1
It's possible and advisable to try and avoid using a counter in Python. The following code is more expressive and maintainable.
from openpyxl import Workbook, load_workbook
fin = open('in.txt', 'r')
wb = Workbook()
ws = wb.active
ws.append([None, None, "NAME", "AGE"])
Flag = False
for line in fin.readlines():
if line.startswith("start"):
Flag = True
row = [None, None, None, None]
elif line.startswith("end"):
break
elif Flag:
if line.startswith('NAME:'):
row[2] = line[5:]
elif line.startswith('AGE:'):
row[3] = int(line[4:])
ws.append(row)
wb.save(filename = 'Sample.xlsx')
fin.close()
I was hoping someone may be able to point me in the right direction, or give an example on how I can put the following script output into an Excel spreadsheet using xlwt. My script prints out the following text on screen as required, however I was hoping to put this output into an Excel into two columns of time and value. Here's the printed output..
07:16:33.354 1
07:16:33.359 1
07:16:33.364 1
07:16:33.368 1
My script so far is below.
import re
f = open("C:\Results\16.txt", "r")
searchlines = f.readlines()
searchstrings = ['Indicator']
timestampline = None
timestamp = None
f.close()
a = 0
tot = 0
while a<len(searchstrings):
for i, line in enumerate(searchlines):
for word in searchstrings:
if word in line:
timestampline = searchlines[i-33]
for l in searchlines[i:i+1]: #print timestampline,l,
#print
for i in line:
str = timestampline
match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
if match:
value = line.split()
print '\t',match.group(),'\t',value[5],
print
print
tot = tot+1
break
print 'total count for', '"',searchstrings[a],'"', 'is', tot
tot = 0
a = a+1
I have had a few goes using xlwt or CSV writer, but each time i hit a wall and revert bact to my above script and try again. I am hoping to print match.group() and value[5] into two different columns on an Excel worksheet.
Thanks for your time...
MikG
What kind of problems do you have with xlwt? Personally, I find it very easy to use, remembering basic workflow:
import xlwt
create your spreadsheet using eg.
my_xls=xlwt.Workbook(encoding=your_char_encoding),
which returns you spreadsheet handle to use for adding sheets and saving whole file
add a sheet to created spreadsheet with eg.
my_sheet=my_xls.add_sheet("sheet name")
now, having sheet object, you can write on it's cells using sheet_name.write(row,column, value):
my_sheet.write(0,0,"First column title")
my sheet.write(0,1,"Second column title")
Save whole thing using spreadsheet.save('file_name.xls')
my_xls.save("results.txt")
It's a simplest of working examples; your code should of course use sheet.write(row,column,value) within loop printing data, let it be eg.:
import re
import xlwt
f = open("C:\Results\VAMOS_RxQual_Build_Update_Fri_04-11.08-16.txt", "r")
searchlines = f.readlines()
searchstrings = ['TSC Set 2 Indicator']
timestampline = None
timestamp = None
f.close()
a = 0
tot = 0
my_xls=xlwt.Workbook(encoding="utf-8") # begin your whole mighty xls thing
my_sheet=my_xls.add_sheet("Results") # add a sheet to it
row_num=0 # let it be current row number
my_sheet.write(row_num,0,"match.group()") # here go column headers,
my_sheet.write(row_num,1,"value[5]") # change it to your needs
row_num+=1 # let's change to next row
while a<len(searchstrings):
for i, line in enumerate(searchlines):
for word in searchstrings:
if word in line:
timestampline = searchlines[i-33]
for l in searchlines[i:i+1]: #print timestampline,l,
#print
for i in line:
str = timestampline
match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
if match:
value = line.split()
print '\t',match.group(),'\t',value[5],
# here goes cell writing:
my_sheet.write(row_num,0,match.group())
my_sheet.write(row_num,1,value[5])
row_num+=1
# and that's it...
print
print
tot = tot+1
break
print 'total count for', '"',searchstrings[a],'"', 'is', tot
tot = 0
a = a+1
# don't forget to save your file!
my_xls.save("results.xls")
A catch:
native date/time data writing to xls was a nightmare to me, as excel
internally doesn't store date/time data (nor I couldn't figure it
out),
be careful about data types you're writing into cells. For simple reporting at the begining it's enough to pass everything as a string,
later you should find xlwt documentation quite useful.
Happy XLWTing!
So working on a program in Python 3.3.2. New to it all, but I've been getting through it. I have an app that I made that will take 5 inputs. 3 of those inputs are comboboxs, two are entry widgets. I have then created a button event that will save those 5 inputs into a text file, and a csv file. Opening each file everything looks proper. For example saved info would look like this:
Brad M.,Mike K.,Danny,Iconnoshper,Strong Wolf Lodge
I then followed a csv demo and copied this...
import csv
ifile = open('myTestfile.csv', "r")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print('%-15s: %s' % (header[colnum], col))
colnum += 1
rownum += 1
ifile.close()
and that ends up printing beautifully as:
rTech: Brad M.
pTech: Mike K.
cTech: Danny
proNam: ohhh
jobNam: Yeah
rTech: Damien
pTech: Aaron
so on and so on. What I'm trying to figure out is if I've named my headers via
if rownum == 0:
header = row
is there a way to pull a specific row / col combo and print what is held there??
I have figured out that I could after the program ran do
print(col)
or
print(col[0:10]
and I am able to print the last col printed, or the letters from the last printed col. But I can't go any farther back than that last printed col.
My ultimate goal is to be able to assign variables so I could in turn have a label in another program get it's information from the csv file.
rTech for job is???
look in Jobs csv at row 1, column 1, and return value for rTech
do I need to create a dictionary that is loaded with the information then call the dictionary?? Thanks for any guidance
Thanks for the direction. So been trying a few different things one of which Im really liking is the following...
import csv
labels = ['rTech', 'pTech', 'cTech', 'productionName', 'jobName']
fn = 'my file.csv'
cameraTech = 'Danny'
f = open(fn, 'r')
reader = csv.DictReader(f, labels)
jobInformation = [(item["productionName"],
item["jobName"],
item["pTech"],
item["rTech"]) for item in reader if \
item['cTech'] == cameraTech]
f.close()
print ("Camera Tech: %s\n" % (cameraTech))
print ("\n".join(["Production Name: %s \nJob Name: %s \nPrep Tech: %s \nRental Agent: %s\n" % (item) for item in jobInformation]))
That shows me that I could create a variable through cameraTech and as long as that matched what was loaded into the reader that holds the csv file and that if cTech column had a match for cameraTech then it would fill in the proper information. 95% there WOOOOOO..
So now what I'm curious about is calling each item. The plan is in a window I have a listbox that is populated with items from a .txt file with "productionName" and "jobName". When I click on one of those items in the listbox a new window opens up and the matching information from the .csv file is then filled into the appropriate labels.
Thoughts??? Thanks again :)
I think that reading the CSV file into a dictionary might be a working solution for your problem.
The Python CSV package has built-in support for reading CSV files into a Python dictionary using DictReader, have a look at the documentation here: http://docs.python.org/2/library/csv.html#csv.DictReader
Here is an (untested) example using DictReader that reads the CSV file into a Python dictionary and prints the contents of the first row:
import csv
csv_data = csv.DictReader(open("myTestfile.csv"))
print(csv_data[0])
Okay so I was able to put this together after seeing the following (https://gist.github.com/zstumgoren/911615)
That showed me how to give each header a variable I could call. From there I could then create a function that would allow for certain variables to be called and compared and if that matched I would be able to see certain data needed. So the example I made to show myself it could be done is as follows:
import csv
source_file = open('jobList.csv', 'r')
for line in csv.DictReader(source_file, delimiter=','):
pTech= line['pTech']
cTech= line['cTech']
rAgent= line['rTech']
prodName= line['productionName']
jobName= line['jobName']
if prodName == 'another':
print(pTech, cTech, rAgent, jobName)
However I just noticed something, while my .csv file has one line this works great!!!! But, creating my proper .csv file, I am only able to print information from the last line read. Grrrrr.... Getting closer though.... I'm still searching but if someone understands my issue, would love some light.
I have a txt file which has some 'excel formulas', I have converted this to a csv file using Python csv reader/writer. Now I want to read the values of the csv file and do some calculation, but when i try to access the particular column of .csv file, it still returns me in the 'excel formula' instead of the actual value?? although When i open the csv file .. formulas are converted in to value??
Any ideas?
Here is the code
Code to convert txt to csv
def parseFile(filepath):
file = open(filepath,'r')
content = file.read()
file.close()
lines = content.split('\n')
csv_filepath = filepath[:(len(filepath)-4)]+'_Results.csv'
csv_out = csv.writer(open(csv_filepath, 'a'), delimiter=',' , lineterminator='\n')
for line in lines:
data = line.split('\t')
csv_out.writerow(data)
return csv_filepath
Code to do some calculation in csv file
def csv_cal (csv_filepath):
r = csv.reader(open(csv_filepath))
lines = [l for l in r]
counter =[0]*(len(lines[4])+6)
if lines[4][4] == 'Last Test Pass?' :
print ' i am here'
for i in range(0,3):
print lines[6] [4] ### RETURNS FORMULA ??
return 0
I am new to python, any help would be appreciated!
Thanks,
You can paste special in Excel with Values only option selected. You could select all and paste into a another sheet and save. This would save you from having to implement some kind of parser in python. Or, you could evaluate some simple arithmetic with eval.
edit:
I've heard of xlrd which can be downloaded from pypi. It loads .xls files.
It sounded like you just wanted the final data which past special can do.
I have a few .xy files (2 columns with x and y values). I have been trying to read all of them and paste the "y" values into a single excel file (The "x" values are the same in all these files). The code I have till now reads the files one by one but its extremely slow (it takes about 20 seconds on each file). I have quite a few .xy files and the time adds up considerably. The code I have till now is:
import os,fnmatch,linecache,csv
from openpyxl import Workbook
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Sheet1"
def batch_processing(file_name):
row_count = sum(1 for row in csv.reader(open(file_name)))
try:
for row in xrange(1,row_count):
data = linecache.getline(file_name, row)
print data.strip().split()[1]
print data
ws.cell("A"+str(row)).value = float(data.strip().split()[0])
ws.cell("B"+str(row)).value = float(data.strip().split()[1])
print file_name
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
except IndexError:
pass
workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
batch_processing(file_name)
Any help is appreciated. Thanks.
I think your main issue is that you're writing to Excel and saving on every single line in the file, for every single file in the directory. I'm not sure of how long it takes to actually write the value to Excel, but just moving the save out of the loop and saving only once everything has been added should cut a little time. Also, how large are these files? If they are massive, then linecache may be a good idea, but assuming they aren't overly large then you can probably do without it.
def batch_processing(file_name):
# Using 'with' is a better way to open files - it ensures they are
# properly closed, etc. when you leave the code block
with open(filename, 'rb') as f:
reader = csv.reader(f)
# row_count = sum(1 for row in csv.reader(open(file_name)))
# ^^^You actually don't need to do this at all (though it is clever :)
# You are using it now to govern the loop, but the more Pythonic way is
# to do it as follows
for line_no, line in enumerate(reader):
# Split the line and create two variables that will hold val1 and val2
val1, val2 = line
print val1, val2 # You can also remove this - printing takes time too
ws.cell("A"+str(line_no+1)).value = float(val1)
ws.cell("B"+str(line_no+1)).value = float(val2)
# Doing this here will save the file after you process an entire file.
# You could save a bit more time and move this to after your walk statement -
# that way, you are only saving once after everything has completed
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")