I was hoping someone may be able to point me in the right direction, or give an example on how I can put the following script output into an Excel spreadsheet using xlwt. My script prints out the following text on screen as required, however I was hoping to put this output into an Excel into two columns of time and value. Here's the printed output..
07:16:33.354 1
07:16:33.359 1
07:16:33.364 1
07:16:33.368 1
My script so far is below.
import re
f = open("C:\Results\16.txt", "r")
searchlines = f.readlines()
searchstrings = ['Indicator']
timestampline = None
timestamp = None
f.close()
a = 0
tot = 0
while a<len(searchstrings):
for i, line in enumerate(searchlines):
for word in searchstrings:
if word in line:
timestampline = searchlines[i-33]
for l in searchlines[i:i+1]: #print timestampline,l,
#print
for i in line:
str = timestampline
match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
if match:
value = line.split()
print '\t',match.group(),'\t',value[5],
print
print
tot = tot+1
break
print 'total count for', '"',searchstrings[a],'"', 'is', tot
tot = 0
a = a+1
I have had a few goes using xlwt or CSV writer, but each time i hit a wall and revert bact to my above script and try again. I am hoping to print match.group() and value[5] into two different columns on an Excel worksheet.
Thanks for your time...
MikG
What kind of problems do you have with xlwt? Personally, I find it very easy to use, remembering basic workflow:
import xlwt
create your spreadsheet using eg.
my_xls=xlwt.Workbook(encoding=your_char_encoding),
which returns you spreadsheet handle to use for adding sheets and saving whole file
add a sheet to created spreadsheet with eg.
my_sheet=my_xls.add_sheet("sheet name")
now, having sheet object, you can write on it's cells using sheet_name.write(row,column, value):
my_sheet.write(0,0,"First column title")
my sheet.write(0,1,"Second column title")
Save whole thing using spreadsheet.save('file_name.xls')
my_xls.save("results.txt")
It's a simplest of working examples; your code should of course use sheet.write(row,column,value) within loop printing data, let it be eg.:
import re
import xlwt
f = open("C:\Results\VAMOS_RxQual_Build_Update_Fri_04-11.08-16.txt", "r")
searchlines = f.readlines()
searchstrings = ['TSC Set 2 Indicator']
timestampline = None
timestamp = None
f.close()
a = 0
tot = 0
my_xls=xlwt.Workbook(encoding="utf-8") # begin your whole mighty xls thing
my_sheet=my_xls.add_sheet("Results") # add a sheet to it
row_num=0 # let it be current row number
my_sheet.write(row_num,0,"match.group()") # here go column headers,
my_sheet.write(row_num,1,"value[5]") # change it to your needs
row_num+=1 # let's change to next row
while a<len(searchstrings):
for i, line in enumerate(searchlines):
for word in searchstrings:
if word in line:
timestampline = searchlines[i-33]
for l in searchlines[i:i+1]: #print timestampline,l,
#print
for i in line:
str = timestampline
match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
if match:
value = line.split()
print '\t',match.group(),'\t',value[5],
# here goes cell writing:
my_sheet.write(row_num,0,match.group())
my_sheet.write(row_num,1,value[5])
row_num+=1
# and that's it...
print
print
tot = tot+1
break
print 'total count for', '"',searchstrings[a],'"', 'is', tot
tot = 0
a = a+1
# don't forget to save your file!
my_xls.save("results.xls")
A catch:
native date/time data writing to xls was a nightmare to me, as excel
internally doesn't store date/time data (nor I couldn't figure it
out),
be careful about data types you're writing into cells. For simple reporting at the begining it's enough to pass everything as a string,
later you should find xlwt documentation quite useful.
Happy XLWTing!
Related
Newer to Python, brand new to openpyxl - I have a large text file that I used regexs to extract the data I need - now I need to write that data to an excel file with openpyxl. The text file is a network address translation (NAT) table which looks like
src_adtr:10.20.30.40 dst_adtr:185.50.40.50 src_adtr_translated:70.60.50.40 dst_adtr_translated:99.44.55.66
These four elements would be one row of the excel file (I have about 500 rows that need to be written).
When I run the code, I do not get any errors, but it also does nothing - how can I correct this?
import re
import openpyxl
with open("NAT-table.txt", "r") as f:
text = f.read()
source = re.findall(r':src_adtr\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
dest = re.findall(r':dst_adtr\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
source_adtr = re.findall(r':src_adtr_translated.*\s+.*\s+.*\s+.*\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
dest_adtr = re.findall(r':dst_adtr_translated\s+.*\s+.*\s+.*\s*\s+.*\s+.*\s.*\s+:Name\s+[\(](.*)', text)
firewall = re.findall(r'\w+LINT\d+(?!_)', text) #'\w' includes the '_'(underscore) char
natRules = list(zip(source, dest, source_adtr, dest_adtr, firewall))
wb = openpyxl.load_workbook('NAT-table.xlsx')
sheet = wb.active
for i in range(2, sheet.max_row):
for k in range(1, 5):
for t in natRules:
sheet.cell(row=i, column=k).value = natRules[i][k]
#sheet.cell(row=rowNum, column=1).value = i
wb.save('NAT-table.xlsx')
Your sample data does not match you regular expressions, so natRules is empty. So there is nothing to write. I think. I say this because it is difficult to tell from your question.
So, since there is no working sample data, it is hard to say exactly where you went wrong. I will point out that this part:
for t in natRules:
sheet.cell(row=i, column=k).value = natRules[i][k]
iterates on natRules, but makes no use of t. And in fact sets the cell multiple times.
Test Code:
Here is a chunk of test code, which uses your basic loop, and succeeds in writing an xlsx file. Suggest you could slowly modify this to look like your code, but with your data and see where it stops working.
data = [list('abcd'), list('efgh')]
wb = openpyxl.load_workbook('test.xlsx')
sheet = wb.active
for i, line in enumerate(data):
for k, val in enumerate(line):
sheet.cell(row=i+2, column=k+1).value = val
wb.save('test.xlsx')
I'm reading data from in.txt and writing specific lines from that to Sample.xlsx. I'm grepping data between lines containing start and end and I set Flag when I'm parsing this section of input data. When Flag is set, whenever I encounter NAME: and AGE: in lines, it needs to be written to C and D columns respectively (Extra info: input file has the following pattern: first line contains NAME, next line contains AGE followed by an empty line and this pattern is repeated).
start is here
NAME:Abe
AGE:40
NAME:John
AGE:20
...
end
Input is similar to above. Now the problem is that I've around 1000 such lines, so roughly 333 NAMES, AGE. When I open excel sheet after running the code, I see that C2 has NAME:Abe repeated 21 times. D2 has AGE:40 repeated 21 times too. I reduced input to 100 lines, and the repetition reduced to 3. I can't seem to figure out why this is happening. When I change to 10 lines, ie just 3 name and age, this problem doesn't happen. C2 just has one name, C3 also one name.
from openpyxl import Workbook, load_workbook
fin = open('in.txt')
fout1 = open('name.txt','w')
fout2 = open('age.txt','w')
wb = Workbook()
ws = wb.active
i = 2
Flag = False
for lines in fin:
if 'start' in lines:
Flag = True
continue
if Flag and 'end' in lines:
break
if Flag:
if 'NAME:' in lines:
fout1.write(lines)
ws['C'+str(i)] = lines
elif 'AGE:' in lines:
fout2.write(lines)
ws['D'+str(i)] = lines
i += 1
wb.save(filename = 'Sample.xlsx')
Apologizing for the long write up. But please let me know what I'm doing wrong here.
Thanks for reading.
______________________________________ Edit-1 ________________________________
I just tried basic writing from text file to excel cells using the following minimal code.
for line in fin:
ws['C'+str(i)] = line
i += 1
This also creates the same error. Line gets written multiple times inside a cell. And the number of times it's getting repeated increases based on the number of lines in input text file.
__________________________________ Edit-2__________________________________
I seem to have fixed the issue, but still don't know why it got fixed. Since strings were getting printed without any issue, I removed last character from lines which should be next-line character. And everything is working as expected now. I'm not sure if this is a proper solution or why this is even happening. Anyway the below given code seems to resolve this issue.
for line in fin:
ws['C'+str(i)] = line[:-1]
i += 1
It's possible and advisable to try and avoid using a counter in Python. The following code is more expressive and maintainable.
from openpyxl import Workbook, load_workbook
fin = open('in.txt', 'r')
wb = Workbook()
ws = wb.active
ws.append([None, None, "NAME", "AGE"])
Flag = False
for line in fin.readlines():
if line.startswith("start"):
Flag = True
row = [None, None, None, None]
elif line.startswith("end"):
break
elif Flag:
if line.startswith('NAME:'):
row[2] = line[5:]
elif line.startswith('AGE:'):
row[3] = int(line[4:])
ws.append(row)
wb.save(filename = 'Sample.xlsx')
fin.close()
I have a string of the form
**var beforeDate = new Date('2015-08-21')**
Here i do not know the value between the paranthesis(). I want to replace this date with any other date. How can I do it in Python ?
I thought of opening the file and then using standard replace function of the language but since the value beween () is not known, that wont be possible.
There would be a lot of code this snippet as well as after this snippet so replacing the whole line with a new line would not work as it would overwrite the code that surrounds this snippet.
How about using regex? Example:
temp.txt
print "I've got a lovely bunch of coconuts"
var beforeDate = new Date('2015-08-21') #date determined by fair die roll
print "Here they are, standing in a row"
main.py
import re
new_value = "'1999-12-31'"
with open("temp.txt") as infile:
data = infile.read()
data = re.sub(r"(var beforeDate = new Date\().*?(\))", "\\1"+new_value+"\\2", data)
with open("output.txt", "w") as outfile:
outfile.write(data)
output.txt after running:
print "I've got a lovely bunch of coconuts"
var beforeDate = new Date('1999-12-31') #date determined by fair die roll
print "Here they are, standing in a row"
Personally, I usually find re.split() to be simpler to use than re.sub(). This reuses Kevin's code, and will capture everything that it does (plus the middle group) and then replace the middle group:
import re
new_value = "'1999-12-31'"
with open("temp.txt") as infile:
data = infile.read()
data = re.split(r"(var beforeDate = new Date\()(.*?)(\))", data)
# data[0] is everything before the first capture
# data[1] is the first capture
# data[2] is the second capture -- the one we want to replace
data[2] = new_value
with open("output.txt", "w") as outfile:
outfile.write(''.join(stuff))
You could blow off capturing the middle group, but then you're inserting stuff in the list. It's easier just to do a replace.
OTOH, this particular problem might be small enough not to require the re hammer. Here is the same code without re:
new_value = "'1999-12-31'"
with open("temp.txt") as infile:
data = infile.read()
data = list(data.partition('var beforeDate = new Date('))
data += data.pop().partition(')')
data[2] = new_value
with open("output.txt", "w") as outfile:
for stuff in data:
outfile.write(stuff)
I'm new to Python but I would like to do some data analysis on some csv files. I'd like to print lines from a csv file that only include some keywords. I use the first block to print all valid lines. From these lines I would like to print the ones including keywords. Thanks for your help.
csv.field_size_limit(sys.maxsize)
invalids = 0
valids = 0
for f in ['1.csv']:
reader = csv.reader(open(f, 'rU'), delimiter='|', quotechar='\\')
for row in reader:
try:
print row[2]
valids += 1
except:
invalids += 1
print 'parsed %s records. ignored %s' % (valids, invalids)
With keywords:
for w in ['ford', 'hyundai','honda', 'jeep', 'maserati','audi','jaguar', 'volkswagen','chevrolet','chrysler']:
I guess I need to filter my top code with an if statement, but I've been struggling with this for hours and can't seem to get it to work.
Your guess is correct. All you need to do is filter the lines with an if statement, checking if each field matches a keyword. Here is how you do it (I've also made some improvement to your code and explained them in the comments.):
# First, create a set of the keywords. Sets are faster than a list for
# checking if they contain an element. The curly brackets create a set.
keywords = {'ford', 'hyundai','honda', 'jeep', 'maserati','audi','jaguar',
'volkswagen','chevrolet','chrysler'}
csv.field_size_limit(sys.maxsize)
invalids = 0
valids = 0
for filename in ['1.csv']:
# The with statement in Python makes sure that your file is properly closed
# (automatically) when an error occurs. This is a common idiom.
# In addition, CSV files should be opened only in 'rb' mode.
with open(filename, 'rb') as f:
reader = csv.reader(f, delimiter='|', quotechar='\\')
for row in reader:
try:
print row[2]
valids += 1
# Don't use bare except clauses. It will catch
# exceptions you don't want or intend to catch.
except IndexError:
invalids += 1
# The filtering is done here.
for field in row:
if field in keywords:
print row
break
# Prefer the str.format() method over the old style string formatting.
print 'parsed {0} records. ignored {1}'.format(valids, invalids)
I have been working on a Python script to parse a single delimited column in a csv file. However, the column has multiple different delimiters and I can't figure out how to do this.
I have another script that works on similar data, but can't get this one to work. The data below is in a single column on the row. I want to have the script parse these out and add tabs in between each. Then I want to append this data into a list with only the unique items. Typically I am dealing with several hundred rows of this data and would like to parse the entire file and then return only the unique items in two columns (one for IP and other for URL).
Data to parse: 123.123.123.123::url.com,url2.com,234.234.234.234::url3.com (note ":" and "," are used as delimiters on the same line)
Script I am working with:
import sys
import csv
csv_file = csv.DictReader(open(sys.argv[1], 'rb'), delimiter=':')
uniq_rows = []
for column in csv_file:
X = column[' IP'].split(':')[-1]
row = X + '\t'
if row not in uniq_rows:
uniq_rows.append(row)
for row in uniq_rows:
print row
Does anyone know how to accomplish what I am trying to do?
Change the list (uniq_rows = []) to a set (uniq_rows = set()):
csv_file = csv.DictReader(open(sys.argv[1], 'rU'), delimiter=':')
uniq_rows = set()
for column in csv_file:
X = column[' IP'].split(':')[-1]
row = X + '\t'
uniq_rows.add(row)
for row in list(uniq_rows):
print row
If you need further help, leave a comment
you can also just use replace to change your import lines: (not overly pythonic I guess but standard builtin):
>>> a = "123.123.123.123::url.com,url2.com,234.234.234.234::url3.com"
>>> a = a.replace(',','\t')
>>> a = a.replace(':','\t')
>>> print (a)
123.123.123.123 url.com url2.com 234.234.234.234 url3.com
>>>
as mentioned in comment here a simple text manipulation to get you (hopefully) the right output prior to cleaning non duplicates:
import sys
read_raw_file = open('D:filename.csv') # open current file
read_raw_text = read_raw_file.read()
new_text = read_raw_text.strip()
new_text = new_text.replace(',','\t')
# new_text = new_text.replace('::','\t') optional if you want double : to only include one column
new_text = new_text.replace(':','\t')
text_list = new_text.split('\n')
unique_items = []
for row in text_list:
if row not in unique_items:
unique_items.append(row)
new_file ='D:newfile.csv'
with open(new_file,'w') as write_output_file: #generate new file
for i in range(0,len(unique_items)):
write_output_file.write(unique_items[i]+'\n')
write_output_file.close()