Writing to xlsx creating duplicate lines in one cell

Writing to xlsx creating duplicate lines in one cell - python

I'm reading data from in.txt and writing specific lines from that to Sample.xlsx. I'm grepping data between lines containing start and end and I set Flag when I'm parsing this section of input data. When Flag is set, whenever I encounter NAME: and AGE: in lines, it needs to be written to C and D columns respectively (Extra info: input file has the following pattern: first line contains NAME, next line contains AGE followed by an empty line and this pattern is repeated).
start is here
NAME:Abe
AGE:40
NAME:John
AGE:20
...
end
Input is similar to above. Now the problem is that I've around 1000 such lines, so roughly 333 NAMES, AGE. When I open excel sheet after running the code, I see that C2 has NAME:Abe repeated 21 times. D2 has AGE:40 repeated 21 times too. I reduced input to 100 lines, and the repetition reduced to 3. I can't seem to figure out why this is happening. When I change to 10 lines, ie just 3 name and age, this problem doesn't happen. C2 just has one name, C3 also one name.
from openpyxl import Workbook, load_workbook
fin = open('in.txt')
fout1 = open('name.txt','w')
fout2 = open('age.txt','w')
wb = Workbook()
ws = wb.active
i = 2
Flag = False
for lines in fin:
if 'start' in lines:
Flag = True
continue
if Flag and 'end' in lines:
break
if Flag:
if 'NAME:' in lines:
fout1.write(lines)
ws['C'+str(i)] = lines
elif 'AGE:' in lines:
fout2.write(lines)
ws['D'+str(i)] = lines
i += 1
wb.save(filename = 'Sample.xlsx')
Apologizing for the long write up. But please let me know what I'm doing wrong here.
Thanks for reading.
______________________________________ Edit-1 ________________________________
I just tried basic writing from text file to excel cells using the following minimal code.
for line in fin:
ws['C'+str(i)] = line
i += 1
This also creates the same error. Line gets written multiple times inside a cell. And the number of times it's getting repeated increases based on the number of lines in input text file.
__________________________________ Edit-2__________________________________
I seem to have fixed the issue, but still don't know why it got fixed. Since strings were getting printed without any issue, I removed last character from lines which should be next-line character. And everything is working as expected now. I'm not sure if this is a proper solution or why this is even happening. Anyway the below given code seems to resolve this issue.
for line in fin:
ws['C'+str(i)] = line[:-1]
i += 1

It's possible and advisable to try and avoid using a counter in Python. The following code is more expressive and maintainable.
from openpyxl import Workbook, load_workbook
fin = open('in.txt', 'r')
wb = Workbook()
ws = wb.active
ws.append([None, None, "NAME", "AGE"])
Flag = False
for line in fin.readlines():
if line.startswith("start"):
Flag = True
row = [None, None, None, None]
elif line.startswith("end"):
break
elif Flag:
if line.startswith('NAME:'):
row[2] = line[5:]
elif line.startswith('AGE:'):
row[3] = int(line[4:])
ws.append(row)
wb.save(filename = 'Sample.xlsx')
fin.close()

Related

Replacing line contains NULL byte with '0' value

Columns with Null ByteHow can I replace the NULL byte with '0' value after opening up csv file?
My code is as follow but it doesn't work:
try:
# open source file
with open (dataFile,'r')as csvfile:
sourceDF = csv.reader(csvfile)
replaced = [sourceDF.replace(b'\0',b'0') for sourceDF in replaced]
print(replaced)
first_line = True
selHeaders = []
# read each row in source file
for dataRow in sourceDF:
# check if first line of file
if first_line == True:
first_line = False
first_row = dataRow
# check if first file in compile list
if first_run == 0:
result.append(list())
Attached hyper link for the csv files for your reference:
https://drive.google.com/drive/folders/1bPbE3hnO7ZAQEVTQ4prHUkEkqlDa7QTi?usp=share_link
Best regards
Tried following code but doesn't work
replaced = [sourceDF.replace(b'\0',b'0') for sourceDF in replaced]
print(replaced)

There are multiple problems in these two lines:
sourceDF = csv.reader(csvfile)
replaced = [sourceDF.replace(b'\0',b'0') for sourceDF in replaced]
The first line reads data from the CSV file. replaced is None at this time.
The second line now tries to iterate over replaced, which is None - which doesn't work. But even if it would work, it would immediately replace sourceDF with the things it iterates over, thus making the CSV data disappear.
I'm not sure whether replacing \0 by 0 is a good idea. In my 25 years of coding, I never replaced \0 by a visible character. I only replaced it by nothing, space or \n. But I can't really judge in your case, because I don't understand what kind of data you have and what the numbers mean.

Concatenate lines with previous line based on number of letters in first column

New to coding and trying to figure out how to fix a broken csv file to make be able to work with it properly.
So the file has been exported from a case management system and contains fields for username, casenr, time spent, notes and date.
The problem is that occasional notes have newlines in them and when exporting the csv the tooling does not contain quotation marks to define it as a string within the field.
see below example:
user;case;hours;note;date;
tnn;123;4;solved problem;2017-11-27;
tnn;124;2;random comment;2017-11-27;
tnn;125;3;I am writing a comment
that contains new lines
without quotation marks;2017-11-28;
HJL;129;8;trying to concatenate lines to re form the broken csv;2017-11-29;
I would like to concatenate lines 3,4 and 5 to show the following:
tnn;125;3;I am writing a comment that contains new lines without quotation marks;2017-11-28;
Since every line starts with a username (always 3 letters) I thought I would be able to iterate the lines to find which lines do not start with a username and concatenate that with the previous line.
It is not really working as expected though.
This is what I have got so far:
import re
with open('Rapp.txt', 'r') as f:
for line in f:
previous = line #keep current line in variable to join next line
if not re.match(r'^[A-Za-z]{3}', line): #regex to match 3 letters
print(previous.join(line))
Script shows no output just finishes silently, any thoughts?

I think I would go a slightly different way:
import re
all_the_data = ""
with open('Rapp.txt', 'r') as f:
for line in f:
if not re.search("\d{4}-\d{1,2}-\d{1,2};\n", line):
line = re.sub("\n", "", line)
all_the_data = "".join([all_the_data, line])
print (all_the_data)
There a several ways to do this each with pros and cons, but I think this keeps it simple.
Loop the file as you have done and if the line doesn't end in a date and ; take off the carriage return and stuff it into all_the_data. That way you don't have to play with looking back 'up' the file. Again, lots of way to do this. If you would rather use the logic of starts with 3 letters and a ; and looking back, this works:
import re
all_the_data = ""
with open('Rapp.txt', 'r') as f:
all_the_data = ""
for line in f:
if not re.search("^[A-Za-z]{3};", line):
all_the_data = re.sub("\n$", "", all_the_data)
all_the_data = "".join([all_the_data, line])
print ("results:")
print (all_the_data)
Pretty much what was asked for. The logic being if the current line doesn't start right, take out the previous line's carriage return from all_the_data.
If you need help playing with the regex itself, this site is great: http://regex101.com

The regex in your code matches to all the lines (string) in the txt (finds a valid match to the pattern). The if condition is never true and hence nothing prints.
with open('./Rapp.txt', 'r') as f:
join_words = []
for line in f:
line = line.strip()
if len(line) > 3 and ";" in line[0:4] and len(join_words) > 0:
print(';'.join(join_words))
join_words = []
join_words.append(line)
else:
join_words.append(line)
print(";".join(join_words))
I've tried to not use regex here to keep it a little clear if possible. But, regex is a better option.

A simple way would be to use a generator that acts as a filter on the original file. That filter would concatenate a line to the previous one if it has not a semicolon (;) in its 4th column. Code could be:
def preprocess(fd):
previous = next(fd)
for line in fd:
if line[3] == ';':
yield previous
previous = line
else:
previous = previous.strip() + " " + line
yield previous # don't forget last line!
You could then use:
with open(test.txt) as fd:
rd = csv.DictReader(preprocess(fd))
for row in rd:
...
The trick here is that the csv module only requires on object that returns a line each time next function is applied to it, so a generator is appropriate.
But this is only a workaround and the correct way would be that the previous step directly produces a correct CSV file.

Writing list to excel using openpyxl

Newer to Python, brand new to openpyxl - I have a large text file that I used regexs to extract the data I need - now I need to write that data to an excel file with openpyxl. The text file is a network address translation (NAT) table which looks like
src_adtr:10.20.30.40 dst_adtr:185.50.40.50 src_adtr_translated:70.60.50.40 dst_adtr_translated:99.44.55.66
These four elements would be one row of the excel file (I have about 500 rows that need to be written).
When I run the code, I do not get any errors, but it also does nothing - how can I correct this?
import re
import openpyxl
with open("NAT-table.txt", "r") as f:
text = f.read()
source = re.findall(r':src_adtr\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
dest = re.findall(r':dst_adtr\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
source_adtr = re.findall(r':src_adtr_translated.*\s+.*\s+.*\s+.*\s+.*\s+.*\s+:Name\s+[\(](.*)', text)
dest_adtr = re.findall(r':dst_adtr_translated\s+.*\s+.*\s+.*\s*\s+.*\s+.*\s.*\s+:Name\s+[\(](.*)', text)
firewall = re.findall(r'\w+LINT\d+(?!_)', text) #'\w' includes the '_'(underscore) char
natRules = list(zip(source, dest, source_adtr, dest_adtr, firewall))
wb = openpyxl.load_workbook('NAT-table.xlsx')
sheet = wb.active
for i in range(2, sheet.max_row):
for k in range(1, 5):
for t in natRules:
sheet.cell(row=i, column=k).value = natRules[i][k]
#sheet.cell(row=rowNum, column=1).value = i
wb.save('NAT-table.xlsx')

Your sample data does not match you regular expressions, so natRules is empty. So there is nothing to write. I think. I say this because it is difficult to tell from your question.
So, since there is no working sample data, it is hard to say exactly where you went wrong. I will point out that this part:
for t in natRules:
sheet.cell(row=i, column=k).value = natRules[i][k]
iterates on natRules, but makes no use of t. And in fact sets the cell multiple times.
Test Code:
Here is a chunk of test code, which uses your basic loop, and succeeds in writing an xlsx file. Suggest you could slowly modify this to look like your code, but with your data and see where it stops working.
data = [list('abcd'), list('efgh')]
wb = openpyxl.load_workbook('test.xlsx')
sheet = wb.active
for i, line in enumerate(data):
for k, val in enumerate(line):
sheet.cell(row=i+2, column=k+1).value = val
wb.save('test.xlsx')

Printing to a file via Python

Hopefully this is an easy fix. I'm trying to edit one field of a file we use for import, however when I run the following code it leaves the file blank and 0kb. Could anyone advise what I'm doing wrong?
import re #import regex so we can use the commands
name = raw_input("Enter filename:") #prompt for file name, press enter to just open test.nhi
if len(name) < 1 : name = "test.nhi"
count = 0
fhand = open(name, 'w+')
for line in fhand:
words = line.split(',') #obtain individual words by using split
words[34] = re.sub(r'\D', "", words[34]) #remove non-numeric chars from string using regex
if len(words[34]) < 1 : continue # If the 34th field is blank go to the next line
elif len(words[34]) == 2 : "{0:0>3}".format([words[34]]) #Add leading zeroes depending on the length of the field
elif len(words[34]) == 3 : "{0:0>2}".format([words[34]])
elif len(words[34]) == 4 : "{0:0>1}".format([words[34]])
fhand.write(words) #write the line
fhand.close() # Close the file after the loop ends

I have taken below text in 'a.txt' as input and modified your code. Please check if it's work for you.
#Intial Content of a.txt
This,program,is,Java,program
This,program,is,12Python,programs
Modified code as follow:
import re
#Reading from file and updating values
fhand = open('a.txt', 'r')
tmp_list=[]
for line in fhand:
#Split line using ','
words = line.split(',')
#Remove non-numeric chars from 34th string using regex
words[3] = re.sub(r'\D', "", words[3])
#Update the 3rd string
# If the 3rd field is blank go to the next line
if len(words[3]) < 1 :
#Removed continue it from here we need to reconstruct the original line and write it to file
print "Field empty.Continue..."
elif len(words[3]) >= 1 and len(words[3]) < 5 :
#format won't add leading zeros. zfill(5) will add required number of leading zeros depending on the length of word[3].
words[3]=words[3].zfill(5)
#After updating 3rd value in words list, again creating a line out of it.
tmp_str = ",".join(words)
tmp_list.append(tmp_str)
fhand.close()
#Writing to same file
whand = open("a.txt",'w')
for val in tmp_list:
whand.write(val)
whand.close()
File content after running code
This,program,is,,program
This,program,is,00012,programs

The file mode 'w+' Truncates your file to 0 bytes, so you'll only be able to read lines that you've written.
Look at Confused by python file mode "w+" for more information.
An idea would be to read the whole file first, close it, and re-open it to write files in it.

Not sure which OS you're on but I think reading and writing to the same file has undefined behaviour.
I guess internally the file object holds the position (try fhand.tell() to see where it is). You could probably adjust it back and forth as you went using fhand.seek(last_read_position) but really that's asking for trouble.
Also, I'm not sure how the script would ever end as it would end up reading the stuff it had just written (in a sort of infinite loop).
Best bet is to read the entire file first:
with open(name, 'r') as f:
lines = f.read().splitlines()
with open(name, 'w') as f:
for l in lines:
# ....
f.write(something)

For 'Printing to a file via Python' you can use:
ifile = open("test.txt","r")
print("Some text...", file = ifile)

Python, appending printed output to excel file

I was hoping someone may be able to point me in the right direction, or give an example on how I can put the following script output into an Excel spreadsheet using xlwt. My script prints out the following text on screen as required, however I was hoping to put this output into an Excel into two columns of time and value. Here's the printed output..
07:16:33.354 1
07:16:33.359 1
07:16:33.364 1
07:16:33.368 1
My script so far is below.
import re
f = open("C:\Results\16.txt", "r")
searchlines = f.readlines()
searchstrings = ['Indicator']
timestampline = None
timestamp = None
f.close()
a = 0
tot = 0
while a<len(searchstrings):
for i, line in enumerate(searchlines):
for word in searchstrings:
if word in line:
timestampline = searchlines[i-33]
for l in searchlines[i:i+1]: #print timestampline,l,
#print
for i in line:
str = timestampline
match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
if match:
value = line.split()
print '\t',match.group(),'\t',value[5],
print
print
tot = tot+1
break
print 'total count for', '"',searchstrings[a],'"', 'is', tot
tot = 0
a = a+1
I have had a few goes using xlwt or CSV writer, but each time i hit a wall and revert bact to my above script and try again. I am hoping to print match.group() and value[5] into two different columns on an Excel worksheet.
Thanks for your time...
MikG

What kind of problems do you have with xlwt? Personally, I find it very easy to use, remembering basic workflow:
import xlwt
create your spreadsheet using eg.
my_xls=xlwt.Workbook(encoding=your_char_encoding),
which returns you spreadsheet handle to use for adding sheets and saving whole file
add a sheet to created spreadsheet with eg.
my_sheet=my_xls.add_sheet("sheet name")
now, having sheet object, you can write on it's cells using sheet_name.write(row,column, value):
my_sheet.write(0,0,"First column title")
my sheet.write(0,1,"Second column title")
Save whole thing using spreadsheet.save('file_name.xls')
my_xls.save("results.txt")
It's a simplest of working examples; your code should of course use sheet.write(row,column,value) within loop printing data, let it be eg.:
import re
import xlwt
f = open("C:\Results\VAMOS_RxQual_Build_Update_Fri_04-11.08-16.txt", "r")
searchlines = f.readlines()
searchstrings = ['TSC Set 2 Indicator']
timestampline = None
timestamp = None
f.close()
a = 0
tot = 0
my_xls=xlwt.Workbook(encoding="utf-8") # begin your whole mighty xls thing
my_sheet=my_xls.add_sheet("Results") # add a sheet to it
row_num=0 # let it be current row number
my_sheet.write(row_num,0,"match.group()") # here go column headers,
my_sheet.write(row_num,1,"value[5]") # change it to your needs
row_num+=1 # let's change to next row
while a<len(searchstrings):
for i, line in enumerate(searchlines):
for word in searchstrings:
if word in line:
timestampline = searchlines[i-33]
for l in searchlines[i:i+1]: #print timestampline,l,
#print
for i in line:
str = timestampline
match = re.search(r'\d{2}:\d{2}:\d{2}.\d{3}', str)
if match:
value = line.split()
print '\t',match.group(),'\t',value[5],
# here goes cell writing:
my_sheet.write(row_num,0,match.group())
my_sheet.write(row_num,1,value[5])
row_num+=1
# and that's it...
print
print
tot = tot+1
break
print 'total count for', '"',searchstrings[a],'"', 'is', tot
tot = 0
a = a+1
# don't forget to save your file!
my_xls.save("results.xls")
A catch:
native date/time data writing to xls was a nightmare to me, as excel
internally doesn't store date/time data (nor I couldn't figure it
out),
be careful about data types you're writing into cells. For simple reporting at the begining it's enough to pass everything as a string,
later you should find xlwt documentation quite useful.
Happy XLWTing!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing to xlsx creating duplicate lines in one cell - python

Related

Replacing line contains NULL byte with '0' value

Concatenate lines with previous line based on number of letters in first column

Writing list to excel using openpyxl

Printing to a file via Python

Python, appending printed output to excel file

Categories

Resources