Python - rearrange and write strings (line splitting, unwanted newline) - python

I have a script that:
Reads in each line of a file
Finds the '*' character in each line and splits the line here
Rearranges the 3 parts (first to last, and last to first)
Writes the rearranged strings to a .txt file
Problem is, it's finding some new line character or something, and isn't outputting how it should. Have tried stripping newline chars, but there must be something I'm missing.
Thanks in advance for any help!
the script:
## Import packages
import time
import csv
## Make output file
file_output = open('output.txt', 'w')
## Open file and iterate over, rearranging the order of each string
with open('input.csv', 'rb') as f:
## Jump to next line (skips file headers)
next(f)
## Split each line, rearrange, and write the new line
for line in f:
## Strip newline chars
line = line.strip('\n')
## Split original string
category, star, value = line.rpartition("*")
##Make new string
new_string = value+star+category+'\n'
## Write new string to file
file_output.write(new_string)
file_output.close()
## Require input (stops program from immediately quitting)
k = input(" press any key to exit")
Input file (input.csv):
Category*Hash Value
1*FB1124FF6D2D4CD8FECE39B2459ED9D5
1*FB1124FF6D2D4CD8FECE39B2459ED9D5
1*FB1124FF6D2D4CD8FECE39B2459ED9D5
1*34AC061CCCAD7B9D70E8EF286CA2F1EA
Output file (output.txt)
FB1124FF6D2D4CD8FECE39B2459ED9D5
*1
FB1124FF6D2D4CD8FECE39B2459ED9D5
*1
FB1124FF6D2D4CD8FECE39B2459ED9D5
*1
34AC061CCCAD7B9D70E8EF286CA2F1EA
*1
EDIT: Answered. Thanks everyone! Looks all good now! :)

The file output.txt should exist.
The following work with python2 on debian:
## Import packages
import time
import csv
## Make output file
file_output = open('output.txt', 'w')
## Open file and iterate over, rearranging the order of each string
with open('input.csv', 'rb') as f:
## Jump to next line (skips file headers)
next(f)
## Split each line, rearrange, and write the new line
for line in f:
## Split original string
category, star, value = line.rpartition("*")
##Make new string
new_string = value.strip()+star+category+'\n'
## Write new string to file
file_output.write(new_string)
file_output.close()
## Require input (stops program from immediately quitting)
k = input(" press any key to exit")
I strip() the value witch contain the \n in order to sanitize it. You used strip('\n') which could be ambiguous and just using the method without parameter do the job.

Use a DictWriter
import csv
with open('aster.csv') as f, open('out.txt', 'w') as fout:
reader = csv.DictReader(f, delimiter='*')
writer = csv.DictWriter(fout, delimiter='*', fieldnames=['Hash Value','Category'])
#writer.writeheader()
for line in reader:
writer.writerow(line)
Without csv library
with open('aster.csv') as f:
next(f)
lines = [line.strip().split('*') for line in f]
with open('out2.txt', 'w') as fout:
for line in lines:
fout.write('%s*%s\n' % (line[1], line[0]))

Related

Trying to remove multiple space in txt file using python [duplicate]

So I have this crazy long text file made by my crawler and it for some reason added some spaces inbetween the links, like this:
https://example.com/asdf.html (note the spaces)
https://example.com/johndoe.php (again)
I want to get rid of that, but keep the new line. Keep in mind that the text file is 4.000+ lines long. I tried to do it myself but figured that I have no idea how to loop through new lines in files.
Seems like you can't directly edit a python file, so here is my suggestion:
# first get all lines from file
with open('file.txt', 'r') as f:
lines = f.readlines()
# remove spaces
lines = [line.replace(' ', '') for line in lines]
# finally, write lines in the file
with open('file.txt', 'w') as f:
f.writelines(lines)
You can open file and read line by line and remove white space -
Python 3.x:
with open('filename') as f:
for line in f:
print(line.strip())
Python 2.x:
with open('filename') as f:
for line in f:
print line.strip()
It will remove space from each line and print it.
Hope it helps!
Read text from file, remove spaces, write text to file:
with open('file.txt', 'r') as f:
txt = f.read().replace(' ', '')
with open('file.txt', 'w') as f:
f.write(txt)
In #Leonardo Chirivì's solution it's unnecessary to create a list to store file contents when a string is sufficient and more memory efficient. The .replace(' ', '') operation is only called once on the string, which is more efficient than iterating through a list performing replace for each line individually.
To avoid opening the file twice:
with open('file.txt', 'r+') as f:
txt = f.read().replace(' ', '')
f.seek(0)
f.write(txt)
f.truncate()
It would be more efficient to only open the file once. This requires moving the file pointer back to the start of the file after reading, as well as truncating any possibly remaining content left over after you write back to the file. A drawback to this solution however is that is not as easily readable.
I had something similar that I'd been dealing with.
This is what worked for me (Note: This converts from 2+ spaces into a comma, but if you read below the code block, I explain how you can get rid of ALL whitespaces):
import re
# read the file
with open('C:\\path\\to\\test_file.txt') as f:
read_file = f.read()
print(type(read_file)) # to confirm that it's a string
read_file = re.sub(r'\s{2,}', ',', read_file) # find/convert 2+ whitespace into ','
# write the file
with open('C:\\path\\to\\test_file.txt', 'w') as f:
f.writelines('read_file')
This helped me then send the updated data to a CSV, which suited my need, but it can help for you as well, so instead of converting it to a comma (','), you can convert it to an empty string (''), and then [or] use a read_file.replace(' ', '') method if you don't need any whitespaces at all.
Lets not forget about adding back the \n to go to the next row.
The complete function would be :
with open(str_path, 'r') as file :
str_lines = file.readlines()
# remove spaces
if bl_right is True:
str_lines = [line.rstrip() + '\n' for line in str_lines]
elif bl_left is True:
str_lines = [line.lstrip() + '\n' for line in str_lines]
else:
str_lines = [line.strip() + '\n' for line in str_lines]
# Write the file out again
with open(str_path, 'w') as file:
file.writelines(str_lines)

Parsing Logs with Regular Expressions Python

Coding and Python lightweight :)
I've gotta iterate through some logfiles and pick out the ones that say ERROR. Boom done got that. What I've gotta do is figure out how to grab the following 10 lines containing the details of the error. Its gotta be some combo of an if statement and a for/while loop I presume. Any help would be appreciated.
import os
import re
# Regex used to match
line_regex = re.compile(r"ERROR")
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("NodeOut.log")
# Overwrites the file, ensure we're starting out with a blank file
#TODO Append this later
with open(output_filename, "w") as out_file:
out_file.write("")
# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
# Open input file in 'read' mode
with open("MXNode1.stdout", "r") as in_file:
# Loop over each log line
for line in in_file:
# If log line matches our regex, print remove later, and write > file
if (line_regex.search(line)):
# for i in range():
print(line)
out_file.write(line)
There is no need for regex to do this, you can just use the in operator ("ERROR" in line).
Also, to clear the content of the file without opening it in w mode, you can simply place the cursor at the beginning of the file and truncate.
import os
output_filename = os.path.normpath("NodeOut.log")
with open(output_filename, 'a') as out_file:
out_file.seek(0, 0)
out_file.truncate(0)
with open("MXNode1.stdout", 'r') as in_file:
line = in_file.readline()
while line:
if "ERROR" in line:
out_file.write(line)
for i in range(10):
out_file.write(in_file.readline())
line = in_file.readline()
We use a while loop to read lines one by one using in_file.readline(). The advantage is that you can easily read the next line using a for loop.
See the doc:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.
Assuming you would only want to always grab the next 10 lines, then you could do something similar to:
with open("MXNode1.stdout", "r") as in_file:
# Loop over each log line
lineCount = 11
for line in in_file:
# If log line matches our regex, print remove later, and write > file
if (line_regex.search(line)):
# for i in range():
print(line)
lineCount = 0
if (lineCount < 11):
lineCount += 1
out_file.write(line)
The second if statement will help you always grab the line. The magic number of 11 is so that you grab the next 10 lines after the initial line that the ERROR was found on.

Python prints two lines in the same line when merging files

I am new to Python and I'm getting this result and I am not sure how to fix it efficiently.
I have n files, let's say for simplicity just two, with some info with this format:
1.250484649 4.00E-02
2.173737246 4.06E-02
... ...
This continues up to m lines. I'm trying to append all the m lines from the n files in a single file. I prepared this code:
import glob
outfile=open('temp.txt', 'w')
for inputs in glob.glob('*.dat'):
infile=open(inputs,'r')
for row in infile:
outfile.write(row)
It reads all the .dat files (the ones I am interested in) and it does what I want but it merges the last line of the first file and the first line of the second file into a single line:
1.250484649 4.00E-02
2.173737246 4.06E-02
3.270379524 2.94E-02
3.319202217 6.56E-02
4.228424345 8.91E-03
4.335169497 1.81E-02
4.557886098 6.51E-02
5.111075901 1.50E-02
5.547288248 3.34E-02
5.685118615 3.22E-03
5.923718239 2.86E-02
6.30299944 8.05E-03
6.528018125 1.25E-020.704223685 4.98E-03
1.961058114 3.07E-03
... ...
I'd like to fix this in a smart way. I can fix this if I introduce a blank line between each data line and then at the end remove all the blank likes but this seems suboptimal.
Thank you!
There's no newline on the last line of each .dat file, so you'll need to add it:
import glob
with open('temp.txt', 'w') as outfile:
for inputs in glob.glob('*.dat'):
with open(inputs, 'r') as infile:
for row in infile:
if not row.endswith("\n"):
row = f"{row}\n"
outfile.write(row)
Also using with (context managers) to automatically close the files afterwards.
To avoid a trailing newline - there's a few ways to do this, but the simplest one that comes to mind is to load all the input data into memory as individual lines, then write it out in one go using "\n".join(lines). This puts "\n" between each line but not at the end of the last line in the file.
import glob
lines = []
for inputs in glob.glob('*.dat'):
with open(inputs, 'r') as infile:
lines += [line.rstrip('\n') for line in infile.readlines()]
with open('temp.txt', 'w') as outfile:
outfile.write('\n'.join(lines))
[line.rstrip('\n') for line in infile.readlines()] - this is a list comprehension. It makes a list of each line in an individual input file, with the '\n' removed from the end of the line. It can then be += appended to the overall list of lines.
While we're here - let's use logging to give status updates:
import glob
import logging
OUT_FILENAME = 'test.txt'
lines = []
for inputs in glob.glob('*.dat'):
logging.info(f'Opening {inputs} to read...')
with open(inputs, 'r') as infile:
lines += [line.rstrip('\n') for line in infile.readlines()]
logging.info(f'Finished reading {inputs}')
logging.info(f'Opening {OUT_FILENAME} to write...')
with open(OUT_FILENAME, 'w') as outfile:
outfile.write('\n'.join(lines))
logging.info(f'Finished writing {OUT_FILENAME}')

How do I split each line into two strings and print without the comma?

I'm trying to have output to be without commas, and separate each line into two strings and print them.
My code so far yields:
173,70
134,63
122,61
140,68
201,75
222,78
183,71
144,69
But i'd like it to print it out without the comma and the values on each line separated as strings.
if __name__ == '__main__':
# Complete main section of code
file_name = "data.txt"
# Open the file for reading here
my_file = open('data.txt')
lines = my_file.read()
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
print(lines)
In your sample code, line contains the full content of the file as a str.
my_file = open('data.txt')
lines = my_file.read()
You then later re-open the file to iterate the lines:
with open('data.txt') as f:
for line in f:
lines.split()
lines.replace(',', ' ')
Note, however, str.split and str.replace do not modify the existing value, as strs in python are immutable. Also note you are operating on lines there, rather than the for-loop variable line.
Instead, you'll need to assign the result of those functions into new values, or give them as arguments (E.g., to print). So you'll want to open the file, iterate over the lines and print the value with the "," replaced with a " ":
with open("data.txt") as f:
for line in f:
print(line.replace(",", " "))
Or, since you are operating on the whole file anyway:
with open("data.txt") as f:
print(f.read().replace(",", " "))
Or, as your file appears to be CSV content, you may wish to use the csv module from the standard library instead:
import csv
with open("data.txt", newline="") as csvfile:
for row in csv.reader(csvfile):
print(*row)
with open('data.txt', 'r') as f:
for line in f:
for value in line.split(','):
print(value)
while python can offer us several ways to open files this is the prefered one for working with files. becuase we are opening the file in lazy mode (this is the prefered one espicialy for large files), and after exiting the with scope (identation block) the file io will be closed automaticly by the system.
here we are openening the file in read mode. files folow the iterator polices, so we can iterrate over them like lists. each line is a true line in the file and is a string type.
After getting the line, in line variable, we split (see str.split()) the line into 2 tokens, one before the comma and the other after the comma. split return new constructed list of strings. if you need to omit some unwanted characters you can use the str.strip() method. usualy strip and split combined together.
elegant and efficient file reading - method 1
with open("data.txt", 'r') as io:
for line in io:
sl=io.split(',') # now sl is a list of strings.
print("{} {}".format(sl[0],sl[1])) #now we use the format, for printing the results on the screen.
non elegant, but efficient file reading - method 2
fp = open("data.txt", 'r')
line = None
while (line=fp.readline()) != '': #when line become empty string, EOF have been reached. the end of file!
sl=line.split(',')
print("{} {}".format(sl[0],sl[1]))

Insert text in between file lines in python

I have a file that I am currently reading from using
fo = open("file.txt", "r")
Then by doing
file = open("newfile.txt", "w")
file.write(fo.read())
file.write("Hello at the end of the file")
fo.close()
file.close()
I basically copy the file to a new one, but also add some text at the end of the newly created file. How would I be able to insert that line say, in between two lines separated by an empty line? I.e:
line 1 is right here
<---- I want to insert here
line 3 is right here
Can I tokenize different sentences by a delimiter like \n for new line?
First you should load the file using the open() method and then apply the .readlines() method, which splits on "\n" and returns a list, then you update the list of strings by inserting a new string in between the list, then simply write the contents of the list to the new file using the new_file.write("\n".join(updated_list))
NOTE: This method will only work for files which can be loaded in the memory.
with open("filename.txt", "r") as prev_file, open("new_filename.txt", "w") as new_file:
prev_contents = prev_file.readlines()
#Now prev_contents is a list of strings and you may add the new line to this list at any position
prev_contents.insert(4, "\n This is a new line \n ")
new_file.write("\n".join(prev_contents))
readlines() is not recommended because it reads the whole file into memory. It is also not needed because you can iterate over the file directly.
The following code will insert Hello at line 2 at line 2
with open('file.txt', 'r') as f_in:
with open('file2.txt','w') as f_out:
for line_no, line in enumerate(f_in, 1):
if line_no == 2:
f_out.write('Hello at line 2\n')
f_out.write(line)
Note the use of the with open('filename','w') as filevar idiom. This removes the need for an explicit close() because it closes the file automatically at the end of the block, and better, it does this even if there is an exception.
For Large file
with open ("s.txt","r") as inp,open ("s1.txt","w") as ou:
for a,d in enumerate(inp.readlines()):
if a==2:
ou.write("hi there\n")
ou.write(d)
U could use a marker
#FILE1
line 1 is right here
<INSERT_HERE>
line 3 is right here
#FILE2
some text
with open("FILE1") as file:
original = file.read()
with open("FILE2") as input:
myinsert = input.read()
newfile = orginal.replace("<INSERT_HERE>", myinsert)
with open("FILE1", "w") as replaced:
replaced.write(newfile)
#FILE1
line 1 is right here
some text
line 3 is right here

Categories

Resources