I have the following code.
import fileinput
map_dict = {'*':'999999999', '**':'999999999'}
for line in fileinput.FileInput("test.txt",inplace=1):
for old, new in map_dict.iteritems():
line = line.replace(old, new)
sys.stdout.write(line)
I have a txt file
1\tab*
*1\tab**
Then running the python code generates
1\tab999999999
9999999991\tab999999999
However, I want to replace "cell" (sorry if this is not standard terminology in python. I am using the terminology of Excel) not string.
The second cell is
*
So I want to replace it.
The third cell is
1*
This is not *. So I don't want to replace it.
My desired output is
1\tab999999999
*1\tab999999999
How should I make this? The user will tell this program which delimiter I am using. But the program should replace only the cell not string..
And also, how to have a separate output txt rather than overwriting the input?
Open a file for writing, and write to it.
Since you want to replace the exact complete values (for example not touch 1*), do not use replace. However, to analyze each value split your lines according to the tab character ('\t').
You must also remove end of line characters (as they may prevent matching last cells in a row).
Which gives
import fileinput
MAPS = (('*','999999999'),('**','999999999'))
with open('output.txt','w') as out_file:
for line in open("test.txt",'r'):
out_list = []
for inp_cell in line.rstrip('\n').split('\t'):
out_cell = inp_cell
for old, new in MAPS:
if out_cell == old:
out_cell = new
out_list.append(out_cell)
out_file.write( "\t".join(out_list) + "\n" )
There are more condensed/compact/optimized ways to do it, but I detailed each step on purpose, so that you may adapt to your needs (I was not sure this is exactly what you ask for).
the csv module can help:
#!python3
import csv
map_dict = {'*':'999999999','**':'999999999'}
with open('test.txt',newline='') as inf, open('test2.txt','w',newline='') as outf:
w = csv.writer(outf,delimiter='\t')
for line in csv.reader(inf,delimiter='\t'):
line = [map_dict[item] if item in map_dict else item for item in line]
w.writerow(line)
Notes:
with will automatically close files.
csv.reader parses and splits lines on a delimiter.
A list comprehension translates line items in the dictionary into a new line.
csv.writer writes the line back out.
Related
I have a csv which contains text like
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
when I run the code like below:
rows = csv.reader(csvfile)
for row in rows:
print(" ".join('%s' %row for row in rows))
it will project as follow:
['AAABBBBCCCDDDDDDD']
['EEEFFFRRRTTTHHHYY']
But I want to display as a series of words like below:
AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY
Is there anything wrong in the code?
Your example looks like you simply need
with open(csvfile) as inputfile: # misnomer; not really proper CSV
for row in inputfile:
print(row.rstrip('\n'), end='')
The example you provided doesn't look like a csv file. It looks like a simple text file. The you could have something as simple as :
Input.txt
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
Solution.py
input_filename = "Input.txt"
with open(input_filename) as input_file:
print("".join(x.rstrip('\n') for x in input_file))
This is taking advantage of:
A file object can be iterated on. This will give you a new line from each iteration
Every line received from the file will have newline character at its end. Since you seem to not want it we use the method .rstrip() to remove it
The .join() method can accept any iterable even a...
Generator expression which will help us create an iterable that will accepted by .join() using .rstrip() to format every line coming from the input file.
EDIT: OK let's decompose further my answer:
When you open a file you can iterate over it. In the most simple way to explain it, let's say that it means that you do a loop over it (for line in input_file: ...).
But not only that, but with an iterator you can create another iterator by transforming each element. This is what a list comprehension or, in the case I have chosen, a generator expression does. So the expression (x.rstrip() for x in input_file) will be a iterator that takes every element of input_file and applies to it .rstrip()
The string method .join() will glue together the elements provided by an iterator using that string as a separator. Since I use here an empty string there won't be a seperator. I have used the iterator defined before for this.
I then print() the string provided by the .join() operation explained before.
I did a minor correction on my answer because there is the edge case that if there are space or tab characters at the end of a line in the input file they would have been removed if I use x.rstrip() instead of x.rstrip('\n')
You could start with an empty string, and for every row read from the csv file, remove the newline at the end and add the contents to the empty string.
joined = ""
with open(csvfile) as f:
for row in f:
joined = joined + row.replace("\n","")
print(joined)
Output:
>> AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY
I have a file of configs. I am trying to get my python code to search for two different strings in a text file, copy (Cut would make my life so much easier) and paste them into a text file without duplicates. My code is working for just one string and every time I try to make it do two it will either not work or only find the lines with both strings.
What am I doing wrong?
import sys
with open("ns-batch.bak.txt") as f:
lines = f.readlines()
lines = [l for l in lines if "10.42.88.192"
in l]
with open("Py_parse2.txt", "w") as f1:
f1.writelines(lines)
Okay, here's my take on things.
Assuming that you are looking for certain strings within each line, and then want to "copy" those lines to another file to see in which lines those strings were found, this, for example, should work:
lines = list()
with open("ns-batch.bak.txt", "r") as orig_file:
for line in orig_file:
if ("12.32.45.1" in line) or ("27.82.1.0" in line): #if "12.32.45.1" in line:
lines.append(line)
with open("Py_parse2.txt", "x") as new_file:
for line in lines:
new_file.write(line + '\n')
Depending on how many strings you are looking for on each line, you can either add or remove in statements on line 5 of my example code (I also provided an example line of code on the same line that demonstrates only needing to find one string on a line, which I commented out). The import sys statement does absolutely nothing in this case; the sys module/package is not needed to do this work, so do not include that import statement. If you want to learn more about file I/O, check out this link ( https://docs.python.org/3/tutorial/inputoutput.html?highlight=write ) and go to section "7.2 Reading and Writing Files".
From other posts, I've learned that '\n' signifies a new line when adding to a txt file. I'm trying to do just this, but I can't figure out the right syntax when an attribute is right before the new line.
My code I'm trying is like this:
for item in list:
with open("file.txt", "w") as att_file:
att_file.write(variable\n)
As you can probably see, I'm trying to add the variable for each item in the list to a new line in a txt file. What's the correct way to do this?
You just need to specify the newline character as a string:
Eg:
with open("file.txt", "w") as att_file:
for item in list:
att_file.write(attribute + "\n")
try this:
att_file.write(attribute+"\n")
note :attribute must be some variable and must be string type
your code will look like this:
with open("file.txt", "w") as att_file:
for item in list:
att_file.write(item+"\n")
with should be before for, else every time you are opening file with write mode, it will omit the
previous write
file7 = open('test_list7_file.txt','a')
file7.write(var7 + '\n')
will work fine, BUT IF YOU APPEND TO EXISTING txt file the MOUSE CURSOR
needs to be ONE SPACE AFTER THE LAST ENTRY when you save the file for use in your program.
~no space and it joins the last entry,
~on the next line already when you create you file to be added to, it adds an empty line first
I am trying to write a python script to read in a large text file from some modeling results, grab the useful data and save it as a new array. The text file is output in a way that has a ## starting each line that is not useful. I need a way to search through and grab all the lines that do not include the ##. I am used to using grep -v in this situation and piping to a file. I want to do it in python!
Thanks a lot.
-Tyler
I would use something like this:
fh = open(r"C:\Path\To\File.txt", "r")
raw_text = fh.readlines()
clean_text = []
for line in raw_text:
if not line.startswith("##"):
clean_text.append(line)
Or you could also clean the newline and carriage return non-printing characters at the same time with a small modification:
for line in raw_text:
if not line.startswith("##"):
clean_text.append(line.rstrip("\r\n"))
You would be left with a list object that contains one line of required text per element. You could split this into individual words using string.split() which would give you a nested list per original list element which you could easily index (assuming your text has whitespaces of course).
clean_text[4][7]
would return the 5th line, 8th word.
Hope this helps.
[Edit: corrected indentation in loop]
My suggestion would be to do the following:
listoflines = [ ]
with open(.txt, "r") as f: # .txt = file, "r" = read
for line in f:
if line[:2] != "##": #Read until the second character
listoflines.append(line)
print listoflines
If you're feeling brave, you can also do the following, CREDITS GO TO ALEX THORNTON:
listoflines = [l for l in f if not l.startswith('##')]
The other answer is great as well, especially teaching the .startswith function, but I think this is the more pythonic way and also has the advantage of automatically closing the file as soon as you're done with it.
I have a program that writes a list to a file.
The list is a list of pipe delimited lines and the lines should be written to the file like this:
123|GSV|Weather_Mean|hello|joe|43.45
122|GEV|temp_Mean|hello|joe|23.45
124|GSI|Weather_Mean|hello|Mike|47.45
BUT it wrote them line this ahhhh:
123|GSV|Weather_Mean|hello|joe|43.45122|GEV|temp_Mean|hello|joe|23.45124|GSI|Weather_Mean|hello|Mike|47.45
This program wrote all the lines into like one line without any line breaks.. This hurts me a lot and I gotta figure-out how to reverse this but anyway, where is my program wrong here? I thought write lines should write lines down the file rather than just write everything to one line..
fr = open(sys.argv[1], 'r') # source file
fw = open(sys.argv[2]+"/masked_"+sys.argv[1], 'w') # Target Directory Location
for line in fr:
line = line.strip()
if line == "":
continue
columns = line.strip().split('|')
if columns[0].find("#") > 1:
looking_for = columns[0] # this is what we need to search
else:
looking_for = "Dummy#dummy.com"
if looking_for in d:
# by default, iterating over a dictionary will return keys
new_line = d[looking_for]+'|'+'|'.join(columns[1:])
line_list.append(new_line)
else:
new_idx = str(len(d)+1)
d[looking_for] = new_idx
kv = open(sys.argv[3], 'a')
kv.write(looking_for+" "+new_idx+'\n')
kv.close()
new_line = d[looking_for]+'|'+'|'.join(columns[1:])
line_list.append(new_line)
fw.writelines(line_list)
This is actually a pretty common problem for newcomers to Python—especially since, across the standard library and popular third-party libraries, some reading functions strip out newlines, but almost no writing functions (except the log-related stuff) add them.
So, there's a lot of Python code out there that does things like:
fw.write('\n'.join(line_list) + '\n')
(writing a single string) or
fw.writelines(line + '\n' for line in line_list)
Either one is correct, and of course you could even write your own writelinesWithNewlines function that wraps it up…
But you should only do this if you can't avoid it.
It's better if you can create/keep the newlines in the first place—as in Greg Hewgill's suggestions:
line_list.append(new_line + "\n")
And it's even better if you can work at a higher level than raw lines of text, e.g., by using the csv module in the standard library, as esuaro suggests.
For example, right after defining fw, you might do this:
cw = csv.writer(fw, delimiter='|')
Then, instead of this:
new_line = d[looking_for]+'|'+'|'.join(columns[1:])
line_list.append(new_line)
You do this:
row_list.append(d[looking_for] + columns[1:])
And at the end, instead of this:
fw.writelines(line_list)
You do this:
cw.writerows(row_list)
Finally, your design is "open a file, then build up a list of lines to add to the file, then write them all at once". If you're going to open the file up top, why not just write the lines one by one? Whether you're using simple writes or a csv.writer, it'll make your life simpler, and your code easier to read. (Sometimes there can be simplicity, efficiency, or correctness reasons to write a file all at once—but once you've moved the open all the way to the opposite end of the program from the write, you've pretty much lost any benefits of all-at-once.)
The documentation for writelines() states:
writelines() does not add line separators
So you'll need to add them yourself. For example:
line_list.append(new_line + "\n")
whenever you append a new item to line_list.
As others have noted, writelines is a misnomer (it ridiculously does not add newlines to the end of each line).
To do that, explicitly add it to each line:
with open(dst_filename, 'w') as f:
f.writelines(s + '\n' for s in lines)
writelines() does not add line separators. You can alter the list of strings by using map() to add a new \n (line break) at the end of each string.
items = ['abc', '123', '!##']
items = map(lambda x: x + '\n', items)
w.writelines(items)
As others have mentioned, and counter to what the method name would imply, writelines does not add line separators. This is a textbook case for a generator. Here is a contrived example:
def item_generator(things):
for item in things:
yield item
yield '\n'
def write_things_to_file(things):
with open('path_to_file.txt', 'wb') as f:
f.writelines(item_generator(things))
Benefits: adds newlines explicitly without modifying the input or output values or doing any messy string concatenation. And, critically, does not create any new data structures in memory. IO (writing to a file) is when that kind of thing tends to actually matter. Hope this helps someone!
Credits to Brent Faust.
Python >= 3.6 with format string:
with open(dst_filename, 'w') as f:
f.writelines(f'{s}\n' for s in lines)
lines can be a set.
If you are oldschool (like me) you may add f.write('\n') below the second line.
As we have well established here, writelines does not append the newlines for you. But, what everyone seems to be missing, is that it doesn't have to when used as a direct "counterpart" for readlines() and the initial read persevered the newlines!
When you open a file for reading in binary mode (via 'rb'), then use readlines() to fetch the file contents into memory, split by line, the newlines remain attached to the end of your lines! So, if you then subsequently write them back, you don't likely want writelines to append anything!
So if, you do something like:
with open('test.txt','rb') as f: lines=f.readlines()
with open('test.txt','wb') as f: f.writelines(lines)
You should end up with the same file content you started with.
As we want to only separate lines, and the writelines function in python does not support adding separator between lines, I have written the simple code below which best suits this problem:
sep = "\n" # defining the separator
new_lines = sep.join(lines) # lines as an iterator containing line strings
and finally:
with open("file_name", 'w') as file:
file.writelines(new_lines)
and you are done.