I have a file which contains following row:
//hva_SaastonJakaumanMuutos/printData/reallocationAssignment/changeUser/firstName>
I want to add "John" at the end of line.
I have written following code but for some reason it is not working,
def add_text_to_file(self, file, rowTitle, inputText):
f = open("check_files/"+file+".txt", "r")
fileList = list(f)
f.close()
j = 0
for row in fileList :
if fileList[j].find(rowTitle) > 0 :
fileList[j]=fileList[j].replace("\n","")+inputText+"\n"
break
j = j+1
f = open("check_files/"+file+".txt", "w")
f.writelines(fileList)
f.close()
Do you see where am I doing wrong?
str.find may return 0 if the text you are searching is found at the beginning. After all, it returns the index the match begins.
So your condition should be:
if fileList[j].find(rowTitle) >= 0 :
Edit:
The correction above would save the day but it's better if you things the right way, the pythonic way.
If you are looking for a substring in a text, you can use the foo in bar comparison. It will be True if foo can be found in bar and False otherwise.
You rarely need a counter in Python. enumerate built-in is your friend here.
You can combine the iteration and writing and eliminate an unnecessary step.
strip or rstrip is better than replace in your case.
For Python 2.6+, it is better to use with statement when dealing with files. It will deal with the closing of the file right way. For Python 2.5, you need from __future__ import with_statement
Refer to PEP8 for commonly preferred naming conventions.
Here is a cleaned up version:
def add_text_to_file(self, file, row_title, input_text):
with open("check_files/" + file + ".txt", "r") as infile:
file_list = infile.readlines()
with open("check_files/" + file + ".txt", "w") as outfile:
for row in file_list:
if row_title in row:
row = row.rstrip() + input_text + "\n"
outfile.write(row)
You are not giving much informations, so even thoug I wouldn't use the following code (because I'm sure there are better ways) it might help to clear your problem.
import os.path
def add_text_to_file(self, filename, row_title, input_text):
# filename should have the .txt extension in it
filepath = os.path.join("check_files", filename)
with open(filepath, "r") as f:
content = f.readlines()
for j in len(content):
if row_title in content[j]:
content[j] = content[j].strip() + input_text + "\n"
break
with open(filepath, "w") as f:
f.writelines(content)
Related
i have a file with data as such.
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>3_DL_2021.1214
>4_DL_2021.1214
>4_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
now as you can see the data is not numbered properly and hence needs to be numbered.
what im aiming for is this:
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>4_DL_2021.1214
>5_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
>9_DL_2021.1214
now the file has a lot of other stuff between these lines starting with > sign. i want only the > sign stuff affected.
could someone please help me out with this.
also there are 563 such lines so manually doing it is out of question.
So, assuming input data file is "input.txt"
You can achieve what you want with this
import re
with open("input.txt", "r") as f:
a = f.readlines()
regex = re.compile(r"^>\d+_DL_2021\.\d+\n$")
counter = 1
for i, line in enumerate(a):
if regex.match(line):
tokens = line.split("_")
tokens[0] = f">{counter}"
a[i] = "_".join(tokens)
counter += 1
with open("input.txt", "w") as f:
f.writelines(a)
So what it does it searches for line with the regex ^>\d+_DL_2021\.\d+\n$, then splits it by _ and gets the first (0th) element and rewrites it, then counts up by 1 and continues the same thing, after all it just writes updated strings back to "input.txt"
sudden_appearance already provided a good answer.
In case you don't like regex too much you can use this code instead:
new_lines = []
with open('test_file.txt', 'r') as f:
c = 1
for line in f:
if line[0] == '>':
after_dash = line.split('_',1)[1]
new_line = '>' + str(c) + '_' + after_dash
c += 1
new_lines.append(new_line)
else:
new_lines.append(line)
with open('test_file.txt', 'w') as f:
f.writelines(new_lines)
Also you can have a look at this split tutorial for more information about how to use split.
I want to achieve this specific task, I have 2 files, the first one with emails and credentials:
xavier.desprez#william.com:Xavier
xavier.locqueneux#william.com:vocojydu
xaviere.chevry#pepe.com:voluzigy
Xavier.Therin#william.com:Pussycat5
xiomara.rivera#william.com:xrhj1971
xiomara.rivera#william-honduras.william.com:xrhj1971
and the second one, with emails and location:
xavier.desprez#william.com:BOSNIA
xaviere.chevry#pepe.com:ROMANIA
I want that, whenever the email from the first file is found on the second file, the row is substituted by EMAIL:CREDENTIAL:LOCATION , and when it is not found, it ends up being: EMAIL:CREDENTIAL:BLANK
so the final file must be like this:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.locqueneux#william.com:vocojydu:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I have do several tries in python, but it is not even worth it to write it because I am not really close to the solution.
Regards !
EDIT:
This is what I tried:
import os
import sys
with open("test.txt", "r") as a_file:
for line_a in a_file:
stripped_email_a = line_a.strip().split(':')[0]
with open("location.txt", "r") as b_file:
for line_b in b_file:
stripped_email_b = line_b.strip().split(':')[0]
location = line_b.strip().split(':')[1]
if stripped_email_a == stripped_email_b:
a = line_a + ":" + location
print(a.replace("\n",""))
else:
b = line_a + ":BLANK"
print (b.replace("\n",""))
This is the result I get:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.desprez#william.com:Xavier:BLANK
xaviere.chevry#pepe.com:voluzigy:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
xavier.locqueneux#william.com:vocojydu:BLANK
xavier.locqueneux#william.com:vocojydu:BLANK
Xavier.Therin#william.com:Pussycat5:BLANK
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I am very close but I get duplicates ;)
Regards
The duplication issue comes from the fact that you are reading two files in a nested way, once a line from the test.txt is read, you open the location.txt file for reading and process it. Then, you read the second line from test.txt, and re-open the location.txt and process it again.
Instead, get all the necessary data from the location.txt, say, into a dictionary, and then use it while reading the test.txt:
email_loc_dict = {}
with open("location.txt", "r") as b_file:
for line_b in b_file:
splits = line_b.strip().split(':')
email_loc_dict[splits[0]] = splits[1]
with open("test.txt", "r") as a_file:
for line_a in a_file:
line_a = line_a.strip()
stripped_email_a = line_a.split(':')[0]
if stripped_email_a in email_loc_dict:
a = line_a + ":" + email_loc_dict[stripped_email_a]
print(a)
else:
b = line_a + ":BLANK"
print(b)
Output:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.locqueneux#william.com:vocojydu:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I am very new to Python so please excuse ignorant questions or overly complicated code. :)
I am very thankful for any help.
The code I have so far is to open read a/several text files, search the lines according to keywords
and then write a new textfiles while leaving out the lines with found keywords. This is to clean the files (newspaper articles) of information I do not want to have before analysing the remaining text. The problem is that I am only able to search for single words. However, sometimes I would like to search for a specific combination of words, i.e. not just "Rechte", but "Alle Rechte vorbehalten".
If I save this into my delword-list, it doesn't work (I think because part in line.split only checks single words.)
Any help is very much appreciated!
import os
delword = ['Quelle:', 'Ressort:', 'Ausgabe:', 'Dokumentnummer:', 'Rechte', 'Alle Rechte vorbehalten']
path = r'C:\files'
pathnew = r'C:\files\new'
dir = []
for f in os.listdir(path):
if f.endswith(".txt"):
#print(os.path.join(path, f))
print(f)
if f not in dir:
dir.append(f)
for f in dir:
fpath = os.path.join(path, f)
print (fpath)
fopen = open(fpath, encoding="utf-8", errors='ignore')
printline = True
#print(fopen.read())
fnew = 'clean' + f
fpathnew = os.path.join(pathnew, fnew)
with open(fpath, encoding="utf-8", errors='ignore') as input:
with open(fpathnew, "w", errors='ignore') as output:
for line in input:
printline = True
for part in line.split():
for i in range(len(delword)):
if delword [i] in part:
#line = " ".join((line).split())
printline = False
#print('Found: ', line)
if printline == False:
output.write('\n')
if printline == True:
output.write(line)
input.close()
output.close()
fopen.close()
For this particular case - you don't need to split the line. You can run similar checks with
for line in input:
for word in delword:
if word in line: ...
Just as side note: usually more generic or complex problems will be using regular expressions, as tool created for such processing
I would like to write a code that will read and open a text file and tell me how many "." (full stops) it contains
I have something like this but i don't know what to do now?!
f = open( "mustang.txt", "r" )
a = []
for line in f:
with open('mustang.txt') as f:
s = sum(line.count(".") for line in f)
Assuming there is absolutely no danger of your file being so large it will cause your computer to run out of memory (for instance, in a production environment where users can select arbitrary files, you may not wish to use this method):
f = open("mustang.txt", "r")
count = f.read().count('.')
f.close()
print count
More properly:
with open("mustang.txt", "r") as f:
count = f.read().count('.')
print count
I'd do it like so:
with open('mustang.txt', 'r') as handle:
count = handle.read().count('.')
If your file isn't too big, just load it into memory as a string and count the dots.
with open('mustang.txt') as f:
fullstops = 0
for line in f:
fullstops += line.count('.')
This will work:
with open('mustangused.txt') as inf:
count = 0
for line in inf:
count += line.count('.')
print 'found %d periods in file.' % count
even with Regular Expression
import re
with open('filename.txt','r') as f:
c = re.findall('\.+',f.read())
if c:print len(c)
This question already has answers here:
In python, how to check the end of standard input streams (sys.stdin) and do something special on that
(2 answers)
Closed 6 months ago.
How do I check for EOF in Python? I found a bug in my code where the last block of text after the separator isn't added to the return list. Or maybe there's a better way of expressing this function?
Here's my code:
def get_text_blocks(filename):
text_blocks = []
text_block = StringIO.StringIO()
with open(filename, 'r') as f:
for line in f:
text_block.write(line)
print line
if line.startswith('-- -'):
text_blocks.append(text_block.getvalue())
text_block.close()
text_block = StringIO.StringIO()
return text_blocks
You might find it easier to solve this using itertools.groupby.
def get_text_blocks(filename):
import itertools
with open(filename,'r') as f:
groups = itertools.groupby(f, lambda line:line.startswith('-- -'))
return [''.join(lines) for is_separator, lines in groups if not is_separator]
Another alternative is to use a regular expression to match the separators:
def get_text_blocks(filename):
import re
seperator = re.compile('^-- -.*', re.M)
with open(filename,'r') as f:
return re.split(seperator, f.read())
The end-of-file condition holds as soon as the for statement terminates -- that seems the simplest way to minorly fix this code (you can extract text_block.getvalue() at the end if you want to check it's not empty before appending it).
This is the standard problem with emitting buffers.
You don't detect EOF -- that's needless. You write the last buffer.
def get_text_blocks(filename):
text_blocks = []
text_block = StringIO.StringIO()
with open(filename, 'r') as f:
for line in f:
text_block.write(line)
print line
if line.startswith('-- -'):
text_blocks.append(text_block.getvalue())
text_block.close()
text_block = StringIO.StringIO()
### At this moment, you are at EOF
if len(text_block) > 0:
text_blocks.append( text_block.getvalue() )
### Now your final block (if any) is appended.
return text_blocks
Why do you need StringIO here?
def get_text_blocks(filename):
text_blocks = [""]
with open(filename, 'r') as f:
for line in f:
if line.startswith('-- -'):
text_blocks.append(line)
else: text_blocks[-1] += line
return text_blocks
EDIT: Fixed the function, other suggestions might be better, just wanted to write a function similar to the original one.
EDIT: Assumed the file starts with "-- -", by adding empty string to the list you can "fix" the IndexError or you could use this one:
def get_text_blocks(filename):
text_blocks = []
with open(filename, 'r') as f:
for line in f:
if line.startswith('-- -'):
text_blocks.append(line)
else:
if len(text_blocks) != 0:
text_blocks[-1] += line
return text_blocks
But both versions look a bit ugly to me, the reg-ex version is much more cleaner.
This is a fast way to see if you have an empty file:
if f.read(1) == '':
print "EOF"
f.close()