Reading a file, making some changes and writing the results back - python

I have an input file (File A) as shown below:
Start of the program
This is my first program ABCDE
End of the program
I receive the program name 'PYTHON' as input, and I need to replace 'ABCDE' with it. So I read the file to find the word 'program' and then replace the string after it as shown below. I have done that in my program. Then, I would like to write the updated string to the original file without changing lines 1 or 3 - just line 2.
Start of the program
This is my first program PYTHON
End of the program
My code:
fileName1 = open(filePath1, "r")
search = "program"
for line in fileName1:
if search in line:
line = line.split(" ")
update = line[5].replace(line[5], input)
temp = " ".join(line[:5]) + " " + update
fileName1 = open(filePath1, "r+")
fileName1.write(temp)
fileName1.close()
else:
fileName1 = open(filePath1, "w+")
fileName1.write(line)
fileName1.close()
I am sure this can be done in an elegant way, but I got a little confused with reading and writing as I experimented with the above code. The output is not as expected. What is wrong with my code?

You can do this with a simple replace:
file_a.txt
Start of the program`
This is my first program ABCDE`
End of the program`
code:
with open('file_a.txt', 'r') as file_handle:
file_content = file_handle.read()
orig_str = 'ABCDE'
rep_str = 'PYTHON'
result = file_content.replace(orig_str, rep_str)
# print(result)
with open('file_a.txt', 'w') as file_handle:
file_handle.write(result)
Also if just replacing ABCDE is not going to work (it may appear in other parts of file as well), then you can use more specific patterns or even a regular expression to replace it more accurately.
For example, here we just replace ABCDE if it comes after program:
with open('file_a.txt', 'r') as file_handle:
file_content = file_handle.read()
orig_str = 'ABCDE'
rep_str = 'PYTHON'
result = file_content.replace('program {}'.format(orig_str),
'program {}'.format(rep_str))
# print(result)
with open('file_a.txt', 'w') as file_handle:
file_handle.write(result)

Related

how to write a line to read a certain sentence in a text file?

I am currently learning how to code python and i need help with something.
is there a way where I can only allow the script to read a line that starts with Text = .. ?
Because I want the program to read the text file and the text file has a lot of other sentences but I only want the program to focus on the sentences that starts with Text = .. and print it out, ignoring the other lines in the text file.
for example,
in text file:
min = 32.421
text = " Hello I am Robin and I am hungry"
max = 233341.42
how I want my output to be:
Hello I am Robin and I am hungry
I want the output to just solely be the sentence so without the " " and text =
This is my code so far after reading through comments!
import os
import sys
import glob
from english_words import english_words_set
try:
print('Finding file...')
file = glob.glob(sys.argv[1])
print("Found " + str(len(file)) + " file!")
print('LOADING NOW...')
with open(file) as f:
lines = f.read()
for line in lines:
if line.startswith('Text = '):
res = line.split('"')[1]
print(res)
You can read the text file and read its lines like so :
# open file
with open('text_file.txt') as f:
# store the list of lines contained in file
lines = f.readlines()
for line in lines:
# find match
if line.startswith('text ='):
# store the string inside double quotes
res = line.split('"')[1]
print(res)
This should print your expected output.
You can open the file and try to find if the word "text" begins a sentence in the file and then checking the value by doing
file = open("file.txt", "r") # specify the variable as reading the file, change file.txt to the files path
for line in file: # for each line in file
if line.startswith("text"): # checks for text following a new line
text = line.strip() # removes any whitespace from the line
text = text.replace("text = \"", "") # removes the part before the string
text = text.replace("\"", "") # removes the part after the string
print(text)
Or you could convert it from text to something like yml or toml (in python 3.11+) as those are natively supported in python and are much simpler than text files while still keeping your file system about the same. It would store it as a dictionary instead of a string in the variable.
List comprehensions in python:
https://www.youtube.com/watch?v=3dt4OGnU5sM
Using list comprehension with files:
https://www.youtube.com/watch?v=QHFWb_6fHOw
First learn list comprehensions, then the idea is this:
listOutput = ['''min = 32.421
text = "Hello I am Robin and I am hungry"
max = 233341.42''']
myText = ''.join(listOutput)
indexFirst= myText.find("text") + 8 # add 8 to this index to discard {text = "}
indexLast = myText.find('''"''', indexFirst) # locate next quote since indexFirst position
print(myText[indexFirst:indexLast])
Output:
Hello I am Robin and I am hungry
with open(file) as f:
lines = f.read().split("\n")
prefix = "text = "
for line in lines:
if line.startswith(prefix):
# replaces the first occurence of prefix and assigns it to result
result = line.replace(prefix, '', 1)
print(result)
Alternatively, you could use result = line.removeprefix(prefix) but removeprefix is only available in python3.9 upwards

Find coincidence and add column

I want to achieve this specific task, I have 2 files, the first one with emails and credentials:
xavier.desprez#william.com:Xavier
xavier.locqueneux#william.com:vocojydu
xaviere.chevry#pepe.com:voluzigy
Xavier.Therin#william.com:Pussycat5
xiomara.rivera#william.com:xrhj1971
xiomara.rivera#william-honduras.william.com:xrhj1971
and the second one, with emails and location:
xavier.desprez#william.com:BOSNIA
xaviere.chevry#pepe.com:ROMANIA
I want that, whenever the email from the first file is found on the second file, the row is substituted by EMAIL:CREDENTIAL:LOCATION , and when it is not found, it ends up being: EMAIL:CREDENTIAL:BLANK
so the final file must be like this:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.locqueneux#william.com:vocojydu:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I have do several tries in python, but it is not even worth it to write it because I am not really close to the solution.
Regards !
EDIT:
This is what I tried:
import os
import sys
with open("test.txt", "r") as a_file:
for line_a in a_file:
stripped_email_a = line_a.strip().split(':')[0]
with open("location.txt", "r") as b_file:
for line_b in b_file:
stripped_email_b = line_b.strip().split(':')[0]
location = line_b.strip().split(':')[1]
if stripped_email_a == stripped_email_b:
a = line_a + ":" + location
print(a.replace("\n",""))
else:
b = line_a + ":BLANK"
print (b.replace("\n",""))
This is the result I get:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.desprez#william.com:Xavier:BLANK
xaviere.chevry#pepe.com:voluzigy:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
xavier.locqueneux#william.com:vocojydu:BLANK
xavier.locqueneux#william.com:vocojydu:BLANK
Xavier.Therin#william.com:Pussycat5:BLANK
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK
I am very close but I get duplicates ;)
Regards
The duplication issue comes from the fact that you are reading two files in a nested way, once a line from the test.txt is read, you open the location.txt file for reading and process it. Then, you read the second line from test.txt, and re-open the location.txt and process it again.
Instead, get all the necessary data from the location.txt, say, into a dictionary, and then use it while reading the test.txt:
email_loc_dict = {}
with open("location.txt", "r") as b_file:
for line_b in b_file:
splits = line_b.strip().split(':')
email_loc_dict[splits[0]] = splits[1]
with open("test.txt", "r") as a_file:
for line_a in a_file:
line_a = line_a.strip()
stripped_email_a = line_a.split(':')[0]
if stripped_email_a in email_loc_dict:
a = line_a + ":" + email_loc_dict[stripped_email_a]
print(a)
else:
b = line_a + ":BLANK"
print(b)
Output:
xavier.desprez#william.com:Xavier:BOSNIA
xavier.locqueneux#william.com:vocojydu:BLANK
xaviere.chevry#pepe.com:voluzigy:ROMANIA
Xavier.Therin#william.com:Pussycat5:BLANK
xiomara.rivera#william.com:xrhj1971:BLANK
xiomara.rivera#william-honduras.william.com:xrhj1971:BLANK

How can I find a line according with two subsequent words in a text file

I am very new to Python so please excuse ignorant questions or overly complicated code. :)
I am very thankful for any help.
The code I have so far is to open read a/several text files, search the lines according to keywords
and then write a new textfiles while leaving out the lines with found keywords. This is to clean the files (newspaper articles) of information I do not want to have before analysing the remaining text. The problem is that I am only able to search for single words. However, sometimes I would like to search for a specific combination of words, i.e. not just "Rechte", but "Alle Rechte vorbehalten".
If I save this into my delword-list, it doesn't work (I think because part in line.split only checks single words.)
Any help is very much appreciated!
import os
delword = ['Quelle:', 'Ressort:', 'Ausgabe:', 'Dokumentnummer:', 'Rechte', 'Alle Rechte vorbehalten']
path = r'C:\files'
pathnew = r'C:\files\new'
dir = []
for f in os.listdir(path):
if f.endswith(".txt"):
#print(os.path.join(path, f))
print(f)
if f not in dir:
dir.append(f)
for f in dir:
fpath = os.path.join(path, f)
print (fpath)
fopen = open(fpath, encoding="utf-8", errors='ignore')
printline = True
#print(fopen.read())
fnew = 'clean' + f
fpathnew = os.path.join(pathnew, fnew)
with open(fpath, encoding="utf-8", errors='ignore') as input:
with open(fpathnew, "w", errors='ignore') as output:
for line in input:
printline = True
for part in line.split():
for i in range(len(delword)):
if delword [i] in part:
#line = " ".join((line).split())
printline = False
#print('Found: ', line)
if printline == False:
output.write('\n')
if printline == True:
output.write(line)
input.close()
output.close()
fopen.close()
For this particular case - you don't need to split the line. You can run similar checks with
for line in input:
for word in delword:
if word in line: ...
Just as side note: usually more generic or complex problems will be using regular expressions, as tool created for such processing

How does readlines() works in Python?

I don't understand how readlines() works in Python, in this case Python 2. Let me explain, I have the following function in a file that I use in other files via "import", like a package.
def openFileForReading(filePath):
if not fileExists(filePath):
print 'The file, ' + filePath + 'does not exist - cannot read it.'
return ''
else:
fileHandle = open(filePath, 'r')
return fileHandle
In my new program, I do this:
openFileRead = openFileForReading("Orden.txt")
lineList = openFileRead.readlines()
print lineList
And the output it gives me is:
[]
But if I do this, directly in the file, without using my package function, it works:
fileHandle = open("Orden.txt", 'r')
lineList = fileHandle.readlines()
print lineList
Why if I do this directly it works but if I do by a function of a package, don't?
P.S.: The "Orden.txt" file is not empty, it has two lines:
Orden.txt
Line number 1
Line number 2

Python using re module to parse an imported text file

def regexread():
import re
result = ''
savefileagain = open('sliceeverfile3.txt','w')
#text=open('emeverslicefile4.txt','r')
text='09,11,14,34,44,10,11, 27886637, 0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07, 19070109, 0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06, 13063500, 0\n559, Tue,29,Jan,2013,'
pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
#with open('emeverslicefile4.txt') as text:
f = re.findall(pattern,text)
for item in f:
print(item)
savefileagain.write(item)
#savefileagain.close()
The above function as written parses the text and returns sets of seven numbers. I have three problems.
Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a TypeError expected string or buffer, which I cannot solve even by reading some of the posts.
Secondly, when I try to write results to the 'write' file, nothing is returned and
thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.
This should do the trick:
import re
filename = 'sliceeverfile3.txt'
pattern = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []
# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
# Read the file contents and generate a list with each line
lines = f.readlines()
# Iterate each line
for line in lines:
# Regex applied to each line
match = re.search(pattern, line)
if match:
# Make sure to add \n to display correctly when we write it back
new_line = match.group() + '\n'
print new_line
new_file.append(new_line)
with open(filename, 'w') as f:
# go to start of file
f.seek(0)
# actually write the lines
f.writelines(new_file)
You're sort of on the right track...
You'll iterate over the file:
How to iterate over the file in python
and apply the regex to each line. The link above should really answer all 3 of your questions when you realize you're trying to write 'item', which doesn't exist outside of that loop.

Categories

Resources