python - extract a part of each line from text file

python - extract a part of each line from text file - python

I have a test.txt file that contains:
yellow.blue.purple.green
red.blue.red.purple
And i'd like to have on output.txt just the second and the third part of each line, like this:
blue.purple
blue.red
Here is python code:
with open ('test.txt', 'r') as file1, open('output.txt', 'w') as file2:
for line in file1:
file2.write(line.partition('.')[2]+ '\n')
but the result is:
blue.purple.green
blue.red.purple
How is possible take only the second and third part of each line?
Thanks

You may want
with open('test.txt', 'r') as file1, open('output.txt', 'w') as file2:
for line in file1:
file2.write(".".join(line.split('.')[1:3])+ '\n')
When you apply split('.') to the line e.g. yellow.blue.purple.green, you get a list of values
["yellow", "blue", "purple", "green"]
By slicing [1:3], you get the second and third items.

First I created a .txt file that had the same data that you entered in your original .txt file. I used the 'w' mode to write the data as you already know. I create an empty list as well that we will use to store the data, and later write to the output.txt file.
output_write = []
with open('test.txt', 'w') as file_object:
file_object.write('yellow.' + 'blue.' + 'purple.' + 'green')
file_object.write('\nred.' + 'blue.' + 'red.' + 'purple')
Next I opened the text file that I created and used 'r' mode to read the data on the file, as you already know. In order to get the output that you wanted, I read each line of the file in a for loop, and for each line I split the line to create a list of the items in the line. I split them based on the period in between each item ('.'). Next you want to get the second and third items in the lines, so I create a variable that stores index[1] and index[2] called new_read. After we will have the two pieces of data that you want and you'll likely want to write to your output file. I store this data in a variable called output_data. Lastly I append the output data to the empty list that we created earlier.
with open ('test.txt', 'r') as file_object:
for line in file_object:
read = line.split('.')
new_read = read[1:3]
output_data = (new_read[0] + '.' + new_read[1])
output_write.append(output_data)
Lastly we can write this data to a file called 'output.txt' as you noted earlier.
with open('output.txt', 'w') as file_object:
file_object.write(output_write[0])
file_object.write('\n' + output_write[1])
print(output_write[0])
print(output_write[1])
Lastly I print the data just to check the output:
blue.purple
blue.red

Related

Modify and Replace just 1 line from a file using Pyhton

I have a script that pulls data and writes it into a TXT file, then in the same code I have a For Loop that changes the format by replacing quotes to double quotes and concatenates the result with a text in another new file.
with open ('myfile.txt', 'w') as f:
print(response['animals']['mammals'], file=f)
fout = open("mynewfile.txt", "wt")
f = open('myfile.txt', 'r')
for line in f:
x = str(line).replace("'", '"')
fout.write(f"mammals = {x}")
f.close()
fout.close()
The result is basically that all that is in myfile.txt with quotes i.e ['dog', 'cat'] it is edited and written in mynewfile.txt as mammals = ["dog", "cat"], that is cool. But I also want to manually add some other text to mynewfile.txt and every time I need to update that data and run the script, the data that I enter manually is deleted because of the For Loop.
Is there a way to overwrite just that line without touching the rest of the lines in the file?

Get numbers of strings in list and start indexing from [last_word +1

How to save each line of a file to a new file (every line a new file) and do that for multiple original files

I have 5 files from which i want to take each line (24 lines in total) and save it to a new file. I managed to find a code which will do that but they way it is, every time i have to manually change the number of the appropriate original file and of the file i want to save it to and also the number of each line every time.
The code:
x1= np.loadtxt("x_p2_40.txt")
x2= np.loadtxt("x_p4_40.txt")
x3= np.loadtxt("x_p6_40.txt")
x4= np.loadtxt("x_p8_40.txt")
x5= np.loadtxt("x_p1_40.txt")
with open("x_p1_40.txt", "r") as file:
content = file.read()
first_line = content.split('\n', 1)[0]
with open("1_p_40_x.txt", "a" ) as f :
f.write("\n")
with open("1_p_40_x.txt", "a" ) as fa :
fa.write(first_line)
print(first_line)
I am a beginner at python, and i'm not sure how to make a loop for this, because i assume i need a loop?
Thank you!

Since you have multiple files here, you could define their names in a list, and use a list comprehension to open file handles to them all:
input_files = ["x_p2_40.txt", "x_p4_40.txt", "x_p6_40.txt", "x_p8_40.txt", "x_p1_40.txt"]
file_handles = [open(f, "r") for f in input_files]
Since each of these file handles is an iterator that yields a single line every time you iterate over it, you could simply zip() all these file handles to iterate over them simultaneously. Also throw in an enumerate() to get the line numbers:
for line_num, files_lines in enumerate(zip(*file_handles), 1):
out_file = f"{line_num}_p_40.txt"
# Remove trailing whitespace on all lines, then add a newline
files_lines = [f.rstrip() + "\n" for f in files_lines]
with open(out_file, "w") as of:
of.writelines(files_lines)
With three files:
x_p2_40.txt:
2_1
2_2
2_3
2_4
x_p4_40.txt:
4_1
4_2
4_3
4_4
x_p6_40.txt:
6_1
6_2
6_3
6_4
I get the following output:
1_p_40.txt:
2_1
4_1
6_1
2_p_40.txt:
2_2
4_2
6_2
3_p_40.txt:
2_3
4_3
6_3
4_p_40.txt:
2_4
4_4
6_4
Finally, since we didn't use a context manager to open the original file handles, remember to close them after we're done:
for fh in file_handles:
fh.close()
If you have files with an unequal number of lines and you want to create files for all lines, consider using itertools.zip_longest() instead of zip()

In order to read each of your input files, you can store them in a list and iterate over it with a for loop. Then we add every line to a single list with the function extend() :
inputFiles = ["x_p2_40.txt", "x_p4_40.txt", "x_p6_40.txt", "x_p8_40.txt", "x_p1_40.txt"]
outputFile = "outputfile.txt"
lines = []
for filename in inputFiles:
with open(filename, 'r') as f:
lines.extend(f.readlines())
lines[-1] += '\n'
Finally you can write all the line to your output file :
with open(outputFile, 'w') as f:
f.write(''.join(lines))

nested file read doesn't loop through all of the primary loop

I have two files.
One file has two columns-let's call it db, and the other one has one column-let's call it in.
Second column in db is the same type as the column in in and both files are sorted by this column.
db for example:
RPL24P3 NG_002525
RPLP1P1 NG_002526
RPL26P4 NG_002527
VN2R11P NG_006060
VN2R12P NG_006061
VN2R13P NG_006062
VN2R14P NG_006063
in for example:
NG_002527
NG_006062
I want to read through these files and get the output as follows:
NG_002527: RPL26P4
NG_006062: VN2R13P
Meaning that I'm iterating on in lines and trying to find the matching line in db.
The code I have written for that is:
with open(db_file, 'r') as db, open(sortIn, 'r') as inF, open(out_file, 'w') as outF:
for line in inF:
for dbline in db:
if len(dbline) > 1:
dbline = dbline.split('\t')
if line.rstrip('\n') == dbline[db_specifications[0]]:
outF.write(dbline[db_specifications[0]] + ': ' + dbline[db_specifications[1]] + '\n')
break
*db_specification isn't relevant for this problem, hence I didn't copy the relevant code for it - the problem doesn't lie there.
The current code will find a match and write it as I planned just for the first line in in but won't find any matches for the other lines. I have a suspicion it has to do with break but I can't figure out what to change.

Since the data in the db_file is sorted by second column, you can use this code to read the file.
with open("xyz.txt", "r") as db_file, open("abc.txt", "r") as sortIn, open("out.txt", 'w') as outF:
#first read the sortIn file as a list
i_list = [line.strip() for line in sortIn.readlines()]
#for each record read from the file, split the values into key and value
for line in db_file:
t_key,t_val = line.strip().split(' ')
#if value is in i_list file, then write to output file
if t_val in i_list: outF.write(t_val + ': ' + t_key + '\n')
#if value has reached the max value in sort list
#then you don't need to read the db_file anymore
if t_val == i_list[-1]: break
The output file will have the following items:
NG_002527: RPL26P4
NG_006062: VN2R13P
In the above code, we have to read the sortIn list first. Then read each line in the db_file. i_list[-1] will have the max value of sortIn file as the sortIn file is also sorted in ascending order.
The above code will have fewer i/o compared to the below one.
===========
previous answer submission:
Based on how the data has been stored in the db_file, it looks like we have to read the entire file to check against the sortIn file. If the values in the db_file was sorted by the second column, we could have stopped reading the file once the last item in sortIn was found.
With the assumption that we need to read all records from the files, see if the below code works for you.
with open("xyz.txt", "r") as db_file, open("abc.txt", "r") as sortIn, open("out.txt", 'w') as outF:
#read the db_file and convert it into a dictionary
d_list = dict([line.strip().split(' ') for line in db_file.readlines()])
#read the sortIn file as a list
i_list = [line.strip() for line in sortIn.readlines()]
#check if the value of each value in d_list is one of the items in i_list
out_list = [v + ': '+ k for k,v in d_list.items() if v in i_list]
#out_list is your final list that needs to be written into a file
#now read out_list and write each item into the file
for i in out_list:
outF.write(i + '\n')
The output file will have the following items:
NG_002527: RPL26P4
NG_006062: VN2R13P
To help you, i have also printed the contents in d_list, i_list, and out_list.
The contents in d_list will look like this:
{'RPL24P3': 'NG_002525', 'RPLP1P1': 'NG_002526', 'RPL26P4': 'NG_002527', 'VN2R11P': 'NG_006060', 'VN2R12P': 'NG_006061', 'VN2R13P': 'NG_006062', 'VN2R14P': 'NG_006063'}
The contents in i_list will look like this:
['NG_002527', 'NG_006062']
The contents that get written into the outF file from out_list will look like this:
['NG_002527: RPL26P4', 'NG_006062: VN2R13P']

I was able to solve the problem by inserting the following line:
line = next(inF)
before the break statement.

"Move" some parts of the file to another file

Let say I have a file with 48,222 lines. I then give an index value, let say, 21,000.
Is there any way in Python to "move" the contents of the file starting from index 21,000 such that now I have two files: the original one and the new one. But the original one now is having 21,000 lines and the new one 27,222 lines.
I read this post which uses partition and is quite describing what I want:
with open("inputfile") as f:
contents1, sentinel, contents2 = f.read().partition("Sentinel text\n")
with open("outputfile1", "w") as f:
f.write(contents1)
with open("outputfile2", "w") as f:
f.write(contents2)
Except that (1) it uses "Sentinel Text" as separator, (2) it creates two new files and require me to delete the old file. As of now, the way I do it is like this:
for r in result.keys(): #the filenames are in my dictionary, don't bother that
f = open(r)
lines = f.readlines()
f.close()
with open("outputfile1.txt", "w") as fn:
for line in lines[0:21000]:
#write each line
with open("outputfile2.txt", "w") as fn:
for line in lines[21000:]:
#write each line
Which is quite a manual work. Is there a built-in or more efficient way?

You can also use writelines() and dump the sliced list of lines from 0 to 20999 into one file and another sliced list from 21000 to the end into another file.
with open("inputfile") as f:
content = f.readlines()
content1 = content[:21000]
content2 = content[21000:]
with open("outputfile1.txt", "w") as fn1:
fn1.writelines(content1)
with open('outputfile2.txt','w') as fn2:
fn2.writelines(content2)

Insert text in between file lines in python

I have a file that I am currently reading from using
fo = open("file.txt", "r")
Then by doing
file = open("newfile.txt", "w")
file.write(fo.read())
file.write("Hello at the end of the file")
fo.close()
file.close()
I basically copy the file to a new one, but also add some text at the end of the newly created file. How would I be able to insert that line say, in between two lines separated by an empty line? I.e:
line 1 is right here
<---- I want to insert here
line 3 is right here
Can I tokenize different sentences by a delimiter like \n for new line?

First you should load the file using the open() method and then apply the .readlines() method, which splits on "\n" and returns a list, then you update the list of strings by inserting a new string in between the list, then simply write the contents of the list to the new file using the new_file.write("\n".join(updated_list))
NOTE: This method will only work for files which can be loaded in the memory.
with open("filename.txt", "r") as prev_file, open("new_filename.txt", "w") as new_file:
prev_contents = prev_file.readlines()
#Now prev_contents is a list of strings and you may add the new line to this list at any position
prev_contents.insert(4, "\n This is a new line \n ")
new_file.write("\n".join(prev_contents))

readlines() is not recommended because it reads the whole file into memory. It is also not needed because you can iterate over the file directly.
The following code will insert Hello at line 2 at line 2
with open('file.txt', 'r') as f_in:
with open('file2.txt','w') as f_out:
for line_no, line in enumerate(f_in, 1):
if line_no == 2:
f_out.write('Hello at line 2\n')
f_out.write(line)
Note the use of the with open('filename','w') as filevar idiom. This removes the need for an explicit close() because it closes the file automatically at the end of the block, and better, it does this even if there is an exception.

For Large file
with open ("s.txt","r") as inp,open ("s1.txt","w") as ou:
for a,d in enumerate(inp.readlines()):
if a==2:
ou.write("hi there\n")
ou.write(d)

U could use a marker
#FILE1
line 1 is right here
<INSERT_HERE>
line 3 is right here
#FILE2
some text
with open("FILE1") as file:
original = file.read()
with open("FILE2") as input:
myinsert = input.read()
newfile = orginal.replace("<INSERT_HERE>", myinsert)
with open("FILE1", "w") as replaced:
replaced.write(newfile)
#FILE1
line 1 is right here
some text
line 3 is right here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python - extract a part of each line from text file - python

Related

Modify and Replace just 1 line from a file using Pyhton

How to save each line of a file to a new file (every line a new file) and do that for multiple original files

nested file read doesn't loop through all of the primary loop

"Move" some parts of the file to another file

Insert text in between file lines in python

Categories

Resources