Replace keys with values and write in a new file Python - python

I've got a problem trying to replace keys (dictionary) that I have in a file, with its corresponding values. More details: an input file called event_from_picks looks like:
EVENT2593
EVENT2594
EVENT2595
EVENT41025
EVENT2646
EVENT2649
Also, my dictionary, created by reloc_event_coords_dic() looks like:
{'EVENT23595': ['36.9828 -34.0538 138.1554'], 'EVENT2594': ['41.2669 -33.0179 139.2269'], 'EVENT2595': ['4.7500 -32.7926 138.1523'], 'EVENT41025': ['16.2453 -32.9552 138.2604'], 'EVENT2646': ['5.5949 -32.4923 138.1866'], 'EVENT2649': ['7.9533 -31.8304 138.6966']}
What I'd like to end up with, is a new file with the values instead of the keys. In this case, a new file called receiver.in which will look like:
36.9828 -34.0538 138.1554
41.2669 -33.0179 139.2269
4.7500 -32.7926 138.1523
16.2453 -32.9552 138.2604
5.5949 -32.4923 138.1866
7.9533 -31.8304 138.6966
My wrong function (I know I must have a problem with loops but I can't figure out what) so far is:
def converted_lines ():
file_out = open ('receiver.in', 'w')
converted_lines = []
event_dict = reloc_event_coords_dic()
data_line = event_dict.items() # Takes data as('EVENT31933', ['10.1230 -32.8294 138.1718'])
for element in data_line:
for item in element:
event_number = element[0] # Gets event number
coord_line = event_dict.get (event_number, None)
with open ('event_from_picks', 'r') as file_in:
for line in file_in:
if line.startswith(" "):
continue
if event_number:
converted_lines.append ("%s" % coord_line)
file_out.writelines(converted_lines)
Thanks for reading!

just do the following:
with open('receiver.in', 'w') as f:
f.writelines([v[0] for v in reloc_event_coords_dic().itervalues()])

Your first loop just leaves the last pair in the coord_line variable.
Better do
event_dict = reloc_event_coords_dic()
with open ('event_from_picks', 'r') as file_in:
with open('receiver.in', 'w') as file_out:
for in_line in file_in:
file_out.writelines(event_dict[in_line.strip()])
(untested, but you should get the logic).

Related

nested file read doesn't loop through all of the primary loop

I have two files.
One file has two columns-let's call it db, and the other one has one column-let's call it in.
Second column in db is the same type as the column in in and both files are sorted by this column.
db for example:
RPL24P3 NG_002525
RPLP1P1 NG_002526
RPL26P4 NG_002527
VN2R11P NG_006060
VN2R12P NG_006061
VN2R13P NG_006062
VN2R14P NG_006063
in for example:
NG_002527
NG_006062
I want to read through these files and get the output as follows:
NG_002527: RPL26P4
NG_006062: VN2R13P
Meaning that I'm iterating on in lines and trying to find the matching line in db.
The code I have written for that is:
with open(db_file, 'r') as db, open(sortIn, 'r') as inF, open(out_file, 'w') as outF:
for line in inF:
for dbline in db:
if len(dbline) > 1:
dbline = dbline.split('\t')
if line.rstrip('\n') == dbline[db_specifications[0]]:
outF.write(dbline[db_specifications[0]] + ': ' + dbline[db_specifications[1]] + '\n')
break
*db_specification isn't relevant for this problem, hence I didn't copy the relevant code for it - the problem doesn't lie there.
The current code will find a match and write it as I planned just for the first line in in but won't find any matches for the other lines. I have a suspicion it has to do with break but I can't figure out what to change.
Since the data in the db_file is sorted by second column, you can use this code to read the file.
with open("xyz.txt", "r") as db_file, open("abc.txt", "r") as sortIn, open("out.txt", 'w') as outF:
#first read the sortIn file as a list
i_list = [line.strip() for line in sortIn.readlines()]
#for each record read from the file, split the values into key and value
for line in db_file:
t_key,t_val = line.strip().split(' ')
#if value is in i_list file, then write to output file
if t_val in i_list: outF.write(t_val + ': ' + t_key + '\n')
#if value has reached the max value in sort list
#then you don't need to read the db_file anymore
if t_val == i_list[-1]: break
The output file will have the following items:
NG_002527: RPL26P4
NG_006062: VN2R13P
In the above code, we have to read the sortIn list first. Then read each line in the db_file. i_list[-1] will have the max value of sortIn file as the sortIn file is also sorted in ascending order.
The above code will have fewer i/o compared to the below one.
===========
previous answer submission:
Based on how the data has been stored in the db_file, it looks like we have to read the entire file to check against the sortIn file. If the values in the db_file was sorted by the second column, we could have stopped reading the file once the last item in sortIn was found.
With the assumption that we need to read all records from the files, see if the below code works for you.
with open("xyz.txt", "r") as db_file, open("abc.txt", "r") as sortIn, open("out.txt", 'w') as outF:
#read the db_file and convert it into a dictionary
d_list = dict([line.strip().split(' ') for line in db_file.readlines()])
#read the sortIn file as a list
i_list = [line.strip() for line in sortIn.readlines()]
#check if the value of each value in d_list is one of the items in i_list
out_list = [v + ': '+ k for k,v in d_list.items() if v in i_list]
#out_list is your final list that needs to be written into a file
#now read out_list and write each item into the file
for i in out_list:
outF.write(i + '\n')
The output file will have the following items:
NG_002527: RPL26P4
NG_006062: VN2R13P
To help you, i have also printed the contents in d_list, i_list, and out_list.
The contents in d_list will look like this:
{'RPL24P3': 'NG_002525', 'RPLP1P1': 'NG_002526', 'RPL26P4': 'NG_002527', 'VN2R11P': 'NG_006060', 'VN2R12P': 'NG_006061', 'VN2R13P': 'NG_006062', 'VN2R14P': 'NG_006063'}
The contents in i_list will look like this:
['NG_002527', 'NG_006062']
The contents that get written into the outF file from out_list will look like this:
['NG_002527: RPL26P4', 'NG_006062: VN2R13P']
I was able to solve the problem by inserting the following line:
line = next(inF)
before the break statement.

Sorting the string lines in a txt file and then overwriting it with the sorted lines

So basically I have a file that is something like this:
hi
my
name
is
and I want it to be (get sorted alphabetically):
hi
is
my
name
My code below stores the lines from the txt file into a list and then sorts it. It does what I want, but...:
with open ("3ex.txt","r") as f:
new = []
for line in f:
stripped = line.strip("\n")
new.append(stripped)
new.sort() #sorts by letter
print(new)
with open ("3ex.txt","w") as file:
for k in new:
file.write(file"[k]\n")
It doesn't overwrite it.
I tried first reading the file then writing in it. But I keep getting errors.
Yes I know its bad thing to do, but thats what Im asked to do.
Define the new list out of the with the statement. And also correct the file.write argument.
Here is the corrected code
new = []
with open ("3ex.txt","r") as f:
for line in f:
stripped = line.strip("\n")
new.append(stripped)
new.sort() #sorts by letter
with open ("3ex.txt","w") as file:
for k in new:
file.write(k + "\n")

Replace character in line inside a file

I have these different lines with values in a text file
sample1:1
sample2:1
sample3:0
sample4:15
sample5:500
and I want the number after the ":" to be updated sometimes
I know I can split the name by ":" and get a list with 2 values.
f = open("test.txt","r")
lines = f.readlines()
lineSplit = lines[0].split(":",1)
lineSplit[1] #this is the value I want to change
im not quite sure how to update the lineSplit[1] value with the write functions
You can use the fileinput module, if you're trying to modify the same file:
>>> strs = "sample4:15"
Take the advantage of sequence unpacking to store the results in variables after splitting.
>>> sample, value = strs.split(':')
>>> sample
'sample4'
>>> value
'15'
Code:
import fileinput
for line in fileinput.input(filename, inplace = True):
sample, value = line.split(':')
value = int(value) #convert value to int for calculation purpose
if some_condition:
# do some calculations on sample and value
# modify sample, value if required
#now the write the data(either modified or still the old one) to back to file
print "{}:{}".format(sample, value)
Strings are immutable, meaning, you can't assign new values inside them by index.
But you can split up the whole file into a list of lines, and change individual lines (strings) entirely. This is what you're doing in lineSplit[1] = A_NEW_INTEGER
with open(filename, 'r') as f:
lines = f.read().splitlines()
for i, line in enumerate(lines):
if condition:
lineSplit = line.split(':')
lineSplit[1] = new_integer
lines[i] = ':'.join(lineSplit)
with open(filename, 'w') as f:
f.write('\n'.join(lines)
Maybe something as such (assuming that each first element before : is indeed a key):
from collections import OrderedDict
with open('fin') as fin:
samples = OrderedDict(line.split(':', 1) for line in fin)
samples['sample3'] = 'something else'
with open('output') as fout:
lines = (':'.join(el) + '\n' for el in samples.iteritems())
fout.writelines(lines)
Another option is to use csv module (: is a column delimiter in your case).
Assuming there is a test.txt file with the following content:
sample1:1
sample2:1
sample3:0
sample4:15
sample5:500
And you need to increment each value. Here's how you can do it:
import csv
# read the file
with open('test.txt', 'r') as f:
reader = csv.reader(f, delimiter=":")
lines = [line for line in reader]
# write the file
with open('test.txt', 'w') as f:
writer = csv.writer(f, delimiter=":")
for line in lines:
# edit the data here
# e.g. increment each value
line[1] = int(line[1]) + 1
writer.writerows(lines)
The contents of test.txt now is:
sample1:2
sample2:2
sample3:1
sample4:16
sample5:501
But, anyway, fileinput sounds more logical to use in your case (editing the same file).
Hope that helps.

Reading from a file into a list and have each element go through the program

Sorry if the title is confusing. What I want to do is have a text file (keywords.txt) read and then split into a list. So basically lets say the file contained "iphone, keys, wallet, pen, folder", I would want the list to be [iphone, keys, wallet, pen, folder].
Is there any way to set one variable to work for each element. Say the variable is query. Is there anyway for query to be each of the elements so it can go through the program and work for each element. Below is the code I have, it obviously doesnt work but that is what I want to happen if possible.
The reason I want to do it for each is because eventually the script will write a new text file for each of the elements and name it based on what the element is and the only way I know how to do that is by having one variable.
data = [line.strip() for line in open('keywords.txt', 'r')]
try:
query = sys.argv[1]
except IndexError:
query = item in data
Here is the rest of the code that I will be performing. It will take what is in the list that is created and create a new textfile and a csv file.
newFile = open("%s.txt" %query, 'w').write(txt.encode('utf8'))
with open("%s.txt" %query, 'rb') as input_file:
reader = csv.reader(input_file, delimiter='\n', quoting = csv.QUOTE_NONE)
with open("%s.csv" %query, 'wb') as output_file:
writer = csv.writer(output_file)
for row in reader:
writer.writerow(row)
Turn the query value taken from the command line into a list instead, then loop over the query list:
try:
query = [sys.argv[1]]
except IndexError:
query = data
for q in query:
# do something with q
def process_keywords_in_file(file_name):
with open(file_name) as f:
for line in f:
process(line.strip())
def process(keyword):
#your code
if you want to write a new file with the name of the keyword:
with open('%s.txt' % keyword, 'w') as fw:
fw.write('content')

Iterate file name with counter

I'm splitting a file based on a string, and would like to have the output file names be numbered.
This is what I have so far:
outputfile = open("output.seq")
outputfileContent = outputfile.read()
outputfileList = outputfileContent.split(">")
for count, line in enumerate(f):
for items in outputfileList:
seqInfoFile = open('%f.dat', 'w')
seqInfoFile.write(str(items))
I'm not sure where to define f.
Thanks for any help!
Assuming I haven't misunderstood you, where you have it.
outputfile = open("output.seq")
outputfileContent = outputfile.read()
outputfileList = outputfileContent.split(">")
for count, content in enumerate(outputfileList, 1):
with open("output_%s.dat" % count, "w") as output:
output.write(content)
It would seem that if you want to associate every item in the output file list with a file titled as its index, you should do something like this:
for i in range(len(outputfileList)):
seqInfoFile = open(str(i) + '.dat', 'w')
seqInfoFile.write(str(outputfileList[i]))
It's not quite as elegant as an iterator, but the other option is to determine the number by making a call to outputfileList.index(items) each time.
Open output.seq, write its first line (splitted at >) into the file 1.dat, the second one to 2.dat and so on:
with open("output.seq") as fi:
for count, line in enumerate(fi, 1):
with open('{0}.dat'.format(count), 'w') as fo:
fo.writelines(line.split('>'))

Categories

Resources