I am trying to print values(a list) from a dictionary to the third column of another file that contains the dictionary key in the first column. I would like the list of values to print in the third column of the output file with a space separating each value. I know my problem lies somewhere in the fact that Python can't write things that aren't strings and that the list is separated by a "," but I am new to programming and am not sure how to accomplish this - any help is much appreciated, thanks!
The GtfFile.txt is a 10 column file (sep = '\t') which I generate the dictionary from... using the Gene name as the key and the Term (functional category) as the values. Several genes have more than one Term attributed to them and are repeated as new lines for each term. There are varying numbers of genes associated with each Term as well and thus I generate a list as the key for each Term. THIS PART OF MY SCRIPT APPEARS TO BE WORKING AS I WOULD LIKE IT TO!
The FuncEnr_terms.txt is a 2 column file (sep ='\t') which consists of a Term in the first column and a description of the term in the 2 column. My desired output file would be to duplicate this file with a third column that contains the Genes associated with the Term separated by a space. WRITING THIS TO THE OUTPUT FILE IS WHERE MY PROBLEM LIES.
Below is my code:
#!/usr/bin/env python
import sys
from collections import defaultdict
if len(sys.argv) != 4 :
print("Usage: GeneSetFileGen.py <GtfFile.txt> <FuncEnr_terms.txt> <OutputFile.txt>")
sys.exit(0)
OutFileName = sys.argv[3]
OutFile = open(OutFileName, 'w')
TermGeneDic = defaultdict(list)
with open(sys.argv[1], 'r') as f :
for line in f :
line = line.strip()
line = line.split('\t')
Term = line[8]
Gene = line[0]
TermGeneDic[Term].append(Gene)
#write output file
with open(sys.argv[2], 'r') as f :
for line in f :
line = line.strip()
Term, Des = line.split('\t')
OutFile.write(Term + '\t' + Des + '\t' + str(TermGeneDic[Term]) + '\n')
OutFile.close
If I understand what you require correctly then what you need is to replace this expression:
str(TermGeneDic[Term])
with something like:
" ".join(TermGeneDic[Term])
A couple of pointers on your code: your code will be incomprehensible to anyone else if you don't follow pep 8 conventions fairly closely. This means, no CamelCase except for class names.
Secondly, reusing variable is generally bad, and a sign that you should just chain up those method calls. It's especially bad when you have a variable like line whose type you actually change.
Thirdly, brackets (parentheses) are mandatory for calling a method or function.
Fourthly, you join the elements of a list into a string with '\t'.join(termgenes[term])
Finally, use templating to generate long strings - it ends up being easier to work with.
Your code should look like:
import sys
from collections import defaultdict
if len(sys.argv) != 4 :
print("Usage: GeneSetFileGen.py <GtfFile.txt> <FuncEnr_terms.txt> <OutputFile.txt>")
sys.exit(0)
progname,gtffilename,funcencrfilename,outfilename = sys.argv
termgenes = defaultdict(list)
with open(gtffilename, 'r') as gtf :
for line in gtf:
linefields = line.strip().split('\t')
term, gene = linefields[8],linefields[0]
termgenes[term].append(gene)
#write output file
with open(funcencrfilename, 'r') as funcencrfile, open(outfilename, 'w') as outfile:
for line in funcencrfile:
term, des = line.strip().split('\t')
outfile.write('%s\t%s%s\n' % term,des,'\t'.join(termgenes[term]))
Related
I've attempted to find this but I end up finding code that replaces that one specific word.
I've been using this code:
phrase = open(club, 'a')
for line in phrase:
if line.contains(name):
#what do i put here?
else:
pass
else:
pass
so lets say I have a text file which contains sam 10 and I want to replace this with sam 5. If I were to do this with the above code how would I? The name will stay the same but the number will not. Since the number is different to each name I'll be unable to search for the number which is why I'm searching for the name. I was thinking of using line.replace but that only changes the one phrase whereas I would want the whole line to change.
Edit: This would be made under the assumption that the text file has multiple names of different people with different numbers. I would want it to search for that specific name and replace the whole line.
Thanks!
You can check line by line to modify and modify the original to save changes:
f = open('sample.txt', 'r')
lines = f.readlines()
f.close()
name = 'ali'
for i in range(len(lines) - 1):
line = lines[i]
#Find 'name' index (-1 if not found)
index = line.find(name)
# If the wanted name is found
if index != -1:
#Split line by spaces to get name and number starting from name
words = line[index:].split()
# Name and number should be the first two elements in words list
name_number = words[0]+ ' ' + '5' #words[1]
#
# Do some thing with name_number
#
# the last '1' is to skip copying a space
line = line[:index] + name_number + ' ' + line[index + len(name_number) + 1:]
# Save result
lines[i] = line
output = open('sample.txt', 'w')
for line in lines:
output.write(line)
output.close()
In general, there are many ways, one of them would be:
Using built-in method:
with open(club, 'r') as in_file, open('new_file', 'w') as out_file:
for line in in_file:
if name in line:
out_file.write(new_line)
else:
out_file.write(line)
But this has the effect of creating a new file.
EDIT:
If you want to do inplace replacement, then you can use fileinput module, this way:
for line in fileinput.input(club, inplace=True):
if name in line:
line = 'NEW LINE' #Construct your new line here
print(line) #This will print it into the file
else:
print(line) #No Changes to be made, print line into the file as it is
Quoting from docs:
class fileinput.FileInput(files=None, inplace=False, backup='', bufsize=0, mode='r', openhook=None)
...
Optional in-place filtering:
if the keyword argument inplace=True is passed to fileinput.input() or
to the FileInput constructor, the file is moved to a backup file and
standard output is directed to the input file (if a file of the same
name as the backup file already exists, it will be replaced silently).
This makes it possible to write a filter that rewrites its input file
in place.
okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]
My script will process few files a different path, I want to write those output in CSV format in python.
For example:
%> script_name <file_name>
In every file, I have different options to be checked.
For example : file1:
Best_friend : Riky
Mutual_friend : Anuj
Family_friend : Jamie
For example : file2:
Best_friend : Anjelina
Mutual_friend : Mythe
For example : file3:
Best_friend : Mahira
Mutual_friend : Shyna
Dear_frind : Kisty
I want to create CSV in the format
File,Best_friend, Mutual_friend
File1,Riky,Anuj
File2,Anjelina,Mythe
File3,Mahira,shyna
Please help
Well, there's several things to your question.
You want to get passed several files, read some values in each of them, then output the values into csv file.
It helps if you decompose your problem into several successive steps.
First, you need to know how to read the best and mutual friend in a given file. You can do that in a function:
def get_best_mutual(filename):
# some code
return (best_friend, mutual_friend)
Then, you can just iterate over all your files to write the values while you collect them:
for filename in list_of_filenames:
best_friend, mutual_friend = get_best_mutual(filename)
# write filename, best_friend, mutual_friend in output file
Writing into the file should be easy, I'll not go into the details.
The problem might be to actually get the values from the input files.
When you read a text file, you typically read it line by line. Then you can just look at your line to decide what to do: if it defines either best or mutual friend, save the definition, otherwise do nothing.
Concretely, it might look like:
def get_best_mutual(filename):
for line in open(filename): # read each line of the file
key, value = line.split(':', 1) # split the line along the first :
if key.startswith('Best'):
best_friend = value
if key.startswith('Mutual'):
mutual_friend = value
return (best_friend, mutual_friend)
Obviously, you'd have to protect a bit more the code, in case for example the line doesn't have a ':' in it, and you might also notice that the value starts with a space and ends with a '\n': you can use value.strip() to solve that. Same for the key, if a line starts with a space the code above will not recognize it.
You also need to decide what to do if a file doesn't have a best_friend, for example.
Using csv dictReader/dictWriter are more efficient way of handling the csv files.
Hope this will solve your problem:
import sys
import csv
import copy
def create_csv(files):
headers= ['File', 'Best Friend', 'Mutual Friend']
list1 = []
for file in files:
with open(file,'r') as file_obj:
dict_temp = {}
dict_temp['File'] = file
for line in file_obj:
if line.split(':')[0] == 'Best_friend ':
dict_temp['Best Friend'] = line.split(':')[1].strip()
if line.split(':')[0] == 'Mutual_friend ':
dict_temp['Mutual Friend'] = line.split(':')[1].strip()
list1.append(dict_temp)
print list1
csv_result = open('result.csv','wb')
writer = csv.DictWriter(csv_result, delimiter=',', fieldnames=headers, quoting=csv.QUOTE_NONE)
writer.writeheader()
for entry in list1:
writer.writerow(entry)
csv_result.close()
if __name__ == "__main__":
create_csv(sys.argv[1:])
You can add/remove the columns in csv just by adding it in dictionary with appropriate key.
How would I go about creating a python program to store a range of values (ie. apples=3, bananas=5, carrots=12.5) in an external file where I can later use those values for calculations later on?
Currently, I am able to print to and read from text files using fileinput and fileoutput, but I am unable to use those values in calculations later on.
with open("values.txt","w") as fileinput:
fileinput.write(value)
An example of what I am looking for is first being able to type a variable name (eg. Apples), then type a number or other value (eg. 3.3) and then print those values to the values.txt. That way, a separate program could view values.txt to be able to use the value of apples=3.3 in a calculation (eg. apples*3=9.9)
Assuming that you want to store fruit and values pairs into a dictionary (but you can adapt it to any data structure):
Writing to file:
fruits = {"apple":12, "banana":4}
with open("test.txt", 'w') as out:
for fruit,number in fruits.iteritems():
line = fruit + '=' + str(number) + '\n'
out.write(line)
Parse from file into dictionary
emptyDict = {}
with open("test.txt", 'r') as infile:
for line in infile:
tokens = line.strip().split('=')
fruit = tokens[0]
number = tokens[1]
emptyDict[fruit] = float(number)
What do you mean by "but I am unable to use those values in calculations later on.". But you could create a dictionary mapping the String to the values, for example:
mydata = {}
with open("values.txt", "r") as output:
for line in output:
tmp = line.split('=')
mydata[tmp[0]] = float(tmp[1])
I have a odd csv file thas has data with header value and its corresponding data in a manner as below:
,,,Completed Milling Job,,,,,, # row 1
,,,,Extended Report,,,,,
,,Job Spec numerical control,,,,,,,
Job Number,3456,,,,,, Operator Id,clipper,
Coder Machine Name,Caterpillar,,,,,,Job Start time,3/12/2013 6:22,
Machine type,Stepper motor,,,,,,Job end time,3/12/2013 9:16,
I need to extract the data from this strucutre create another csv file as per the structure below:
Status,Job Number,Coder Machine Name,Machine type, Operator Id,Job Start time,Job end time,,, # header
Completed Milling Job,3456,Caterpillar,Stepper motor,clipper,3/12/2013 6:22,3/12/2013 9:16,,, # data row
If you notice, there is a new header column added called 'status" but the value is in the first row of the csv file. rest of the column names in output file are extracted from the original file.
Any thoughts will be greatly appreciated - thanks
Assuming the files are all exactly like that (at least in terms of caps) this should work, though I can only guarantee it on the exact data you have supplied:
#!/usr/bin/python
import glob
from sys import argv
g=open(argv[2],'w')
g.write("Status,Job Number,Coder Machine Name,Machine type, Operator Id,Job Start time,Job end time\n")
for fname in glob.glob(argv[1]):
with open(fname) as f:
status=f.readline().strip().strip(',')
f.readline()#extended report not needed
f.readline()#job spec numerical control not needed
s=f.readline()
job_no=s.split('Job Number,')[1].split(',')[0]
op_id=s.split('Operator Id,')[1].strip().strip(',')
s=f.readline()
machine_name=s.split('Coder Machine Name,')[1].split(',')[0]
start_t=s.split('Job Start time,')[1].strip().strip(',')
s=f.readline()
machine_type=s.split('Machine type,')[1].split(',')[0]
end_t=s.split('Job end time,')[1].strip().strip(',')
g.write(",".join([status,job_no,machine_name,machine_type,op_id,start_t,end_t])+"\n")
g.close()
It takes a glob argument (like Job*.data) and an output filename and should construct what you need. Just save it as 'so.py' or something and run it as python so.py <data_files_wildcarded> output.csv
Here is a solution that should work on any CSV files that follow the same pattern as what you showed. That is a seriously nasty format.
I got interested in the problem and worked on it during my lunch break. Here's the code:
COMMA = ','
NEWLINE = '\n'
def _kvpairs_from_line(line):
line = line.strip()
values = [item.strip() for item in line.split(COMMA)]
i = 0
while i < len(values):
if not values[i]:
i += 1 # advance past empty value
else:
# yield pair of values
yield (values[i], values[i+1])
i += 2 # advance past pair
def kvpairs_by_column_then_row(lines):
"""
Given a series of lines, where each line is comma-separated values
organized as key/value pairs like so:
key_1,value_1,key_n+1,value_n+1,...
key_2,value_2,key_n+2,value_n+2,...
...
key_n,value_n,key_n+n,value_n+n,...
Yield up key/value pairs taken from the first column, then from the second column
and so on.
"""
pairs = [_kvpairs_from_line(line) for line in lines]
done = [False for _ in pairs]
while not all(done):
for i in range(len(pairs)):
if not done[i]:
try:
key_value_tuple = next(pairs[i])
yield key_value_tuple
except StopIteration:
done[i] = True
STATUS = "Status"
columns = [STATUS]
d = {}
with open("data.csv", "rt") as f:
# get an iterator that lets us pull lines conveniently from file
itr = iter(f)
# pull first line and collect status
line = next(itr)
lst = line.split(COMMA)
d[STATUS] = lst[3]
# pull next lines and make sure the file is what we expected
line = next(itr)
assert "Extended Report" in line
line = next(itr)
assert "Job Spec numerical control" in line
# pull all remaining lines and save in a list
lines = [line.strip() for line in f]
for key, value in kvpairs_by_column_then_row(lines):
columns.append(key)
d[key] = value
with open("output.csv", "wt") as f:
# write column headers line
line = COMMA.join(columns)
f.write(line + NEWLINE)
# write data row
line = COMMA.join(d[key] for key in columns)
f.write(line + NEWLINE)