I want to create a dictionary in python using a for loop, where each key ('CUI' in my case)is associated with an array of values, but the output I obtain is a dictionary where each key yeld just one of the values in my list. Following my code:
import numpy as np
data2 = open('pathways.dat', 'r', errors = 'ignore')
pathways = data2.readlines()
special_line_indexes = []
PWY_ID = []
line_cont = []
L_PRMR = [] #Left primary
dict_comp = dict()
#i is the line number (first element of enumerate), while line is the line content (2nd elem of enumerate)
for CUI in just_compound_id:
for i,line in enumerate(pathways):
if '//' in line:
#fint the indexes of the lines containing //
special_line_indexes = i+1
elif 'REACTION-LAYOUT -' in line:
if CUI in line:
PWY_ID.append(special_line_indexes)
dict_comp[CUI] = special_line_indexes
print(PWY_ID)
You need to take the dictionary out of the inner for and asign the PWY_ID table:
import numpy as np
data2 = open('pathways.dat', 'r', errors = 'ignore')
pathways = data2.readlines()
special_line_indexes = []
line_cont = []
L_PRMR = [] #Left primary
dict_comp = dict()
#i is the line number (first element of enumerate), while line is the line content (2nd elem of enumerate)
for CUI in just_compound_id:
PWY_ID = []
for i,line in enumerate(pathways):
if '//' in line:
#fint the indexes of the lines containing //
special_line_indexes = i+1
elif 'REACTION-LAYOUT -' in line:
if CUI in line:
PWY_ID.append(special_line_indexes)
dict_comp[CUI] = PWY_ID
print(PWY_ID)
print (dict_comp)
EDIT
The reason it's because you are over writting the value of the dictionary index (CUI) every time with a value (special_line_indexes) instead of an array of values. What you need it's to create the table in the inner for (with PWY_ID(append)), adding one element on each loop, and once you have created it, when you are finished with the for loop, then you need to assign that array to the dictionary (dict_comp[CUI] = PWY_ID).
You get an empty array before the inner for each time with the PWY_ID = []
Related
I'm a beginner programmer, and I'm trying to figure out how to create a 2d nested list (grid) from a particular text file. For example, the text file would look like this:
3
3
150
109
80
892
123
982
0
98
23
The first two lines in the text file would be used to create the grid, meaning that it is 3x3. The next 9 lines would be used to populate the grid, with the first 3 making up the first row, the next 3 making up the middle row, and the final 3 making up the last row. So the nested list would look like this:
[[150, 109, 80] [892, 123, 982] [0, 98, 23]]
How do I go about doing this? I was able to make a list of all of the contents, but I can't figure out how to use the first 2 lines to define the size of the inner lists within the outer list:
lineContent = []
innerList = ?
for lines in open('document.txt','r'):
value = int(lines)
lineContent.append(value)
From here, where do I go to turn it into a nested list using the given values on the first 2 lines?
Thanks in advance.
You can make this quite neat using list comprehension.
def txt_grid(your_txt):
with open(your_txt, 'r') as f:
# Find columns and rows
columns = int(f.readline())
rows = int(f.readline())
your_list = [[f.readline().strip() for i in range(rows)] for j in range(columns)]
return your_list
print(txt_grid('document.txt'))
strip() just clears the newline characters (\n) from each line before storing them in the list.
Edit: A modified version with logic for if your txt file didn't have enough rows for the defined dimensions.
def txt_grid(your_txt):
with open(your_txt, 'r') as f:
# Find columns and rows
columns = int(f.readline())
rows = int(f.readline())
dimensions = columns * rows
# Test to see if there are enough rows, creating grid if there are
nonempty_lines = len([line.strip("\n") for line in f]) # This ignores the first two lines as they have already been written
if nonempty_lines < dimensions:
# Either raise an error
# raise ValueError("Insufficient non-empty rows in text file for given dimensions")
# Or return something that's not a list
your_list = None
else:
# Creating grid
your_list = [[f.readline().strip() for i in range(rows)] for j in range(columns)]
return your_list
print(txt_grid('document.txt'))
def parse_txt(filepath):
lineContent = []
with open(filepath, 'r') as txt: # The with statement closes the txt file after its been used
nrows = int(txt.readline())
ncols = int(txt.readline())
for i in range(nrows): # For each row
row = []
for j in range(ncols): # Grab each value in the row
row.append(int(txt.readline()))
lineContent.append(row)
return lineContent
grid_2d = parse_txt('document.txt')
lineContent = []
innerList = []
for lines in open('testQuestion.txt', 'r'):
value = int(lines)
lineContent.append(value)
rowSz = lineContent[0] # row size
colSz = lineContent[1] # column size
del lineContent[0], lineContent[0] # makes line contents just the values in the matrix, could also just start currentLine at 2, notice 0 index is repeated because 1st element was deleted
assert rowSz * colSz == len(lineContent), 'not enough values for array' # to ensure there are enough entries to complete array of rowSz * colSz elements
arr = []
currentLine = 0
for x in range(rowSz):
arr.append([])
for y in range(colSz):
arr[x].append(lineContent[currentLine])
currentLine += 1
print(arr)
I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end
I have a text file containing multiple columns. I can successfully print all the items in the 2 columns I am interested in by using this code:
with open(file) as catalog:
for line in catalog:
column = line.split()
if not line.startswith('#'): #skipping column labels
x = float(column[3])
y = float(column[4])
Now if I add a print(x) command inside the 'if not' loop, it prints all of the x values. But if I put print(x) outside of the loop it only prints the last item. What I want is to be able to access the full array of x and y values anywhere in my code. I also need to be able to access the x/y array items individually, so I can say x[2], and it will give me the third value in the x array. I can not get this part to work even inside of the 'if not' loop. Thanks for any help, I have only been using Python for a couple of weeks..
Save your Xs and Ys in a list:
X_list = []
Y_list = []
with open(file) as catalog:
for line in catalog:
column = line.split()
if not line.startswith('#'): #skipping column labels
x = float(column[3])
y = float(column[4])
X_list.append(x)
Y_list.append(y)
#then print the lists if you wish
print(X_list)
print(Y_list)
It sounds like you'll need to build a list of values.
with open(file) as catalog:
x_values = []
y_values = []
for line in catalog:
column = line.split()
if not line.startswith('#'): #skipping column labels
x_values.append(float(column[3]))
y_values.append(float(column[4]))
I am having a problem updating values in a dictionary in python. I am trying to update a nested value (either as an int or list) for a single fist level key, but instead i update the values, for all first level keys.
I start by creating the dictionary:
kmerdict = {}
innerdict = {'endcover':0, 'coverdict':{}, 'coverholder':[], 'uncovered':0, 'lowstart':0,'totaluncover':0, 'totalbases':0}
for kmer in kmerlist: # build kmerdict
kmerdict [kmer] = {}
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
kmerdict [kmer][chromnum] = innerdict
Then i am walking through chromosomes (as plain text files) from a list (fas, not shown), and taking 7mer strings (k=7) as the key. If that key is in a list of keys i am looking for (kmerlist) and trying to use that to reference a single value nested in the dictionary:
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
p = 0 #chromosome position counter
thisfile = "/var/store/fa/" + chrom
thischrom = open(thisfile)
thischrom.readline()
thisline = thischrom.readline()
thisline = string.strip(thisline.lower())
l=0 #line counter
workline = thisline
while(thisline):
if len(workline) > k-1:
thiskmer = ''
thiskmer = workline[0:k] #read five bases
if thiskmer in kmerlist:
thisuncovered = kmerdict[thiskmer][chromnum]['uncovered']
thisendcover = kmerdict[thiskmer][chromnum]['endcover']
thiscoverholder = kmerdict[thiskmer][chromnum]['coverholder']
if p >= thisendcover:
thisuncovered += (p - thisendcover)
thisendcover = ((p+k) + ext)
thiscoverholder.append(p)
elif p < thisendcover:
thisendcover = ((p+k) + ext)
thiscoverholder.append(p)
print kmerdict[thiskmer]
p += 1
workline = workline[1:]
else:
thisline = thischrom.readline()
thisline = string.strip(thisline.lower())
workline = workline+thisline
l+=1
print kmerdict
but when i print the dictionary, all "thiskmer" levels are getting updated with the same values. I'm not very good with dictionaries, and i can't see the error of my ways, but they are profound! Can anyone enlighten me?
Hope i've been clear enough. I've been tinkering with this code for too long now :(
confession -- I haven't spent the time to figure out all of your code -- only the first part. The first problem you have is in the setup:
kmerdict = {}
innerdict = {'endcover':0, 'coverdict':{}, 'coverholder':[], 'uncovered':0,
'lowstart':0,'totaluncover':0, 'totalbases':0}
for kmer in kmerlist: # build kmerdict
kmerdict [kmer] = {}
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
kmerdict [kmer][chromnum] = innerdict
You create innerdict once and then proceed to use the same dictionary over an over again. In other words, every kmerdict[kmer][chromnum] refers to the same objects. Perhaps changing the last line to:
kmerdict [kmer][chromnum] = copy.deepcopy(innerdict)
would help (with an appropriate import of copy at the top of your file)? Alternatively, you could just move the creation of innerdict into the inner loop as pointed out in the comments:
def get_inner_dict():
return {'endcover':0, 'coverdict':{}, 'coverholder':[], 'uncovered':0,
'lowstart':0,'totaluncover':0, 'totalbases':0}
kmerdict = {}
for kmer in kmerlist: # build kmerdict
kmerdict [kmer] = {}
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
kmerdict [kmer][chromnum] = get_inner_dict()
-- I decided to use a function to make it easier to read :).
We basically have a large xcel file and what im trying to do is create a list that has the maximum and minimum values of each column. there are 13 columns which is why the while loop should stop once it hits 14. the problem is once the counter is increased it does not seem to iterate through the for loop once. Or more explicitly,the while loop only goes through the for loop once yet it does seem to loop in that it increases the counter by 1 and stops at 14. it should be noted that the rows in the input file are strings of numbers which is why I convert them to tuples and than check to see if the value in the given position is greater than the column_max or smaller than the column_min. if so I reassign either column_max or column_min.Once this is completed the column_max and column_min are appended to a list( l ) andthe counter,(position), is increased to repeat the next column. Any help will be appreciated.
input_file = open('names.csv','r')
l= []
column_max = 0
column_min = 0
counter = 0
while counter<14:
for row in input_file:
row = row.strip()
row = row.split(',')
row = tuple(row)
if (float(row[counter]))>column_max:
column_max = float(row[counter])
elif (float(row[counter]))<column_min:
column_min = float(row[counter])
else:
column_min=column_min
column_max = column_max
l.append((column_max,column_min))
counter = counter + 1
I think you want to switch the order of your for and while loops.
Note that there is a slightly better way to do this:
with open('yourfile') as infile:
#read first row. Set column min and max to values in first row
data = [float(x) for x in infile.readline().split(',')]
column_maxs = data[:]
column_mins = data[:]
#read subsequent rows getting new min/max
for line in infile:
data = [float(x) for x in line.split(',')]
for i,d in enumerate(data):
column_maxs[i] = max(d,column_maxs[i])
column_mins[i] = min(d,column_mins[i])
If you have enough memory to hold the file in memory at once, this becomes even easier:
with open('yourfile') as infile:
data = [map(float,line.split(',')) for line in infile]
data_transpose = zip(*data)
col_mins = [min(x) for x in data_transpose]
col_maxs = [max(x) for x in data_transpose]
Once you have consumed the file, it has been consumed. Thus iterating over it again won't produce anything.
>>> for row in input_file:
... print row
1,2,3,...
4,5,6,...
etc.
>>> for row in input_file:
... print row
>>> # Nothing gets printed, the file is consumed
That is the reason why your code is not working.
You then have three main approaches:
Read the file each time (inefficient in I/O operations);
Load it into a list (inefficient for large files, as it stores the whole file in memory);
Rework the logic to operate line by line (quite feasible and efficient, though not as brief in code as loading it all into a two-dimensional structure and transposing it and using min and max may be).
Here is my technique for the third approach:
maxima = [float('-inf')] * 13
minima = [float('inf')] * 13
with open('names.csv') as input_file:
for row in input_file:
for col, value in row.split(','):
value = float(value)
maxima[col] = max(maxima[col], value)
minima[col] = min(minima[col], value)
# This gets the value you called ``l``
combined_max_and_min = zip(maxima, minima)