python: recursive dictionary of dictionary - python

I need help with a pretty simple exercise I am trying to execute, just syntactically I'm a bit lost
basically I read in a very brief text file containing 15 lines of 3 elements (essentially 2 keys and a value)
put those elements into a dictionary comprised of dictionaries
the 1st dictionary contains location and the 2nd dictionary which is made up of the type of the item and how much it costs for example
gymnasium weights 15
market cereal 5
gymnasium shoes 50
saloon beer 3
saloon whiskey 10
market bread 5
which would result in this
{
'gymnasium': {
'weights': 15,
'shoes': 50
},
'saloon': {
'beer': 3,
'whiskey': 10
}
}
and so on for the other keys
basically I need to loop through this file but I'm struggling to read in the contents as a dict of dicts.
moreover without that portion i cant figure out how to append the inner list to the outer list if an instance of the key in the outer list occurs.
I would like to do this recursively
location_dict = {} #row #name day weight temp
item_dict = {}
for line in file:
line = line.strip()
location_dict[item_dict['location'] = item_dict`

this is a good use for setdefault (or defaultdict)
data = {}
for line in file:
key1,key2,value = line.split()
data.setdefault(key1,{})[key2] = value
print data
or based on your comment
from collections import defaultdict
data = defaultdict(lambda:defaultdict(int))
for line in file:
key1,key2,value = line.split()
data[key1][key2] += value
print data

Here is another solution.
yourFile = open("yourFile.txt", "r")
yourText = yourFile.read()
textLines = yourText.split("\n")
locationDict = {}
for line in textLines:
k1, k2, v = line.split(" ")
if k1 not in locationDict.keys():
locationDict[k1] = {}
else:
if k2 not in locationDict[k1].keys():
locationDict[k1][k2] = int(v)
else:
locationDict[k1][k2] += int(v)
print locationDict
Hope it helps!

Related

Turning text file into dictionary when the same keys appear multiple times

I have a text file that looks like this:
tomato 7000
potato and pear 8000
prunes 892
tomato 8
carrot 600
prunes 3
To turn it into a dictionary that ignores the lines where there are more words (which is what I want, so potato and pear are ignored, which is fine), I wrote:
with open("C:\\path\\food.txt", encoding="utf-8") as f_skipped:
result = {}
for line in f_skipped:
try:
k, v = line.split()
except ValueError:
pass
else:
result[k] = v
But since there can't be duplicate keys, it takes the value that appears later, so tomato and prunes have values 8 and 3, respectively. Is there any way of taking only the first appearance and ignoring the later once?
I thought of keeping my code and just turning the text around (sounds a bit silly) or detecting whether there are duplicate words (the latter is a bit risky since there are lots of rows with many words that I simply wanna ignore anyway).
Try this .get(key) method of the dictionary will return None if the key doesn't exit otherwise return the value for the key. so you can use it in if condition.
I hope this is what you want by reading your question.
filename = "text.txt"
with open(filename, encoding="utf-8") as f_skipped:
result = {}
for line in f_skipped:
try:
k, v = line.split()
except ValueError:
pass
else:
if result.get(k) is None:
result[k] = v
print(result)
Output
py code.py
{'tomato': '7000', 'prunes': '892', 'carrot': '600'}
Try this:-
with open('food.txt') as food:
D = {}
for line in food:
t = line.rsplit(' ', 1)
k = t[0]
if not k in D:
D[k] = t[1].split()
print(D)

sort a dictionary by the first letter of its keys

I wrote a code that reads a whole DNA genome and returns a dictionary o all the 8-primers with their locations, i want to loop through this dictionary and sort these codons into 4 other dictionaries based on the letter they start with A,T,G and C.
But I couldn't figure out how to check the first letter of each key.
This is my code:
"""
Generating all the possible 8-codon primers.
saving them in a text file with their locations.
"""
import csv
##MAIN FUNCTION:
def k_mer(Text, k):
dictionary = {}
for i in range (len(Text) - k + 1):
if(Text[i: i+k] in dictionary):
dictionary[Text[i: i+k]].append(i)
else:
dictionary[Text[i: i+k]] = [i]
return dictionary
##INPUT:
# open the file with the original sequence
myfile = open('Vibrio_cholerae.txt')
# set the file to the variable Text to read and scan
Text = myfile.read()
result = k_mer(Text.strip(), 8)
with open("result.txt","w") as f:
from collections import Counter
wr = csv.writer(f,delimiter=":")
wr.writerows(Counter(result).items())
Given your dictionary, it's just this. It's not a complicated problem.
groupby = {'A':{}, 'C':{}, 'G':{}, 'T':{} }
for k,v in dictionary.items():
groupby[k[0]][k] = v

Reading a Tuple Assignment (e.g.written as such d1: p, m, h, = 20, 15, 22) from a Text File and Performing Calculations with Each Variable (e.g. p*h)

I'm a reading a text file with several hundred lines of data in python. The text file contains data written as a tuple assignment. For example, the data looks exactly like this in the text file:
d1: p,h,t,m= 74.15 18 6 0.1 ign: 0.0003
d2: p,h,t,m= 54. 378 -0.14 0.1 ign: 0.0009
How can I separate the data as such:
p = 20
t = 15
etc.
Then, how can I perform calculations on the tuple assignment? For example calculate:
p*p = 20*15?
I am not sure if I should convert the tuple assignment to an array. But I was not successful. In addition, I do not know how to get rid of the d1 and d2: which is there to identify which data set I am looking at
I have read the data and picked out the lines that have the data, (ignoring the First Set line and of Data Given as line)
The results that I need would be:
p (from first set of data d1)*p(from first set of data d2) = 20*15 = 300
p (from second set of data d1)*p(from second set of data d2) = 12*5 = 60
I believe I would need to do this over some kind of loop so that I can separate the data in all the lines in the file.
I would appreciate any help on this! I couldn't find anything pertaining to my question. I would only find how to deal with tuples in the simplest manner but nothing on how to extract variables and performing calculations on a tuple assignment contained in a text file.
EDIT:
After looking at the answer given for this question given by #JArunMani, I went back to try to see if I can understand each line of code. I understand that we need to create a dictionary that fills in the respective values for p, q, etc...
When I try to rewrite the code to how I understand it, I have:
with open("d.txt") as fp: # Opens the file
# The database kinda thing here
line = fp.readline() # Read the file's first line
number, _,cont = line.partition(":")#separates m1 from p, m, h, n =..."
print(cont)
data, _,ignore = cont.partition("int") #separates int from p, m, h, n =..."
print(data) #prints tuple assignment needed
keys, _,values = data.partition("=")
print(keys) #prints p, m, h, n
print(values) #prints values (all numbers after =)
thisdict = {} #creating an empty dictionary to fill with keys and values
thisdict[keys] = values
print(thisdict)
if "m" in thisdict:
print("Yes")
print(thisdict) gives me the Output: {' p,m,h,n': ' 76 6818 2.2 1 '}
However, if "m" in thisdict: did not print anything. I do not understand why m is not in the dictionary, yet print(thisdict) shows that thisdict = {} has been filled. Also, is it necessary to add the for loop in the answer given below?
Thank you.
EDIT 2
I am now trying my second attempt to this problem. I combining both answers to write the code since I using what I understand from each code:
def DataExtract(self):
with open("muonsdata.txt") as fp: # Opens the file
line = fp.readline() # Read the file's first line
number, _,cont = line.partition(":")#separates m1 from pt, eta, phi, m =..."
print(cont)
data, _,ignore = cont.partition("dptinv") #separates dptinv from pt, eta, phi, m =..."
print(data) #prints tuple assignment needed
keys, _,values = data.partition("=")
print(keys) #prints pt, eta, phi, m
print(values) #prints values (all numbers after =)
key = [k for k in keys.split(",")]
value = [v for v in values.strip().split(" ")]
print(key)
print(value)
thisdict = {}
data = {}
for k, v in zip(key, value): #creating an empty dictionary to fill with keys and values
thisdict[k] = v
print(thisdict)
if "m" in thisdict:
print("Yes")
x = DataExtract("C:/Users/username/Desktop/data.txt")
mul_p = x['m1']['p'] * x['d2']['p']
print(mul_p)
However, this gives me the error: Traceback (most recent call last):
File "read.py", line 29, in
mul_p = x['d1']['p'] * x['d2']['p']
TypeError: 'NoneType' object is not subscriptable
EDIT 3
I have the code made from a combination of answers 1 and 2, BUT...
the only thing is that I have the code written and working but why doesn't the while loop go on until we reach the end of the file. I only get one answer from the calculating the values from the first two lines, but what about the remaining lines? Also, it seems like it is not reading the d2 data lines (or the line = fp.readline is not doing anything), because when I try to calculate m , I get the error Traceback (most recent call last):
File "read.py", line 37, in
m = math.cosh(float(data[" m2"]["eta"])) * float(data["m1"][" pt"])
KeyError: ' m2'
Here is my code that I have:
import math
with open("d.txt") as fp: # Opens the file
data ={} #final dictionary
line = fp.readline() # Read the file's first line
while line: #continues to end of file
name, _,cont = line.partition(":")#separates d1 from p, m, h, t =..."
#print(cont)
numbers, _,ignore = cont.partition("ign") #separates ign from p, m, h, t =..."
#print(numbers) #prints tuple assignment needed
keys, _,values = numbers.partition("=")
#print(keys) #prints p, m, h, t
#print(values) #prints values (all numbers after =)
key = [k for k in keys.split(",")]
value = [v for v in values.strip().split(" ")]
#print(key) #prints pt, eta, phi, m
#print(value)
thisdict = {}
for k, v in zip(key, value): #creating an empty dictionary to fill with keys and values
#thisdict[k] = v
#print(thisdict)
#data[name]=thisdict
line = fp.readline()#read next lines, not working I think
thisdict[k] = v
data[name]=thisdict
print(thisdict)
#if " m2" in thisdict:
#print("Yes")
#print(data)
#mul_p = float(data["d1"][" p"])*float(data["d1"]["m"])
m = math.cosh(float(data[" d2"]["m"])) * float(data["m1"][" p"])
#m1 = float(data["d1"][" p"]) * float(2)
print(m)
#print(mul_p)
If I replace the d2's with d1 the code runs fine, except it skips the last d1. I do not know what I am doing wrong. Would appreciate any input or guidance.
So the following function returns a dictionary with values of 'p', 'q' and other variables. But I leave it to you to find out how to multiply or perform operations on them ^^
def DataExtract(path): # 'path' is the path to the data file
fp = open(path) # Opens the file
data = {} # The database kinda thing here
line = fp.readline() # Read the file's first line
while line: # This goes on till we reach end of file (EOF)
name, _, cont = line.partition(":") # So this gives, 'd1', ':', 'p, q, ...'
keys, _, values = cont.partition("=") # Now we split the text into RHS and LHS
keys = keys.split(",") # Split the variables by ',' as separator
values = values.split(",") # Split the values
temp_d = {} # Dict for variables
for i in range(len(keys)):
key = keys[i].strip() # Get the item at the index and remove left-right spaces
val = values[i].strip() # Same
temp_d[key] = float(val) # Store it in dictionary but as number
data[name.strip()] = temp_d # Store the temp_d itself in main dict
line = fp.readline() # Now read next line
fp.close() # Close the file
return data # Return the data
I used simple methods, to make it easy for you. Now to access data, you have to do something like this:
x = DataExtract("your_file_path")
mul_p = x['d1']['p'] * x['d2']['p']
print(mul_p) # Tadaaa !
Feel free to comment...
This answer is quite familiar with #JArunMani, but it's shorter a bit and sure that can run successfully.
The idea is return your data to dictionary.
lines = "d1: p,h,t,m= 74.15 18 6 0.1 ign: 0.0003\nd2: p,h,t,m= 54. 378 -0.14 0.1 ign: 0.0009".split("\n") # lines=open("d.txt",'r').read().split("\n")
data = {}
for line in lines:
l = line.split("ign")[0] # remove "ign:.."
name_dict, vals_dict = l.split(":") #['d1',' p,h,t,m= 74.15 18 6 0.1']
keys_str, values_str = vals_dict.split("=") #[' p,h,t,m',' 74.15 18 6 0.1']
keys=[k for k in keys_str.strip().split(',')] #['p','h','t','m']
values=[float(v) for v in values_str.strip().split(' ')] #[74.15, 18, 6, 0.1]
sub_dict = {}
for k,v in zip(keys, values):
sub_dict[k]=v
data[name_dict]=sub_dict
Result:
>>>data
{'d1': {'p': 74.15, 'h': 18.0, 't': 6.0, 'm': 0.1}, 'd2': {'p': 54.0, 'h': 378.0, 't': -0.14, 'm': 0.1}}
>>>data['d1']['p']*data['d2']['p']
4004.1000000000004

Python: Dictionary of Lists error

I have this code:
def PlayerStats(self,LSTPlayers,actualround):
count = 0
statDict = defaultdict(list)
Stats = CStats.CStats('Cricket',self.Logs) #create instance
statDict = Stats.WritetoDict()
for objplayer in LSTPlayers:
if Stats.FindPlayerIndex(statDict,"Player",objplayer.PlayerName) == -1: #if there is no player in the stats csv file
self.Logs.Log("ERROR","Cannot find {} in the stats file! I will add them.".format(objplayer.PlayerName))
for key in statDict: count += 1
self.Logs.Log("ERROR","Count of keys is: {}".format(count))
statDict[count+1].append(objplayer.PlayerName,objplayer.GetTotalTouch,actualround,Stats.MPR(objplayer.PlayerName))
else:
self.Logs.Log("DEBUG","Updating {}'s stats!".format(objplayer.PlayerName))
And i keep getting the error: IndexError: list assignment index out of range.
in line 5, statDict is being updated with a defaultdict(list) from another class that reads a csv file and returns the dictionary of lists.
Line 7 looks for the player name in the dictionary, which if it doesn't find, returns -1, otherwise returns index of player.
I was looking to add a new list to the dictionary of lists with information if the player wasn't in there already.
Update-- Here is my code for the writetodict fuction in CStats file:
def WritetoDict(self):
with open(self.pathfile) as f:
self.PlayerStatDict = [{k: str(v) for k, v in row.items()}
for row in csv.DictReader(f, skipinitialspace=True)]
return self.PlayerStatDict
PlayerStatDict is defined as a defaultdict(list) in the CStats file.

How do I merge two csv files based on the words in a given list?

I have two csv files, each of them are in this format,
file1
zip name score
23431 david 12
23231 rob 45
33441 hary 23
98901 rgrg 55
file2
zip1 name1 score1
23433 david 12
23245 stel 45
33478 hary 23
98988 rob 55
12121 jass 33
and I have a list that has the names, like this
lista = ['harry', 'rob', 'wine', 'david', 'jass']
The final csv file should look like this:
name zip score zip1 score1
harry x x x x
rob 23231 45 98988 55
wine x x x x
david 23431 12 23433 12
jass x x 12121 33
that means, if any name from the list lies in either of the csv files, than we should include it in the new csv file along with its zip and score. Otherwise we should print 'x' in it.
This is what I have done so far:
import csv
with open('file1.csv', 'r') as input1, open('file2.csv', 'r') as input2, open('merge_final.csv', 'w') as output:
writer = csv.writer(output)
reader1 = csv.reader(input1)
eader2 = csv.reader(input2)
lista = ['harry', 'rob', 'wine', 'david', 'jass']
writer.writerow(['name','zip','score','zip1','score'])
for i in lista:
for row in list(reader1):
rev = row[1]
if i in rev:
score = row[2]
zip = row[0]
else:
score = 'x'
zip = 'x'
for row in list(reader2):
rev = row[1]
if i in rev:
score1 = row[2]
zip1 = row[0]
else:
score1 = 'x'
zip1 = 'x'
writer.writerow([i, score, zip, score1, zip1])
This code is not working as expected. This is the output I got using this code.
name zip score zip1 score1
harry x x x x
rob x x x x
wine x x x x
david x x x x
jass x x x x
Even thought there are many common words, only 'x' gets printed in the final merged csv file. I think the problem is with the loops. But, I don't seem to figure out the issue.
First, the first call of list(readerX) exhausts the iterator that is the file handle.
Secondly, rev is supposed to be the name already, so check for equality not contains: if name == rev.
Thirdly, you'd mostly get 'x's except for the last names in each file since you iterate the files to the end and only the last row will really matter. You should break the inner loops as soon as you find a name, but set the default values only after you iterated the entire file without finding the name.
Also, it is very bad performance-wise to repeatedly iterate both files. You better load the two files into a permanent data structure with faster lookup like a nested dict with names as keys:
d1 = {row[1]: {'zip': row[0], 'score': row[2]} for row in reader1}
d2 = {row[1]: {'zip': row[0], 'score': row[2]} for row in reader2}
# {'david': {'zip': 23431, 'score: 12, ...}
for name in lista:
if name in d1 or name in d2:
writer.writerow([
name,
d1.get(name, {}).get('zip', 'x'),
d1.get(name, {}).get('score', 'x'),
d2.get(name, {}).get('zip', 'x'),
d2.get(name, {}).get('score', 'x'),
])
To make your own approach work, change as follows, but note that this has terrible performance for larger data because of the nested loops:
# next(reader1) # skip the header line if necessary
lst1 = list(reader1) # load all the data into a list beforehand ...
for i in lista:
for row in lst1: # ... that you can repeatedly iterate
rev = row[1]
if i == rev: # compare for equality
score = row[2]
zip = row[0]
break # <- you found the name, so end the loop!
else: # note the indentation: this is a for-else-loop, not an if-else
# the else-part is only executed if the for loop was NOT break'ed
score = 'x'
zip = 'x'
Don't you think it would be better to read the two files into a single nested dictionary where the names would be the keys and the values would be dictionaries with keys 'zip,'zip1','score' and 'score1'?
{'hary' :
{'zip1':33478,
'zip':33441 ,
'score':23,
'score1':23 }
}
Then iterate through the list and print 'x' for whatever the keys are not present
The errors in above code:
The first loop of list(reader1) exhausts the iterator that is the file handle. So when the next iteration for 'lista' starts the reader1 is empty with len=0.
Instead of repeatedly iterating the file handle store data in a list or dictionary.
When the i matches with rev in case of if(i in rev) you are continuing to iterate over rest of the file which causes the value of zip and score to be reset to 'x' as for the next iteration "i in rev" will give False. You need to remove the else part to rectify this. Instead declare zip,score as 'x' just after for i in lista:

Categories

Resources