Python: Dictionary of Lists error - python

I have this code:
def PlayerStats(self,LSTPlayers,actualround):
count = 0
statDict = defaultdict(list)
Stats = CStats.CStats('Cricket',self.Logs) #create instance
statDict = Stats.WritetoDict()
for objplayer in LSTPlayers:
if Stats.FindPlayerIndex(statDict,"Player",objplayer.PlayerName) == -1: #if there is no player in the stats csv file
self.Logs.Log("ERROR","Cannot find {} in the stats file! I will add them.".format(objplayer.PlayerName))
for key in statDict: count += 1
self.Logs.Log("ERROR","Count of keys is: {}".format(count))
statDict[count+1].append(objplayer.PlayerName,objplayer.GetTotalTouch,actualround,Stats.MPR(objplayer.PlayerName))
else:
self.Logs.Log("DEBUG","Updating {}'s stats!".format(objplayer.PlayerName))
And i keep getting the error: IndexError: list assignment index out of range.
in line 5, statDict is being updated with a defaultdict(list) from another class that reads a csv file and returns the dictionary of lists.
Line 7 looks for the player name in the dictionary, which if it doesn't find, returns -1, otherwise returns index of player.
I was looking to add a new list to the dictionary of lists with information if the player wasn't in there already.
Update-- Here is my code for the writetodict fuction in CStats file:
def WritetoDict(self):
with open(self.pathfile) as f:
self.PlayerStatDict = [{k: str(v) for k, v in row.items()}
for row in csv.DictReader(f, skipinitialspace=True)]
return self.PlayerStatDict
PlayerStatDict is defined as a defaultdict(list) in the CStats file.

Related

TF-IDF calculation KeyError

I want to calculate document frequencies of text documents. First I created the term dictionary and calculated the term frequencies. I have no problems in these steps, but when I try to use the function below it gives an error:
def computeDF(docList):
df = {}
df = dict.fromkeys(docList[0].keys(), 0)
for doc in docList:
for word, val in doc.items():
if val > 0:
df[word] += 1
for word, val in df.items():
df[word] = float(val)
return df
Called the function like this:
dictList = []
for i in range(N):
# creating dictionary for all documents
tokens = processed_text[i]
dictionary = dict.fromkeys(tokens,0)
# calculation of term frequencies for all documents
for word in tokens:
dictionary[word] += 1
tf = termFreq(dictionary, tokens)
dictList.append(dictionary)
df = computeDF(dictList)
I called the function with list of 10 dictionaries, because it works with list object.
N = 10 (num of documents)
dictList continues like this: dictList
Error:
line 155, in <module> df = computeDF(dictList)
line 134, in computeDF df[word] += 1
KeyError: 'flagstaff'
It works when I try the function in different python file with same object types. I don't understand what is the problem. How can I solve this?
Where you have df = dict.fromkeys(docList[0].keys(), 0) you need something like
keys = set()
for doc in docList:
keys = keys.union(set(doc.keys()))
df = dict.fromkeys(docList[0].keys(), 0)
That way you have keys for all your docs not just the first one. If you want todo it in one line you can do it like this:
keys = set().union(*[set(doc.keys()) for doc in docList])

sort a dictionary by the first letter of its keys

I wrote a code that reads a whole DNA genome and returns a dictionary o all the 8-primers with their locations, i want to loop through this dictionary and sort these codons into 4 other dictionaries based on the letter they start with A,T,G and C.
But I couldn't figure out how to check the first letter of each key.
This is my code:
"""
Generating all the possible 8-codon primers.
saving them in a text file with their locations.
"""
import csv
##MAIN FUNCTION:
def k_mer(Text, k):
dictionary = {}
for i in range (len(Text) - k + 1):
if(Text[i: i+k] in dictionary):
dictionary[Text[i: i+k]].append(i)
else:
dictionary[Text[i: i+k]] = [i]
return dictionary
##INPUT:
# open the file with the original sequence
myfile = open('Vibrio_cholerae.txt')
# set the file to the variable Text to read and scan
Text = myfile.read()
result = k_mer(Text.strip(), 8)
with open("result.txt","w") as f:
from collections import Counter
wr = csv.writer(f,delimiter=":")
wr.writerows(Counter(result).items())
Given your dictionary, it's just this. It's not a complicated problem.
groupby = {'A':{}, 'C':{}, 'G':{}, 'T':{} }
for k,v in dictionary.items():
groupby[k[0]][k] = v

Sorting algorithm help in python

I've been playing around with a program that will take in information from two files and then write the information out to a single file in sorted order.
So what i did was store each line of the file as an element in a list. I create another function that splits each element into a 2d array where i can easily access the name variables. From there i want to create a nested for loop that as it iterates it checks for the highest value in the array, removes the value from the list and appending it to a new list until there's a sorted list.
I think I am like 90% of the way there, but I am having trouble wrapping my head around the logic of sorting algorithms. It seems like the problem just keeps getting more complex and i keep wanting to use pointers. If someone could help shine some light on the subject I would greatly appreciate it.
import os
from http.cookiejar import DAYS
from macpath import split
# This program reads a given input file and finds its longest line.
class Employee:
def __init__(self, EmployeeID, name, wage, days):
self.EmployeeID = EmployeeID
self.name = name
self.wage = wage
self.days = days
def Extraction(file,file2):
employList = []
while True:
line1 = file.readline().strip()
line2 = file2.readline().strip()
#print(type(line1))
employList.append(line1)
#print(line1)
employList.append(line2)
#print(line2)
if line1 == '' or line2 == '':
break
return employList
def Sort(mylist):
splitlist = []
sortedlist = []
print(len(mylist))
for items in range(len(mylist)):
#print(mylist[items].split())
splitlist.append(mylist[items].split())
print(splitlist)
#print(splitlist[1][1])
#print(splitlist[1][2])
highest = "z"
print(highest)
sortingLength = len(splitlist)
for i in range(10):
for items in range(len(splitlist)-2):
if highest > splitlist[items][2]:
istrue = highest < splitlist[items][2]
highest = splitlist[items][1]
print(items)
print(istrue)
print('marker')
print(splitlist[items][2])
if items == (len(splitlist)-2):
print("End of list",splitlist[items][2])
print(highest)
print(splitlist.index(highest))
print(splitlist[len(splitlist)-1][2])
print(sortingLength)
fPath = 'C:/Temp'
fileName = 'payroll1.txt'
fullFileName = os.path.join(fPath,fileName)
fileName2 = 'payroll2.txt'
fullFileName2 = os.path.join(fPath,fileName2)
f = open(fullFileName,'r')
f2 = open(fullFileName2, 'r')
employeeList = Extraction(f,f2)#pulling out each line in the file and placing into a list
Sort(employeeList)
ReportName= "List of Employees:"
marker = '-'* len(ReportName)
print (ReportName + ' \n' + marker)
total = 0
f.close()
I am having trouble with once having the higest value trying to append that value to a sortedlist, removing the value from the splitlist, and re running the code.
Using the sorted method is much easier and already built-in, per Joran's suggestion. I've edited your reading method so that it builds two lists of tuples, representing the line and the length of the line. The sorted method will return a list sorted according to the key (line length) and descending order (reverse=True)
from operator import itemgetter
class Employee:
def __init__(self, EmployeeID, name, wage, days):
self.EmployeeID = EmployeeID
self.name = name
self.wage = wage
self.days = days
def Extraction(file,file2):
employList = []
mylines = [(i, len(l.strip()), 'file1') for i,l in enumerate(file.readlines())]
mylines2 = [(i, len(l.strip()), 'file2') for i,l in enumerate(file2.readlines())]
employList = [*mylines, *mylines2]
return employList
fPath = 'C:/Temp'
fileName = 'payroll1.txt'
fullFileName = os.path.join(fPath,fileName)
fileName2 = 'payroll2.txt'
fullFileName2 = os.path.join(fPath,fileName2)
f = open(fullFileName,'r')
f2 = open(fullFileName2, 'r')
employeeList = Extraction(f,f2)#pulling out each line in the file and placing the line_number and length into a list
f.close()
f2.close()
# Itemgetter will sort on the second element of the tuple, len(line)
# and reverse will put it in descending order
ReportName = sorted(employeeList, key=itemgetter(1), reverse=True)
EDIT: I've added markers in the tuples so that you can keep track of what lines came from what file. Might be a bit confusing without them

python: recursive dictionary of dictionary

I need help with a pretty simple exercise I am trying to execute, just syntactically I'm a bit lost
basically I read in a very brief text file containing 15 lines of 3 elements (essentially 2 keys and a value)
put those elements into a dictionary comprised of dictionaries
the 1st dictionary contains location and the 2nd dictionary which is made up of the type of the item and how much it costs for example
gymnasium weights 15
market cereal 5
gymnasium shoes 50
saloon beer 3
saloon whiskey 10
market bread 5
which would result in this
{
'gymnasium': {
'weights': 15,
'shoes': 50
},
'saloon': {
'beer': 3,
'whiskey': 10
}
}
and so on for the other keys
basically I need to loop through this file but I'm struggling to read in the contents as a dict of dicts.
moreover without that portion i cant figure out how to append the inner list to the outer list if an instance of the key in the outer list occurs.
I would like to do this recursively
location_dict = {} #row #name day weight temp
item_dict = {}
for line in file:
line = line.strip()
location_dict[item_dict['location'] = item_dict`
this is a good use for setdefault (or defaultdict)
data = {}
for line in file:
key1,key2,value = line.split()
data.setdefault(key1,{})[key2] = value
print data
or based on your comment
from collections import defaultdict
data = defaultdict(lambda:defaultdict(int))
for line in file:
key1,key2,value = line.split()
data[key1][key2] += value
print data
Here is another solution.
yourFile = open("yourFile.txt", "r")
yourText = yourFile.read()
textLines = yourText.split("\n")
locationDict = {}
for line in textLines:
k1, k2, v = line.split(" ")
if k1 not in locationDict.keys():
locationDict[k1] = {}
else:
if k2 not in locationDict[k1].keys():
locationDict[k1][k2] = int(v)
else:
locationDict[k1][k2] += int(v)
print locationDict
Hope it helps!

python loop dictionary value references updating all values

I am having a problem updating values in a dictionary in python. I am trying to update a nested value (either as an int or list) for a single fist level key, but instead i update the values, for all first level keys.
I start by creating the dictionary:
kmerdict = {}
innerdict = {'endcover':0, 'coverdict':{}, 'coverholder':[], 'uncovered':0, 'lowstart':0,'totaluncover':0, 'totalbases':0}
for kmer in kmerlist: # build kmerdict
kmerdict [kmer] = {}
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
kmerdict [kmer][chromnum] = innerdict
Then i am walking through chromosomes (as plain text files) from a list (fas, not shown), and taking 7mer strings (k=7) as the key. If that key is in a list of keys i am looking for (kmerlist) and trying to use that to reference a single value nested in the dictionary:
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
p = 0 #chromosome position counter
thisfile = "/var/store/fa/" + chrom
thischrom = open(thisfile)
thischrom.readline()
thisline = thischrom.readline()
thisline = string.strip(thisline.lower())
l=0 #line counter
workline = thisline
while(thisline):
if len(workline) > k-1:
thiskmer = ''
thiskmer = workline[0:k] #read five bases
if thiskmer in kmerlist:
thisuncovered = kmerdict[thiskmer][chromnum]['uncovered']
thisendcover = kmerdict[thiskmer][chromnum]['endcover']
thiscoverholder = kmerdict[thiskmer][chromnum]['coverholder']
if p >= thisendcover:
thisuncovered += (p - thisendcover)
thisendcover = ((p+k) + ext)
thiscoverholder.append(p)
elif p < thisendcover:
thisendcover = ((p+k) + ext)
thiscoverholder.append(p)
print kmerdict[thiskmer]
p += 1
workline = workline[1:]
else:
thisline = thischrom.readline()
thisline = string.strip(thisline.lower())
workline = workline+thisline
l+=1
print kmerdict
but when i print the dictionary, all "thiskmer" levels are getting updated with the same values. I'm not very good with dictionaries, and i can't see the error of my ways, but they are profound! Can anyone enlighten me?
Hope i've been clear enough. I've been tinkering with this code for too long now :(
confession -- I haven't spent the time to figure out all of your code -- only the first part. The first problem you have is in the setup:
kmerdict = {}
innerdict = {'endcover':0, 'coverdict':{}, 'coverholder':[], 'uncovered':0,
'lowstart':0,'totaluncover':0, 'totalbases':0}
for kmer in kmerlist: # build kmerdict
kmerdict [kmer] = {}
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
kmerdict [kmer][chromnum] = innerdict
You create innerdict once and then proceed to use the same dictionary over an over again. In other words, every kmerdict[kmer][chromnum] refers to the same objects. Perhaps changing the last line to:
kmerdict [kmer][chromnum] = copy.deepcopy(innerdict)
would help (with an appropriate import of copy at the top of your file)? Alternatively, you could just move the creation of innerdict into the inner loop as pointed out in the comments:
def get_inner_dict():
return {'endcover':0, 'coverdict':{}, 'coverholder':[], 'uncovered':0,
'lowstart':0,'totaluncover':0, 'totalbases':0}
kmerdict = {}
for kmer in kmerlist: # build kmerdict
kmerdict [kmer] = {}
for chrom in fas: #open file and read line
chromnum = chrom[3:-3]
kmerdict [kmer][chromnum] = get_inner_dict()
-- I decided to use a function to make it easier to read :).

Categories

Resources