Search keys and its values using defaultdict - python

I am new to python, so correct me if this is not the best/fastest way of doing this. I have created a dictionary with multiple values assigned to each key. In codonDict I have included only 1 key with a few of its values (there will be a lot more). Now I have a file which I have called calls here. What I want to do is find the key that corresponds to the #CHROM in the file and then search through the keys values to see if it contains the corresponding POS.
codonDict = defaultdict(<type 'list'>, {'HE667775': [106690, 106692, 106694, 106696, 106698, 106700, 106702, 106704, 106706, 106708, 106710, 106712, 106714, 106716, 106718, 106720, 106722, 106724, 106726, 106728, 106730, 106732, 106734, 106736, 106738, 106740, 106742, 106744, 106746, 106748, 106750, 106752, 106754, 106756, 106758, 106760, 106762, 106764, 106766, 106768, 106770, 106772, 106774, 106776, 106778, 106780, 106782, 106784, 106786, 106788, 106790, 106792, 106794, 106796, 106798, 106800, 106802, 106804, 106806, 106808, 106810, 106812, 106814, 106816, 106818, 106820, 106822, 106824, 106826, 106828, 106830, 106832, 106834, 106836]})
calls file:
#CHROM POS
HE667775 106824
HE667775 24
So from this sample data the desired output would be that HE667775 106824 which gets append to test
What I have tried:
test = []
line = calls.readline()
while len(line) > 1:
#for line in calls:
objects = line.split()
pos = int(objects[1])
chrom = objects[0]
#if scaf in codonDict and pos associated with that key
for scaf, position in codonDict.itervalues():
if pos == position and chrom in scaf:
test.append(line)
print test
Error:
ValueError: too many values to unpack
Edit:
This is the complete error traceback, however the lines differ, so line 28 in the above code would be I believe pos = int(objects[1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 28, in main
ValueError: too many values to unpack

To check if the pos from your file is in the condonDict no loop is required, you can use python in to check for membership by:
pos in condonDict[chrom]

So i don' t know exactly what your code is doing I' m pretty sure you get the ValueError because of this line of code:
for scaf, position in codonDict.itervalues()
itervalues gives you an iterator over the values of your dictionary. In your case this is a list. But you can' t unpack two variables scaf and position.
Try it this way and there should be no ValueError anymore:
for val in codonDict.itervalues()

To check if chrom is in codonDict just use in like dm03514 wrote before. I could imagine something like this with codonDict as an ordinary dictionary:
def find(chrom, pos):
if chrom in codonDict:
values = codonDict[chrom]
if pos in values:
print "%i in %s" % (pos, chrom)
else:
print "%i not in %s" % (pos, chrom)
else:
print "chrom %s not found" % chrom

Related

Code works for a small selection but not the entire database

I have a similar problem to this one.
I am working on Qgis. To speed things up, I've created a small selection of my map on which I test my code. It works great. Here is the section that poses problem later :
layer = qgis.utils.iface.activeLayer()
iter = layer.getFeatures()
dict = {}
#iterate over features
for feature in iter:
#print feature.id()
geom = feature.geometry()
coord = geom.asPolyline()
### GET FIRST AND LAST POINTS OF POLY + N ORIENTATION###
# Get Objective Orientation
d=QgsDistanceArea()
d.setEllipsoidalMode(True)
points=geom.asPolyline()
#second way to get Endpoints
first = points[0]
last = points[-1]
r=d.bearing(first, last)
b= "NorthOrientation= %s" %(math.degrees(r))
# Assemble Features
dict[feature.id() ]= [first, last]
### KEY = INTERSECTION, VALUES = COMMONPOINTS###
dictionary = {}
a = dict
for i in a:
for j in a:
c = set(a[i]).intersection(set(a[j]))
if len(c) == 1:
d = set(a[i]).difference(c)
c = list(c)[0]
value = list(d)[0] #This is where the problem is
if c in dictionary and value not in dictionary[c]:
dictionary[c].append(value)
elif c not in dictionary:
dictionary.setdefault(c, [])
dictionary[c].append(value)
else: pass
print dictionary
This code works for the 10 polylines of my small selection (which I've stored in a seperate shapefile). But when I try to run it though the 40 000 lines of my original database, I get the following Error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "c:/users/16116/appdata/local/temp/tmp96wd24.py", line 47, in <module>
value = list(d)[0]
IndexError: list index out of range
A few things:
This code stems from a first question that you can find here. I'm still pretty new to python so to be honest I have a hard time understanding how this exact part of the code works, but I know it does (at least for the small dataset).
The small "test selection"'s structure is identical to the entire database. Only the length has changed.
If anyone has had the same experience or knows why this problem occures, I would be very greatful for any indications.

KeyError with Python dictionary

I've been practicing on a ><> (Fish) interpreter and am stuck on an error I'm getting. The problematic code seems to be here:
import sys
from random import randint
file = sys.argv[1]
code = open(file)
program = code.read()
print(str(program))
stdin = sys.argv[2]
prgmlist = program.splitlines()
length = len(prgmlist)
prgm = {}
for x in range(0,length-1):
prgm[x+1] = list(prgmlist[x])
The goal here was to take the code and put it into a sort of grid, so that each command could be taken and computed separately. By grid, I mean a map to a list:
{line1:["code","code","code"]
line2:["code","code","code"]
line3:...}
and so on.
However, when I try to retrieve a command using cmd = prgm[y][x] it gives me KeyError: 0.
Any help is appreciated.
Here's a traceback:
Traceback (most recent call last):
File "/Users/abest/Documents/Python/><>_Interpreter.py", line 270, in <module>
cmd = prgm[cmdy][cmdx]
KeyError: 0
And a pastebin of the entire code.
The input is the hello world program from the wiki page:
!v"hello, world"r!
>l?!;o
Few issues -
You are not considering the last line , since your range is - for x in range(0,length-1): - and the stop argument of range is exlusive, so it does not go to length-1 . You actually do not need to get the length of use range, you can simply use for i, x in enumerate(prgmlist): . enumerate() in each iteration returns the index as well as the current element.
for i, x in enumerate(prgmlist, 1):
prgm[i] = list(x)
Secondly, from your actual code seems like you are defining cmdx initially as 0 , but in your for loop (as given above) , you are only starting the index in the dictionary from 1 . So you should define that starting at 1. Example -
stacks, str1, str2, cmdx, cmdy, face, register, cmd = {"now":[]}, 0, 0, 1, 0, "E", 0, None
And you should start cmdy from 0 . Seems like you had both of them reversed.
You'll want to use something like
cmd = prgm[x][y]
the first part prgm[x] will access the list that's the value for the x key in the dictionary then [y] will pull the yth element from the list.

Default Dict append Attribute Error 'float' object has no attribute 'append'

I have read all the script from default dict and all the posts on here. I believe my syntax is correct.
influenceDict = defaultdict(list)
to fill with all tags from all tweets
Later, I am appending ALOT of float values, 1000+ list entries for a majority of dictionary keys. I get my error on line 47, specified below.
def addInfluenceScores(hashtagArr,numFollowers,numRetweets, influenceDict):
influenceScore = float(0)
if numFollowers == 0 and numRetweets != 0:
influenceScore = numRetweets + 1
elif numFollowers == 0 and numRetweets == 0:
influenceScore = 0
else:
influenceScore = numRetweets / numFollowers
print "Adding influence score %f to individual hashtags" % (influenceScore)
for tag in hashtagArr:
tID = tag2id_map[tag]
print "Appending ",tID,tag
# if not influenceDict.has_key(tID):
# influenceDict[tID] = list()
# influenceDict[tID].append(influenceScore)
# else:
# influenceDict[tID].append(influenceScore)
influenceDict[tID].append(influenceScore) **#LINE 47 I GET THE ERROR HERE**
for i in range(len(hashtagArr)):
for j in range(i+1, len(hashtagArr)):
tID1 = tag2id_map[hashtagArr[i]]
tID2 = tag2id_map[hashtagArr[j]]
if(tID2 < tID1): #ensure alpha order to avoid duplicating (a,b) and (b,a)
temp = tID1
tID1 = tID2
tID2 = temp
print "Appending ",tID1, hashtagArr[j],tID2,hashtagArr[i]
# if not influenceDict.has_key((tID1, tID2)):
# influenceDict[(tID1, tID2)] = list()
# influenceDict[(tID1, tID2)].append(influenceScore)
# else:
# influenceDict[(tID1, tID2)].append(influenceScore)
influenceDict[(tID1, tID2)].append(influenceScore)
The program runs for a while, and it actually does append values (or so I think) and then I get this error:
Traceback (most recent call last):
File "./scripts/make_id2_influencescore_maps.py", line 158, in <module
processTweets(tweets, influenceDict)
File "./scripts/make_id2_influencescore_maps.py", line 127, in processTweets
addInfluenceScores(hashtags, numFollowers,numRetweets, influenceDict)
File "./scripts/make_id2_influencescore_maps.py", line 47, in addInfluenceScores
influenceDict[tID].append(influenceScore)
AttributeError: 'float' object has no attribute 'append'
I am thinking that the list is just maxed out in memory. Maybe you guys can see something I don't. I am trying to loop through a file of tweets and for everytime I see the hashtag I want to append a score to the list associated with it. That way I can just take the average of all the scores in the list when I am completely done reading the file. Thanks ahead.
I am thinking that the list is just maxed out in memory.
I can assure you thats not the case if your error is
AttributeError: 'float' object has no attribute 'append'
The problem is not in the code you have shown here, as influenceDict is a parameter you have obviously set one of the keys to point to a float value elsewhere in your code. Just because it is a defaultdict(list) that doesn't prevent this from occurring.

TypeError: coercing to Unicode: need string or buffer, Cell found

I try to use xlrd to load some records from excel and check the relationship.
Please refer to my amateur codes for more details:
import xlrd
feature_list_file = xlrd.open_workbook('FeatureList.xls')
feature_table = feature_list_file.sheet_by_index(0)
num_feature_rows = feature_table.nrows
num_feature_cols = feature_table.ncols
feature_list = []
for i in range(num_feature_rows):
feature_list.append(feature_table.cell(i, 1))
# print feature_list
issue_list_file = xlrd.open_workbook('IssueList.xls')
issue_table = issue_list_file.sheet_by_index(0)
num_issue_rows = issue_table.nrows
num_issue_cols = issue_table.ncols
epic_list = []
for i in range(num_issue_rows):
if issue_table.cell(i, 0).value == 'Epic':
epic_list.append(issue_table.cell(i, 1).value)
# print epic_list
def check_link(actual_link, parent_list):
result = True
for i in range(parent_list.__len__()):
count = 0
if parent_list[i] in actual_link:
count += 1
if count > 1:
result = False
break
return result
invalid_list = []
for i in range(num_issue_rows):
if issue_table.cell(i, 10).value == '':
invalid_list.append(issue_table.cell(i, 1).value)
else:
if issue_table.cell(i, 0).value == 'Story':
if check_link(issue_table.cell(i, 10).value, epic_list):
invalid_list.append(issue_table.cell(i, 1).value)
if issue_table.cell(i, 0).value == 'Epic':
if check_link(issue_table.cell(i, 10).value, feature_list):
invalid_list.append(issue_table.cell(i, 1).value)
print invalid_list
However, it always returns below messages
Traceback (most recent call last):
File "/Users/sut/PycharmProjects/ItemChecker/JiraItemChecker.py", line 54, in <module>
if check_link(issue_table.cell(i, 10).value, feature_list):
File "/Users/sut/PycharmProjects/ItemChecker/JiraItemChecker.py", line 36, in check_link
if parent_list[i] in actual_link:
TypeError: coercing to Unicode: need string or buffer, Cell found
How could I resolve this issue?
Thanks
you're trying to do an in on an xlrd.cell object, I think you need to change
if parent_list[i] in actual_link:
to
if parent_list[i].value in actual_link:
adding some more details to my answer based on your comment
Your call is passing what appears to be a string and a list of cells
the string is fine, but in your function you iterate over the list and try to compare each instance of the cell class to the correctly passed string, and there's your problem.
hope this helps (and I'm not misreading the code!)
and some more details -
here's the crux of the problem, when you create the feature_list, and the epic_list you are doing it in 2 different ways, on the first you are appending a cell to the list, on the second you are appending a value. BUT then you are using the same check_link function on both types. So either you need to extend your check_link function to handle both types, OR you need to be consistent and pick one.

Int Object Is Not Iterable

I have come across a problem that I don't know how to resolve involving Dijkstra's algorithm - here is my code:
infinity = 1000000
invalid_node = -1
#startNode = 0
class Node:
distFromSource = infinity
previous = invalid_node
visited = False
def populateNodeTable():
nodeTable = []
f = open("twoDArray.txt", "r")
for line in f.readlines(): #get number of nodes from file
nodeTable.append(line.split(',')) # Create matrix of weights
numNodes = len(nodeTable) # Count nodes
print numNodes
#for all nodes in text file, set visited to false, distFromSource to infinity & predecessor to none
**for i in numNodes:
nodeTable.append(Node(i))**
#nodeTable.append(Node())
nodeTable[startNode].distFromSource = 0
print nodeTable
if __name__ == "__main__":
populateArray()
populateNodeTable()
When I run this code I get the following error:
Traceback (most recent call last):
File "2dArray.py", line 63, in <module>
populateNodeTable()
File "2dArray.py", line 18, in populateNodeTable
for i in numNodes:
TypeError: 'int' object is not iterable
I am not sure how I rectify this error (the section between the asterix) - what I am trying to do is to read my text file which is just a series of integers separated by commas, and count the number of nodes within that text file
Each node will then be assigned the values in the Node class
Try this:
for i in nodeTable:
why are you trying to iterate over numNodes? You just defined one line above as the length of the table.
But appending to the same table in the loop doesn't make sense. And it does not work together with the code that reads the file. Also the Node class isn't usable at all ...
How about for i in range(numNodes) ... numNodes is just a number, not an array of numbers, which is what you are after.
If you want to iterate over the element indexes, use for i, _ in enumerate(nodeTable)
If you want to access the element itself, too, use a real name instead of _

Categories

Resources