Coercing Dict to Read List as Tuples - python

I have an existing dict that has keys but no values. I would like to populate the values by iterating over two lists at the same time like so:
for (pair,name) in enumerate(zip([[0,1],[0,2],[0,3],[1,2],[1,3],[2,3]], ['pair1','pair2','pair3','pair4','pair5','pair6'])):
my_dict[tuple(name)] = pair
However I get the error: unhashable type: list.
So it seems my attempt to cast the list as a tuple doesn't work. I choose tuple because, according to what I read from other posts is a better way to go.
Can someone adjust this method to work as desired? I'm also open to other solutions.
Update
I will take the blame for not putting my whole function in the post. I thought being more concise would make things easier to understand, but in the end some important details were overlooked. Sorry for that. I'm working with numpy and sklearn Here is my whole function:
pair_names = ['pair1','pair2','pair3','pair4','pair5','pair6']
pair_dict = {p:[] for p in pair_names}
for (pair,key) in zip([[0,1],[0,2],[0,3],[1,2],[1,3],[2,3]], ['pair1','pair2','pair3','pair4','pair5','pair6']):
x = iris.data[:,pair]
y = iris.target
clf = DecisionTreeClassifier().fit(x,y)
decision_boundaries = decision_areas(clf,[0,7,0,3])
pair_dict[key] = decision_boundaries
Going on the suggestions from the answers to this question so far, I removed enumerate and simply used zip. Unfortunately, now on the line clf = DecisionTreeClassifier().fit(x,y) I get an error:number of samples does not match number of labels. Which I find odd, because I didnt change the sample size at all. My only guess is it has something to do with enumerate or zip -- because that is the only difference from the original function from the documentation example

Maybe what you want is:
{ tuple(x):y for (x,y) in zip([[0,1],[0,2],[0,3],[1,2],[1,3],[2,3]], ['pair1','pair2','pair3','pair4','pair5','pair6'])}

Related

Iterating over array and slicing or making changes in Python

I'm about to pull my hair out on this. I'm not sure why the index in my array is not being implemented in the second column.
I created this array - project_information :
project_information.append([proj_id,project_text])
When I print this out, I get the rows and columns. It contains about 40 rows.
When I iterate through it to print out the contents, everything comes out fine. I am using this:
for i in range(0,len(project_information)):
project_id = project_information[i][0]
project_text = project_information[i][1]
print(project_id)
print (project_text)
The project_text column contains text, while the project_id contains integers. It prints out perfectly, and the index, changes for both project_id and project_text.
However, I need to use the project_text in a different way, and I am really struggling with this. I need to slice the text to a shorter text for reuse. To do this, I tried:
for i in range(0,len(project_information)):
project_id = project_information[i][0]
project_text = project_information[i][1]
print(project_id)
print (project_text)
if len(project_text) > 5000:
trunc_proj_text = project_text[:1000]
else:
trunc_proj_text = project_text
print (project_id)
print(trunc_proj_text)
The problem I'm having here is that though the project_id column is being iterated through properly, the project_text is not. What I am getting is just the text in the first row for the project_text, sliced, and repeated for as many times as the length of the array.
I have tried different ways, and also a while loop, but it is still not working.
I've also looked at these answers for reference - Slicing,indexing and iterating over 2D Numpy arrays,Efficient iteration over slice in Python, iteration over list slices, and I can't seem to see how they can be applied to my problem.
I'm not well-versed in using Numpy, so is this something that it could help with? I'm well aware this might be simple and I'm missing it because I've been working on various aspects of this project for the past weeks, so I would appreciate a bit of consideration in this.
Thanks in advance.
The problem was with the input list here, so the slicing with this code does in fact work. The code to create the input array has now been fixed. The original code to create the input list was concatenating the strings for each entry, so the project_texts for each appeared different from the end, but all had the same beginning. But viewing this on a console, it was hard to see.

Convert numpy.ndarray to list (Python)

I am running a function developed by Esri to get list of values in a integer column of a spatial table (however, the same behaviour is observed even when running the function on a non-spatial table). According to the help, I should get NumPy structured array. After running the function, I have a numpy array. I run print in this format:
in_table = r"C:\geodb101#server.sde\DataTable" #
data = arcpy.da.TableToNumPyArray(in_table, "Field3")
print data
Which gives me back this in IDE (copy/pasted from IDE interpreter):
[(20130825,) (20130827,) (20130102,)]
I am running:
allvalues = data.tolist()
and getting:
[(20130825,), (20130827,), (20130102,)]
Same result when running data.reshape(len(data)).tolist() as suggested in comments.
Running type() lets me know that in the first case it is <type 'numpy.ndarray'> and in the second case <type 'list'>. I am expecting to get my output list in another format [20130825, 20130827, 20130102]. What am I doing wrong or what else should I do to get the output list in the specified format?
I have a possible approach, but I'm not 100% sure it will work, as I can't figure out how you got tuples into an array (when I tried to create an array of tuples, it looks like the tuples got converted to arrays). In any case, give this a shot:
my_list = map(lambda x: x[0], my_np_array_with_tuples_in_it)
This assumes you're dealing specifically with the single element tuples you describe above. And like I said, when I tried to recreate your circumstances, numpy did some conversion moves that I don't fully understand (not really a numpy expert).
Hope that helps.
Update: Just saw the new edits. Not sure if my answer applies anymore.
Update 2: Glad that worked, here's a bit of elaboration.
Lambda is basically just an inline function, and is a construct common in a lot of languages. It's essentially a temporary, anonymous function. You could have just as easily done something like this:
def my_main_func():
def extract_tuple_value(tup):
return tup[0]
my_list = map(extract_tuple_value, my_np_array_with_tuples_in_it)
But as you can see, the lambda version is more concise. The "x" in my initial example is the equivalent of "tup" in the more verbose example.
Lambda expressions are generally limited to very simple operations, basically one line of logic, which is what is returned (there is no explicit return statement).
Update 3: After chatting with a buddy and doing some research, list comprehension is definitely the way to go (see Python List Comprehension Vs. Map).
From acushner's comment below, you can definitely go with this instead:
my_list = [tup[0] for tup in my_np_array_with_tuples_in_it]

Python- How to find the height of standard binary search tree?

I've a json file of type [{"score": 68},{"score": 78}]
I need to find the height of standard binary search tree that is made using the scores of all the objects. How can I do it?
This is what I'm doing. I'm first getting all the scores and storing inside the json file and then applying the formula.
import ijson
import math
f = open ('data_large')
content = ijson.items(f, 'item')
n = len(list(i['score'] for i in content))
height = math.ceil(math.log((n+1),2)-1)
print height
Well, this does gives me the correct answer, but wanted to know 2 things?
1) Whether this formula will also be valid in case when there are duplicates in the list, since I need to develop a BST which can have duplicates as well?
2) I think n = len(list(i['score'] for i in content)) is useless because since I dont need the node values to calculate the height of the BST, but only the length of the list. Is there any way I can calculate the number of entries so that I omit this line and calculate the total nuber of entries in the json file, which will serve the purpose of n?
The other thing is I also wanted to calculate is the unique scores as well from the file. So, this is how I'm doing print set(i['score'] for i in content) , but it takes 201secs to execute since the file is so large( 256MB, hence used ijson for fast processing ), hence there are multiple entries inside the content. Can I make this statement much more time-efficient. If yes, How?
1) Yes/no. If you've added a property to each node which counts the number of times that node has been inserted into the tree then you still have a BST and the answer is yes.
If you actually want duplicate nodes, then you'd need modify the BST property. The unmodified property says that items smaller than X go left and items greater than X go right. If you instead say items greater than or equal to X to right, then it's easy to see that you can make the tree arbitrarily high by adding many duplicate items and the answer is no.
2) Have you tried list(content)? Of course, you cannot build a BST without somehow removing duplicate nodes, this cannot be used to calculate the height of the BST. You need to remove duplicate items. This leads to your third question.
3) As regards the print set(i['score']... You shouldn't bundle separate questions together like this as it will lead you down the dark path to having different answers addressing different parts of your question. However, the code you've write certainly does have something Pythonic about it. So you've got to ask yourself if it's really worth your time (which is often the only time that really matters) to try to find a more convoluted, but quicker solution.

How to create a function in Python that makes objects (i.e lists)

I couldn't find a guide that would help me out in this area. So I was hoping somebody could help me explain this kind of programming in Python. I am trying to write a code that goes something like this:
def Runner():
for G in range(someRange):
makeListObjectcalled 'ListNumber'+'G'
ListNumberg.append(G*500000 or whatever)
print ListNumberG
#so I would have a someRange amount of lists
#named 0,1,2,3...(up to someRange) I could look through
I think it can be done with classes (in fact I'm guessing thats what they're for...) but I'm not sure. Could someone lay me down some clarifications please?
It looks like what you really want is a list of lists.
def Runner():
Lists = []
for G in range(someRange):
Lists[G] = []
Lists[G].append(G*500000 or whatever)
print Lists[G]
#This way, you have Lists[0], Lists[1], ..., Lists[someRange]
You want to dynamically create variables of type lists that store an array of values.
An easier and better approach (than juggling unknown variable names) is to use a dictionary to keep your lists in, so you can look them up by name/key:
(pseudo code, don't have my Python interpreter with me)
# create a dictionary to store your ListNumberG's
dict_of_lists = {}
# down the line in your loop, add each generated list to the dict:
dict_of_lists['ListNumberG'] = ListNumberG
Later you can find a list by it's name/key via
print(dict_of_lists['ListNumberG'])
or loop through them
for idx in range(bestguess):
print(dict_of_lists['ListNumber%s' % (idx,)])

How to Sort Arrays in Dictionary?

I'm currently writing a program in Python to track statistics on video games. An example of the dictionary I'm using to track the scores :
ten = 1
sec = 9
fir = 10
thi5 = 6
sec5 = 8
games = {
'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"],
'nethack': [fir+fir+fir+sec+thi5, "Nethack"]
}
Right now, I'm going about this the hard way, and making a big long list of nested ifs, but I don't think that's the proper way to go about it. I was trying to figure out a way to sort the dictionary, via the arrays, and then, finding a way to display the first ten that pop up... instead of having to work deep in the if statements.
So... basically, my question is : Do you have any ideas that I could use to about making this easier, instead of wayyyy, way harder?
===== EDIT ====
the ten+fir produces numbers. I want to find a way to go about sorting the lists (I lack the knowledge of proper terminology) to go by the number (basically, whichever ones have the highest number in the first part of the array go first.
Here's an example of my current way of going about it (though, it's incomplete, due to it being very tiresome : Example Nests (paste2) (let's try this one?)
==== SECOND EDIT ====
In case someone doesn't see my comment below :
ten, fir, et cetera - these are just variables for scores. Basically, it goes from a top ten list into a variable number.
ten = 1, nin = 2, fir = 10, fir5 = 10, sec5 = 8, sec = 9...
so : 'adom': [ten+fir+sec+sec5, "Ancient Domain of Mysteries"] actually registers as : 'adom': [1+10+9+8, "Ancient Domain of Mysteries"] , which ends up looking like :
'adom': [28, "Ancient Domain of Mysteries"]
So, basically, if I ended up doing the "top two" out of my example, it'd be :
((1)) Nethack (48)
((2)) ADOM (28)
I'd write an actual number, but I'm thinking of changing a few things up, so the numbers might be a touch different, and I wouldn't want to rewrite it.
== THIRD (AND HOPEFULLY THE FINAL) EDIT ==
Fixed my original code example.
How about something like this:
scores = games.items()
scores.sort(key = lambda key, value: value[0])
return scores[:10]
This will return the first 10 items, sorted by the first item in the array.
I'm not sure if this is what you want though, please update the question (and fix the example link) if you need something else...
import heapq
return heapq.nlargest(10, games.iteritems(), key=lambda k, v: v[0])
is the most direct way to get the top ten key / value pairs, sorted by the first item of each "value" list. If you can define more precisely what output you want (just the names, the name / value pairs, or what else?) and the sorting criterion, this is easy to adjust, of course.
Wim's solution is good, but I'd say that you should probably go the extra mile and push this work off onto a database, rather than relying on Python. Python interfaces well with most types of databases, where much of what you're exploring is already a solved problem.
For example, instead of worrying about shifting your dictionaries to various other data types in order to properly sort them, you can simply get all the data for each pertinent entry pre-sorted based on the criteria of your query. There goes the need for convoluted sorting and resorting right there.
While dictionaries are tempting to use, because they give the illusion of database-like abilities to access data based on its attributes, I still think they stumble quite a bit with respect to implementation. I don't really have any numbers to throw at you, but just from personal experience, anything you do on Python when it comes to manipulating large amounts of data, you can do much faster and more efficient both in code and computation with something like MySQL.
I'm not sure what you have planned as far as the structure of your data goes, but along with adding data, changing its structure is a lot easier using a database, too.

Categories

Resources