change value dictionary by index - python

I have a dictionary dico like this :
id_OS (keys) : List of pages(values)
0 : [A, B]
1 : [C, D, E]
2 : [F, B]
3 : [G, A, B]
I would like to change it to this form
id_OS : List of index id_pages
0 : [0, 1]
1 : [2, 3, 4]
2 : [5, 1]
3 : [6, 0, 1]
I try this code, but i didnt got the correct index of values :
dico = dict(zip(range(len(dico)), zip(range(len(dico.values())))))
Any idea please to do it
Thanks

This should work:
letters = {0: ['A', 'B'], 1: ['C', 'Z']}
for key in letters:
new_list = []
for i in letters[key]:
i = i.lower()
new_list.append(ord(i) - 97)
letters[key] = new_list
I subtracted 97 instead of 96 (The reason why 96 is subtracted is well explained in this post: Convert alphabet letters to number in Python) because it seems like you want everything to be shifted so that A is 0 not 1 like it would usually be.
Output:
{0: [0, 1], 1: [2, 25]}

through your previous question I see that you could simplefy your task. I would change data['PageId'] type into categories and passed categories codes to the dictionary. Something like this:
data['codes'] = data['PageId'].astype('category').cat.codes
then change this line in your code:
dico[tup].append(row['PageId'])
into this:
dico[tup].append(row['codes'])

Related

Create an adjacency matrix using a dictionary with letter values converted to numbers in python

So I have a dictionary with letter values and keys and I want to generate an adjacency matrix using digits (0 or 1). But I don't know how to do that.
Here is my dictionary:
g = { "a" : ["c","e","b"],
"b" : ["f","a"]}
And I want an output like this :
import numpy as np
new_dic = {'a':[0,1,1,0,1,0],'b':(1,0,0,0,0,1)}
rows_names = ['a','b'] # I use a list because dictionaries don't memorize the positions
adj_matrix = np.array([new_dic[i] for i in rows_names])
print(adj_matrix)
Output :
[[0 1 1 0 1 0]
[1 0 0 0 0 1]]
So it's an adjacency matrix: column/row 1 represent A, column/row 2 represent B ...
Thank you !
I don't know if it helps but here is how I convert all letters to numbers using ascii :
for key, value in g.items():
nums = [str(ord(x) - 96) for x in value if x.lower() >= 'a' and x.lower() <= 'z']
g[key] = nums
print(g)
Output :
{'a': ['3', '5', '2'], 'b': ['6', '1']}
So a == 1 b == 2 ...
So my problem is: If a take the keys a with the first value "e", how should I do so that the e is found in the column 5 line 1 and not in the column 2 line 1 ? and replacing the e to 1
Using comprehensions:
g = {'a': ['c', 'e', 'b'], 'b': ['f', 'a']}
vals = 'a b c d e f'.split() # Column values
new_dic = {k: [1 if x in v else 0 for x in vals] for k, v in g.items()}

A column in my dataframe does not seem to correspond to the input List (python)

I want to assign one of the columns of my dataframe to a list. I used the code below.
listone = [['a', 'b', 'c'], ['m', 'g'], ['h'], ['y', 't', 'r']]
df['Letter combinations'] = listone
The 'Letter Combinations' column in the dataframe doesn't correspond to the list, instead seems to assign random elements to each row in the column. I was wondering if this method indexes the elements differently causing a change in the order or if there is something wrong with my code. Any help would be appreciated!
Edit: Here is my complete code
listone = [[a, b, c], [m, g], [h], [y, t, r]]
numbers = [1, 2, 3, 4]
my_matrix = {'Numbers': numbers}
sample = pd.DataFrame(my_matrix)
sample['Letter combinations'] = listone
sample
My output looks like:
```
Numbers Letter combination
0 1 [b]
1 2 [m, g]
2 3 []
3 4 [r]
```
You need to make the listone to be a series. Ie:
sample['Letter combinations'] = pd.Series(listone)
sample
Numbers Letter combinations
0 1 [a, b, c]
1 2 [m, g]
2 3 [h]
3 4 [y, t, r]

Get column numbers of dataframe where given condition holds

I have the following code:
raw_data = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(data=raw_data,
columns=["cA", "cB", "cC"])
wrong_indexes = df.loc[df['cA'] > 2 ]
print(wrong_indexes)
This prints:
cA cB cC
1 4 5 6
Instead of this, I would like to only get a list of indexes at which this condition holds, like so:
[1]
Any idea how I can do that?
wrong_indexes = df.loc[df['cA'] > 2 ].index.tolist()

Why is the index only changing when I use different values?

I just started programming in Python, and I can't figure out how to make the index change if I want the values in the list to be the same. What I want is for the index to change, so it will print 0, 1, 2, but all I get is 0, 0, 0. I tried to change the values of the list so that they were different, and then I got the output I wanted. But I don't understand why it matters what kind of values I use, why would the index care about what is in the list?
a = 0
b = 0
c = 0
d = 0
e = 0
f = 0
justTesting = [[a, b], [c, d], [e, f]]
for item in justTesting:
something = justTesting.index(item)
print (something)
I'm using python 3.6.1 if that mattters
Because each list (designated 'item' in your loop) is [0, 0] this means the line:
something = justTesting.index(item)
will look for the first instance of the list [0, 0] in the list for each 'item' during the iteration. As every item in the list is [0, 0] the first instance is at position 0.
I have prepared an alternative example to illustrate the point
a = 1
b = 2
c = 3
d = 4
e = 5
f = 6
justTesting = [[a, b], [c, d], [e, f]]
for item in justTesting:
print(item)
something = justTesting.index(item)
print(something)
This results in the following:
[1, 2]
0
[3, 4]
1
[5, 6]
2
It's because your list only contains [0, 0]!
So basically, if we replace all the variables with their values, we get:
justTesting = [[0, 0], [0, 0], [0, 0]]
And using .index(item) will return the first occurrence of item if any. Since item is always [0, 0] and it first appears at justTesting[0], you will always get 0! Try changing up the values in each list and try again. For example, this works:
b = [1, 2, 3, 4, 5, 6, 7, 8, 9]
for item in b:
print(b.index(item))
Which returns:
0, 1, 2, 3, 4, 5, 6, 7, 8
if the results were on a single line.
Try it here!
Read the documentation: the default for index is to identify the first occurence. You need to use the start parameter as well, updating as you go: search only the list after the most recent find.
something = justTesting.index(item, something+1)
That's because you are iterating over a list of lists.
Every item is actually a list, and you are executing list.index() method which returns the index of the element in the list.
This is a little tricky. Since you actually have 3 lists, of [0, 0] their values will be the same when testing for equality:
>>> a = 0
>>> b = 0
>>> c = 0
>>> d = 0
>>> ab = [a, b]
>>> cd = [c, d]
>>>
>>> ab is cd
False
>>> ab == cd
True
>>>
Now when you run list.index(obj) you are looking for the 1st index that matches the object. Your code actually runs list.index([0, 0]) 3 times and returns the first match, which is at index 0.
Put different values inside a, b, c lists and it would work as you expect.
Your code:
a = 0
b = 0
c = 0
d = 0
e = 0
f = 0
justTesting = [[a, b], [c, d], [e, f]]
for item in justTesting:
something = justTesting.index(item)
print (something)
is equivalent to:
a = 0
b = 0
c = 0
d = 0
e = 0
f = 0
ab = [a, b]
cd = [c, d]
ef = [e, f]
justTesting = [ab, cd, ef]
# Note that ab == cd is True and cd == ef is True
# so all elements of justTesting are identical.
#
# for item in justTesting:
# something = justTesting.index(item)
# print (something)
#
# is essentially equivalent to:
item = justTesting[0] # = ab = [0, 0]
something = justTesting.index(item) # = 0 First occurrence of [0, 0] in justTesting
# is **always** at index 0
item = justTesting[1] # = cd = [0, 0]
something = justTesting.index(item) # = 0
item = justTesting[2] # = ef = [0, 0]
something = justTesting.index(item) # = 0
justTesting does not change as you iterate and the first position in justTesting at which [0,0] is found is always 0.
But I don't understand why it matters what kind of values I use, why would the index care about what is in the list?
Possibly what is confusing you is the fact that index() does not search for occurrences of the item "in abstract" but it looks at the values of items in a list and compares those values with a given value of item. That is,
[ab, cd, ef].index(cd)
is equivalent to
[[0,0],[0,0],[0,0].index([0,0])
and the first occurrence of [0,0] value (!!!) is at 0 index of the list for your specific values for a, b, c, d, e, and f.

Trimming numpy arrays: what is the best method?

Consider the following code:
a = np.arange (1,6)
b = np.array(["A", "B", "C", "D", "E"])
c = np.arange (21, 26)
a,b,c = a[a> 3],b[a>3], c[a >3]
print a,b,c
The output is: [4 5] ['D' 'E'] [24 25]
I cant' figure out why this output is different from the following:
a = np.arange (1,6)
b = np.array(["A", "B", "C", "D", "E"])
c = np.arange (21, 26)
a = a[a>3]
b = b[a>3]
c = c[a>3]
print a,b,c
output:
[4 5] ['A' 'B'] [21 22]
Any idea?
In the first part, when you do:
a, b, c = a[a> 3], b[a>3], c[a >3]
it is done over a = np.arange (1,6) - The value of a is only modified after all operations have been executed.
whereas in the second part, you are filtering b and c over an already filtered and modified array a, because it happens after you have done:
a = a[a>3]
Therefore, the following lines are filtered against array a now equal to [4, 5]
b = b[a>3] # <-- over a = [4, 5] gives values at index 0 and 1
c = c[a>3] # <-- over a = [4, 5] gives values at index 0 and 1
In the second case, you could use a temporary array to hold the filtered values of a.
temp = a[a>3]
b = b[a>3]
c = c[a>3]
a = temp
or, as suggested in the comments by #hpaulj, evaluate and store the mask in a variable first, then use it as many times as needed without having to redo the work:
mask = a > 3
a = a[mask]
b = b[mask]
c = c[~mask]
A simple fix is to trim your "a" array last, not first!
b=b[a>3]
c=c[a>3]
a=a[a>3]
If you plan to perform multiple trimmings, then consider saving the [a>3] to a variable temporarily (as instructed by other answer) which may help improve computational efficiency.

Categories

Resources