Compare two dict and update one of them - python

I have two dictionaries like the following:
dict1 =
{'a': [67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0, 45.0],
'b': [0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0, 0.0],
'c': [1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0, 50.0],
'd': [60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0, 70.0]}
dict2 =
{'a': 0.897,'c': 3.4, 'd': 34.567}
I want all the values in dict1 to be shifted right by value of 1. The keys of dict1 and dict2 are compared. If there exist a value for the similar keys indict2, the value is put as the first element in the values of dict1 (which is a list). If there exist no value in dict2, the value the first element is 0.0. For eg:
When the two dictionaries are compared, dict2 contains values for the key 'a', 'c', 'd'. So the values for these keys are put as the first element in the value of dict1 (which is a list) while shifting the other elements of the list to right. The size of the list is maintained. For the keys which do not contain a value in dict2, a value of 0.0 is put as the first element in the list as shown below
dict1 =
{'a': [0.897, 67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0],
'b': [0.0, 0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0, 0.0],
'c': [3.4, 1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0],
'd': [34.567, 60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0]}

You can iterate over dict1 and if the key exists in dict2, insert the value from dict2 into the index 0 of list in dict1 or insert zero with the default value with dict.get.
for k,v in dict1.items():
dict1[k].pop() # removing last element from each 'list` in 'dict1'
dict1[k].insert(0, dict2.get(k, 0.0)) # insert elelment at 'index=0' from 'dict2' or 'zero' if key doesn't exist in 'dict2'
print(dict1)
{
'a': [0.897, 67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0],
'b': [0.0, 0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0],
'c': [3.4, 1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0],
'd': [34.567, 60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0]
}

If this operation is done repeatedly I suggest you use a deque, the time complexity of appending to the left of deque is O(1) (in a list is O(n)).
from collections import deque
dict1 = {k: deque(v, maxlen=len(v)) for k, v in dict1.items()}
for key, value in dict1.items():
value.appendleft(dict2.get(key, 0.0))
print(dict1)
Output
{'a': deque([0.897, 67.0, 24.0, 45.0, 45.0, 45.0, 23.0, 21.0], maxlen=8),
'b': deque([0.0, 0.9, 0.5, 9.0, 4.5, 54.0, 0.0, 0.0], maxlen=8),
'c': deque([3.4, 1.0, 5.0, 40.0, 30.0, 20.0, 0.0, 10.0], maxlen=8),
'd': deque([34.567, 60.0, 80.0, 56.0, 34.0, 78.0, 13.0, 0.0], maxlen=8)}

Related

How can I put the result of a calculation to be put in a list?

while seed != 1.0:
if (seed % 2 == 0) :
seed = seed / 2
else:
seed = seed * 3 + 1
I want to put the Result of the Calculation to be put in a list.
Could I use return?
If yes how?
You can use the list method append
results = []
while seed != 1.0:
if (seed % 2 == 0) :
seed = seed / 2
else:
seed = seed * 3 + 1
results.append(seed)
print(results)
Output for seed = 50: [25.0, 76.0, 38.0, 19.0, 58.0, 29.0, 88.0, 44.0, 22.0, 11.0, 34.0, 17.0, 52.0, 26.0, 13.0, 40.0, 20.0, 10.0, 5.0, 16.0, 8.0, 4.0, 2.0, 1.0]
Note: the return keyword can only be used in functions and will only be useful in this example if this is in a function at the end instead of print(results)
def three_n_plus_one(seed):
results = []
while seed != 1.0:
if (seed % 2 == 0) :
seed = seed / 2
else:
seed = seed * 3 + 1
results.append(seed)
return results
You can call the function like this:
print(three_n_plus_one(50))
It gives the same output - [25.0, 76.0, 38.0, 19.0, 58.0, 29.0, 88.0, 44.0, 22.0, 11.0, 34.0, 17.0, 52.0, 26.0, 13.0, 40.0, 20.0, 10.0, 5.0, 16.0, 8.0, 4.0, 2.0, 1.0]

Problem with for loop and creating list of list

I have an numpy array called expected which is a list of a list of a list.
expected = [[[45.0, 10.0, 10.0], [110.0, 10.0, 8.0], [60.0, 10.0, 5.0], [170.0, 10.0, 4.0]], [[-80.0, 20.0, 10.0], [97.0, 15.0, 12.0], [5.0, 20.0, 8.0], [93.0, 10.0, 8.0], [12.0, 5.0, 15.0], [-88.0, 10.0, 10.0], [176.0, 10.0, 8.0]]]
I want to put it through a loop without having to hardcode so its applicable to different lengths of list.
When the loop runs for the first time i want it to solve this:
horizontal_exp = expected[0][0][1]*expected[0][0][2]
*np.cos(np.deg2rad(expected[0][0][0]))
Then the next loop to be like this:
horizontal_exp = expected[1][1][1]*expected[1][1][2]
*np.cos(np.deg2rad(expected[1][1][0]))
And the following loop to be like this:
horizontal_exp = expected[2][2][1]*expected[2][2][2]
*np.cos(np.deg2rad(expected[2][2][0]))
and so on until it finished the different sections of rows.
I don't understand why the 'i' never worked??
In the end I want horizontal expected to be a list of a list
e.g.
expected = [ [12,21,23,34], [12,32,54,65,76,87,65] ] # These are not the values I'm just giving an example
where the [12,21,23,24] corresponds to the [[45.0, 10.0, 10.0], [110.0, 10.0, 8.0], [60.0, 10.0, 5.0], [170.0, 10.0, 4.0]]
and the [12,32,54,65,76,87,65] corresponds to the [[-80.0, 20.0, 10.0], [97.0, 15.0, 12.0], [5.0, 20.0, 8.0], [93.0, 10.0, 8.0], [12.0, 5.0, 15.0], [-88.0, 10.0, 10.0], [176.0, 10.0, 8.0]]
I'm unsure how to do this, I know you have to append it with a for loop but how do you separate it into a list of a list??
horizontal_expected = []
for i in list(range(len(expected[i]))):
horizontal_exp = expected[i][i][1]*expected[i][i][2]
*np.cos(np.deg2rad(expected[i][i][0]))
horizontal_expected.append(horizontal_exp)
print(horizontal_expected)
The reason why you don't see the desired output is that, even though you have nested list expected, you are iterating only through the nested lists. You first need to iterate through the outer lists and then iterate through the nested lists internally:
import numpy as np
expected = [ [[45.0, 10.0, 10.0], [110.0, 10.0, 8.0], [60.0, 10.0, 5.0], [170.0, 10.0, 4.0]], [[-80.0, 20.0, 10.0], [97.0, 15.0, 12.0], [5.0, 20.0, 8.0], [93.0, 10.0, 8.0], [12.0, 5.0, 15.0], [-88.0, 10.0, 10.0], [176.0, 10.0, 8.0]] ]
horizontal_expected = []
for i in range(len(expected)):
tmp_list = []
for j in range(len(expected[i])):
horizontal_exp = expected[i][i][1]*expected[i][i][2]*np.cos(np.deg2rad(expected[i][i][0]))
tmp_list.append(horizontal_exp)
horizontal_expected.append(tmp_list)
print(horizontal_expected)
The output of that is a list of lists:
>>> print(horizontal_expected)
[[70.71067811865476, 70.71067811865476, 70.71067811865476, 70.71067811865476], [-21.936481812926527, -21.936481812926527, -21.936481812926527, -21.936481812926527, -21.936481812926527, -21.936481812926527, -21.936481812926527]]
As you can see, it holds a value for each of the lists in the input, but the value is the same. This is due to the way that your equation was set up.
You want the indices to be updated based on the level of the loop:
horizontal_exp = expected[i][j][1]*expected[i][j][2]*np.cos(np.deg2rad(expected[i][j][0]))
The full working code would look like this:
import numpy as np
expected = [ [[45.0, 10.0, 10.0], [110.0, 10.0, 8.0], [60.0, 10.0, 5.0], [170.0, 10.0, 4.0]], [[-80.0, 20.0, 10.0], [97.0, 15.0, 12.0], [5.0, 20.0, 8.0], [93.0, 10.0, 8.0], [12.0, 5.0, 15.0], [-88.0, 10.0, 10.0], [176.0, 10.0, 8.0]] ]
horizontal_expected = []
for i in range(len(expected)):
tmp_list = []
for j in range(len(expected[i])):
horizontal_exp = expected[i][j][1]*expected[i][j][2]*np.cos(np.deg2rad(expected[i][j][0]))
tmp_list.append(horizontal_exp)
horizontal_expected.append(tmp_list)
print(horizontal_expected)
And the output:
>>> print(horizontal_expected)
[[70.71067811865476, -27.361611466053496, 25.000000000000007, -39.39231012048832], [34.72963553338608, -21.936481812926527, 159.39115169467928, -4.186876499435507, 73.36107005503543, 3.489949670250108, -79.80512402078594]]

Python: What is an efficient way to sort a nested array by repeated values?

data is a list, where each entry is a list of floats
L is a range to check whether the first entry of _ in data is equal to and if so store it at that index in c
c = []
d = []
for i in range(L):
for seq in data:
if int(seq[0]) == i:
d.append(seq)
c.append(d)
d = []
return c
>>> data = [[4.0, 0.0, 15.0, 67.0], [3.0, 0.0, 15.0, 72.0], [4.0, 0.0, 15.0, 70.0], [1.0, -0.0, 15.0, 90.0], [3.0, -0.0, 15.0, 75.0], [2.0, -0.0, 15.0, 83.0], [3.0, 0.0, 15.0, 74.0], [4.0, 0.0, 15.0, 69.0], [4.0, 0.0, 14.0, 61.0], [3.0, 0.0, 15.0, 74.0], [3.0, 0.0, 15.0, 75.0], [4.0, 0.0, 15.0, 67.0], [5.0, 0.0, 14.0, 45.0], [6.0, 0.0, 13.0, 30.0], [3.0, 0.0, 15.0, 74.0], [4.0, 0.0, 15.0, 55.0], [7.0, 0.0, 13.0, 22.0], [6.0, 0.0, 13.0, 25.0], [1.0, -0.0, 15.0, 83.0], [7.0, 0.0, 13.0, 18.0]]
>>> sort(data,7)
[[], [[1.0, -0.0, 15.0, 90.0], [1.0, -0.0, 15.0, 83.0]], [[2.0, -0.0, 15.0, 83.0]], [[3.0, 0.0, 15.0, 72.0], [3.0, -0.0, 15.0, 75.0], [3.0, 0.0, 15.0, 74.0], [3.0, 0.0, 15.0, 74.0], [3.0, 0.0, 15.0, 75.0], [3.0, 0.0, 15.0, 74.0]], [[4.0, 0.0, 15.0, 67.0], [4.0, 0.0, 15.0, 70.0], [4.0, 0.0, 15.0, 69.0], [4.0, 0.0, 14.0, 61.0], [4.0, 0.0, 15.0, 67.0], [4.0, 0.0, 15.0, 55.0]], [[5.0, 0.0, 14.0, 45.0]], [[6.0, 0.0, 13.0, 30.0], [6.0, 0.0, 13.0, 25.0]]]
len(data) is on the order of 2 Million
L is on the order of 8000.
I need a way to speed this up ideally!
Optimization attempt
Assuming you want to sort your sublists into buckets according to the first value of each sublist.
For simplicity, I use the following to generate random numbers for testing:
L = 10
data = [[round(random.random() * 10.0, 2) for _ in range(3)] for _ in range(10)]
First about your code, just to make sure that I got your intention correctly.
c = []
d = []
for i in range(L): # Loop over all buckets
for e in data: # Loop over entire data
if int(e[0]) == i: # If first float of sublist falls into i-th bucket
d.append(e) # Append entire sublist to current bucket
c.append(d) # Append current bucket to list of buckets
d = [] # Reset
This is inefficient, because you loop over the full data set for each of your buckets. If you have, as you say, like 8000 buckets and 2 000 000 lists of floats, you will be essentially performing 16 000 000 000 (16 Billion) comparisons. Additionally, you completely populate your bucket lists on creation instead of reusing the existing lists in your data variable. So this makes as many data reference copies.
Thus, you should think about working with your data's indices, e.g.
bidx = [int(e[0]) for e in data] # Calculate bucket indices for all sublists
buck = []
for i in range(L): # Loop over all buckets
lidx = [k for k, b in enumerate(bidx) if b == i] # Get sublist indices for this bucket
buck.append([data[l] for l in lidx]) # Collect list references
print(buck)
This should result in a single iteration over your data calculating the bucket indices in-place. Then, only one second iteration over all your buckets is performed, where corresponding bucket indices are collected from bidx (you have to have this double loop, but this may be a bit faster though) -- resulting in lidx holding the positions of sublists in data that fall into the current bucket. Finally, collect the list references in the bucket's list and store it.
The last step can be costly though, because it contains a lot of reference copying. You should consider storing only the indices in each bucket, not the entire data, e.g.
lidx = ...
buck.append(lidx)
However, optimizing performance only in-code with large data has limits.
If your data is large, all linear iterations will be costly. You can try to reduce them as far as possible, but there is a lower cap defined by the data size itself!
If you have to perform more operations of millions of records, you should think about changing to another data representation or format. For example, if you need to perform frequent operations within one script, you may want to think about trees (e.g. b-trees). If you want to store it for further processing, you may want to think about a database with proper indexes.
Running in Python 3 I get 2 order of magnitude better performance than jbndlr with this algorithm:
rl = range(L) # Generate the range list
buck = [[] for _ in rl] # Create all the buckets
for seq in data: # Loop over entire data
try:
idx = rl.index(int(seq[0])) # Find the bucket index
buck[idx].append(seq) # Append current data in its bucket
except ValueError:
pass # There is no bucket for that value
Comparing the algorithms with:
L = 1000
data = [[round(random.random() * 1200.0, 2) for _ in range(3)] for _ in range(100000)]
I get:
yours: 26.66 sec
jbndlr: 6.78 sec
mine: 0.07 sec

Missed values when creating a dictionary with two values

I have two lists as follows.
count = (1, 0, 0, 2, 0, 0, 1, 1, 1, 2)
bins = [[2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0], [6.0, 7.0], [7.0, 8.0], [8.0, 9.0], [9.0, 10.0], [10.0, 11.0], [11.0, 12.0], [12.0]]
I tried to create a dictionary using following;
dictionary = dict(itertools.izip(count, bins))
And it gives me {"0": [7.0, 8.0], "1": [10.0, 11.0], "2": [11.0, 12.0]}
It gives only the unique key values only but I need to get the all the pairs as below.
{"0": [3.0, 4.0],"0": [4.0, 5.0],"0": [6.0, 7.0],"0": [7.0, 8.0], "1": [2.0, 3.0],"1": [8.0, 9.0], "1": [9.0, 10.0], "1": [10.0, 11.0], "2": [6.0, 7.0] ,"2": [11.0, 12.0]}
or interchange of keys and values in the above dictionary is acceptable.(because keys should be unique)
How can I do that?
You can't use a list as a key to a dictionary as it is mutable.
You could convert the list to a tuple:
>>> count = (1, 0, 0, 2, 0)
>>> bins = [[2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0], [6.0, 7.0], [7.0, 8.0]]
>>> {tuple(key): value for (key, value) in zip(bins, count)}
{(4.0, 5.0): 0,
(3.0, 4.0): 0,
(5.0, 6.0): 2,
(2.0, 3.0): 1,
(6.0, 7.0): 0}
If you want to serialise to json, the keys need to be strings. You could convert the bins to strings instead:
>>> {str(key): value for (key, value) in zip(bins, count)}
{'[2.0, 3.0]': 1, '[4.0, 5.0]': 0, '[6.0, 7.0]': 0, '[5.0, 6.0]': 2, '[3.0, 4.0]': 0}
>>> import json
>>> json.dumps(_)
'{"[2.0, 3.0]": 1, "[4.0, 5.0]": 0, "[6.0, 7.0]": 0, "[5.0, 6.0]": 2, "[3.0, 4.0]": 0}'
Alternatively, just serialise the pairs, and make the dictionary on the receiving end:
>>> zip(bins, count)
[([2.0, 3.0], 1), ([3.0, 4.0], 0), ([4.0, 5.0], 0), ([5.0, 6.0], 2), ([6.0, 7.0], 0)]
>>> import json
>>> json.dumps(_)
'[[[2.0, 3.0], 1], [[3.0, 4.0], 0], [[4.0, 5.0], 0], [[5.0, 6.0], 2], [[6.0, 7.0], 0]]'
{"0": [3.0, 4.0],"0": [4.0, 5.0]} is not a valid dictionary, as the keys in a dictionary have to be unique. If you really want the entries in count to be your keys, the best thing I can think of is to make a list of values for each key:
count = (1, 0, 0, 2, 0, 0, 1, 1, 1, 2)
bins = [[2.0, 3.0], [3.0, 4.0], [4.0, 5.0], [5.0, 6.0], [6.0, 7.0], [7.0, 8.0], [8.0, 9.0], [9.0, 10.0], [10.0, 11.0], [11.0, 12.0], [12.0]]
answer = {}
for c, b in zip(count, bins):
if c not in answer: answer[c] = []
answer[c].append(b)

Is there a simpler way for finding a number

I'm writing a python script.
I have a list of numbers:
b = [55.0, 54.0, 54.0, 53.0, 52.0, 51.0, 50.0, 49.0, 48.0, 47.0,
45.0, 45.0, 44.0, 43.0, 41.0, 40.0, 39.0, 39.0, 38.0, 37.0, 36.0, 35.0, 34.0, 33.0, 32.0, 31.0, 30.0, 28.0, 27.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 22.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 11.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]
I need to parse the list and see if the list contains '50'. If it does not,I have to search for one less number 49. if it is not there I have to look for 48. I can do this down to 47.
In python, is there a one liner code I can do this, or can I use a lambda for this?
You could use min() and abs():
>>> b = [55.0, 54.0, 54.0, 53.0, 52.0, 51.0, 50.0, 49.0, 48.0, 47.0, 45.0, 45.0, 44.0, 43.0, 41.0, 40.0, 39.0, 39.0, 38.0, 37.0, 36.0, 35.0, 34.0, 33.0, 32.0, 31.0, 30.0, 28.0, 27.0, 27.0, 26.0, 25.0, 24.0, 23.0, 22.0, 22.0, 20.0, 19.0, 18.0, 17.0, 16.0, 15.0, 14.0, 13.0, 11.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0]
>>> min(b, key=lambda x:abs(x-50))
50.0
>>> min(b, key=lambda x:abs(x-20.1))
20.0
max(i for i in b if i <= 50)
It will raise a ValueError if there are no elements that match the condition.
max(filter(lambda i: i<=50, b))
or, to handle list with all elements above 50:
max(filter(lambda i: i<=50, b) or [None])
You can do this with a generator expression and max.
max(n for n in b if n >= 47 and n <= 50)
highestValue = max(b)
lowestValue = min(b)
if 50 in b:
pass
Three different ways of finding numbers, highest, lowest and if 50 is in the mix.
And if you need to check if multiple numbers is in your hughe list, say you need to know if 50, 30 and 40 is in there:
set(b).issuperset(set([50, 40, 30]))
Oneliner without any lambda (raises ValueError if value not found):
max((x for x in b if 46 < x <= 50))
or version that returns None in this case:
from itertools import chain
max(chain((x for x in b if 46 < x <= 50), (None,)))

Categories

Resources