How to add dictionary values in python numpy array - python

I'm trying to create numpy array and data keys are positions, metadata. Its output should be like below
#sample output
['positions', metadata] #data keys when I print file_name.keys()
{'num_pos': 10, 'keypoints': [[4, 5, 6, 10, 11, 12], [1, 2, 3, 13, 14, 15]]} #values of metadata in dictionary when I print file_name['metadata']
I want output same as above. Below is my python code to get required npz file.
#code sample
positions = [] #this step is working and values are saved in npz file, so I'm just skipping this step, my problem is in metadata key which is given below
metadata = {
'num_pos': 10,
'keypoints': [[4, 5, 6, 10, 11, 12], [1, 2, 3, 13, 14, 15]]
}
positions = np.array(positions).astype(np.float32)
np.savez_compressed('file_name.npz', position=positions, metadata=metadata)
With above code I can get npz file having values of positions but not values of metadata. When I print file_name.keys() then output is ['positions', 'metadata'] which is ok but when I print file_name['metadata'] I'm getting following error.
ValueError: unsupported pickle protocol: 3
Looking for valuable suggestions

Related

How to save a large dict with tuples as keys?

I have large dict which has 3-tuples of integers as keys. I would like to save it to disk so I can read it in quickly. Sadly it seems I can't save it as a JSON file (which would let me use a fast JSON module such as orjson). What are my options other than pickle?
A tiny example would be:
my_dict = {
(1, 2, 3): [4, 5, 6],
(4, 5, 6): [7, 8, 9],
(7, 8, 9): [10, 11, 12]
}
I have about 500,000 keys and each value list is of length 500.
I will make this data once and it will not be modified after it is made. my_dict will only ever be used as a lookup table
You can try with the package pprint. This is a code saving the file as a Python module, which you can import either as module or just the dictionary object. This is the code.
import pprint
my_dict = {
(1, 2, 3): [4, 5, 6],
(4, 5, 6): [7, 8, 9],
(7, 8, 9): [10, 11, 12]
}
obj_str = pprint.pformat(my_dict, indent=4, compact=False)
message = f'my_dict = {obj_str}\n'
with open('data.py', 'w') as f:
f.write(message)
Of course, you don't have to save it as a Python module, you can just save it as text/binary data and read it into your program as an object; maybe with eval in case you save it as text.
EDIT
Just saw you edited the question. This might be enough for 500,000 keys with 500 items each.

How to (log) transform *args arguments without losing structure

I am attempting to apply statistical tests to some datasets with variable numbers of groups. This causes a problem when I try to perform a log transformation for said groups while maintaining the ability to perform the test function (in this case scipy's kruskal()), which takes a variable number of arguments, one for each group of data.
The code below is an idea of what I want. Naturally stats.kruskal([np.log(i) for i in args]) does not work, as kruskal() does not expect a list of arrays, but one argument for each array. How do I perform log transformation (or any kind of alteration, really), while still being able to use the function?
import scipy.stats as stats
import numpy as np
def t(*args):
test = stats.kruskal([np.log(i) for i in args])
return test
a = [11, 12, 4, 42, 12, 1, 21, 12, 6]
b = [1, 12, 4, 3, 14, 8, 8, 6]
c = [2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8]
print(t(a, b, c))
IIUC, * in front of the list you are forming while calling kruskal should do the trick:
test = stats.kruskal(*[np.log(i) for i in args])
Asterisk unpacks the list and passes each entry of the list as arguments to the function being called i.e. kruskal here.

How to do i print some numbers using .sample() from the random built in module in python

I working on a problem where I'm supposed to generate ten random but unique numbers that range from 1 to 15 inclusive. The thing is, I'm supposed to write everything in one line and to also get this output:
[2, 4, 6, 7, 8, 9, 11, 12, 13, 15]
Below I have some code I wrote but, it's not getting the output I want. What am I doing wrong and can I perhaps see a solution with a break so I know how to do this going down the road?
import random
print(sorted(random.sample(range(1,16),15)))
Output:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
The output I want is:
[2,4,6,7,8,9,11,12,13,15]
How do I get this in one line of code?
>>> help(random.sample)
sample(population, k): method of random.Random instance
Chooses k unique random elements from a population sequence or set.
I'm supposed to write everything in one line and to also get this output:
[2, 4, 6, 7, 8, 9, 11, 12, 13, 15]
>>> sorted(__import__('random').Random(4225).sample(range(1, 16), 10))
[2, 4, 6, 7, 8, 9, 11, 12, 13, 15]
If you want to generate ten numbers in range 1-15, change
print(sorted(random.sample(range(1,16),15)))
to
print(sorted(random.sample(range(1,16),10)))
# From the documentation :
# random.sample(population, k)
import random
population = range(16)
how_may_sample = 10
random.sample(population, how_many_sample)
# Now in one line
random.sample(range(16), 10)

Strange output from ndarray.shape

I am trying to convert the value of a dictionary to a 1d array using:np.asarray(dict.values()), but when I tried to print the shape of the output array, I have problem.
My array looks like this:
dict_values([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26])
but the output of array.shape is:
()
by which I was expecting (27,1) or (27,)
after I changed the code to np.asarray(dict.values()).flatten(),the output of array.shape became
(1,)
I have read the document of numpy.ndarray.shape, but can't get a hint why the outputs are like these. Can someone explain it to me? Thx
This must be python 3.
From docs
The objects returned by dict.keys(), dict.values() and dict.items()
are view objects. They provide a dynamic view on the dictionary’s
entries, which means that when the dictionary changes, the view
reflects these changes.
The issue is that dict.values() is only returning a dynamic view of the data in dictionary's values, Leading to the behaviour you see.
dict_a = {'1': 1, '2': 2}
res = np.array(dict_a.values())
res.shape #()
res
#Output:
array(dict_values([1, 2]), dtype=object)
Notice that the numpy array isn't resolving the view object into the actual integers, but rather just coercing the view into an array with dtype = object
To avoid this issue, consume the view to get a list, as follows:
dict_a = {'1': 1, '2': 2}
res = np.array(list(dict_a.values()))
res.shape #(2,)
res #array([1, 2])
res.dtype #dtype('int32')

How to remove varying multiple strings from a string extracted from a csv file

I am quite new to programming and have a string with integrated list values. I am trying to isolate the numerical values in the string to be able to use them later.
I have tried to split the string, and change it back to a list and remove the EU variables with a loop. The initial definition produces the indexes of the duplicates and writes them in a list/string format that I am trying to change.
This is the csv file extract example:
Country,Population,Number,code,area
,,,,
Canada,8822267,83858,EU15,central
Denmark,11413058,305010,EU6,west
Southafrica,705034,110912,EU6,south
We are trying to add up repeating EU number populations.
def duplicates(listed, number):
return [i for i,x in enumerate(listed) if x == number]
a=list((x, duplicates(EUlist, x)) for x in set(EUlist) if EUlist.count(x) > 1)
str1 = ''.join(str(e) for e in a)
for x in range (6,27):
str2=str1.replace("EUx","")
#split=str1.split("EUx")
#Here is where I tried to split it as a list. Changing str1 back to a list. str1= [x for x in split]
This is what the code produces:
('EU6', [1, 9, 10, 14, 17, 19])('EU12', [21, 25])('EU25', [4, 5, 7, 12, 15, 16, 18, 20, 23, 24])('EU27', [2, 22])('EU9', [6, 13])('EU15', [0, 8, 26])
I am trying to isolate the numbers in the square brackets so it prints:
[1, 9, 10, 14, 17, 19]
[21, 25]
[4, 5, 7, 12, 15, 16, 18, 20, 23, 24]
[2, 22]
[6, 13]
[0, 8, 26]
This will allow me to isolate the indexes for further use.
I'm not sure without example data but I think this might do the trick:
def duplicates(listed, number):
return [i for i,x in enumerate(listed) if x == number]
a=list((x, duplicates(EUlist, x)) for x in set(EUlist) if EUlist.count(x) > 1)
for item in a:
print(item[1])
At least I think this should print what you asked for in the question.
As an alternative you can use pandas module and save some typing. Remove the four commas on second line and then:
import pandas as pd
csvfile = r'C:\Test\pops.csv'
df = pd.read_csv(csvfile)
df.groupby('membership')['Population'].sum()
Will output:
membership
Brexit 662307
EU10 10868
EU12 569219
EU15 8976639
EU25 17495803
EU27 900255
EU28 41053
EU6 13694963
EU9 105449

Categories

Resources