Python: "Misbehaving" insert function in a loop

Python: "Misbehaving" insert function in a loop - python

I wrote the following bit of code:
...
for x in range(len(coeff)): coeff[x].insert(0,names[x])
coeff.insert(0,['Center','c1','c2','c3'])
print_matrix(coeff)
...
The print_matrix function just prints a nice matrix from a tuple [[row1],[row2],[etc...]].
my coeff = [[1,2,3],[4,5,6]] and my names = ['A,'B'].
The first time i run the function I get:
coeff = [['Center','c1','c2','c3'],['A',1,2,3],[B,4,5,6]]
+----------------------+
| Center c1 c2 c3 |
| A 1 2 3 |
| B 4 5 6 |
+----------------------+
which is exactly what I want. The problem starts when I run THE SAME (copied and pasted) script just after the first one to print in a similar fashion another tuple basis = [[7,8,9],[10,11,12]]:
...
del x
for x in range(len(basis)): basis[x].insert(0,names[x])
basis.insert(0,['Center','A1','A2','A3'])
print_matrix(basis)
...
I then get:
basis = [['Center','A1','A2','A3'],['A','B',7,8,9],['A','B',10,11,12]]
and an error from the print_matrix functions since it doesn't get a tuple with equal lenght rows. Why?

Ok, I worked it out. What happened was that the way basis was constructed in the fist place affected the functions. I just gave random numbers as an example of basis but in fact it was (deep in the code):
coordinates = [...,[1,2,3],...]
coordinates[7] = [1,2,3] # Or something like that
basis = []
basis.append(coordinates[7])
...
basis.append(coordinates[7])
so that when I did insert(0,something) on basis[0], it also inserted element into basis[1].
Here is a strip of code that works:
...
basis_clone = [[y for y in basis[x]] for x in range(len(basis))]
for y, name in zip(basis_clone,orbital_center_names): y.insert(0,name)
basis_clone.insert(0,['Center','A1','A2','A3'])
print_matrix(basis_clone) ; sleep(0.1)
...
None of the methods given here worked so I had to clone the basis in the way I did. I'm open for suggestions of a better way to do that though.
P.S.: Thank you to #Lattyware for help on good syntax.

Related

Unexpected behavior when using R sample function with rpy2?

I need to cross-validate an R code in python. My code contains lots of pseudo-random number generations, so, for an easier comparison, I decided to use rpy2 to generate those values in my python code "from R".
As an example, in R, I have:
set.seed(1234)
runif(4)
[1] 0.1137034 0.6222994 0.6092747 0.6233794
In python, using rpy2, I have:
import rpy2.robjects as robjects
set_seed = robjects.r("set.seed")
runif = robjects.r("runif")
set_seed(1234)
print(runif(4))
[1] 0.1137034 0.6222994 0.6092747 0.6233794
as expected (values are similar). However, I face a strange behavior with the R sample function (equivalent to the numpy.random.choice function).
As the simplest reproducible example, I have in R:
set.seed(1234)
sample(5)
[1] 1 3 2 4 5
while in python I have:
sample = robjects.r("sample")
set_seed(1234)
print(sample(5))
[1] 4 5 2 3 1
The results are different. Could anyone explain why this happens and/or provide a way to get similar values in R and python using the R sample function?

If you print the value of the R function RNGkind() in both situations, I suspect you won't get the same answer. The Python result looks like the default output, while your R result looks like the old buggy output.
For example, in R:
set.seed(1234, sample.kind = "Rejection")
sample(5)
#> [1] 4 5 2 3 1
set.seed(1234, sample.kind = "Rounding")
#> Warning in set.seed(1234, sample.kind = "Rounding"): non-uniform 'Rounding'
#> sampler used
sample(5)
#> [1] 1 3 2 4 5
set.seed(1234, sample.kind = "default")
sample(5)
#> [1] 4 5 2 3 1
Created on 2021-01-15 by the reprex package (v0.3.0)
So it looks to me as though you are still using the old "Rounding" method in your R session. You probably saved a workspace a long time ago, and have reloaded it since. Don't do that, start with a clean workspace each session.

Maybe give this a shot (stackoverflow answer from here). Quoting the answer : "The p argument corresponds to the prob argument in the sample()function"
import numpy as np
np.random.choice(a, size=None, replace=True, p=None)

Python dictionaries not copied properly causing repetitions, how to get this right?

I'm writing a function which is supposed to compare lists (significant genes for a test) and list out common elements (genes) for all possible combinations of the selection of lists.
These results are to be used for a venn diagram thingy...
The number of tests and genes being flexible.
The input JSON file looks something like this:
| test | genes |
|----------------- |--------------------------------------------------- |
| p-7trt_1/0con_1 | [ENSMUSG00000000031, ENSMUSG00000000049, ENSMU... |
| p-7trt_2/0con_1 | [ENSMUSG00000000031, ENSMUSG00000000037, ENSMU... |
| p-7trt_1/0con_2 | [ENSMUSG00000000037, ENSMUSG00000000049, ENSMU... |
| p-7trt_2/0con_2 | [ENSMUSG00000000028, ENSMUSG00000000031, ENSMU... |
| p-7trt_1/0con_3 | [ENSMUSG00000000088, ENSMUSG00000000094, ENSMU... |
| p-7trt_2/0con_3 | [ENSMUSG00000000028, ENSMUSG00000000031, ENSMU... |
So The function is follows:
import pandas as pd
def get_venn_compiled_data(dir_loc):
"""return json of compiled data for the venn thing
"""
data_frame = pd.read_json(dir_loc + "/venn.json", orient="records")
number_of_tests = data_frame.shape[0]
venn_data = []
venn_data_point = {"tests": [], "genes": []} # list of genes which are common across listed tests
binary = lambda x: bin(x)[2:] # to directly get the binary number
for dec_number in range(1, 2 ** number_of_tests):
# resetting
venn_data_point["tests"] = []
venn_data_point["genes"] = []
# using a binary number to get all the cases
for index, state in enumerate(binary(dec_number)):
if state == "0":
continue
# putting in all the genes from the first test
if venn_data_point["tests"] == []:
venn_data_point["genes"] = data_frame["data"][index].copy()
# removing the ones which are not common in current genes state and this.tests
else:
for gene_index, gene in enumerate(venn_data_point["genes"]):
if gene not in data_frame["data"][index]:
venn_data_point["genes"].pop(gene_index)
# putting the test in the tests list
venn_data_point["tests"].append(data_frame["name"][index])
venn_data.append(venn_data_point.copy())
return venn_data
I'm basically abusing the fact that binary number generate all possible combinations of 1's and 0's so corresponding every place of the binary number with a test, and for every binary number, if 0 is present then the list corresponding to that test is not taken for list comparison.
I tried my best to explain, please ask in the comments if I was not clear.
After running the function I am getting an output in which there are random places where test sets are repeated.
This is the test input file.
and
This is what cameout as the output
Any help is highly appreciated Thank you.

I realized what error I was making
I assumed that the binary function will magically always generate the string with the number of places I needed, Which it doesn't.
After updating the binary function to add those zeros things are doing fine.
import pandas as pd
def get_venn_compiled_data(dir_loc):
"""return json of compiled data for the venn thing
"""
# internal variables
data_frame = pd.read_json(dir_loc + "/venn.json", orient="records")
number_of_tests = data_frame.shape[0]
venn_data = []
# defining internal function
def binary(dec_no, length=number_of_tests):
"""Just to convert decimal number to binary of specified length
"""
bin_number = bin(dec_no)[2:]
if len(bin_number) < length:
bin_number = "0" * (length - len(bin_number)) + bin_number
return bin_number
# list of genes which are common across listed tests
venn_data_point = {
"tests": [],
"genes": [],
}
for dec_number in range(1, 2 ** number_of_tests):
# resetting
venn_data_point["tests"] = []
venn_data_point["genes"] = []
# using a binary number to get all the cases
for index, state in enumerate(binary(dec_number)):
if state == "0":
continue
# putting in all the genes from the first test
if venn_data_point["tests"] == []:
venn_data_point["genes"] = data_frame["data"][index].copy()
# removing the ones which are not common in current genes state and this.tests
else:
for gene_index, gene in enumerate(venn_data_point["genes"]):
if gene not in data_frame["data"][index]:
venn_data_point["genes"].pop(gene_index)
# putting the test in the tests list
venn_data_point["tests"].append(data_frame["name"][index])
venn_data.append(venn_data_point.copy())
return venn_data
If anyone else has a more optimized algorithm for this, help is appreciated.

How to efficient find existent key-values of 2-dimensional dictionary in python which are between 4 values?

I have a little Problem in Python. I got a 2 dimensional dictionary. Lets call it dict[x,y] now. x and y are integers. I try to only select the key-pair-values, which match between 4 points. Function should look like this:
def search(topleft_x, topleft_y, bottomright_x, bottomright_y):
For example: search(20, 40, 200000000, 300000000)
Now are Dictionary-items should be returned that match to:
20 < x < 20000000000
AND 40 < y < 30000000000
Most of the key-pair-values in this huge matrix are not set (see picture - this is why i cant just iterate).
This function should return a shorted dictionary. In the example shown in the picture, it would be a new dictionary with the 3 green circled values. Is there any simple solution to realize this?
I recently used 2-for-loops. In this example they would look like this:
def search():
for x in range(20, 2000000000):
for y in range(40, 3000000000):
try:
#Do something
except:
#Well item just doesnt exist
Of course this is highly inefficient. So my question is: How to Boost up this simple thing in Python? In C# i used Linq for stuff like this... What to use in python?
Thanks for help!
Example Picture

You dont go over random number ranges and ask 4million times for forgiveness - you use 2 number range to specify your "filters" and go only over existing keys in the dictionary that fall into those ranges:
# get fancy on filtering if you like, I used explicit conditions and continues for clearity
def search(d:dict,r1:range, r2:range)->dict:
d2 = {}
for x in d: # only use existing keys in d - not 20k that might be in
if x not in r1: # skip it - not in range r1
continue
d2[x] = {}
for y in d[x]: # only use existing keys in d[x] - not 20k that might be in
if y not in r2: # skip it - not in range r2
continue
d2[x][y] = "found: " + d[x][y][:] # take it, its in both ranges
return d2
d = {}
d[20] = {99: "20",999: "200",9999: "2000",99999: "20000",}
d[9999] = { 70:"70",700:"700",7000:"7000",70000:"70000"}
print(search(d,range(10,30), range(40,9000)))
Output:
{20: {99: 'found: 20', 999: 'found: 200'}}
It might be useful to take a look at modules providing sparse matrices.

How can I tell MATLAB that the data I am importing is a series of vectors, not just a series of letters?

I have obtained my data using python for a project in MATLAB. I have 3 different matrices of dimensions mxn, mxn+1 and mxn+2. I used this command in python scipy.io.savemat('set1.mat', mdict ={'abc1':abc1}). Each row of the matrix should actually be a row of row vectors (of length p) not scalars, so that the matrices are actually mx(n)*p, mx(n+1)*p and mx(n+2)*p.
As an example, I have defined at the top of the MATLAB file for both cases
A = ones(1,5)
B = 2*ones(1,5)
C = 3*ones(1,5)
Now directly in MATLAB I can write:
abc1 = [A B C]
which strange as though it may seem, gives me the output I want.
abc1 =
Columns 1 through 14
1 1 1 1 1 2 2 2 2 2 3 3 3 3
Column 15
3
Now if I import my data using load I can grab abc1(1,:). This gives me:
ans = A B C
or I could take:
abc1(1,1)
ans = A
How can I get it to recognise that A is the name of a vector?

From what I understand of your question it sounds like you have (in matlab):
A = ones(1,5);
B = 2*ones(1,5);
C = 3*ones(1,5);
load('set1.mat');
And then you want to do something like:
D = [abc1];
and have the result be, for abc1 = 'A B C', the equivalent of [A B C].
There are a number of options for doing this. The first and possibly simplest is to use eval, though I shudder to mention it, since most consider eval to be evil.
In your case this would look like:
D = eval(['[' abc1 ']']);
A nicer solution would be to exploit the dynamic field names trick that can be done with structures:
foo.A = ones(1,5);
foo.B = 2*ones(1,5);
foo.C = 3*ones(1,5);
load('set1.mat');
D = [foo.(abc1(1,1)) foo.(abc1(1,2)) foo.(abc1(1,3))];
Or, if you need to concatenate more than just 3 columns you could do so itteratively, using the cat function. e.g.:
D = [];
for idx = 1:3
D = cat(2, D, foo.(abc1(1,idx)));
end
Or, if you know the length of D before you have created it you can use a slightly more efficient version:
D = zeros(1, num_elements);
ins_idx = 1;
for idx = 1:3
temp_num = length(foo.(abc1(1,idx)));
D(ins_idx:(ins_idx+temp_num-1)) = foo.(abc1(1,idx));
ins_idx = ins_idx + temp_num;
end

Load the data into a structure and use dynamic field indexing:
s = load('yourfile');
s.(abc1(1,1))
However, if you keep structuring your project in the above-mentioned way, you're probably gonna run into eval(), which I always suggest to avoid.

Python bidirectional mapping

I'm not sure what to call what I'm looking for; so if I failed to find this question else where, I apologize. In short, I am writing python code that will interface directly with the Linux kernel. Its easy to get the required values from include header files and write them in to my source:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
Its easy to use these values when constructing structs to send to the kernel. However, they are of almost no help to resolve the values in the responses from the kernel.
If I put the values in to dict I would have to scan all the values in the dict to look up keys for each item in each struct from the kernel I presume. There must be a simpler, more efficient way.
How would you do it? (feel free to retitle the question if its way off)

If you want to use two dicts, you can try this to create the inverted dict:
b = {v: k for k, v in a.iteritems()}

Your solution leaves a lot of work do the repeated person creating the file. That is a source for error (you actually have to write each name three times). If you have a file where you need to update those from time to time (like, when new kernel releases come out), you are destined to include an error sooner or later. Actually, that was just a long way of saying, your solution violates DRY.
I would change your solution to something like this:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {globals()[x]:x for x in dir() if x.startswith('IFA_') or x.startswith('__IFA_')}
This was the values dict is generated automatically. You might want to (or have to) change the condition in the if statement there, according to whatever else is in that file. Maybe something like the following. That version would take away the need to list prefixes in the if statement, but it would fail if you had other stuff in the file.
values = {globals()[x]:x for x in dir() if not x.endswith('__')}
You could of course do something more sophisticated there, e.g. check for accidentally repeated values.

What I ended up doing is leaving the constant values in the module and creating a dict. The module is ip_addr.py (the values are from linux/if_addr.h) so when constructing structs to send to the kernel I can use if_addr.IFA_LABEL and resolves responses with if_addr.values[2]. I'm hoping this is the most straight forward so when I have to look at this again in a year+ its easy to understand :p
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {
IFA_UNSPEC : 'IFA_UNSPEC',
IFA_ADDRESS : 'IFA_ADDRESS',
IFA_LOCAL : 'IFA_LOCAL',
IFA_LABEL : 'IFA_LABEL',
IFA_BROADCAST : 'IFA_BROADCAST',
IFA_ANYCAST : 'IFA_ANYCAST',
IFA_CACHEINFO : 'IFA_CACHEINFO',
IFA_MULTICAST : 'IFA_MULTICAST',
__IFA_MAX : '__IFA_MAX'
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: "Misbehaving" insert function in a loop - python

Related

Unexpected behavior when using R sample function with rpy2?

Python dictionaries not copied properly causing repetitions, how to get this right?

How to efficient find existent key-values of 2-dimensional dictionary in python which are between 4 values?

How can I tell MATLAB that the data I am importing is a series of vectors, not just a series of letters?

Python bidirectional mapping

Categories

Resources