I need to compute tuples (of integers) of arbitrary (but the same) length into RGB values. It would be especially nice if I could have them ordered more-or-less by magnitude, with any standard way of choosing sub-ordering between (0,1) and (1,0).
Here's how I'm doing this now:
I have a long list of RGB values of colors.
colors = [(0,0,0),(255,255,255),...]
I take the hash of the tuple mod the number of colors, and use this as the index.
def tuple_to_rgb(atuple):
index = hash(atuple) % len(colors)
return colors[index]
This works OK, but I'd like it to work more like a heatmap value, where (5,5,5) has a larger value than (0,0,0), so that adjacent colors make some sense, maybe getting "hotter" as the values get larger.
I know how to map integers onto RGB values, so perhaps if I just had a decent way of generating a unique integer from a tuple that sorted first by the magnitude of the tuple and then by the interior values it might work.
I could simply write my own sort comparitor, generate the entire list of possible tuples in advance, and use the order in the list as the unique integer, but it would be much easier if I didn't have to generate all of the possible tuples in advance.
Does anyone have any suggestions? This seems like something do-able, and I'd appreciate any hints to push me in the right direction.
For those who are interested, I'm trying to visualize predictions of electron occupations of quantum dots, like those in Fig 1b of this paper, but with arbitrary number of dots (and thus an arbitrary tuple length). The tuple length is fixed in a given application of the code, but I don't want the code to be specific to double-dots or triple-dots. Probably won't get much bigger than quadruple dots, but experimentalists dream up some pretty wild systems.
Here's an alternative method. Since the dots I've generated so far only have a subset of the possible occupations, the color maps were skewed one way, and didn't look as good. This method requires a list of possible states to be passed in, and thus these must be generated in advance, but the resulting colormaps look much nicer.
class Colormapper2:
"""
Like Colormapper, but uses a list of possible occupations to
generate the maps, rather than generating all possible occupations.
The difference is that the systems I've explored only have a subset
of the possible states occupied, and the colormaps look better
this way.
"""
def __init__(self,occs,**kwargs):
import matplotlib.pyplot as plt
colormap = kwargs.get('colormap','hot')
self.occs = sorted(list(occs),key=sum)
self.n = float(len(self.occs))
self.cmap = plt.get_cmap(colormap)
return
def __call__(self,occ):
ind255 = int(255*self.occs.index(occ)/self.n)
return self.cmap(ind255)
Here's an example of the resulting image:
You can see the colors are better separated than the other version.
Here's the code I came up with:
class Colormapper:
"""
Create a colormap to map tuples onto RGBA values produced by matplolib's
cmap function.
Arguments are the maximum value of each place in the tuple. Dimension of
the tuple is inferred from the length of the args array.
"""
def __init__(self,*args):
from itertools import product
import matplotlib.pyplot as plt
self.occs = sorted(list(product(*[xrange(arg+1) for arg in args])),key=sum)
self.n = float(len(self.occs))
self.hotmap = plt.get_cmap('hot')
return
def __call__(self,occ):
ind255 = int(255*self.occs.index(occ)/self.n)
return self.hotmap(ind255)
Here's an example of the result of this code:
Related
I want to generate a number of random points in hexagon. To do so, i generate random points in square and then try to use conditions to drop not suitable pairs. I tried solutions like this:
import scipy.stats as sps
import numpy as np
size=100
kx = 1/np.sqrt(3)*sps.uniform.rvs(loc=-1,scale=2,size=size)
ky = 2/3*sps.uniform.rvs(loc=-1,scale=2,size=size)
pairs = [(i, j) for i in kx for j in ky]
def conditions(pair):
return (-1/np.sqrt(3)<pair[0]<1/np.sqrt(3)) & (-2/3<pair[1]<2/3)
mask = np.apply_along_axis(conditions, 1, pairs)
hex_pairs = np.extract(mask, pairs)
L=len(hex_pairs)
print(L)
In this example I try to construct a logical mask for future use of np.extract to extract needed values. I try to apply conditional function to all pairs from a list. But it seems that I understand something badly because if using this mask the output of this code is:
10000
That means that no pairs were dropped and all boolean numbers in mask were True. Can anyone suggest how to correct this solution or maybe to put it another way (with a set of randomly distributed points in hexagon as a result)?
The reason why none of your pairs gets eliminated is, that they are created such that the condition is fulfilled (all x-values are in [-1/sqrt(3), 1/sqrt(3)], similar for the y-values).
I think an intuitive and easy way to get their is to create a hexagonal polygon, generate uniformly distributed random numbers within a square that encloses this hexagon and then apply the respective method from one of the already existing polygon-libraries, such as shapely. See e.g. https://stackoverflow.com/a/36400130/7084566
I have a set of strings which are some millions of characters each. I want to split them into substrings of random length, and this I can do with no particular issue.
However, my question is: how can I apply some sort of weight to the substring length choice? My code runs in python3, so I would like to find a pythonic solution. In detail, my aim is to:
split the strings into substrings that range in length between 1*e04 and 8*e06 characters.
make it so, that the script chooses more often a short length (1*e04) over a long length (8*e06) for the newly generated substrings, like a descending length likelihood gradient.
Thanks for the help!
NumPy supplies lots of random samping functions. Have a look through the various distributions available.
If you're looking for something that it weighted towards the lower end of the scale, maybe the exponential distribution would work?
With matplotlib you can plot the histogram of the values, so you can get a better idea if the distribution fits what you want.
So something like this:
import numpy as np
import matplotlib.pyplot as plt
# desired range of values
mn = 1e04
mx = 8e06
# random values following exp distribution
values = np.random.exponential(scale=1, size=2000)
# scale the values to the desired range
values = ((mx-mn)*values/np.max(values)) + mn
# plot the distribution of values
plt.hist(values)
plt.grid()
plt.show()
plt.close()
There are probably many ways to do this. I would do it as follows:
Take a random number rand in the interval [0,1]:
import random
rand = random.random()
Use an operation on that number to make smaller numbers more likely, but stay in the range of [0,1]. What operation you use depends on how you want your likelihood distribution to look like. A simple choice would be the square.
rand = rand**2
Scale the number space [0,1] up to [1e04, 8e06] and round to the next integer:
subStringLen = round(rand*(8e06-1e04)+1e04)
Get the substring of length subStringLen from your string and check how many characters are left.
If there are more than 8e06 characters left go to step 1.
If there are between 1e04 and 8e06, use them as your last substring.
If there are less than 1e04 you need to decide if you want to throw the rest away or allow substrings smaller than 1e04 in this speciel case.
I'm sure there is a lot of improvements possible in terms of efficiency, this is just to give you an idea of my method.
I'm starting to use numpy. I get the slice notations and element-wise computations, but I can't understand this:
for i, (I,J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[int(np.floor(I/self.bin_size))][int(np.floor(J/self.bin_size))] += 1
Variables:
data_list contains two np.array().flatten() images (eventually more)
joint_hist[] is the joint histogram of those two images, it's displayed later with plt.imshow()
bin_size is the number of slots in the histogram
I can't understand why the coordinate in the final histogram is I,J. So it's not just that the value at a position in joint_hist[] is the result of some slicing/element-wise computation. I need to take the result of that computation and use THAT as the indices in joint_hist...
EDIT:
I indeed do not use the i in the loop actually - it's a leftover from previous iterations and I simply hadn't noticed I didn't need it anymore
I do want to remain in control of the bin sizes & the details of how this is done, so not particularly looking to use histogramm2D. I will later be using that for further image processing, so I'd rather have the flexibility to adapt my approach than have to figure out if/how to do particular things with built-in functions.
You can indeed gussy up that for loop using some numpy notation. Assuming you don't actually need i (since it isn't used anywhere):
for I,J in (data_list.T // self.bin_size).astype(int):
joint_hist[I, J] += 1
Explanation
data_list.T flips data_list on its side. Each row of data_list.T will contain the data for the pixels at a particular coordinate.
data_list.T // self.bin_size will produce the same result as np.floor(I/self.bin_size), only it will operate on all of the pixels at once, instead of one at a time.
.astype(int) does the same thing as int(...), but again operates on the entire array instead of a single element.
When you iterate over a 2D array with a for loop, the rows are returned one at a time. Thus, the for I,J in arr syntax will give you back one pair of pixels at a time, just like your zip statement did originally.
Alternative
You could also just use histogramdd to calculate joint_hist, in place of your for loop. For your application it would look like:
import numpy as np
joint_hist,edges = np.histogramdd(data_list.T)
This would have different bins than the ones you specified above, though (numpy would determine them automatically).
If I understand, your goal is to make an histogram or correlated values in your images? Well, to achieve the right bin index, the computation that you used is not valid. Instead of np.floor(I/self.bin_size), use np.floor(I/(I_max/bin_size)).astype(int). You want to divide I and J by their respective resolution. The result that you will get is a diagonal matrix for joint_hist if both data_list[0] and data_list[1] are the same flattened image.
So all put together:
I_max = data_list[0].max()+1
J_max = data_list[1].max()+1
joint_hist = np.zeros((I_max, J_max))
bin_size = 256
for i, (I, J) in enumerate(zip(data_list[0], data_list[1])):
joint_hist[np.floor(I / (I_max / bin_size)).astype(int), np.floor(J / (J_max / bin_size)).astype(int)] += 1
I have a list of tuples, containing floats, e.g.:
myList = [(1.0,2.0), (1.0,0.5), (2.0,1.0), (3.0,2.0), (3.0,0.0)]
The lexicographic order of the tuples is:
mySortedList = [(1.0,0.5), (1.0,2.0), (2.0,1.0), (3.0,0.0), (3.0,2.0)]
I.e. one tuple is smaller than the other, if both entries of the tuple are smaller.
Now I want to make a histogram, that shows the distribution of data that is ordered lexicographically like mySortedList. Is there any way do so with a built-in function in python? plt.hist works only for onedimensional lists. Btw is a histogram a good approach at all, to show the density in this case? (My statistic skills are rather limited, sorry)
In this case:
print(sorted(myList,key=sum))
Would work
Output:
[(1.0,0.5), (1.0,2.0), (2.0,1.0), (3.0,0.0), (3.0,2.0)]
I have a list of np.array, mya = [a0,...,an] (all of which have the same shape and dtype). Say ai has the shape ai = array[xi0,xi1,..,xim]. I want to get
[max((a[i] for a in mya)) for i in range(m)]
. For example, let x=np.array([3,4,5]), y=np.array([2,50,-1]) and z=np.array([30,0,3]) then for mya = [x,y,z], I want [30,50,5] (or np.array equivalent).
Giving m by m=len(mya[0]), my code above does work, but it seems way too tedious. What are the suggested ways to achieve this?
In numpy, numpy.amax(myarray) give you the maximum of myarray. If you look for the maximum of each list/array of first dimmension, you can set also the axis you want. In this case, it should be:
x=np.array([3,4,5])
y=np.array([2,50,-1])
z=np.array([30,0,3])
mya = [x,y,z]
maximum = np.amax(mya, axis=0)
# maximum will store a list as [maximumofx, maximumofy, maximumofz] -> [30,50,5]
See docs
As #Ruben_Bermudez suggested, np.amax was just what I was looking for.
np.amax, scipy documentation provided here, accepts an array-like data as input and returns "the maximum of an array or maximum along an axis." Among its optional parameters is axis, which specifies the axis along which to find maximum.
By default, input is flattened, so
np.amax(mya) # => 50
Specifying axis=0
np.amax(mya,axis=0) # np.array([30,50,5])
and this was what I wanted.
Sorry for the mess.