Can morphology.remove_small_objects remove big objects? - python

I was wondering if morphology.remove_small_objects could be used to remove big objects. I am using this tool to detect the objects as seen in the figure.
,
However,there are big objects as seen in the left. Is there any way I could use morphology.remove_small_objects as a threshold, for example:
mask=morphology.remove_small_objects(maske, 30)
Could I use like a range? between 30 and 200 so I can ignore the red detection in the image.
Otherwise, I will just count the white pixels in the image and remove the ones that have the highest.

This might be a good contribution to the scikit-image library itself, but for now, you need to roll your own. As suggested by Christoph, you can subtract the result of remove_small_objects from the original image to remove large objects. So, something like:
def filter_objects_by_size(label_image, min_size=0, max_size=None):
small_removed = remove_small_objects(label_image, min_size)
if max_size is not None:
mid_removed = remove_small_objects(small_removed, max_size)
large_removed = small_removed - mid_removed
return large_removed
else:
return small_removed

Related

Divine geometry bounding box to equal smaller boxes

I have a bounding box with coordinates:
bottom_left = [10.7510994291,106.5517721598]
bottom_right = [10.7510994291,106.7500970722]
top_right = [10.9005609767,106.7500970722,]
top_left = [10.9005609767,106.5517721598]
I'm trying to divide it into smaller boxes that have the same area using Python,
I'm able to create two lists using this code:
cols = np.linspace(bottom_left[1], bottom_right[1], num=15)
rows = np.linspace(bottom_left[0],top_left[0], num=15)
Here is the result:
[106.55177216 106.56593822 106.58010429 106.59427036 106.60843642
106.62260249 106.63676855 106.65093462 106.66510068 106.67926675
106.69343281 106.70759888 106.72176494 106.73593101 106.75009707]
[10.75109943 10.76177525 10.77245108 10.7831269 10.79380273 10.80447855
10.81515438 10.8258302 10.83650603 10.84718185 10.85785768 10.8685335
10.87920933 10.88988515 10.90056098]
I'm trying to combine the lat/long for creating a box, here is the example of two small boxes:
[[106.55177216,10.75109943],[106.55177216,10.76177525],[106.56593822,10.75109943],[106.56593822,10.76177525]]
[[106.55177216,10.75109943],[106.55177216,10.76177525],[106.580104,10.751099],[106.580104,10.761775]]
Image:
I know that loop can handle this case but I'm still trying to find a better way. Any help is appreciated. Many thanks.
ps: I'm new to Python and don't know much about libs in Python ecosystem.
So if I now understand correctly you want to take items from each list, two at a time, and generate the four points that will make up the coordinates of a box.
In this case the solution is itertools.product():
out = []
for x in range(0,len(lats),2):
out.append(list(product(lats[x:x+2],longs[x:x+2])))
out
[[(106.55177216, 10.75109943), (106.55177216, 10.76177525), (106.56593822, 10.75109943), (106.56593822, 10.76177525)], [(106.58010429, 10.77245108), (106.58010429, 10.7831269), (106.59427036, 10.77245108), (106.59427036, 10.7831269)], [(106.60843642, 10.79380273), (106.60843642, 10.80447855), (106.62260249, 10.79380273), (106.62260249, 10.80447855)], [(106.63676855, 10.81515438), (106.63676855, 10.8258302), (106.65093462, 10.81515438), (106.65093462, 10.8258302)], [(106.66510068, 10.83650603), (106.66510068, 10.84718185), (106.67926675, 10.83650603), (106.67926675, 10.84718185)], [(106.69343281, 10.85785768), (106.69343281, 10.8685335), (106.70759888, 10.85785768), (106.70759888, 10.8685335)], [(106.72176494, 10.87920933), (106.72176494, 10.88988515), (106.73593101, 10.87920933), (106.73593101, 10.88988515)], [(106.75009707, 10.90056098)]]
Note: I'm assuming lats and longs are lists, and of course the 15th item in each list is ignored since we take them two at a time

Convert LineString / MultiLineString geometries to lat lon

I am using this Mapillary endpoint: https://tiles.mapillary.com/maps/vtp/mly1_public/2/{zoom_level}/{x}/{y}?access_token={} and getting such responses back (see photo). Also, here is the Mapillary documentation.
It is not quite clear to me what the nested coordinate lists in the response represent. By the looks of it, I initially thought it may have to do with pixel coordinates. But judging by the context (the API documentation) and the endpoint I am using, I would say that is not the case. Also, I am not sure if the json response you see in the picture is valid geojson. Some online formatters did not accept it as valid.
I would like to find the bounding box of the "sequence". For context, that would be the minimal-area rectangle defined by two lat, lon positions that fully encompasses the geometry of the so-called "sequence"; and a "sequence" is basically a series of photos taken during a vehicle/on-foot trip, together with the metadata associated with the photos (metadata is available using another endpoint, but that is just for context).
My question is: is it possbile to turn the coordinates you see in the pictures into (lat,lon)? Having those, it would be easy for me to find the bounding box of the sequence. And if so, how? Also, please notice that some of the nested lists are of type LineString while others are MultiLineString (which I read about the difference here: help.arcgis.com, hope this helps)
Minimal reproducible code snippet:
import json
import requests
import mercantile
import mapbox_vector_tile as mvt
ACCESS_TOKEN = 'XXX' # can be provided from here: https://www.mapillary.com/dashboard/developers
z6_tiles = list(mercantile.tiles( #us_west_coast_bbox
west=-125.066423,
south=42.042594,
east=-119.837770,
north=49.148042,
zooms=6
))
# pprint(z6_tiles)
vector_tiles_url = 'https://tiles.mapillary.com/maps/vtp/mly1_public/2/{}/{}/{}?access_token={}'
for tile in z6_tiles:
res = requests.get(vector_tiles_url.format(tile.z,tile.x,tile.y,ACCESS_TOKEN))
res_json = mvt.decode(res.content)
with open('idea.json','w+') as f:
json.dump(res_json, f, indent=4)
I think this get_normalized_coordinates is the solution I was looking for. Please take this with a grain of salt, as I did not fully test it yet. Will try to and then I will update my answer. Also, please be cautious, because for tiles closer to either the South or the North Pole, the Z14_TILE_DMD_WIDTH constant will not be the one you see, but something more like: 0.0018958715374282065.
Z14_TILE_DMD_WIDTH = 0.02197265625
Z14_TILE_DMD_HEIGHT = 0.018241950298914844
def get_normalized_coordinates(bbox: mercantile.LngLatBbox,
target_lat: int,
target_lon: int,
extent: int=4096): # 4096 is Mapillary's default
"""
Returns lon,lat tuple representing real position on world map of a map feature.
"""
min_lon, min_lat, _, _ = bbox
return min_lon + target_lon / extent * Z14_TILE_DMD_WIDTH,
min_lat + target_lat / extent * Z14_TILE_DMD_HEIGHT
And if you are wondering how I came with the constants that you see, I simply iterated over the list of tiles that I am interested in and checked to make sure they all have the same width/height size (this might have not been the case, keeping in mind what I mentioned above about tiles closer to one of the poles - I think this is called "distortion", not sure). Also, for context: these tiles I iterated over are within this bbox: (-125.024414, 31.128199, -108.896484, 49.152970) (min_lon, min_lat, max_lon, max_lat; US west coast) which I believe is also why all the tiles have the same width/height sizes.
set_test = set()
for tile in relevant_tiles_set:
curr_bbox = mercantile.bounds(list_relevant_tiles_set[i])
dm_width_diff: float = curr_bbox.east - curr_bbox.west
dm_height_diff: float = curr_bbox.north - curr_bbox.south
set_test.add((dm_width_diff, dm_height_diff))
set_test
output:
{(0.02197265625, 0.018241950298914844}
UPDATE: forgot to mention that you actually do not need to compute those WIDTH, HEIGHT constants. You just replace those with (max_lon - min_lon) and (max_lat - min_lat) respectively. What I did with those constants was something for testing purposes only

Using astropy.fits and numpy to apply coincidence corrections to SWIFT fits image

This question may be a little specialist, but hopefully someone might be able to help. I normally use IDL, but for developing a pipeline I'm looking to use python to improve running times.
My fits file handling setup is as follows:
import numpy as numpy
from astropy.io import fits
#Directory: /Users/UCL_Astronomy/Documents/UCL/PHASG199/M33_UVOT_sum/UVOTIMSUM/M33_sum_epoch1_um2_norm.img
with fits.open('...') as ima_norm_um2:
#Open UVOTIMSUM file once and close it after extracting the relevant values:
ima_norm_um2_hdr = ima_norm_um2[0].header
ima_norm_um2_data = ima_norm_um2[0].data
#Individual dimensions for number of x pixels and number of y pixels:
nxpix_um2_ext1 = ima_norm_um2_hdr['NAXIS1']
nypix_um2_ext1 = ima_norm_um2_hdr['NAXIS2']
#Compute the size of the images (you can also do this manually rather than calling these keywords from the header):
#Call the header and data from the UVOTIMSUM file with the relevant keyword extensions:
corrfact_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
coincorr_um2_ext1 = numpy.zeros((ima_norm_um2_hdr['NAXIS2'], ima_norm_um2_hdr['NAXIS1']))
#Check that the dimensions are all the same:
print(corrfact_um2_ext1.shape)
print(coincorr_um2_ext1.shape)
print(ima_norm_um2_data.shape)
# Make a new image file to save the correction factors:
hdu_corrfact = fits.PrimaryHDU(corrfact_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_corrfact]).writeto('.../M33_sum_epoch1_um2_corrfact.img')
# Make a new image file to save the corrected image to:
hdu_coincorr = fits.PrimaryHDU(coincorr_um2_ext1, header=ima_norm_um2_hdr)
fits.HDUList([hdu_coincorr]).writeto('.../M33_sum_epoch1_um2_coincorr.img')
I'm looking to then apply the following corrections:
# Define the variables from Poole et al. (2008) "Photometric calibration of the Swift ultraviolet/optical telescope":
alpha = 0.9842000
ft = 0.0110329
a1 = 0.0658568
a2 = -0.0907142
a3 = 0.0285951
a4 = 0.0308063
for i in range(nxpix_um2_ext1 - 1): #do begin
for j in range(nypix_um2_ext1 - 1): #do begin
if (numpy.less_equal(i, 4) | numpy.greater_equal(i, nxpix_um2_ext1-4) | numpy.less_equal(j, 4) | numpy.greater_equal(j, nxpix_um2_ext1-4)): #then begin
#UVM2
corrfact_um2_ext1[i,j] == 0
coincorr_um2_ext1[i,j] == 0
else:
xpixmin = i-4
xpixmax = i+4
ypixmin = j-4
ypixmax = j+4
#UVM2
ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax])
xvec_UVM2 = ft*ima_UVM2sum
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2*xvec_UVM2) + (a3*xvec_UVM2*xvec_UVM2*xvec_UVM2) + (a4*xvec_UVM2*xvec_UVM2*xvec_UVM2*xvec_UVM2)
Ctheory_UVM2 = - alog(1-(alpha*ima_UVM2sum*ft))/(alpha*ft)
corrfact_um2_ext1[i,j] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum)
coincorr_um2_ext1[i,j] = corrfact_um2_ext1[i,j]*ima_sk_um2[i,j]
The above snippet is where it is messing up, as I have a mixture of IDL syntax and python syntax. I'm just not sure how to convert certain aspects of IDL to python. For example, the ima_UVM2sum = total(ima_norm_um2[xpixmin:xpixmax,ypixmin:ypixmax]) I'm not quite sure how to handle.
I'm also missing the part where it will update the correction factor and coincidence correction image files, I would say. If anyone could have the patience to go over it with a fine tooth comb and suggest the neccessary changes I need that would be excellent.
The original normalised image can be downloaded here: Replace ... in above code with this file
One very important thing about numpy is that it does every mathematical or comparison function on an element-basis. So you probably don't need to loop through the arrays.
So maybe start where you convolve your image with a sum-filter. This can be done for 2D images by astropy.convolution.convolve or scipy.ndimage.filters.uniform_filter
I'm not sure what you want but I think you want a 9x9 sum-filter that would be realized by
from scipy.ndimage.filters import uniform_filter
ima_UVM2sum = uniform_filter(ima_norm_um2_data, size=9)
since you want to discard any pixel that are at the borders (4 pixel) you can simply slice them away:
ima_UVM2sum_valid = ima_UVM2sum[4:-4,4:-4]
This ignores the first and last 4 rows and the first and last 4 columns (last is realized by making the stop value negative)
now you want to calculate the corrections:
xvec_UVM2 = ft*ima_UVM2sum_valid
fxvec_UVM2 = 1 + (a1*xvec_UVM2) + (a2*xvec_UVM2**2) + (a3*xvec_UVM2**3) + (a4*xvec_UVM2**4)
Ctheory_UVM2 = - np.alog(1-(alpha*ima_UVM2sum_valid*ft))/(alpha*ft)
these are all arrays so you still do not need to loop.
But then you want to fill your two images. Be careful because the correction is smaller (we inored the first and last rows/columns) so you have to take the same region in the correction images:
corrfact_um2_ext1[4:-4,4:-4] = Ctheory_UVM2*(fxvec_UVM2/ima_UVM2sum_valid)
coincorr_um2_ext1[4:-4,4:-4] = corrfact_um2_ext1[4:-4,4:-4] *ima_sk_um2
still no loop just using numpys mathematical functions. This means it is much faster (MUCH FASTER!) and does the same.
Maybe I have forgotten some slicing and that would yield a Not broadcastable error if so please report back.
Just a note about your loop: Python's first axis is the second axis in FITS and the second axis is the first FITS axis. So if you need to loop over the axis bear that in mind so you don't end up with IndexErrors or unexpected results.

how to properly read a PPM file using Python

Here is my overall instructions
Write a Color class that represents an RGB color using integer values in the range 0 to 255. Your class must:
Be placed in image.py
Provide a constructor that accepts the values of the red, green, and blue channels from the client and stores those values
Provide public methods that return the values of the red, green, and blue channels
Write a PortablePixmap class that represents a PPM image. Your class must:
Be placed in image.py
Provide a constructor that accepts the magic number, width, height, maximum color value, and pixel data from the client and stores those values
Store the pixel data as a list of (or list of lists of) Color objects
Provide a public method that returns a string representation of the PPM image
Write a read_ppm function that opens a PPM image file, reads its contents, and returns a PortablePixmap object that holds its contents. Your function must:
Be placed in image.py
Read the contents of a PPM image file
Not be sensitive to the formatting of the PPM image file
Exit with an error if the numbers of expected and provided pixels differ
Write a main function that tests your read_ppm function. Your function must be placed in main.py
this is what I have thus far
class Color:
# constructor takes in values from client and stores them
def __init__(self, red, green, blue):
# checks that type of arg == int: raises exception otherwise
if (isinstance(red, int) and isinstance(green, int) and isinstance(blue, int)):
print("good stuff, indeed integers")
else:
raise TypeError("Argument must be an integer.")
# checks if values are between 0 and 225
if red < 0 or red > 225:
print("0 < rgb values < 225")
elif green < 0 or green > 225:
print("0 < rgb values < 225")
elif blue < 0 or blue > 225:
print("0 < rgb values < 225")
# instance variables (RGB values)
self._red = red
self._green = green
self._blue = blue
# methods that reuturn RGB values
def returnRed(self):
return self._red
def returnGreen(self):
return self._green
def returnBlue(self):
return self._blue
'''class that represents a PPM image'''
class PortablePixmap:
def __init__(self, magic_number, width, height, max_color_value, pixel_data):
self._magic_number = magic_number
self._width = width
self._height = height
self._max_color_value = max_color_value
self._pixel_data = pixel_data
def __str__(self):
s = self._magic_number
s += '\n' + str(self._width)
s += ' ' + str(self._height)
s += '\n' + str(self._max_color_value)
for pixel in self._pixel_data:
s += ' ' + str(pixel[0])
s += ' ' + str(pixel[1])
s += ' ' + str(pixel[2])
return s
I have a few questions for clarification..
1. Did I go about creating the Color class correctly?
2. Do I even need to raise any exceptions in that class specifically? We will ultimately be reading from a file that contains everything in order but not necessarily on it's own individual line.
I really just want to know if I am going about this correctly. The instructions seem stepwise, but I am not really understanding how everything connects so I'm afraid I am either doing too much or too little.
Thanks in advance
It is not clear from the specification that you need to check the values, and your checks only raise exceptions in some cases, otherwise cause side effects (printing); from a reuse perspective, I'd prefer to have only the exceptions if any. Aside from the indentation error (which I assume is only here, not in your source) the Color class looks to cover the demands, although they are quite unpythonic with the accessors; probably someone was trained by Java.
The docstring should be inside the PortablePixmap class, not above it.
Most remarkable is the combination of demands that your class not be sensitive to the formatting of the PPM and store pixels as 8-bit unsigned RGB. This makes it impossible to support all PPMs, as they support 16-bit values (note the maxval field in the PPM format).
Your PortablePixmap class also doesn't use the Color class: "Store the pixel data as a list of (or list of lists of) Color objects". That requirement forces a rather awfully inefficient implementation, but the whole thing is an exercise, I suppose. You'll need to extract the RGB triplets from the pixel data string. That's also where you need the one check that is specified; verifying that there are exactly the right number of pixels. One would expect a ValueError exception if that fails.
If I were writing this sort of thing I might have used slots to reduce memory use for classes like Color, arrays to handle the large number of limited range numeric values, and possibly properties to make storage transparent without using unwieldy getter methods. split and join would make it easier to handle the collection of pixels.

put stockprices into groups when they are within 0.5% of each other

Thanks for the answers, I have not used StackOverflow before so I was suprised by the number of answers and the speed of them - its fantastic.
I have not been through the answers properly yet, but thought I should add some information to the problem specification. See the image below.
I can't post an image in this because i don't have enough points but you can see an image
at http://journal.acquitane.com/2010-01-20/image003.jpg
This image may describe more closely what I'm trying to achieve. So you can see on the horizontal lines across the page are price points on the chart. Now where you get a clustering of lines within 0.5% of each, this is considered to be a good thing and why I want to identify those clusters automatically. You can see on the chart that there is a cluster at S2 & MR1, R2 & WPP1.
So everyday I produce these price points and then I can identify manually those that are within 0.5%. - but the purpose of this question is how to do it with a python routine.
I have reproduced the list again (see below) with labels. Just be aware that the list price points don't match the price points in the image because they are from two different days.
[YR3,175.24,8]
[SR3,147.85,6]
[YR2,144.13,8]
[SR2,130.44,6]
[YR1,127.79,8]
[QR3,127.42,5]
[SR1,120.94,6]
[QR2,120.22,5]
[MR3,118.10,3]
[WR3,116.73,2]
[DR3,116.23,1]
[WR2,115.93,2]
[QR1,115.83,5]
[MR2,115.56,3]
[DR2,115.53,1]
[WR1,114.79,2]
[DR1,114.59,1]
[WPP,113.99,2]
[DPP,113.89,1]
[MR1,113.50,3]
[DS1,112.95,1]
[WS1,112.85,2]
[DS2,112.25,1]
[WS2,112.05,2]
[DS3,111.31,1]
[MPP,110.97,3]
[WS3,110.91,2]
[50MA,110.87,4]
[MS1,108.91,3]
[QPP,108.64,5]
[MS2,106.37,3]
[MS3,104.31,3]
[QS1,104.25,5]
[SPP,103.53,6]
[200MA,99.42,7]
[QS2,97.05,5]
[YPP,96.68,8]
[SS1,94.03,6]
[QS3,92.66,5]
[YS1,80.34,8]
[SS2,76.62,6]
[SS3,67.12,6]
[YS2,49.23,8]
[YS3,32.89,8]
I did make a mistake with the original list in that Group C is wrong and should not be included. Thanks for pointing that out.
Also the 0.5% is not fixed this value will change from day to day, but I have just used 0.5% as an example for spec'ing the problem.
Thanks Again.
Mark
PS. I will get cracking on checking the answers now now.
Hi:
I need to do some manipulation of stock prices. I have just started using Python, (but I think I would have trouble implementing this in any language). I'm looking for some ideas on how to implement this nicely in python.
Thanks
Mark
Problem:
I have a list of lists (FloorLevels (see below)) where the sublist has two items (stockprice, weight). I want to put the stockprices into groups when they are within 0.5% of each other. A groups strength will be determined by its total weight. For example:
Group-A
115.93,2
115.83,5
115.56,3
115.53,1
-------------
TotalWeight:12
-------------
Group-B
113.50,3
112.95,1
112.85,2
-------------
TotalWeight:6
-------------
FloorLevels[
[175.24,8]
[147.85,6]
[144.13,8]
[130.44,6]
[127.79,8]
[127.42,5]
[120.94,6]
[120.22,5]
[118.10,3]
[116.73,2]
[116.23,1]
[115.93,2]
[115.83,5]
[115.56,3]
[115.53,1]
[114.79,2]
[114.59,1]
[113.99,2]
[113.89,1]
[113.50,3]
[112.95,1]
[112.85,2]
[112.25,1]
[112.05,2]
[111.31,1]
[110.97,3]
[110.91,2]
[110.87,4]
[108.91,3]
[108.64,5]
[106.37,3]
[104.31,3]
[104.25,5]
[103.53,6]
[99.42,7]
[97.05,5]
[96.68,8]
[94.03,6]
[92.66,5]
[80.34,8]
[76.62,6]
[67.12,6]
[49.23,8]
[32.89,8]
]
I suggest a repeated use of k-means clustering -- let's call it KMC for short. KMC is a simple and powerful clustering algorithm... but it needs to "be told" how many clusters, k, you're aiming for. You don't know that in advance (if I understand you correctly) -- you just want the smallest k such that no two items "clustered together" are more than X% apart from each other. So, start with k equal 1 -- everything bunched together, no clustering pass needed;-) -- and check the diameter of the cluster (a cluster's "diameter", from the use of the term in geometry, is the largest distance between any two members of a cluster).
If the diameter is > X%, set k += 1, perform KMC with k as the number of clusters, and repeat the check, iteratively.
In pseudo-code:
def markCluster(items, threshold):
k = 1
clusters = [items]
maxdist = diameter(items)
while maxdist > threshold:
k += 1
clusters = Kmc(items, k)
maxdist = max(diameter(c) for c in clusters)
return clusters
assuming of course we have suitable diameter and Kmc Python functions.
Does this sound like the kind of thing you want? If so, then we can move on to show you how to write diameter and Kmc (in pure Python if you have a relatively limited number of items to deal with, otherwise maybe by exploiting powerful third-party add-on frameworks such as numpy) -- but it's not worthwhile to go to such trouble if you actually want something pretty different, whence this check!-)
A stock s belong in a group G if for each stock t in G, s * 1.05 >= t and s / 1.05 <= t, right?
How do we add the stocks to each group? If we have the stocks 95, 100, 101, and 105, and we start a group with 100, then add 101, we will end up with {100, 101, 105}. If we did 95 after 100, we'd end up with {100, 95}.
Do we just need to consider all possible permutations? If so, your algorithm is going to be inefficient.
You need to specify your problem in more detail. Just what does "put the stockprices into groups when they are within 0.5% of each other" mean?
Possibilities:
(1) each member of the group is within 0.5% of every other member of the group
(2) sort the list and split it where the gap is more than 0.5%
Note that 116.23 is within 0.5% of 115.93 -- abs((116.23 / 115.93 - 1) * 100) < 0.5 -- but you have put one number in Group A and one in Group C.
Simple example: a, b, c = (0.996, 1, 1.004) ... Note that a and b fit, b and c fit, but a and c don't fit. How do you want them grouped, and why? Is the order in the input list relevant?
Possibility (1) produces ab,c or a,bc ... tie-breaking rule, please
Possibility (2) produces abc (no big gaps, so only one group)
You won't be able to classify them into hard "groups". If you have prices (1.0,1.05, 1.1) then the first and second should be in the same group, and the second and third should be in the same group, but not the first and third.
A quick, dirty way to do something that you might find useful:
def make_group_function(tolerance = 0.05):
from math import log10, floor
# I forget why this works.
tolerance_factor = -1.0/(-log10(1.0 + tolerance))
# well ... since you might ask
# we want: log(x)*tf - log(x*(1+t))*tf = -1,
# so every 5% change has a different group. The minus is just so groups
# are ascending .. it looks a bit nicer.
#
# tf = -1/(log(x)-log(x*(1+t)))
# tf = -1/(log(x/(x*(1+t))))
# tf = -1/(log(1/(1*(1+t)))) # solved .. but let's just be more clever
# tf = -1/(0-log(1*(1+t)))
# tf = -1/(-log((1+t))
def group_function(value):
# don't just use int - it rounds up below zero, and down above zero
return int(floor(log10(value)*tolerance_factor))
return group_function
Usage:
group_function = make_group_function()
import random
groups = {}
for i in range(50):
v = random.random()*500+1000
group = group_function(v)
if group in groups:
groups[group].append(v)
else:
groups[group] = [v]
for group in sorted(groups):
print 'Group',group
for v in sorted(groups[group]):
print v
print
For a given set of stock prices, there is probably more than one way to group stocks that are within 0.5% of each other. Without some additional rules for grouping the prices, there's no way to be sure an answer will do what you really want.
apart from the proper way to pick which values fit together, this is a problem where a little Object Orientation dropped in can make it a lot easier to deal with.
I made two classes here, with a minimum of desirable behaviors, but which can make the classification a lot easier -- you get a single point to play with it on the Group class.
I can see the code bellow is incorrect, in the sense the limtis for group inclusion varies as new members are added -- even it the separation crieteria remaisn teh same, you heva e torewrite the get_groups method to use a multi-pass approach. It should nto be hard -- but the code would be too long to be helpfull here, and i think this snipped is enoguh to get you going:
from copy import copy
class Group(object):
def __init__(self,data=None, name=""):
if data:
self.data = data
else:
self.data = []
self.name = name
def get_mean_stock(self):
return sum(item[0] for item in self.data) / len(self.data)
def fits(self, item):
if 0.995 < abs(item[0]) / self.get_mean_stock() < 1.005:
return True
return False
def get_weight(self):
return sum(item[1] for item in self.data)
def __repr__(self):
return "Group-%s\n%s\n---\nTotalWeight: %d\n\n" % (
self.name,
"\n".join("%.02f, %d" % tuple(item) for item in self.data ),
self.get_weight())
class StockGrouper(object):
def __init__(self, data=None):
if data:
self.floor_levels = data
else:
self.floor_levels = []
def get_groups(self):
groups = []
floor_levels = copy(self.floor_levels)
name_ord = ord("A") - 1
while floor_levels:
seed = floor_levels.pop(0)
name_ord += 1
group = Group([seed], chr(name_ord))
groups.append(group)
to_remove = []
for i, item in enumerate(floor_levels):
if group.fits(item):
group.data.append(item)
to_remove.append(i)
for i in reversed(to_remove):
floor_levels.pop(i)
return groups
testing:
floor_levels = [ [stock. weight] ,... <paste the data above> ]
s = StockGrouper(floor_levels)
s.get_groups()
For the grouping element, could you use itertools.groupby()? As the data is sorted, a lot of the work of grouping it is already done, and then you could test if the current value in the iteration was different to the last by <0.5%, and have itertools.groupby() break into a new group every time your function returned false.

Categories

Resources