Python Class / Instance /Object Organization Question - python
The following is a simplified example for something I'm trying to do in python (with pygame, but that's probably irrelevant).
I have a list of 8x8 pixel jpgs, each depicting an English letter:
[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z]
I want to arrange a 4x4 grid of these, in any pattern I want, as a larger 32x32 picture:
gmmg
gppg
gppg
gmmg
But that pattern is only a single frame of an animation.
For example, I might want 4 animation frames where b's and n's flash side to side alternately while an f moves southwest:
bbnf nnbb bbnn nnbb
bbnn nnfb bbnn nnbb
bbnn nnbb bfnn nnbb
bbnn nnbb bbnn fnbb
I want control over the letter value of each square in each frame to make any animation, so I guess essentially there are 64 separate variables (for a 4-frame animation like the one shown above). Each square also has an [x,y] list position variable and rbg color.
My question is how to organize this all with classes (I'm trying to learn OOP). Logically, it seems that each frame contains squares, and each square contains variables like position, letter and color. But I suppose you could even think of it as each square 'contains' 4 frames...
My guess is make a frame class, and put 4 instances of it in a list (if there's 4 frames) and somehow make each frame instance contain a list of 16 square instances. Maybe usable like frames[2].squares[5].letter = f (Just a fuzzy idea; I'm too new at OOP to know if that's remotely correct or a good idea). But it would be helpful to see how someone who knows what they're doing would organize all this.
Thanks!
Since the size of a frame is fixed, and the number of frames is not, then making a class Frame seems like a good first choice. Each Frame would contain a member grid which could be a list of four lists of four letters. A list of four strings wouldn't work as well since strings are immutable, though having it be a single 16-character string might perform better. You'd have to profile to be sure. For this answer, I'll assume you're going with a list of lists of characters.
Then make a class Animation that has a frames member, which is a list of frames. Then you'll write code that looks like:
myAnimation.frames[10].grid[2][3] = 'f'
I can provide more detail if desired.
EXAMPLE: (Haven't tested this yet, but it should be close. The doc comments should hopefully work with doctest.)
import string
class Frame(object):
"""This is a single frame in an animation."""
def __init__(self, fill=None, color=None):
"""Initializes the frame.
>>> f = Frame()
>>> f.print()
aaaa
aaaa
aaaa
aaaa
>>> g = Frame(fill='c', color=(0, 255, 0))
>>> g.print()
cccc
cccc
cccc
cccc
"""
if fill is None:
fill = 'a' # Or whatever default you want
self.letterGrid = []
for row in range(4):
self.letterGrid.append([fill for col in range(4)])
if color is None:
color = (0, 0, 0)
self.colorGrid = []
for row in range(4):
self.letterGrid.append([fill for col in range(4)])
def set_grid(self, row, col, letter=None, color=None):
"""Sets the letter and/or color at the given grid.
>>> f.set_grid(1, 1, 'b', (255, 0, 0))
>>> f.print()
aaaa
abaa
aaaa
aaaa
>>> f.set_grid(1, 3, letter='x')
>>> f.print()
aaaa
abax
aaaa
aaaa
>>> f.set_grid(3, 3, color=(255, 0, 0))
"""
if letter is not None:
self.letterGrid[row][col] = letter
if color is not None:
self.colorGrid[row][col] = color
def position(row, col):
return (row * 16, col * 16)
def print(self):
"""Simple routine to print a frame as text."""
for row in self.letterGrid:
print(''.join(row))
class Animation(object):
def __init__(self, frames=None):
"""Initializes an animation."""
self.frames = frames or []
Hope this gets you started.
the alternative approach would be to come up with a suitable generic datastructure solely made up from dictionaries, lists, sets and so on, and then write library methods to manipulate that data. that doesn't sound very classical OOP, and it isn't, but i've found that way easier to handle and easier to 'get right'. you can clearly seperate the two concerns of building data containers on the one hand and defining suitable data manipulation code on the other.
as earlier posters suggested, the animation could be modeled as a list of frames; each frame then either contains 32 lists with 32 elements each, or 8 lists with 8 elements each where each element models again the 4x4 grid shown above. of course, whether you actually precompute (or simply define) each frame beforehand, or whether you manipulate the data of a single frame 'live' during the animation depends on further considerations.
#Mike
(replying to you above was limited to 600 characters so I guess I'll show my reply here)
This is my attempt at the Frame class so far. I don't know if I should define one class inside another or whether or how to send a list of instances to the Animation class or something. Each square can have a unique letter, position, and color (position because I intend for the columns or rows to be positionally shiftable). So that's why I put 3 types of grids in there (not sure if that's a good idea, or whether an individual square should have its own class too or something).
class Frame(object):
def __init__(self, letterGrid, positionGrid, colorGrid):
self.letterGrid = letterGrid
self.positionGrid = positionGrid
self.colorGrid = colorGrid
class Animation(object):
def __init__(self, frames):
self.frames = frames
frames = []
frames.append(Frame( [
['b','b','n','f'],
['b','b','n','n'],
['b','b','n','n'],
['b','b','n','n'] ],
[
[[0,0],[16,0],[32,0],[48,0]],
[[0,16],[16,16],[32,16],[48,16]],
[[0,32],[16,32],[32,32],[48,32]],
[[0,48],[16,48],[32,48],[48,48]] ],
[
[[0,0,255],[0,0,0],[0,0,0],[0,0,0]],
[[0,0,255],[0,0,0],[0,0,0],[0,0,0]],
[[0,0,255],[0,0,0],[0,0,0],[0,0,0]],
[[0,0,255],[0,0,0],[0,0,0],[0,0,0]] ]
))
frames.append(Frame( [
['n','n','b','b'],
['n','n','f','b'],
['n','n','b','b'],
['n','n','b','b'] ],
[
[[0,0],[16,0],[32,0],[48,0]],
[[0,16],[16,16],[32,16],[48,16]],
[[0,32],[16,32],[32,32],[48,32]],
[[0,48],[16,48],[32,48],[48,48]] ],
[
[[0,0,0],[0,0,255],[0,0,0],[0,0,0]],
[[0,0,0],[0,0,255],[0,0,0],[0,0,0]],
[[0,0,0],[0,0,255],[0,0,0],[0,0,0]],
[[0,0,0],[0,0,255],[0,0,0],[0,0,0]] ]
))
frames.append(Frame( [
['b','b','n','n'],
['b','b','n','n'],
['b','f','n','n'],
['b','b','n','n'] ],
[
[[0,0],[16,0],[32,0],[48,0]],
[[0,16],[16,16],[32,16],[48,16]],
[[0,32],[16,32],[32,32],[48,32]],
[[0,48],[16,48],[32,48],[48,48]] ],
[
[[0,0,0],[0,0,0],[0,0,255],[0,0,0]],
[[0,0,0],[0,0,0],[0,0,255],[0,0,0]],
[[0,0,0],[0,0,0],[0,0,255],[0,0,0]],
[[0,0,0],[0,0,0],[0,0,255],[0,0,0]] ]
))
frames.append(Frame( [
['n','n','b','b'],
['n','n','b','b'],
['n','n','b','b'],
['n','n','b','b'] ],
[
[[0,0],[16,0],[32,0],[48,0]],
[[0,16],[16,16],[32,16],[48,16]],
[[0,32],[16,32],[32,32],[48,32]],
[[0,48],[16,48],[32,48],[48,48]] ],
[
[[0,0,0],[0,0,0],[0,0,0],[0,0,255]],
[[0,0,0],[0,0,0],[0,0,0],[0,0,255]],
[[0,0,0],[0,0,0],[0,0,0],[0,0,255]],
[[0,0,0],[0,0,0],[0,0,0],[0,0,255]] ]
))
print "3rd frame's colorGrid:\n", frames[2].colorGrid
Related
win32com LineStyle Excel
Luckily i found this side: https://www.linuxtut.com/en/150745ae0cc17cb5c866/ (There are many Linetypes difined Excel Enum XlLineStyle) (xlContinuous = 1 xlDashDot = 4 xlDashDotDot = 5 xlSlantDashDot = 13 xlDash = -4115 xldot = -4118 xlDouble = -4119 xlLineStyleNone = -4142) i run with try and except +/- 100.000 times set lines because i thought anywhere should be this [index] number for put this line in my picture too but they warsnt.. why not? how can i set this line? why are there some line indexe's in a such huge negative ranche and not just 1, 2, 3...? how can i discover things like the "number" for doing things like that? why is this even possible, to send apps data's in particular positions, i want to step a little deeper in that, where can i learn more about this?
(1) You can't find the medium dashed in the linestyle enum because there is none. The line that is drawn as border is a combination of lineStyle and Weight. The lineStyle is xlDash, the weight is xlThin for value 03 in your table and xlMedium for value 08. (2) To figure out how to set something like this in VBA, use the Macro recorder, it will reveal that lineStyle, Weight (and color) are set when setting a border. (3) There are a lot of pages describing all the constants, eg have a look to the one #FaneDuru linked to in the comments. They can also be found at Microsoft itself: https://learn.microsoft.com/en-us/office/vba/api/excel.xllinestyle and https://learn.microsoft.com/en-us/office/vba/api/excel.xlborderweight. It seems that someone translated them to Python constants on the linuxTut page. (4) Don't ask why the enums are not continuous values. I assume especially the constants with negative numbers serve more that one purpose. Just never use the values directly, always use the defined constants. (5) You can assume that numeric values that have no defined constant can work, but the results are kind of unpredictable. It's unlikely that there are values without constant that result in something "new" (eg a different border style). As you can see in the following table, not all combination give different borders. Setting the weight to xlHairline will ignore the lineStyle. Setting it to xlThick will also ignore the lineStyle, except for xlDouble. Ob the other hand, xlDouble will be ignored when the weight is not xlThick. Sub border() With ThisWorkbook.Sheets(1) With .Range("A1:J18") .Clear .Interior.Color = vbWhite End With Dim lStyles(), lWeights(), lStyleNames(), lWeightNames lStyles() = Array(xlContinuous, xlDash, xlDashDot, xlDashDotDot, xlDot, xlDouble, xlLineStyleNone, xlSlantDashDot) lStyleNames() = Array("xlContinuous", "xlDash", "xlDashDot", "xlDashDotDot", "xlDot", "xlDouble", "xlLineStyleNone", "xlSlantDashDot") lWeights = Array(xlHairline, xlThin, xlMedium, xlThick) lWeightNames = Array("xlHairline", "xlThin", "xlMedium", "xlThick") Dim x As Long, y As Long For x = LBound(lStyles) To UBound(lStyles) Dim row As Long row = x * 2 + 3 .Cells(row, 1) = lStyleNames(x) & vbLf & "(" & lStyles(x) & ")" For y = LBound(lWeights) To UBound(lWeights) Dim col As Long col = y * 2 + 3 If x = 1 Then .Cells(1, col) = lWeightNames(y) & vbLf & "(" & lWeights(y) & ")" With .Cells(row, col).Borders .LineStyle = lStyles(x) .Weight = lWeights(y) End With Next Next End With End Sub
Maya Python Create joints Hierachy
I'm trying to create o hierarchy of joints for a skeleton in maya python. And I'm doing this def makeSkelet(args): helperSkelet('Root_Locator', 'root_Joint') helperSkelet('Pelvis_Locator', 'pelvis_Joint') helperSkelet('Spine_Locator', 'spine_Joint') helperSkelet('Spine01_Locator', 'spine01_Joint') helperSkelet('Spine02_Locator', 'spine02_Joint') helperSkelet('Neck_Locator', 'neck_Joint') helperSkelet('Head_Locator', 'head_Joint') mc.select(cl=True) helperSkelet('ArmL_Locator', 'armL_joint') helperSkelet('ElbowL_Locator', 'elbowL_Joint') helperSkelet('HandL_Locator', 'handL_Joint') mc.select(cl=True) helperSkelet('ArmR_Locator', 'armR_joint') helperSkelet('ElbowR_Locator', 'elbowR_Joint') helperSkelet('HandR_Locator', 'handR_Joint') mc.select(cl=True) helperSkelet('HipL_Locator', 'hipL_joint') helperSkelet('KneeL_Locator', 'kneeL_Joint') helperSkelet('AnkleL_Locator', 'ankleL_Joint') helperSkelet('FootL_Locator', 'footL_Joint') mc.select(cl=True) helperSkelet('HipR_Locator', 'hipR_joint') helperSkelet('KneeR_Locator', 'kneeR_Joint') helperSkelet('AnkleR_Locator', 'ankleR_Joint') helperSkelet('FootR_Locator', 'footR_Joint') Now this works fine, because the joints must be created in this order. (the helper skelet is a function where i create the joint with the reference to a locator position) I was wondering if there is a more optimized way to do this considering the order or creation must be kept . Thank you
If by "optimize" you mean getting better performace, I agree with what #downshift said. If what you meant was instead making your code "cleaner" (more general or scalable or simply more pythonic), here's another way you can do the same, which is a bit more compact (and separates the logic from your input): def helperSkeletGroup(group, symmetric=False): # quick workaround to capitalize a word, leaving the following letters unchanged capitalize = lambda s: s[:1].upper() + s[1:] symmetric_group = [] for elem in group: if symmetric: symmetric_group.append('{0}R'.format(elem)) elem = '{0}L'.format(elem) # format locators and joints loc, joint = '{0}_Locator'.format(capitalize(elem)), '{0}_Joint'.format(elem) helperSkelet(loc, joint) cmds.select(cl=True) if symmetric_group: helperSkeletGroup(symmetric_group) helperSkeletGroup(['root', 'pelvis', 'spine', 'spine01', 'spine02', 'neck', 'head']) helperSkeletGroup(['arm', 'elbow', 'hand'], True) helperSkeletGroup(['hip', 'knee', 'ankle', 'foot'], True) This comes with a few advantages: it handles symmetry for you the code doesn't grow too much, as the number of joints increases if at some point you want to change the naming convention for locators and joints, you can do it by changing a single line Alternatively, you could go with an OOP approach. Here's an example: class Skeleton: def __init__(self): self.joint_groups = [] def add_joint_group(self, group, symmetric=False): # quick workaround to capitalize a word, leaving the following letters unchanged capitalize = lambda s: s[:1].upper() + s[1:] processed, processed_symmetric = [], [] for elem in group: if symmetric: processed_symmetric.append('{0}R'.format(elem)) elem = '{0}L'.format(elem) processed.append(('{0}_Locator'.format(capitalize(elem)), '{0}_Joint'.format(elem))) self.joint_groups.append(processed) if processed_symmetric: self.add_joint_group(processed_symmetric) def helper_skelet(self, loc, joint): # your helper logic goes here print loc, joint def build(self): for group in self.joint_groups: for loc, joint in group: self.helper_skelet(loc, joint) cmds.select(cl=True) skeleton = Skeleton() skeleton.add_joint_group(['root', 'pelvis', 'spine', 'spine01', 'spine02', 'neck', 'head']) skeleton.add_joint_group(['arm', 'elbow', 'hand'], True) skeleton.add_joint_group(['hip', 'knee', 'ankle', 'foot'], True) from pprint import pformat print pformat(skeleton.joint_groups) skeleton.build() Here the code is a bit longer but it is all contained in a single object, where you could store additional data, which you get only at construction time and which you might need later on. EDIT (to answer #Giakaama's question in the comment): If you save the class in a separate file skeleton_class.py, you can import the class in your main.py (or whatever you want to call it), as such: from skeleton_class import Skeleton where the lower-case skeleton_class refers to your module (read: file) and Skeleton is the class itself. Once you've done that, you can do the same as above: skeleton = Skeleton() skeleton.add_joint_group(['root', 'pelvis', 'spine', 'spine01', 'spine02', 'neck', 'head']) skeleton.add_joint_group(['arm', 'elbow', 'hand'], True) skeleton.build()
How to make this code less repetitive? (openpyxl)
I'm a teacher and I'm making a program to facilitate myself to catalog my students' grades. This isn't so much of a problem, the deal is, I'm doing it mostly to practice programming. # Name the first line sheet['A1'] = 'Index' sheet['B1'] = 'Name' sheet['C1'] = 'Grade' # Changes the style of the first line to bold sheet['A1'].font = font_bold sheet['B1'].font = font_bold sheet['C1'].font = font_bold # Widens the columns sheet.column_dimensions['A'].width = 10 sheet.column_dimensions['B'].width = 30 sheet.column_dimensions['C'].width = 30 # Aligns to center sheet.cell('A1').alignment = Alignment(horizontal='center', vertical='center') sheet.cell('B1').alignment = Alignment(horizontal='center', vertical='center') sheet.cell('C1').alignment = Alignment(horizontal='center', vertical='center') # Freeze the first row sheet.freeze_panes = 'A2' # Index number of the lines i = 2 -- -- # function to calculate the grade def grade(): As you may notice, it is all, to some extent, repetitive. The code functions exactly as I want it to, but I would like to know some other way to make it more... succint. It is important to remember the variables reach, because up next a function will start, and, soon after, a While loop. The irrelevant parts of the code for this question have been omitted with a ---. If they are somehow needed I'll edit them in, but to my knowledge, they are not. Thank you in advance.
Objects and DRY (Don't Repeat Yourself) Principle In general, whenever you have some parallel arrays of the same length that keep some related attribute, it is a good indication that you can combine those attributes and make an object out of them. For your case, I see that suggest to define an object Sheet with the following attributes: title font column_dimensions cell ...
I'd take the whole thing and break it down into functions. def name_column(cell, name): sheet[cell] = name def style_name_column(cell, style): sheet[cell].font = style def change_width(column, width): sheet.column_dimensions[column].width = width def align_column(cell): sheet.cell(cell).alignment = Alignment(horizontal='center', vertical='center') then use some sort of data structure to loop over and do this stuff. indexes_and_names = [['A1','Index'] ['B1','Name' ] ['C1','Grade']] for item in indexes_and_names: name_column(item[0], item[1]) and then repeat for the other functions, or use a bigger data structure, like Jack's dictionary. Your code will be readable, and easily maintainable.
Here's a good start: items = [ {"sheet": "A1", "column": "A", "column_width": 10, "title": "Index"}, {"sheet": "B1", "column": "B", "column_width": 30, "title": "Name"}, {"sheet": "C1", "column": "C", "column_width": 30} "title": "Grade"}, ] for item in items: sheet[item["sheet"]] = item["title"] sheet[item["sheet"]].font = font_bold # always bold sheet.column_dimensions[item["column"]] = item["column_width"] # shared value sheet.cell(item["sheet"]).alignment = Alignment(horizontal='center', vertical='center') # shared value
openpyxl provides all the necessary functions to do this quickly and easily. The best way to do this is to use named styles. You can create a single style for the header cells and apply it using a loop. Create the first row: ws.append(['Index', 'Name', 'Grade']) Create the relevant style: header = NamedStyle(…) Apply the style to the first row: for cell in ws[1]: cell.style = header
python class property trouble
I have the following code: class tile: def __init__(self, value): self.value = value class map_2d: def __init__(self, xsize, ysize): self.dimx = xsize self.dimy = ysize self.canvas = [[tile(0)] * xsize for i in range(ysize)] for yc in range(ysize): for xc in range(xsize): self.canvas[yc][xc].x = xc self.canvas[yc][xc].y = yc #CHECKPOINT #TEST: mymap = map_2d(10, 10) for line in mymap.canvas: print ' | '.join('%d:%d' % (cell.x, cell.y) for cell in line) I expect to have a map_2d instance with .canvas property, that is a 2d array of tile instances with x and y properties corresponding to the tile coordinates. Like 0:0, 1:0, 2:0, ... Problem is, in the end ALL my tiles have an x property of xsize-1, 9 in the test above. It is utterly confusing, since at the moment marked by #CHECKPOINT everything is right and all tiles have their actual coordinates as x and y properties. Nothing is wrong with my visualization method either. I would welcome any hints to help with this mystery. Any suggestions about achieving my goal (which is assigning coordinates to cells) more efficiently will be appreciated as well. Moreover, if anyone reading this feels like "what the hell is this guy doing", I'd be grateful for any sound advice on how to deal with simple map generation, which is my ultimate goal in this case. I did all this to have a way of addressing tiles adjacent to another tile by coordinates, but my approach feels quite suboptimal.
This line doesn't do what you expect: self.canvas = [[tile(0)] * xsize for i in range(size)] Even though it seems to create a list of lists, you're actually getting lists that contain a reference to the same object tile(0). So when you modify canvas[0][0], you're also modifying canvas[0][1], canvas[0][2] and so on. For example: >>> [tile(0)] * 5 [<__main__.Tile instance at 0x10200eea8>, <__main__.Tile instance at 0x10200eea8>, <__main__.Tile instance at 0x10200eea8>, <__main__.Tile instance at 0x10200eea8>, <__main__.Tile instance at 0x10200eea8>] Each object has the same memory address so it's a list of five elements which are actually all the same object. You can solve this by explicitly creating new objects: self.canvas = [[tile(0) for j in range(xsize)] for i in range(ysize)]
put stockprices into groups when they are within 0.5% of each other
Thanks for the answers, I have not used StackOverflow before so I was suprised by the number of answers and the speed of them - its fantastic. I have not been through the answers properly yet, but thought I should add some information to the problem specification. See the image below. I can't post an image in this because i don't have enough points but you can see an image at http://journal.acquitane.com/2010-01-20/image003.jpg This image may describe more closely what I'm trying to achieve. So you can see on the horizontal lines across the page are price points on the chart. Now where you get a clustering of lines within 0.5% of each, this is considered to be a good thing and why I want to identify those clusters automatically. You can see on the chart that there is a cluster at S2 & MR1, R2 & WPP1. So everyday I produce these price points and then I can identify manually those that are within 0.5%. - but the purpose of this question is how to do it with a python routine. I have reproduced the list again (see below) with labels. Just be aware that the list price points don't match the price points in the image because they are from two different days. [YR3,175.24,8] [SR3,147.85,6] [YR2,144.13,8] [SR2,130.44,6] [YR1,127.79,8] [QR3,127.42,5] [SR1,120.94,6] [QR2,120.22,5] [MR3,118.10,3] [WR3,116.73,2] [DR3,116.23,1] [WR2,115.93,2] [QR1,115.83,5] [MR2,115.56,3] [DR2,115.53,1] [WR1,114.79,2] [DR1,114.59,1] [WPP,113.99,2] [DPP,113.89,1] [MR1,113.50,3] [DS1,112.95,1] [WS1,112.85,2] [DS2,112.25,1] [WS2,112.05,2] [DS3,111.31,1] [MPP,110.97,3] [WS3,110.91,2] [50MA,110.87,4] [MS1,108.91,3] [QPP,108.64,5] [MS2,106.37,3] [MS3,104.31,3] [QS1,104.25,5] [SPP,103.53,6] [200MA,99.42,7] [QS2,97.05,5] [YPP,96.68,8] [SS1,94.03,6] [QS3,92.66,5] [YS1,80.34,8] [SS2,76.62,6] [SS3,67.12,6] [YS2,49.23,8] [YS3,32.89,8] I did make a mistake with the original list in that Group C is wrong and should not be included. Thanks for pointing that out. Also the 0.5% is not fixed this value will change from day to day, but I have just used 0.5% as an example for spec'ing the problem. Thanks Again. Mark PS. I will get cracking on checking the answers now now. Hi: I need to do some manipulation of stock prices. I have just started using Python, (but I think I would have trouble implementing this in any language). I'm looking for some ideas on how to implement this nicely in python. Thanks Mark Problem: I have a list of lists (FloorLevels (see below)) where the sublist has two items (stockprice, weight). I want to put the stockprices into groups when they are within 0.5% of each other. A groups strength will be determined by its total weight. For example: Group-A 115.93,2 115.83,5 115.56,3 115.53,1 ------------- TotalWeight:12 ------------- Group-B 113.50,3 112.95,1 112.85,2 ------------- TotalWeight:6 ------------- FloorLevels[ [175.24,8] [147.85,6] [144.13,8] [130.44,6] [127.79,8] [127.42,5] [120.94,6] [120.22,5] [118.10,3] [116.73,2] [116.23,1] [115.93,2] [115.83,5] [115.56,3] [115.53,1] [114.79,2] [114.59,1] [113.99,2] [113.89,1] [113.50,3] [112.95,1] [112.85,2] [112.25,1] [112.05,2] [111.31,1] [110.97,3] [110.91,2] [110.87,4] [108.91,3] [108.64,5] [106.37,3] [104.31,3] [104.25,5] [103.53,6] [99.42,7] [97.05,5] [96.68,8] [94.03,6] [92.66,5] [80.34,8] [76.62,6] [67.12,6] [49.23,8] [32.89,8] ]
I suggest a repeated use of k-means clustering -- let's call it KMC for short. KMC is a simple and powerful clustering algorithm... but it needs to "be told" how many clusters, k, you're aiming for. You don't know that in advance (if I understand you correctly) -- you just want the smallest k such that no two items "clustered together" are more than X% apart from each other. So, start with k equal 1 -- everything bunched together, no clustering pass needed;-) -- and check the diameter of the cluster (a cluster's "diameter", from the use of the term in geometry, is the largest distance between any two members of a cluster). If the diameter is > X%, set k += 1, perform KMC with k as the number of clusters, and repeat the check, iteratively. In pseudo-code: def markCluster(items, threshold): k = 1 clusters = [items] maxdist = diameter(items) while maxdist > threshold: k += 1 clusters = Kmc(items, k) maxdist = max(diameter(c) for c in clusters) return clusters assuming of course we have suitable diameter and Kmc Python functions. Does this sound like the kind of thing you want? If so, then we can move on to show you how to write diameter and Kmc (in pure Python if you have a relatively limited number of items to deal with, otherwise maybe by exploiting powerful third-party add-on frameworks such as numpy) -- but it's not worthwhile to go to such trouble if you actually want something pretty different, whence this check!-)
A stock s belong in a group G if for each stock t in G, s * 1.05 >= t and s / 1.05 <= t, right? How do we add the stocks to each group? If we have the stocks 95, 100, 101, and 105, and we start a group with 100, then add 101, we will end up with {100, 101, 105}. If we did 95 after 100, we'd end up with {100, 95}. Do we just need to consider all possible permutations? If so, your algorithm is going to be inefficient.
You need to specify your problem in more detail. Just what does "put the stockprices into groups when they are within 0.5% of each other" mean? Possibilities: (1) each member of the group is within 0.5% of every other member of the group (2) sort the list and split it where the gap is more than 0.5% Note that 116.23 is within 0.5% of 115.93 -- abs((116.23 / 115.93 - 1) * 100) < 0.5 -- but you have put one number in Group A and one in Group C. Simple example: a, b, c = (0.996, 1, 1.004) ... Note that a and b fit, b and c fit, but a and c don't fit. How do you want them grouped, and why? Is the order in the input list relevant? Possibility (1) produces ab,c or a,bc ... tie-breaking rule, please Possibility (2) produces abc (no big gaps, so only one group)
You won't be able to classify them into hard "groups". If you have prices (1.0,1.05, 1.1) then the first and second should be in the same group, and the second and third should be in the same group, but not the first and third. A quick, dirty way to do something that you might find useful: def make_group_function(tolerance = 0.05): from math import log10, floor # I forget why this works. tolerance_factor = -1.0/(-log10(1.0 + tolerance)) # well ... since you might ask # we want: log(x)*tf - log(x*(1+t))*tf = -1, # so every 5% change has a different group. The minus is just so groups # are ascending .. it looks a bit nicer. # # tf = -1/(log(x)-log(x*(1+t))) # tf = -1/(log(x/(x*(1+t)))) # tf = -1/(log(1/(1*(1+t)))) # solved .. but let's just be more clever # tf = -1/(0-log(1*(1+t))) # tf = -1/(-log((1+t)) def group_function(value): # don't just use int - it rounds up below zero, and down above zero return int(floor(log10(value)*tolerance_factor)) return group_function Usage: group_function = make_group_function() import random groups = {} for i in range(50): v = random.random()*500+1000 group = group_function(v) if group in groups: groups[group].append(v) else: groups[group] = [v] for group in sorted(groups): print 'Group',group for v in sorted(groups[group]): print v print
For a given set of stock prices, there is probably more than one way to group stocks that are within 0.5% of each other. Without some additional rules for grouping the prices, there's no way to be sure an answer will do what you really want.
apart from the proper way to pick which values fit together, this is a problem where a little Object Orientation dropped in can make it a lot easier to deal with. I made two classes here, with a minimum of desirable behaviors, but which can make the classification a lot easier -- you get a single point to play with it on the Group class. I can see the code bellow is incorrect, in the sense the limtis for group inclusion varies as new members are added -- even it the separation crieteria remaisn teh same, you heva e torewrite the get_groups method to use a multi-pass approach. It should nto be hard -- but the code would be too long to be helpfull here, and i think this snipped is enoguh to get you going: from copy import copy class Group(object): def __init__(self,data=None, name=""): if data: self.data = data else: self.data = [] self.name = name def get_mean_stock(self): return sum(item[0] for item in self.data) / len(self.data) def fits(self, item): if 0.995 < abs(item[0]) / self.get_mean_stock() < 1.005: return True return False def get_weight(self): return sum(item[1] for item in self.data) def __repr__(self): return "Group-%s\n%s\n---\nTotalWeight: %d\n\n" % ( self.name, "\n".join("%.02f, %d" % tuple(item) for item in self.data ), self.get_weight()) class StockGrouper(object): def __init__(self, data=None): if data: self.floor_levels = data else: self.floor_levels = [] def get_groups(self): groups = [] floor_levels = copy(self.floor_levels) name_ord = ord("A") - 1 while floor_levels: seed = floor_levels.pop(0) name_ord += 1 group = Group([seed], chr(name_ord)) groups.append(group) to_remove = [] for i, item in enumerate(floor_levels): if group.fits(item): group.data.append(item) to_remove.append(i) for i in reversed(to_remove): floor_levels.pop(i) return groups testing: floor_levels = [ [stock. weight] ,... <paste the data above> ] s = StockGrouper(floor_levels) s.get_groups()
For the grouping element, could you use itertools.groupby()? As the data is sorted, a lot of the work of grouping it is already done, and then you could test if the current value in the iteration was different to the last by <0.5%, and have itertools.groupby() break into a new group every time your function returned false.