What data structure should I use to define states at 3600 points?

What data structure should I use to define states at 3600 points? - python

I am working on a personal project that analyzes hockey player shot data. One thing that I would like to investigate is the effects of different game-states (5v5, power play, short handed). The problem that I have is that I am not sure how to structure this part of my program.
My initial thought is to define a dictionary with 3600 sub-dictionaries as follows:
game_state = {}
game_state[time] = {}
game_state[time][home] = []
game_state[time][away] = []
I can then use the time for each shot event that I am interested in to lookup each teams' game-state. This, however, seems like an inefficient way of doing things.
As I am writing this question up it occurs to me that most of the game is 5v5 for both teams. Perhaps I could set up a similar dictionary but only use the times that the game-state is not 5v5 to generate the keys, and then when looking up play data assume that no entry means a 5v5 game-state.
My Question: Is there something better suited than a dictionary for this kind of application?
Edit:
To #Karl Knechtel's point, I do not need to save any of this information beyond one iteration of a for loop in my main file. In the main file I loop through game_data (a pickled JSON file) and collect x, y coordinates for all shots to later be binned and plotted. I am trying to refine the shot data to consider only a specific game state by introducing an additional check into my data parsing loop.

This sounds like the perfect use case for a relational database like SQLite or Postgres. Without getting too much into the nitty gritties, you could define a relation called Shot with time as a primary key. This would allow you to also look up more interesting questions like "How many shots are made short handed when it's a powerplay?" You could potentially also have a relation called Game which allows you to know which shots happened in which game.
If you want a less labor intensive solution, I think grouping the data into a class would be good idea. For example,
class Shot:
def __init__(self, time: int, short_hand: bool, num_players: int):
self.time = time
self.short_hand = short_hand
self.num_players = num_players
You could then have a dictionary that maps time to Shot instances.
shots: dict[int, Shot] = {}
shots[100] = Shot(100, False, 10)
shots[150] = Shot(150, True, 9)
...
NB: I highly suggest the first option since it sounds like it will be more useful for your case.

Related

how to compare two lists in different functions in Python?

I am making a Django web Application and i am facing a problem in comparing two different lists in different functions
def test(request,slug):
n=request.user
choice=TestOptions.objects.filter(choice=slug).first()
que=questions.objects.filter(Subject=choice)
question=[]
un=['a','b','c','d','e']
for q in que:
if q not in question:
question.append(q)
else:
continue
sampling = random.sample(question, 5)
print("--------------SamPLING111")
print(sampling)
print("--------------SamPLING111")
correctAnswers=[]
for j in sampling:
correctAnswers.append(j.answer)
marks(correctAnswers)
d = dict(zip(un,sampling))
return render(request,"code/test.html",{'questions':d})
def acceptAnswer(request):
answers=[]
if request.method=="POST":
answers.append(request.POST['a'])
answers.append(request.POST['b'])
answers.append(request.POST['c'])
answers.append(request.POST['d'])
answers.append(request.POST['e'])
score(answers)
return render(request,"code/dub.html")
def marks(correct):
list1=[]
l1=correct
def score(and):
list2=[]
l2=ans
function test is passing a list and function acceptAnswer is passing another list my job is to compare those two lists
how can I compare l1 and l2?

I am not 100 percent what you are trying to do with these lists, but in order to compare them I would just return them. Here is a quick example:
def marks(correct):
list1 = []
l1 = correct
return l1
def score(answer):
list2 = []
l2 = answer
return l2
numbers = [1,2,3]
numbers2 = [1,2,3]
numbers3 = [3,4,5]
print(marks(numbers) == score(numbers2)) # True
print(marks(numbers2) == score(numbers3)) # False
Hopefully this helps!

Rather than continue with comments I figured I'd elaborate in an answer though it isn't an exact answer to your question I think it is the real answer.
You really have two issues. One is a design issue ie how to make your program work correctly and the other is an implementation issue about the scope of variables and how to deal with it.
I can see in your profile you're a university student and given the nature of the code it seems very likely you're writing your web app for the purposes of learning maybe even an assignment.
If you were doing this outside of a university I'd expect you were seeking practitioner type skills in which case I'd suggest the answer would be to design your application the way Django expects you to, which I would say would translate into storing state information in a database.
If this is a lab however you may not have covered databases yet. Labs sometimes have students doing silly things because they can't teach everything at the same time. So your Prof may not expect you to use a database yet.
Inside a web application you have to consider that the web is request response and that you can get requests from a lot of different sources so you have state management concerns that classical desktop applications don't have. Who is supposed to see these tests and who is supposed to see the marks and what is the order these things happen? Should anyone be able to create a test? Should anyone be able to take a test? You might not care yet, eventually you'll want to care about sessions. If people are taking their own tests you could store data in a user session but then other people wouldn't see those tests. Generally the correct way to store this sort of state is in a database where you can access it according to what you know about the current request. If this is some sort of basic intro app your Prof may be happy with you doing something kludgy for now.

Is there a way to create nested class attributes in Python?

I've been struggling with creating a class for my image processing code in Python.
The code requires a whole bunch of different parameters (set in a params.txt file) which can easily be grouped into different categories. For example, some are paths, some are related to the experimental geometry, some are just switches for turning certain image processing features on/off etc etc.
If my "structure" (not sure how I should create it yet) is created as P, I would like to have something like,
P = my_param_object()
P.load_parameters('path/to/params.txt')
and then, from the main code, I can access whatever elements I need like so,
print(P.paths.basepath())
'/some/path/to/data'
print(P.paths.year())
2019
print(P.filenames.lightfield())
'andor_lightfield.dat'
print(P.geometry.dist_lens_to_sample())
1.5
print(P.settings.subtract_background())
False
print(P.settings.apply_threshold())
True
I already tried creating my own class to do this but everything is just in one massive block. I don't know how to create nested parts for the class. So for example, I have a setting and a function called "load_background". This makes sense because the load_background function always loads a specific filename in a specific location but should only do so if the load_background parameter is set to True
From within the class, I tried doing something like
self.setting_load_background = False
def method_load_background(self):
myutils.load_dat(self.background_fname())
but that's very ugly. It would be nicer to have,
if P.settings.load_background() == True:
P.load_background()
else:
P.generate_random_background()

What is the best way for allowing for dynamic object creation and removal in a Plotly Dash dashboard?

Currently, I have a class which stores a dictionary of Card elements, each of which is unique. The class can also generate these cards and append them to the dictionary or remove a card from a dictionary. However, I am not sure how to best allow for this action through a callback function since the ID for a card doesn't exist until the card is made, and the functionality isn't directly within the Dash framework since a dictionary object acts as an intermediary where the objects are stored.
Basically, I am wondering what the best way to dynamically create and destroy objects with a callback is?
Thank you!

Assuming you want to avoid extra computation for building cards you wont use, I'd suggest create a which creates each card and store those functions in a dictionary. (You can also create a universal function with params that allow specificity)
my_card_functions = {
'id_1': make_id_1,
'id_2': make_id_2,
}
Creating a card could be done as such:
my_id = 'id_1'
f = my_card_functions[my_id] # will break if id isn't registered
my_card = f()
You can store the cards you want to create in a dcc.store object. Here's an example of some code you might consider:
# Pretend these are structured properly
dcc.Store(id='cards_data')
html.Div(id='my_cards',children=[])
#app.callback(
Output('my_cards','children'),
[Input('cards_data','data')],
[State('my_cards','children')]
)
def make_cards(data, children):
"""Makes cards for each id"""
if not data:
raise PreventUpdate
# The following code is not correct but serves as a demonstrative example
# Some data structure manipulation is necessary to access
# the ids from the children elements
children_ids = [x['id'] for x in children]
# Assuming your data looks something like this:
# {'add':['id_1','id_2']}
for x in data['add']:
if x not in children_ids:
f = my_card_functions[my_id]
my_card = f()
# Store values
children.append(my_card)
return children
Note, this approach does not resolve removal of cards. That could easily be done but would probably require a more dynamic use of caching.

Just on the basis of your question, I have some immediate suggestions (since there is no working code that you have posted).
1. Generate all card elements by default. They can be generated, but not 'displayed'
2. Add your callbacks to toggle the display/rendering of the cards dynamically based on the use case. That way you will have card element ids to play around with in the callbacks.
Hope this helps.

Is it appropriate to use a class for the purpose of organizing functions that share inputs?

To provide a bit of context, I am building a risk model that pulls data from various different sources. Initially I wrote the model as a single function that when executed read in the different data sources as pandas.DataFrame objects and used those objects when necessary. As the model grew in complexity, it quickly became unreadable and I found myself copy an pasting blocks of code often.
To cleanup the code I decided to make a class that when initialized reads, cleans and parses the data. Initialization takes about a minute to run and builds my model in its entirety.
The class also has some additional functionality. There is a generate_email method that sends an email with details about high risk factors and another method append_history that point-in-times the risk model and saves it so I can run time comparisons.
The thing about these two additional methods is that I cannot imagine a scenario where I would call them without first re-calibrating my risk model. So I have considered calling them in init() like my other methods. I haven't only because I am trying to justify having a class in the first place.
I am consulting this community because my project structure feels clunky and awkward. I am inclined to believe that I should not be using a class at all. Is it frowned upon to create classes merely for the purpose of organization? Also, is it bad practice to call instance methods (that take upwards of a minute to run) within init()?
Ultimately, I am looking for reassurance or a better code structure. Any help would be greatly appreciated.
Here is some pseudo code showing my project structure:
class RiskModel:
def __init__(self, data_path_a, data_path_b):
self.data_path_a = data_path_a
self.data_path_b = data_path_b
self.historical_data = None
self.raw_data = None
self.lookup_table = None
self._read_in_data()
self.risk_breakdown = None
self._generate_risk_breakdown()
self.risk_summary = None
self.generate_risk_summary()
def _read_in_data(self):
# read in a .csv
self.historical_data = pd.read_csv(self.data_path_a)
# read an excel file containing many sheets into an ordered dictionary
self.raw_data = pd.read_excel(self.data_path_b, sheet_name=None)
# store a specific sheet from the excel file that is used by most of
# my class's methods
self.lookup_table = self.raw_data["Lookup"]
def _generate_risk_breakdown(self):
'''
A function that creates a DataFrame from self.historical_data,
self.raw_data, and self.lookup_table and stores it in
self.risk_breakdown
'''
self.risk_breakdown = some_dataframe
def _generate_risk_summary(self):
'''
A function that creates a DataFrame from self.lookup_table and
self.risk_breakdown and stores it in self.risk_summary
'''
self.risk_summary = some_dataframe
def generate_email(self, recipient):
'''
A function that sends an email with details about high risk factors
'''
if __name__ == "__main__":
risk_model = RiskModel(data_path_a, data_path_b)
risk_model.generate_email(recipient#generic.com)

In my opinion it is a good way to organize your project, especially since you mentioned the high rate of re-usability of parts of the code.
One thing though, I wouldn't put the _read_in_data, _generate_risk_breakdown and _generate_risk_summary methods inside __init__, but instead let the user call this methods after initializing the RiskModel class instance.
This way the user would be able to read in data from a different path or only to generate the risk breakdown or summary, without reading in the data once again.
Something like this:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
my_risk_model.generate_risk_breakdown(parameters)
my_risk_model.generate_risk_summary(other_parameters)
If there is an issue of user calling these methods in an order which would break the logical chain, you could throw an exception if generate_risk_breakdown or generate_risk_summary are called before read_in_data. Of course you could only move the generate... methods out, leaving the data import inside __init__.
To advocate more on exposing the generate... methods out of __init__, consider a case scenario, where you would like to generate multiple risk summaries, changing various parameters. It would make sense, not to create the RiskModel every time and read the same data, but instead change the input to generate_risk_summary method:
my_risk_model = RiskModel()
my_risk_model.read_in_data(path_a, path_b)
for parameter in [50, 60, 80]:
my_risk_model.generate_risk_summary(parameter)
my_risk_model.generate_email('test#gmail.com')

Clearest way to initialize module-level lists in Python

I've got a situation where a module needs to do some simple, but slightly time consuming initialization. One of the end conditions is a pair of lists which will be filled out by the initialization; what's bugging me is that there's a conflict between the role of the lists - which are basically intended to be constants - and the need to actually initialize them.
I feel uneasy writing code like this:
CONSTANT_LIST = []
DIFFERENT_LIST = []
for item in get_some_data_slowly():
if meets_criteria_one(item):
CONSTANT_LIST.append(item)
continue
if meets_criteria_two(item):
DIFFERENT_LIST.append(item)
Since the casual reader will see those lists in the position usually occupied by constants and may expect them to be empty.
OTOH, I'd be OK with the same under-the-hood facts if I could write this as a list comprehension:
CONSTANT_LIST = [i for i in some_data() if criterion(i)]
and so on... except that I need two lists drawn from the same (slightly time consuming) source, so two list comprehensions will make the code noticeably slower.
To make it worse, the application is such that hiding the constants behind methods:
__private_const_list = None
__other_private_list = None
def public_constant_list():
if __private_const_list: return __private_const_list
# or do the slow thing now and fill out both lists...
# etc
def public_other_const_list():
# same thing
is kind of silly since the likely use frequency is basically 1 per session.
As you can see it's not a rocket science issue but my Python sense is not tingling at all. Whats the appropriate pythonic pattern here?

The loop is quite clear. Don't obfuscate it by being too clever. Just use comments to help explain
CONSTANT_LIST = [] # Put a comment here to tell the reader that these
DIFFERENT_LIST = [] # are constants that are filled in elsewhere
"""
Here is an example of what CONSTANT_LIST looks like
...
Here is an example of what DIFFERENT_LIST looks like
...
"""
for item in get_some_data_slowly():
if meets_criteria_one(item):
CONSTANT_LIST.append(item)
elif meets_criteria_two(item):
DIFFERENT_LIST.append(item)
Maybe use elif instead of continue/if

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.