Is it beneficial to use OOP on large datasets in Python? [closed]

Is it beneficial to use OOP on large datasets in Python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm implementing Kalman Filter on two types of measurements. I have GPS measurement every second (1Hz) and 100 measurment of accelration in one second (100Hz).
So basically I have two huge tables and they have to be fused at some point. My aim is: I really want to write readable and maintainable code.
My first approach was: there is a class for both of the datatables (so an object is a datatable), and I do bulk calculations in the class methods (so almost all of my methods include a for loop), until I get to the actual filter. I found this approach a bit too stiff. It works, but there is so much data-type transformation, and it is just not that convenient.
Now I want to change my code. If I would want to stick to OOP, my second try would be: every single measurment is an object of either the GPS_measurment or the acceleration_measurement. This approach seems better, but this way thousands of objects would have been created.
My third try would be a data-driven design, but I'm not really familiar with this approach.
Which paradigm should I use? Or perhaps it should be solved by some kind of mixture of the above paradigms? Or should I just use procedural programming with the use of pandas dataframes?

It sounds like you would want to use pandas. OOP is a concept btw, not something you explicitly code in inflexibly. Generally speaking, you only want to define your own classes if you plan on extending them or encapsulating certain features. pandas and numpy are 2 modules that already do almost everything you can ask for with regards to data and they are faster in execution.

Related

Order of constraints in Python Z3 for best speed [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm writing a z3 python program that is running a bit slow. The reason why it is running slow is because there is one part of the program where I am adding many harder constraints in nested for loops. My instructor told us that adding equality constraints would make the program run faster in terms of limiting the possibilities the program goes through when the line Solver().check() is reached.
I'm wondering whether I should be adding the equality constraints before the "harder" constraints to make it go faster, or whether the equality constraints should go after the "harder" constraints?
I would want the equality constraints to be checked first to limit possibilities of the harder constraints, so I would assume s.add(x == y), or something should be added first so that it is checked first?

These sorts of questions come often; and the honest answer is that the performance of the solver on any given question relies on many factors. Changing the order of constraints should in general have no effect, but in practice it usually does. (See https://github.com/Z3Prover/z3/issues/5559 as an example.) Even renaming variables (something you'd think that'd have no effect) can change performance. (See here, for instance: https://github.com/Z3Prover/z3/issues/5147)
If you're having performance problems, it's best to address it as a modeling issue: i.e., how can you formulate your problem so it's "easier" to solve; instead of thinking about how to "reorder" the constraints, which is a never-ending guessing game. I suggest you actually post your actual encoding and ask about specific advice regarding that problem. Your question is unanswerable in the sense that there's no single strategy that will work well in all cases.

Is it bad practice to mix OOP and procedural programming in Python (or to mix programming styles in general) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have been working on a program to solve a Rubik's cube, and I found myself unsure of a couple of things. Firstly, if I have a bunch of functions that have general applications (e.g., clearing the screen), and don't know whether or not I should change them to methods inside of a class. Part of my code is OOP and the other part is procedural. Does this violate PEP8/is it a bad practice? And I guess in broader terms, is it a bad idea to mix two styles of programming (OOP, functional, procedural, etc)?

I would say it's not bad at all, if the problem can be solved more easily that way and that your language supports it.
Programming paradigms are just different approaches to solving problems:
Procedural says "What are the steps that need to be performed to solve this problem?"
Functional says "What values should be transformed and how should they be transformed to solve this problem?"
OOP says "What objects need to interact with one another and what messages are needed to be sent between them to solve this problem?"
When you divide up your problem into smaller ones, you might find that some parts of the problem can be solved more easily in a functional way, other parts a procedural way. It's totally possible to have such a problem.
Python is mainly procedural and quite OOP, and has functional features (namely functools). It's not as functional as Haskell and not as OOP as C#, but it allows you to use those paradigms to some extent.

Is it possible to make CPython aware of type hints? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
As mentioned in PEP 484:
Using type hints for performance optimizations is left as an exercise for the reader.
Assuming one would be interested in doing this exercise, how hard would it be to undertake, even partially? Is there prior art for using type-hinting in an interpreted language to improve execution speed or is this only possible by going with a JIT compiler?
I should note that I also understand that this is a non-goal and that:
Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.
Consequently, I understand that efforts moving to improve speed would go against this by encouraging type hints by convention. However, I'm still curious about the difficulty of this task.
Update: Although this question is too broad for this site, it is partially answered in the PyPy FAQ:
... the speed benefits would be extremely minor.
There are several reasons for why.
One of them is that annotations are
at the wrong level (e.g. a PEP 484 “int” corresponds to Python 3’s int
type, which does not necessarily fits inside one machine word; even
worse, an “int” annotation allows arbitrary int subclasses).
Another
is that a lot more information is needed to produce good code (e.g.
“this f() called here really means this function there, and will never
be monkey-patched” – same with len() or list(), btw). The third
reason is that some “guards” in PyPy’s JIT traces don’t really have an
obvious corresponding type (e.g. “this dict is so far using keys which
don’t override __hash__ so a more efficient implementation was
used”). Many guards don’t even have any correspondence with types at
all (“this class attribute was not modified”; “the loop counter did
not reach zero so we don’t need to release the GIL”; and so on).

Easiest way to validate between two CSV files using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have two CSV files, and I would like to validate(find the differences and similarities) the data between these two files.
I am retrieving this data from vertica and because the data is so large I would like to do the validation at CSV level.

csvdiff allows you to compare the semantic contents of two CSV files, ignoring things like row and column ordering in order to get to what’s actually changed. This is useful if you’re comparing the output of an automatic system from one day to the next, so that you can look at just what’s changed.

I don't think you can directly compare sheets using openpyxl without manually looping on each rows and using your own validation code.
That depend your aim at performance, if speed is not a requirement, then why not but that will require some additional work.
Instead I would use pandas dataframes for any CSV validation needs, if you can add this dependency it should become really easier to compare files while keeping it at a great performance.
Here is a link to complete example:
http://pbpython.com/excel-diff-pandas.html
However, use read_csv() instead of read_excel() to read data from your files.

Optimal Length of a Python Function (Style) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Perhaps this is not the correct place to ask this question, and part of me thinks that there is no real answer to it, but I'm interested to see what experienced Python users have to say on the subject:
For maximum readability, concision, and utility, what is a range for an optimal length of a Python function? (Assuming that this function will be used in combination with other functions to do something useful.)
I recognize that this is incredibly dependent on the task at hand, but as a Sophomore Comp. Sci. major, one of the most consistent instructions from professors is to write programs that are comprised of short functions so as to break them up into "simple", discrete tasks.
I've done a big of digging, including through the Python style guide, but I haven't come up with a good answer. If there are any experienced Python users that would like to weigh in on this subject, I would appreciate the insight. Thanks.

I'm sure a lot of people have strong opinions about this, but for new programmers a good rule of thumb is to try and keep it below 10-20 lines. A better rule of thumb is that a function should do one thing and do that one thing well. If it becomes really long, it is likely doing more than one thing and can be broken down into several functions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.