Importing code from one script into another script - python

I'm new to Python so I searched for beginner projects in order to practice my skills. I came across a project on Edureka where you have to program a simple word game called Hangman (https://www.edureka.co/blog/python-projects/#hangman). The whole code consists of different scripts, and a part of one script is then improted into another, like in this case (Words.py)
import random
WORDLIST = 'wordlist.txt'
def get_random_word(min_word_length):
...
and then (Hangman.py)
from string import ascii_lowercase
from words import get_random_word
So they first created a function in a script Words.py and then imported it in another script Hangman.py. My question is: why is a code sometimes separated into several scripts and then parts of one imported into other? Can't one script just contain eveyrthing?
Thank you

Using multiple files to create sub-modules helps keep the code organised and makes reusing code between projects/functions much easier.
Functions and variables defined within a module importable into other modules and allows you to scope your function and variable names without worrying about conflicts.

It is basic organisation. Imagine a library would glue every new book to a stack of the old ones. Lord of the Rings would turn from a door stopper to a door. After defeating Sauron, the reader would smoothly transition to The Little Mermaid, before plunging into 50 Shades of Grey.
Small files are easier to stomach. If every file serves only one topic, you know quickly where to look. You also immediately know what does not belong to the topic, without having to read through commentary.
Multiple files allow for non-linear organisation. The building blocks of a program rarely follow a single, linear chain of interactions. Loosely coupled components are easily represented by individual files, and folders allow to add external structure.
Distinct files are easier to reorganise. As complexity grows, components move to sub packages, and sometimes you just need to clean up. A file can simply be moved as a whole. Copy/Pasting to migrate code is more work, especially if you have tacked on all the structuring manually.
Basically, a single file is easy to write. Multiple files are much easy to read, maintain and manage. For software development, the later is much more important. Even if you are working alone, future-you does not know everything that past-you has done.

Related

Visualize class dependencies in Python

I have started a new job and there are 100k lines of code written in Python 2.7 across four different repos.
The code is sometimes quite nested, with many library imports and a complex class structure, and no documentation.
I want to create a graph of the dependencies in order to understand the code better.
I have not found anything on the internet except https://pypi.org/project/pydeps/ but that is not working for some unknown reason.
The solution should either query all python files in the four repos automatically, or it should take a single python file with some function call I have saved, and then go through all dependencies and graphically display them.
A good solution would also display which arguments or (keyword arguments) are passed on, or how often a function is used within the 100k lines of code to understand which methods are more important etc. This is not a strong requirement, however.
If someone could post one or more python libraries (or VSCode extensions) that would be much appreciated.

Structure of a python project that is not a package

If you search over the internet about python project structures, you will find some articles about python package structure. Based on it, What I want to know is if there is any kind of instructions for creating structure for python projects that isn't packages, that is, projects that the code is the end code it self?
For example, I created a package that handles some requests of some specific endpoints. This package will serve the main code that will handle the data fetched by this package. The main code is not a package, that is, it don't have classes and __init__ files, because in this software layer, there will be no necessity of code reuse. Instead, the main code relate straight to the end it self.
Is there any instructions for it?
It would be good to see the structure itself instead of reading the description of it - it can help visualize the problem and answer properly to your case 😉
projects that isn't packages, that is, projects that the code is the end code it self
In general, I would say you should always structure your code! And by telling that, I mean exactly the work with the modules/packages. It is needed mostly to sperate the responsibilities and to introduce things that can be reused. It also gives the possibility to find things easier/faster instead of going through the unstructured tones of the code.
Of course, as I said, it is a general thought and as far as you are experienced you can experiment with the structure to find the best one for the project which you are working on. But without any structure, you won't survive in a bigger project (or the life will be harder than you want).

Keep files (modules) as "big" as possible [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a Python web application in which I have some small modules that serve very specific functions: session.py, logger.py, database.py, etc. And by "small" I really do mean small; each of these files currently includes around 3-5 lines of code, or maybe up to 10 at most. I might have a few imports and a class definition or two in each. I'm wondering, is there any reason I should or shouldn't merge these into one module, something like misc.py?
My thoughts are that having separate modules helps with code clarity, and later on, if by some chance these modules grow to more than 10 lines, I won't feel so bad about having them separated. But on the other hand, it just seems like such a waste to have a bunch of files with only a few lines in each! And is there any significant difference in resource usage between the multi-file vs. single-file approach? (Of course I'm nowhere near the point where I should be worrying about resource usage, but I couldn't resist asking...)
I checked around to see whether this had been asked before and didn't see anything specific to Python, but if it's in fact a duplicate, I'd appreciate being pointed in the right direction.
My thoughts are that having separate
modules helps with code clarity, and
later on, if by some chance these
modules grow to more than 10 lines, I
won't feel so bad about having them
separated.
This. Keep it the way you have it.
As a user of modules, I greatly prefer when I can include the entire module via a single import. Don't make a user of your package do multiple imports unless there's some reason to allow for importing different alternates.
BTW, there's no reason a single modules can't consist of multiple source files. The simplest case is to use an __init__.py file to simply load all the other code into the module's namespace.
Personally I find it easier to keep things like this in a single file, just for the practicality of editing a smaller number of files in my editor.
The important thing to do is treat the different pieces of code as though they were in separate files, so you ensure that you can trivially separate them later, for the reasons you cite. So for instance, don't introduce dependencies between the different pieces that will make it hard to disentangle them later.
For command line scripts there most likely will not be much difference unless each invocation invokes all files in the module, in which case there will be a slight performance cost as n files need to be opened vs one.
For mod_python there most likely will be no difference as byte-compiled modules stay alive for the duration of the apache process.
For google app engine though there will be a performance hit unless the service is constantly used and is "hot" as each cold start would require opening all files.
Off course you can have as many modules as you like.
But now let as think a little, what happens when we put every small code snippet into one single file.
We will end up in hundreds of import statements in any less trivial module. And off course you could also save a little by having all explicit in seperated files. But guess what: Nobody can remember so many module names and you might end up in searching for the right file anyway ...
I try to put things that belong together in one single file (unless it becomes to big!). But when I have small functions or classes that do not belong to other components in my system, I have "util" modules or the like. I also try to group these for example according to my application layering or seperate them by other means. One seperation criteria could be: Utilities that are used for UI and those that are not.
Small.

Tkinter with or without OOP

Studying Tkinter and I've only found tutorials on Tkinter without OOP, but looking at the Python.org documentation it looks like it's all in OOP. What's the benefit of using classes? It seems like more work and the syntax looks night and day from what I've learned so far.
This is going to be a really generic answer and most of the answers to this will be opinionated anyways. Speaking of which,the answer will likely be downvoted and closed because of this.
Anyways... Let's say you have a big GUI with a bunch of complicated logic sure you could write one huge file with hundreds, if not thousands of lines, and proxy a bunch of stuff through different functions and make it work. But, the logic is messy.
What if you could compartmentalize different sections of the GUI and all the logic surrounding them. Then, takes those components and aggregate them into the sum which makes the GUI?
This is exactly what you can use classes for in Tkinter. More generally, this is essentially what you use classes for - abstracting things into (reusable - instances) objects which provide a useful utility.
Example:
An app I built ages ago with Tkinter when I first learned it was a file moving program. The file moving program let you select the source / destination directory, had logging capabilities, search functions, monitoring processes for when downloads complete, and regex renaming options, unzipping archives, etcetera. Basically, everything I could think of for moving files.
So, what I did was I split the app up like this (at a high level)
1) Have a main which is the aggregate of the components forming the main GUI
Aggregates were essentially a sidebar, buttons / labels for selection various options split into their own sections as needed, and a scrolled text area for operation logging + search.
So, the main components were split like this:
2) A sidebar which had the following components
Section which contained the options for monitoring processes
Section which contained options for custom regular expressions or premade ones for renaming files
Section for various flag such as unpacking
3) A logging / text area section with search functionality build in + the options to dump (save) log files or view them.
That's a high level description of the "big" components which were comprised from the smaller components which were their own classes. So, by using classes I was able to wrap the complicated logic up into small pieces that were self contained.
Granted, you can do the same thing with functions, but you have "pieces" of a GUI which you can consider objects (classes) which fit together. So, it just makes for cleaner code / logic.
Like what pythonista just said...
OOP makes your GUI code more organized and if you need to create new windows eg.toplevel() you will find it extremely useful because you won't need to write all that code again and again and again... Plus if you have to use variables that are inside another function you will not need to declare it as a global. OOP with Tkinter is the best approach

Best practices for turning jupyter notebooks into python scripts

Jupyter (iPython) notebook is deservedly known as a good tool for prototyping the code and doing all kinds of machine learning stuff interactively. But when I use it, I inevitably run into the following:
the notebook quickly becomes too complex and messy to be maintained and improved further as notebook, and I have to make python scripts out of it;
when it comes to production code (e.g. one that needs to be re-run every day), the notebook again is not the best format.
Suppose I've developed a whole machine learning pipeline in jupyter that includes fetching raw data from various sources, cleaning the data, feature engineering, and training models after all. Now what's the best logic to make scripts from it with efficient and readable code? I used to tackle it several ways so far:
Simply convert .ipynb to .py and, with only slight changes, hard-code all the pipeline from the notebook into one python script.
'+': quick
'-': dirty, non-flexible, not convenient to maintain
Make a single script with many functions (approximately, 1 function for each one or two cell), trying to comprise the stages of the pipeline with separate functions, and name them accordingly. Then specify all parameters and global constants via argparse.
'+': more flexible usage; more readable code (if you properly transformed the pipeline logic to functions)
'-': oftentimes, the pipeline is NOT splittable into logically completed pieces that could become functions without any quirks in the code. All these functions are typically needed to be only called once in the script rather than to be called many times inside loops, maps etc. Furthermore, each function typically takes the output of all functions called before, so one has to pass many arguments to each function.
The same thing as point (2), but now wrap all the functions inside the class. Now all the global constants, as well as outputs of each method can be stored as class attributes.
'+': you needn't to pass many arguments to each method -- all the previous outputs already stored as attributes
'-': the overall logic of a task is still not captured -- it is data and machine learning pipeline, not just class. The only goal for the class is to be created, call all the methods sequentially one-by-one and then be removed. On top of this, classes are quite long to implement.
Convert a notebook into python module with several scripts. I didn't try this out, but I suspect this is the longest way to deal with the problem.
I suppose, this overall setting is very common among data scientists, but surprisingly I cannot find any useful advice around.
Folks, please, share your ideas and experience. Have you ever encountered this issue? How have you tackled it?
Life saver: as you're writing your notebooks, incrementally refactor your code into functions, writing some minimal assert tests and docstrings.
After that, refactoring from notebook to script is natural. Not only that, but it makes your life easier when writing long notebooks, even if you have no plans to turn them into anything else.
Basic example of a cell's content with "minimal" tests and docstrings:
def zip_count(f):
"""Given zip filename, returns number of files inside.
str -> int"""
from contextlib import closing
with closing(zipfile.ZipFile(f)) as archive:
num_files = len(archive.infolist())
return num_files
zip_filename = 'data/myfile.zip'
# Make sure `myfile` always has three files
assert zip_count(zip_filename) == 3
# And total zip size is under 2 MB
assert os.path.getsize(zip_filename) / 1024**2 < 2
print(zip_count(zip_filename))
Once you've exported it to bare .py files, your code will probably not be structured into classes yet. But it is worth the effort to have refactored your notebook to the point where it has a set of documented functions, each with a set of simple assert statements that can easily be moved into tests.py for testing with pytest, unittest, or what have you. If it makes sense, bundling these functions into methods for your classes is dead-easy after that.
If all goes well, all you need to do after that is to write your if __name__ == '__main__': and its "hooks": if you're writing script to be called by the terminal you'll want to handle command-line arguments, if you're writing a module you'll want to think about its API with the __init__.py file, etc.
It all depends on what the intended use case is, of course: there's quite a difference between converting a notebook to a small script vs. turning it into a full-fledged module or package.
Here's a few ideas for a notebook-to-script workflow:
Export the Jupyter Notebook to Python file (.py) through the GUI.
Remove the "helper" lines that don't do the actual work: print statements, plots, etc.
If need be, bundle your logic into classes. The only extra refactoring work required should be to write your class docstrings and attributes.
Write your script's entryways with if __name__ == '__main__'.
Separate your assert statements for each of your functions/methods, and flesh out a minimal test suite in tests.py.
We are having the similar issue. However we are using several notebooks for prototyping the outcomes which should become also several python scripts after all.
Our approach is that we put aside the code, which seams to repeat across those notebooks. We put it into the python module, which is imported by each notebook and also used in the production. We iteratively improve this module continuously and add tests of what we find during prototyping.
Notebooks then become rather like the configuration scripts (which we just plainly copy into the end resulting python files) and several prototyping checks and validations, which we do not need in the production.
Most of all we are not afraid of the refactoring :)
I made a module recently (NotebookScripter) to help address this issue. It allows you to invoke a jupyter notebook via a function call. Its as simple to use as
from NotebookScripter import run_notebook
run_notebook("./path/to/Notebook.ipynb", some_param="Provided Exteranlly")
Keyword parameters can be passed to the function call. Its easy to adapt a notebook to be parameterizable externally.
Within a .ipynb cell
from NotebookScripter import receive_parameter
some_param = receive_parameter(some_param="Return's this value by default when matching keyword not provided by external caller")
print("some_param={0} within the invocation".format(some_param))
run_notebook() supports .ipynb files or .py files -- allowing one to easily use .py files as might be generated by nbconvert of vscode's ipython. You can keep your code organized in a way that makes sense for interactive use, and also reuse/customize it externally when needed.
You should breakdown the logic in small steps, that way your pipeline will be easier to maintain. Since you already have a working codebase, you want to keep your code running, so make small changes, test and repeat.
I'd go this way:
Add some tests to your pipeline, for ML pipelines this is a bit hard, but if your notebook trains a model, you can use performance metrics to test if your pipeline still works (your test can be accuracy = 0.8, but make sure you define a tolerable range since the number hardly be the exact same for each run)
Break apart your single notebook into smaller ones, the output from one should the input for the other. As soon as you create a split, make sure you add a few tests for each notebook individually. To manage this sequential execution, you can use papermill to execute your notebooks or a workflow management tool such as ploomber which integrates with papermill, is able to resolve complex dependencies and has a hook to run tests upon notebook execution (Disclaimer: I'm ploomber's author)
Once you have a pipeline composed of several notebooks that passes all your tests you can decide whether you want to keep using the ipynb format or not. My recommendation would be to only keep as notebooks the tasks that have rich output (such as tables or plots), the rest can be refactored into Python functions, which are more maintainable

Categories

Resources