Process suddenly stacking up memory when running in a docker container - python

So I wrote a script, it's collecting a timeseries of the last year via an API call (requests library) and then calculating the average. It's an infinite loop ( while True: ) with a sleep time of a couple of minutes and is requesting the new data, cutting off that which is older than 1 year and concatenating the new one. Then the average gets recalculated. The current timeseries (of a whole year) and the average are stored in a class object that was created before the loop begins. So it's basically:
Obj=Obj_Class()
while True:
updateAverage()
postAverage()
time.sleep(60)
This all runs fine and the memory required is relatively constant at around 60-80MB.
Now, for deployment, it has been put in a Docker container and ran on the server. Therefore, the API calls were edited to grab the data directly from the Influx database that's hosted on the same server. Also "postAverage()" now does not only print the data on the monitor but puts it into the Influx database.
This is all that was changed. But suddenly, the memory (RAM) continuously grows (I ended the process after it reached 1 GB). I do not have a clue why this could be happening. Does someone have an idea what the reason could be? Where I could look or what I could look into? I know it is most likely impossible to tell without going through my code, but I figured someone here might have experienced something similar before and could offer some advice.

Related

How can I trigger function in django while reading it's output without impacting that functions execution time?

I've been trying to find a solution to my issue for some time now, but I haven't come across anything that seems intuitive enough that it seems like the "right" solution.
I'm building an electron app that uses django as the backend. The backend is responsible for running some long processes that are time critical. For example, I have a loop that continuously takes data for about 5 seconds. When I run that function standalone, it takes a data point about every 10 ms, however, when I run it through django it takes a data point anywhere from 10 ms to 70 ms or even longer. This makes sense to me intuitively because django is sharing thread time to keep responding to the frontend. However, this is unacceptable for my application.
The delays seem to be related to returning data to the frontend. Basically, there's a static variable in the view class that's a container for the result data, then the measurement is triggered from the front end and the measurement populates that static variable with data. When the measurement is running, the front end queries django once a second for updated data so it can plot it for the user to track progress.
I first tried using threading.Thread to create a new thread to run the measurement. This thread gets triggered by the django view, but that doesn't fix the issue. Ok, maybe this makes sense too because the thread is still sharing processing time with the main thread?
The next step seems to be creating an entirely new subprocess. However, I'd still like to be passing data back to the front end while the script runs, so short of dropping a file, I don't know how to do that.
Is there an officially supported way to run a function through django who's execution time won't be impacted by the fact that it's being triggered from django?

Python Script is using all my RAM - is there a way to reset?

I have been working on an MMO bot for fun and the script itself stores previous data points of where my character was to continue on the script. After a few hours I came back to my machine giving a memory error and nothing working on the computer forcing me to need a restart. Is there any sort of command that I can give to the script that would reset the memory its cached up?
With the bot, I dont need to keep this memory cached up for more than a few seconds or even minutes at most, the stored up data does nothing for me. I was wondering if anyone had a way to fresh wipe the stored memory after a given time and start fresh?
You can use this script:
import subprocess
subprocess.call(["purge"])

How to store BIG DATA as global variables in Dash Python?

I have a problem with my Dash application put in a server of a remote office. Two users running the app will experience interactions with each other due to table import followed by table pricing (the code for pricing is around 10,000 lines and pull out 8 tables). While looking on the internet, I saw that to solve this problem, it was enough to create html.Div preceded by the conversation of dataframes in JSON. However, this solution is not possible because I have to store 9 tables totaling 200,000 rows and 500 columns. So, I looked into the cache solution. However, this option does not create errors but increases the execution time of the program considerably. Going from a table of 20,000 vehicles to 200,000 it increases the compute time by almost * 1,000 and it is horrible every time I change the settings of the graphics.
I use cache filesystem and i used the exemple 4 of this : https://dash.plotly.com/sharing-data-between-callbacks. By doing some time calculations, I noticed that it is not accessing the cache that is the problem (about 1sec) but converting the JSON tables to dataframe (almost 60 seconds per callback). About 60 seconds is the time also corresponding to the pricing, so it is the same to call the cache in a callback as it is to price in a callback.
1/ do you have an idea that would save a dataframe not a JSON in the form of a cache or with a technique like the invisible html.Div or a cookie system or whatever other methods ?
2/ with the Redis or Memcached, we have to provide return json?
2/ If so, how do we set it up, taking example 4 from the previous link because I have an error "redis.exceptions.ConnectionError: Error 10061 connecting to localhost: 6379. No connection could be established because l target computer expressly refused it. " ?
3/ Do you also know if turning off the application automatically deletes the cache without following the default_timeout?
I think your issue can be solved using dash_extensions and specifically server side call back caches, might be worth a shot to implement.
https://community.plotly.com/t/show-and-tell-server-side-caching/42854

ansible runner runs to long

When I use ansible's python API to run a script on remote machines(thousands), the code is:
runner = ansible.runner.Runner(
module_name='script',
module_args='/tmp/get_config.py',
pattern='*',
forks=30
)
then, I use
datastructure = runner.run()
This takes too long. I want to insert the datastructure stdout into MySQL. What I want is if once a machine has return data, just insert the data into MySQL, then the next, until all the machines have returned.
Is this a good idea, or is there a better way?
The runner call will not complete until all machines have returned data, can't be contacted or the SSH session times out. Given that this is targeting 1000's of machines and you're only doing 30 machines in parallel (forks=30) it's going to take roughly Time_to_run_script * Num_Machines/30 to complete. Does this align with your expectation?
You could up the number of forks to a much higher number to have the runner complete sooner. I've pushed this into the 100's without much issue.
If you want max visibility into what's going on and aren't sure if there is one machine holding you up, you could run through each hosts serially in your python code.
FYI - this module and class is completely gone in Ansible 2.0 so you might want to make the jump now to avoid having to rewrite code later

postgres database: When does a job get killed

I am using a postgres database with sql-alchemy and flask. I have a couple of jobs which I have to run through the entire database to updates entries. When I do this on my local machine I get a very different behavior compared to the server.
E.g. there seems to be an upper limit on how many entries I can get from the database?
On my local machine I just query all elements, while on the server I have to query 2000 entries step by step.
If I have too many entries the server gives me the message 'Killed'.
I would like to know
1. Who is killing my jobs (sqlalchemy, postgres)?
2. Since this does seem to behave differently on my local machine there must be a way to control this. Where would that be?
thanks
carl
Just the message "killed" appearing in the terminal window usually means the kernel was running out of memory and killed the process as an emergency measure.
Most libraries which connect to PostgreSQL will read the entire result set into memory, by default. But some libraries have a way to tell it to process the results row by row, so they aren't all read into memory at once. I don't know if flask has this option or not.
Perhaps your local machine has more available RAM than the server does (or fewer demands on the RAM it does have), or perhaps your local machine is configured to read from the database row by row rather than all at once.
Most likely kernel is killing your Python script. Python can have horrible memory usage.
I have a feeling you are trying to do these 2000 entry batches in a loop in one Python process. Python does not release all used memory, so the memory usage grows until it gets killed. (You can watch this with top command.)
You should try adapting your script to process 2000 records in a step and then quit. If you run in multiple times, it should continue where it left off. Or, a better option, try using multiprocessing and run each job in separate worker. Run the jobs serially and let them die, when they finish. This way they will release the memory back to OS when they exit.

Categories

Resources