PyCharm reindexing when project is stored on a NAS

PyCharm reindexing when project is stored on a NAS - python

I have been working with PyCharm for quite some time now and I recently upgraded my storing system with a NAS.
Everything is working fine except one : PyCharm scans through my files to reindex them very very often. This makes me losing a lot of time waiting for it to end.
When the reindexing occurs:
When a script ends
When a debugging session ends
When PyCharm loses the focus, i.e. I use another application
So it happens basically ALL the time, taking quite a long time (several minutes sometimes).
Misc.:
Windows 10
PyCharm Community Edition 2018.1
Netgear - ReadyNas 422
Do you have any ideas to solve this issues ?

So I have contacted the IntelliJ support and here is their response:
Working with network drives/folders is not supported officially yet.
Using remote development features is recommended (remote interpreter,
deployment etc). Here is more detailed answer
https://intellij-support.jetbrains.com/hc/en-us/community/posts/207069145/comments/207464249.
What I end up doing, which is really not ideal, is to create a local copy of my projects environment and syncing it with a folder in my NAS. To do so I used the SyncBackPro software.

I'm using PyCharm both at home and at work with code stored on a Samba share (using its remote interpreter feature). I don't encounter consistent reindexing but by default it does not support file system notifications to know when a file changed.
However, as a programmer this shouldn't discourage you! You can drop in your own file system notifier that connects to your remote system (assuming your NAS runs Linux and supports SSH) and thus avoid the performance drop.
I actually wrote such a proxy to run the fsnotifier on a remote system a few years ago and I'm still using it. If you are interested, check out https://github.com/ThiefMaster/fsnotifier-remote
Some things in the repo are outdated (JetBrains removed this stupid file size check for example), but it should still provide you a good basis to start from if you are interested in using it.

Related

Memory leak debug python

I have a python script that works fine on my main computer without problems. But when I uploaded it to the Ubuntu server it started crashing. I thought for a long time what the problem was and looked at the system logs. It turned out that ubuntu automatically forcibly terminates the script due to lack of memory (server configuration is 512 MB of RAM), how can I debug the program on the consumed memory in different work options?

Have a look at something like Guppy3, which includes heapy, a 'heap analysis toolset' that can help you find where the memory's being used/held. Some links to information on how to use it are in the project's README.

If you have a core, consider using https://github.com/vmware/chap, which will allow you to look at both python and native allocations.
Once you have opened the core, probably "summarize used" is a good place to start.

Developing python software that will run in different environment

I have the last six months been working on a Python GUI application that I will use at work. Specifically my GUI will run on a couple of super computer clusters that I use for work.
However, I am mostly developing the software at my personal computer, and here I do not have direct access to the commands that my GUI will call, since the GUI will use subprocess to call commands that only are available on the computing cluster.
So, in order to efficiently develop the program, I often have to copy the directory containing all files related to the GUI, to the cluster. Then I test my current version there, locate all my bugs, fix them by editing the files on the cluster, and finally copy back all files to my computer, overwriting the old version.
This just seems like a bad way of doing it, but I have to be able to test my software in the environment it is made for in order to find my bugs.
Surely this is a common problem in software development... What do actual programmers do (as opposed to hobby programmers such as myself)?
Edit:
Examples of commands that are only available on the computing cluster, that I make heavy use of, are squeue, sacct, and scontrol (SLURM related commands).
Edit2:
I could mention that I tested using ssh connections with Python, but it slowed down the commands significantly, having to establish the ssh connection for each command I wanted. Unless I could set of a lasting ssh session, as in logging in when opening my program, I don't think the ssh-ing will work.

Explore the concepts that make Vagrant a popular choice for developers
Vagrant is a tool for building and managing virtual machine
environments in a single workflow. With an easy-to-use workflow and
focus on automation, Vagrant lowers development environment setup
time, increases production parity, and makes the "works on my machine"
excuse a relic of the past.
Your use case is covered by a couple of vagrant boxes that create a slurm cluster for development purposes. A good starting point might be
Example slurm cluster on your laptop (multiple VMs via vagrant)
If you understand and can setup your development environment with tools like Vagrant, you might explore next which options modern code editors or integrated development environments (IDE) offer for remote development. Remote development covers some other use cases, that might fit into your developer toolbox as well.
A "good enough", free and open source code editor for Python development is Visual Studio Code. According to the docs it has powerful features for remote development.
Visual Studio Code Remote Development allows you to use a container, remote machine, or the Windows Subsystem for Linux (WSL) as a full-featured development environment.
Read the docs
VS Code Remote Development

How does django detect file changes

I'm trying to watch for changes in .py files, in a directory.
I went through existing solution.
I'm curious on how django library solves this problem. The development server is restarted on file changes.

The code can be found in django.utils.autoreload. The autoreloader uses a separate thread that watches any python module that has been imported, and any translation file.
If inotify is available, Django uses that to listen to change events. Otherwise, it checks the timestamps of every file every second. If there are any changes, the process is restarted.
Django's autoreloader may not be the best source of inspiration. Better options may be Watchman (with the appropriate python bindings) or the pure-python alternative Watchdog.

Fast forward to April 2019:
With django 2.2 pywatchman as part of Watchman will be supported and pyinotify (being unmaintained since mid 2015) is dropped:
If you're using Linux or MacOS and install both pywatchman and the
Watchman service, kernel signals will be used to autoreload the server
(rather than polling file modification timestamps each second). This offers
better performance on large projects, reduced response time after code changes,
more robust change detection, and a reduction in power usage.
source: django-admin
When using Watchman with a project that includes large non-Python
directories like node_modules, it's advisable to ignore this directory
for optimal performance.
See the watchman documentation for information on how to do this.

Creating a remote project with PyDev

I'm new to Eclipse/PyDev and have what's probably a really basic question. I want to use it to edit and debug python files on a remote system. I am able to do this using RSE and pydevd, but what I'm doing doesn't really seem integrated with the IDE. That is, I can go to the RSE perspective and edit the files. I can then lauch the script on the remote system and step through it in the debugger. But the files are not part of a project that Eclipse maintains for me. It's all fairly disjointed. Is there a way to make remote files part of an Eclipse project? I can drag the files into the project, but that makes a local copy. Am I just approaching this wrong?
Thanks,
Jerry

OK, it turns out to be not only simple but rather obvious once you find it. From the RSE perspective, right-click the folder containing your source files and select "Create Remote Project." This seems to work fairly well, but I'm still having one problem: It seems the debugger wants a local copy of the file I am debugging, and does not consider the RSE copy local enough. So now I have to copy the file from the remote server to my workstation before I start debugging. It kind of defeats the purpose of the integration.
Is there a better way? I'm looking at SSH filesystems, but really don't want to have to do that. It feels like I'm so close.
Edit 2011-11-09:
This has recently been addressed by the PyDev developers. As of today, installing the nightly PyDev update adds an option to fetch source from the remote server. Details here.

I ran in to this issue a while back ,I answered this question in the link below. Unfortunately, with eclipse you cannot setup a remote interpreter with the RSE package. I use Pycharm ( python Jetbrains IDE). And it has been working great for me for about a year now. You do have to pay for it, its a nominal amount but worth it.
https://stackoverflow.com/a/15360958/1702186

Odd message and processing hangs

I have a large project that runs on an application server. It does pipelined processing of large batches of data and works fine on one Linux system (the old production environment) and one windows system (my dev environment).
However, we're upgrading our infrastructure and moving to a new linux system for production, based on the same image used for the existing production system (we use AWS). The python version (2.7) and libraries should be identical because of this, we're verifying this on our own using file hashes, also.
Our issue is that when we attempt to start processing on the new server, we receive a very strange output written to standard out followed by hanging of the server, "Removing descriptor: [some number]". I cannot duplicate this on the dev machine.
Has anyone ever encountered behavior like this in python before? Besides modules in the python standard library we are also using eventlet and beautifulsoup. In the standard library we lean heavily on urllib2, re, cElementTree, and multiprocessing (mostly the pools).

wberry was correct in his comment, I was running into a max descriptors per process issue. This seems highly dependent on operating system. Reducing the size of the batches I was having each processor handle to below the file descriptor limit of the process solved the problem.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.