I have several large files stored on a remote server that I would like to run over using a local Python script. I am aware that I can use paramiko in Python to open them and run over them that way, but it is much too slow. Also, ftping the files over is not an option since they are too large. Does anyone have any suggestions?
Related
I need to get a complete list of files in a folder and all its subfolders regularly (daily or weekly) to check for changes. The folder is located on a server that I access as a network share.
This folder currently contains about 250,000 subfolders and will continue to grow in the future.
I do not have any access to the server other than the ability to mount the filesystem R/W.
The way I currently retrieve the list of files is by using python's os.walk() function recursively on the folder. This is limited by the latency of the internet connection and currently takes about 4.5h to complete.
A faster way to do this would be to create a file server-side containing the whole list of files, then transfering this file to my computer.
Is there a way to request such a recursive listing of the files from the client side?
A python solution would be perfect, but I am open to other solutions as well.
My script is currently run on Windows, but will probably move to a Linux server in the future; an OS-agnostic solution would be best.
You have provided the answer to your question:
I do not have any access to the server other than the ability to mount the filesystem R/W.
Nothing has to be added after that, since any server side processing requires the ability to (directly or indirectly) launch a process on the server.
If you can collaborate with the server admins, you could ask them to periodically start a server side script that would build a compressed archive (for example a zip file) containing the files you need, and move it in a specific location when done. Then you would only download that compressed archive saving a lot of network bandwidth.
You can approach this in multiple ways. I would do this by doing a running a script over ssh like
ssh xys#server 'bash -s' < local_script_togetfilenames.sh
If you prefer python you can run a similar python script by adding #!python assuming python is installed on the server
If you want to stick to fully python you should explore python RPC(Remote process call)
You can use rPyC library . Documentation is
here
With this question I would like to gain some insights/verify that I'm on the right track with my thinking.
The request is as follows: I would like to create a database on a server. This database should be updated periodically by adding information that is present in a certain folder, on a different computer. Both the server and the computer will be within the same network (I may be running into some firewall issues).
So the method I am thinking of using is as follows. Create a tunnel between the two systems. I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Note: I dont think I will be able to install python on the pc with the files.
Is this at all doable? Is my approach solid? Please let me know if additional information is required.
Create a tunnel between the two systems.
If you mean setup the firewall between the two machines to allow connection, then yeah. Just open the postgresql port. Check postgresql.conf for the port number in case it isn't the default. Also put the correct permissions in pg_hba.conf so the computer's ip can connect to it.
I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Yeah, that's pretty standard. No problem.
Note: I dont think I will be able to install python on the pc with the files.
On Windows you can install anaconda for all users or just the current user. The latter doesn't require admin privileges, so that may help.
If you can't install python, then you can use some python tools to turn your python program into an executable that contains all the libraries, so you just have to drop that into a folder on the computer and execute it.
If you absolutely cannot install anything or execute any program, then you'll have to create a scheduled task to copy the data to a computer that has python over the network, and run the python script there, but that's extra complication.
If the source computer is automatically backed up to a server, you can also use the backup as a data source, but there will be a delay depending on how often it runs.
I need to read a csv file, which is saved in my local computer, from code within an "Execute R/Python Script" in an experiment of Azure Machine Learning Studio. I don't have to upload the data as usually, i.e. from Datasets -> New -> Load from local file or with an Import Data module. I must do it with code. In principle this is not possible, neither from an experiment nor from a notebook, and in fact I always got error. But I'm confused because the documentation about Execute Python Script module says (among other things):
Limitations
The Execute Python Script currently has the following limitations:
Sandboxed execution. The Python runtime is currently sandboxed and, as a result, does not allow access to the network or to the local file system in a persistent manner. All files saved locally are isolated and deleted once the module finishes. The Python code cannot access most directories on the machine it runs on, the exception being the current directory and its subdirectories.
According to the highlighted text, it should be possible to access and load a file from current directory, using for instance the pandas function read_csv. But actually no. There is some trick to accomplish this?
Thanks.
You need to remember that Azure ML Studio is an online tool, and it's not running any code on your local machine.
All the work is being done in the cloud, including running the Execute Python Script, and this is what the text you've highlighted refers to: the directories and subdirectories of the cloud machine running your machine learning experiment, and not your own, local, computer.
I got a little headless LAMP-webserver running and I am also using the server for downloading files from the internet. At the moment I have to login via SSH and start the download via wget. The files are partially really large (exceeding 4GB).
A nice solution would be to use a python cgi to add a link to the queue and let python do the rest. I already know how to download files from the net (like here: Download file via python) and I know how to write the python-cgi (or wsgi). The thing is, that the script needs to run continuously, which would mean to keep the connection alive - which would be pretty useless. Therefore I think I need some kind of a background solution.
Help or hints would be much appreciated.
Thanks in advance & best regards!
I have edited about 100 html files locally, and now I want to push them to my live server, which I can only access via ftp.
The HTML files are in many different directories, but hte directory structure on the remote machine is the same as on the local machine.
How can I recursively descend from my top-level directory ftp-ing all of the .html files to the corresponding directory/filename on the remote machine?
Thanks!
If you want to do it in Python (rather than using other pre-packaged existing tools), you can use os.walk to read everything in the local subtree, and ftplib to perform all the FTP operations. In particular, storbinary is the method you'll usually use to transfer entire files without line-end conversions (storlines if you do want line-end conversions, for files that are text, not binary, and that you know need such treatment).
umm, maybe by pressing F5 in mc for linux or total commander for windows?
After searching PyPI, I found ftptool (http://pypi.python.org/pypi/ftptool/0.4.2). Its mirror_to_remote method could be what you need. I don't have an FTP server handy, though, so I couldn't test it.
if you have a mac, you can try cyberduck. It's good for syncing remote directory structures via ftp.