Request recursive list of server files from client - python

I need to get a complete list of files in a folder and all its subfolders regularly (daily or weekly) to check for changes. The folder is located on a server that I access as a network share.
This folder currently contains about 250,000 subfolders and will continue to grow in the future.
I do not have any access to the server other than the ability to mount the filesystem R/W.
The way I currently retrieve the list of files is by using python's os.walk() function recursively on the folder. This is limited by the latency of the internet connection and currently takes about 4.5h to complete.
A faster way to do this would be to create a file server-side containing the whole list of files, then transfering this file to my computer.
Is there a way to request such a recursive listing of the files from the client side?
A python solution would be perfect, but I am open to other solutions as well.
My script is currently run on Windows, but will probably move to a Linux server in the future; an OS-agnostic solution would be best.

You have provided the answer to your question:
I do not have any access to the server other than the ability to mount the filesystem R/W.
Nothing has to be added after that, since any server side processing requires the ability to (directly or indirectly) launch a process on the server.
If you can collaborate with the server admins, you could ask them to periodically start a server side script that would build a compressed archive (for example a zip file) containing the files you need, and move it in a specific location when done. Then you would only download that compressed archive saving a lot of network bandwidth.

You can approach this in multiple ways. I would do this by doing a running a script over ssh like
ssh xys#server 'bash -s' < local_script_togetfilenames.sh
If you prefer python you can run a similar python script by adding #!python assuming python is installed on the server
If you want to stick to fully python you should explore python RPC(Remote process call)
You can use rPyC library . Documentation is
here

Related

What strategy should I use to periodically extract information from a specific folder

With this question I would like to gain some insights/verify that I'm on the right track with my thinking.
The request is as follows: I would like to create a database on a server. This database should be updated periodically by adding information that is present in a certain folder, on a different computer. Both the server and the computer will be within the same network (I may be running into some firewall issues).
So the method I am thinking of using is as follows. Create a tunnel between the two systems. I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Note: I dont think I will be able to install python on the pc with the files.
Is this at all doable? Is my approach solid? Please let me know if additional information is required.
Create a tunnel between the two systems.
If you mean setup the firewall between the two machines to allow connection, then yeah. Just open the postgresql port. Check postgresql.conf for the port number in case it isn't the default. Also put the correct permissions in pg_hba.conf so the computer's ip can connect to it.
I will run a script that periodically (hourly or daily) searches through the specified directory, convert the files to data and add it to the database. I am planning to use python, which I am fairly familiar with.
Yeah, that's pretty standard. No problem.
Note: I dont think I will be able to install python on the pc with the files.
On Windows you can install anaconda for all users or just the current user. The latter doesn't require admin privileges, so that may help.
If you can't install python, then you can use some python tools to turn your python program into an executable that contains all the libraries, so you just have to drop that into a folder on the computer and execute it.
If you absolutely cannot install anything or execute any program, then you'll have to create a scheduled task to copy the data to a computer that has python over the network, and run the python script there, but that's extra complication.
If the source computer is automatically backed up to a server, you can also use the backup as a data source, but there will be a delay depending on how often it runs.

List all files from outside Windows 7 VM

I am running Win7 in VirtualBox VM and my goal is listing the list of files that are inside the Win7 VM from outside the VM, for example, I want to use python client. I have network access to the VM, is the best practice is sharing all the files and folders using Samba and accessing through the network with python client? Any more suggestions? I want also to be able to download the files. (The client will run on OSX/Linux)
You can use WinSCP - https://winscp.net/eng/download.php
This will help you to access the files with a nice GUI. Make sure you select the commander option while installing WinSCP. This will allow you to have two pane - one for your host and one for your VM.
If you are planning to make the files downloadable for a private network users, then you can install Xampp server inside your VM and place the files to be downloaded inside "C:\xampp\htdocs\dashboard\" and share the URL (e.g. 192.168.10.2:5000\dashboard" with the users inside same network. So that they can download the required files.

Run over remote files quickly using python?

I have several large files stored on a remote server that I would like to run over using a local Python script. I am aware that I can use paramiko in Python to open them and run over them that way, but it is much too slow. Also, ftping the files over is not an option since they are too large. Does anyone have any suggestions?

Best practice for watching and reliable uploading files in Python?

I'm building a desktop application for Windows in Python 2.7. The primary function of this application is to watch a folder for new files. Whenever a new file appears in this folder the app uploads it to remote server. The process on the remote server creates a db record for the file and stores remote file path in that record.
Currently I'm using watchdog to monitor directory and httplib for file upload.
What approach should I take to ensure that a new file will be uploaded reliably regardless of a network condition or internet connection loss?
Update: What I mean by reliable upload is that the app will upload the file even if the app restarts. Like Dropbox. Some files are quite big (> 100 MB) so simple solutions like wrapping the code in try / catch and starting the upload all over is not very efficient. I know Dropbox uses librsync, but it might be overkill in this case.
What if the source file has been changed during the upload? Should I stop the upload and start over?
You could maintain file or database of files names, timestamps and information about their upload status. Based on that data You will know what files were already sent and what to upload after any restart of application or computer.
Checking timestamps tells You that file has been modified and upload process should be started over.

How to upload all .html files to a remote server using FTP and preserving file structure?

I have edited about 100 html files locally, and now I want to push them to my live server, which I can only access via ftp.
The HTML files are in many different directories, but hte directory structure on the remote machine is the same as on the local machine.
How can I recursively descend from my top-level directory ftp-ing all of the .html files to the corresponding directory/filename on the remote machine?
Thanks!
If you want to do it in Python (rather than using other pre-packaged existing tools), you can use os.walk to read everything in the local subtree, and ftplib to perform all the FTP operations. In particular, storbinary is the method you'll usually use to transfer entire files without line-end conversions (storlines if you do want line-end conversions, for files that are text, not binary, and that you know need such treatment).
umm, maybe by pressing F5 in mc for linux or total commander for windows?
After searching PyPI, I found ftptool (http://pypi.python.org/pypi/ftptool/0.4.2). Its mirror_to_remote method could be what you need. I don't have an FTP server handy, though, so I couldn't test it.
if you have a mac, you can try cyberduck. It's good for syncing remote directory structures via ftp.

Categories

Resources