read big files from SFTP server with python 3

read big files from SFTP server with python 3 - python

I want to read multi big files that exist on centos server with python.I wrote a simple code for that and it's worked but entire file came to a paramiko object (paramiko.sftp_file.SFTPFile) after that I can process line. it has not good performance and I want process file and write to csv piece by piece because process entire file can affect performance. Is there a way to solve the problem?
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host, port, username, password)
sftp_client = ssh.open_sftp()
remote_file = sftp_client.open(r'/root/bigfile.csv')
try:
for line in remote_file:
#Proccess
finally:
remote_file.close()

Here could solve your problem.
def lazy_loading_ftp_file(sftp_host_conn, filename):
"""
Lazy loading ftp file when exception simple sftp.get call
:param sftp_host_conn: sftp host
:param filename: filename to be downloaded
:return: None, file will be downloaded current directory
"""
import shutil
try:
with sftp_host_conn() as host:
sftp_file_instance = host.open(filename, 'r')
with open(filename, 'wb') as out_file:
shutil.copyfileobj(sftp_file_instance.raw, out_file)
return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
except Exception as ex:
return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}
This will avoid reading the whole thing into memory at once.

Reading in chunks will help you here:
import pandas as pd
chunksize = 1000000
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
Update:
Yeah, I'm aware that my answer written based on a local file. Just giving example for reading file in chunks.
To answer the question, check out this one:
paramiko.sftp_client.SFTPClient.putfo
Functions for working with remote files using pandas and paramiko (SFTP/SSH). - pass the chunk size as I mentioned above.

Related

How to put data into a tempfile and post as CSV on SFTP

Goal is
Create a temporary SCP file filled with data and upload it to an sftp. The data to fill is TheList and is from class list.
What I am able to achieve
Create the connection to the SFTP
Push a file to the SFTP
What happens with the code below
There is a file created/put to the SFTP, but the file is empty and has 0 byte.
Question
How can I achieve that I have a file with type SCP on SFTP with the content of TheList?
import paramiko
import tempfile
import csv
# code part to make and open sftp connection
TheList = [['name', 'address'], [ 'peter', 'london']]
csvfile = tempfile.NamedTemporaryFile(suffix='.csv', mode='w', delete=False)
filewriter = csv.writer(csvfile)
filewriter.writerows(TheList)
sftp.put(csvfile.name, SftpPath + "anewfile.csv")
# code part to close sftp connection

You do not need to create a temporary file. You can use csv.writer to write the rows directly to the SFTP with use of file-like object opened using SFTPClient.open:
with sftp.open(SftpPath + "anewfile.csv", mode='w', bufsize=32768) as csvfile:
writer = csv.writer(csvfile, delimiter=',')
filewriter.writerows(TheList)
See also pysftp putfo creates an empty file on SFTP server but not streaming the content from StringIO
To answer your literal question: I believe you need to flush the temporary file before trying to upload it:
filewriter.flush()
See How to use tempfile.NamedTemporaryFile() in Python
Though better option would be to use Paramiko SFTPClient.putfo to upload the NamedTemporaryFile object, rather then trying to refer to the temporary file via the filename (what allegedly would not work at least on Windows anyway):
csvfile.seek(0)
sftp.putfo(csvfile, SftpPath + "anewfile.csv")

Append to existing file on SFTP server via pysftp

I have one file named Account.txt in SFTP server, and I'm trying to appending a line to this file. This is my effort:
from io import StringIO
from pysftp import Connection, CnOpts
cnopts = CnOpts()
cnopts.hostkeys = None
with Connection('ftpserver.com'
,username= 'username'
,password = 'password'
,cnopts=cnopts
) as sftp:
with sftp.cd('MY_FOLDER'):
f = sftp.open('Account.txt', 'ab')
data='google|33333|Phu|Wood||true|2018-09-21|2018-09-21|google'
f.write(data+'\n')
When I run this above code, the file was overwritten, instead of appended. So, How can append new line but still keep the old lines in the file?
For example:
Account.txt file:
facebook|11111|Jack|Will||true|2018-09-21|2018-09-21|facebook
facebook|22222|Jack|Will||true|2018-09-21|2018-09-21|facebook
And now I want to add line "google|33333|Phu|Wood||true|2018-09-21|2018-09-21|google" to the file.
The result I'm expecting:
Account.txt file
facebook|11111|Jack|Will||true|2018-09-21|2018-09-21|facebook
facebook|22222|Jack|Will||true|2018-09-21|2018-09-21|facebook
google|33333|Phu|Wood||true|2018-09-21|2018-09-21|google
Hope you guys can understand. Leave a comment if you don't. Thank you.

Your code works for me with OpenSSH SFTP server.
Maybe it's a bug in Core FTP server.
You can instead try manually seeking file write pointer to the end of the file:
with sftp.open('Account.txt', 'r+b') as f:
f.seek(0, os.SEEK_END)
data='google|33333|Phu|Wood||true|2018-09-21|2018-09-21|google'
f.write(data+'\n')

An addition to Martin's answer:
When using r+b, it will fail if the file does not exist yet. Use a+ instead if you want the file to be created if it does not exist, in a similar way to Difference between modes a, a+, w, w+, and r+ in built-in open function?.
Then no f.seek(0, os.SEEK_END) will be needed:
with sftp.open('test.txt', 'a+') as f:
f.write('hello')

Pytsk - Sending files to a server from a disk image

I am trying to send each file from a disk image to a remote server using paramiko.
class Server:
def __init__(self):
self.ssh = paramiko.SSHClient()
self.ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
self.ssh.connect('xxx', username='xxx', password='xxx')
def send_file(self, i_node, name):
sftp = self.ssh.open_sftp()
serverpath = '/home/paul/Testing/'
try:
sftp.chdir(serverpath)
except IOError:
sftp.mkdir(serverpath)
sftp.chdir(serverpath)
serverpath = '/home/Testing/' + name
sftp.putfo(fs.open_meta(inode = i_node), serverpath)
However when I run this I get an error saying that "pytsk.File has no attribute read".
Is there any other way of sending this file to the server?

After a quick investigation I think I found what your problem is. Paramiko's sftp.putfo expects a Python file object as the first parameter. The file object of Pytsk3 is a completely different thing. Your sftp object tries to perform "read" on this, but Pytsk3 file object does not have a method "read", hence the error.
You could in theory try expanding Pytsk3.File class and adding this method but I would not hold my breath that it actually works.
I would just read the file to a temporary one and send that. Something like this (you would need to make temp file name handling more clever and delete the file afterwards but you will get the idea):
serverpath = '/home/Testing/' + name
tmp_path = "/tmp/xyzzy"
file_obj = fs.open_meta(inode = i_node)
# Add here tests to confirm this is actually a file, not a directory
tha = open(tmp_path, "wb")
tha.write(file_obj.read_random(0, file_obj.info.meta.size))
tha.close()
rha = open(tmp_path, "rb")
sftp.putfo(rha, serverpath)
rha.close()
# Delete temp file here
Hope this helps. This will read the whole file in memory from fs image to be written to temp file, so if the file is massive you would run out of memory.
To work around that, you should read the file in chunks looping through it with read_random in suitable chunks (the parameters are start offset and amount of data to read), allowing you to construct the temp file in a chunks of for example a couple of megabytes.
This is just a simple example to illustrate your problem.
Hannu

Write a file over network in odoo with authentication

I have a Odoo8 running on my linux server and I need to copy a file from this server to a Windows 10 shared folder with authentication.
I tried to do it programmatically like this:
full_path = "smb://hostname/shared_folder/other_path"
if not os.path.exists(full_path):
os.makedirs(full_path)
full_path = os.path.join(full_path, file_name)
bin_value = stream.decode('base64')
if not os.path.exists(full_path):
try:
with open(full_path, 'wb') as fp:
fp.write(bin_value)
fp.close()
return True
except IOError:
_logger.exception("stream_save writing %s", full_path)
but even if no exception is raised, folders are not created and file is not written.
Then I tried to remove the "smb:" part from the uri and it raised an exception regarding authentication.
I'd like to fix the problem just by using python, possibly avoiding os.system calls or external scripts, but if no other way is possible, then any suggestion is welcome.
I also tried with
"//user:password#hostname"
and
"//domain;user:password#hostname"
both with and without smb

Well, I found it out by myself a way using SAMBA:
First you need to install pysmb (pip install pysmb) then:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(user, password, "my_name", server, domain=domain, use_ntlm_v2 = True)
conn.connect(ip_server)
conn.createDirectory(shared_folder, sub_directory)
file_obj = open(local_path_file,'rb')
conn.storeFile(shared_folder, sub_directory+"/"+filename, file_obj)
file_obj.close()
in my case sub_directory is a whole path, thus I need to create each folder one by one (createDirectory works only this way) and everytime I need to check if the directory does not already exists because otherwise createDirectory raise an exception.
I hope my solution could be useful for others.
If anybody find a better solution, please answer...

Reading appengine backup_info file gives EOFError

I'm trying to inspect my appengine backup files to work out when a data corruption occured. I used gsutil to locate and download the file:
gsutil ls -l gs://my_backup/ > my_backup.txt
gsutil cp gs://my_backup/LongAlphaString.Mymodel.backup_info file://1.backup_info
I then created a small python program, attempting to read the file and parse it using the appengine libraries.
#!/usr/bin/python
APPENGINE_PATH='/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/'
ADDITIONAL_LIBS = [
'lib/yaml/lib'
]
import sys
sys.path.append(APPENGINE_PATH)
for l in ADDITIONAL_LIBS:
sys.path.append(APPENGINE_PATH+l)
import logging
from google.appengine.api.files import records
import cStringIO
def parse_backup_info_file(content):
"""Returns entities iterator from a backup_info file content."""
reader = records.RecordsReader(cStringIO.StringIO(content))
version = reader.read()
if version != '1':
raise IOError('Unsupported version')
return (datastore.Entity.FromPb(record) for record in reader)
INPUT_FILE_NAME='1.backup_info'
f=open(INPUT_FILE_NAME, 'rb')
f.seek(0)
content=f.read()
records = parse_backup_info_file(content)
for r in records:
logging.info(r)
f.close()
The code for parse_backup_info_file was copied from
backup_handler.py
When I run the program, I get the following output:
./view_record.py
Traceback (most recent call last):
File "./view_record.py", line 30, in <module>
records = parse_backup_info_file(content)
File "./view_record.py", line 19, in parse_backup_info_file
version = reader.read()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 335, in read
(chunk, record_type) = self.__try_read_record()
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 307, in __try_read_record
(length, len(data)))
EOFError: Not enough data read. Expected: 24898 but got 2112
I've tried with a half a dozen different backup_info files, and they all show the same error (with different numbers.)
I have noticed that they all have the same expected length: I was reviewing different versions of the same model when I made that observation, it's not true when I view the backup files of other Modules.
EOFError: Not enough data read. Expected: 24932 but got 911
EOFError: Not enough data read. Expected: 25409 but got 2220
Is there anything obviously wrong with my approach?
I guess the other option is that the appengine backup utility is not creating valid backup files.
Anything else you can suggest would be very welcome.
Thanks in Advance

There are multiple metadata files created when an AppEngine Datastore backup is run:
LongAlphaString.backup_info is created once. This contains metadata about all of the entity types and backup files that were created in datastore backup.
LongAlphaString.[EntityType].backup_info is created once per entity type. This contains metadata about the the specific backup files created for [EntityType] along with schema information for the [EntityType].
Your code works for interrogating the file contents of LongAlphaString.backup_info, however it seems that you are trying to interrogate the file contents of LongAlphaString.[EntityType].backup_info. Here's a script that will print the contents in a human-readable format for each file type:
import cStringIO
import os
import sys
sys.path.append('/usr/local/google_appengine')
from google.appengine.api import datastore
from google.appengine.api.files import records
from google.appengine.ext.datastore_admin import backup_pb2
ALL_BACKUP_INFO = 'long_string.backup_info'
ENTITY_KINDS = ['long_string.entity_kind.backup_info']
def parse_backup_info_file(content):
"""Returns entities iterator from a backup_info file content."""
reader = records.RecordsReader(cStringIO.StringIO(content))
version = reader.read()
if version != '1':
raise IOError('Unsupported version')
return (datastore.Entity.FromPb(record) for record in reader)
print "*****" + ALL_BACKUP_INFO + "*****"
with open(ALL_BACKUP_INFO, 'r') as myfile:
parsed = parse_backup_info_file(myfile.read())
for record in parsed:
print record
for entity_kind in ENTITY_KINDS:
print os.linesep + "*****" + entity_kind + "*****"
with open(entity_kind, 'r') as myfile:
backup = backup_pb2.Backup()
backup.ParseFromString(myfile.read())
print backup

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

read big files from SFTP server with python 3 - python

Related

How to put data into a tempfile and post as CSV on SFTP

Append to existing file on SFTP server via pysftp

Pytsk - Sending files to a server from a disk image

Write a file over network in odoo with authentication

Reading appengine backup_info file gives EOFError

Categories

Resources