Uploading multiple files via FTP based on filename

Uploading multiple files via FTP based on filename - python

Using Python 3.4
I am generating files that are will look like 'Report_XXXXXX.xlsx' with X being unique customer numbers. I have a server with folders that are named 'XXXXXX.CustomerName'. I am trying to loop through each file/report and upload it to the matching folder based on the customer no. I have something that is working in my small test environment but I don't think it is working how I want. It uploads the files, but I am trying to catch anytime it doesn't find a match. Currently it fails my IF statment for every file. I think I am looping too many times or over the wrong items.
import os
import ftplib
creds = [line.rstrip('\n') for line in open('C:\\folder\\credentials.txt')]
ftp = ftplib.FTP_TLS("server.com")
ftp.login(creds[0], creds[1])
ftp.prot_p()
src_dir = 'C:\\Reports\\'
src_files = os.listdir('C:\\Reports\\')
for folder_name in ftp.nlst():
for file_name in src_files:
if folder_name[0:6] == file_name[7:-5]:
ftp.cwd('/'+folder_name)
open_file = open(src_dir+file_name, 'rb')
ftp.storbinary('STOR '+file_name, open_file)
open_file.close()
else:
print('Folder ' + folder_name + ' Not Found')
ftp.quit()
So for example the source directory has 3 files: 'Report_100002.xlsx, Report_100003.xlsx, Report_100007.xlsx' And the server has matching folders and a few extra folders. The files upload, and the output looks like so:
Folder 100000.CustomerName Not Found
Folder 100000.CustomerName Not Found
Folder 100000.CustomerName Not Found
Folder 100002.CustomerName Not Found
Folder 100002.CustomerName Not Found
Folder 100003.CustomerName Not Found
Folder 100003.CustomerName Not Found
Folder 100007.CustomerName Not Found
Folder 100007.CustomerName Not Found
I am trying to get to a state where I can properly log each item and whether it was a success, what folder it landed in, etc...

In your inner for loop you compare all 3 file names in src_dir with folder_name, but maximally only one satisfies the condition in your if statement. So the other 2 or 3 files that don't match cause the output you are seeing, for every folder on the ftp server. You could use a flag to keep track of whether a match was found and print your output based on that flag.
Another thing is that you should start iterating over src_files and then find matching folder names by iterating over ftp.nlist() (you are interested in source files that don't have a matching folder, not the other way around). So something like this (assuming a source file is allowed to end up in multiple folders):
....
folder_names = ftp.nlst()
for file_name in src_files:
folder_found = False
for folder_name in folder_names:
if folder_name[0:6] == file_name[7:-5]:
folder_found = True
ftp.cwd('/'+folder_name)
open_file = open(src_dir+file_name, 'rb')
ftp.storbinary('STOR '+file_name, open_file)
open_file.close()
if not folder_found:
print('No destination folder found for ' + file_name)
ftp.quit()
(the folder_names = ftp.nlst() is there so you don't repeatedly list the directories on the server)

Related

Skip list of filenames from txt file with os.walk

I would like to upload files that users dump into a shared folder to an FTP site. Only certain files must be uploaded based on a pattern in the filename, and that works. I would like to avoid uploading files that have been uploaded in the past. A simple solution would be to move the files to a subdirectory once uploaded, but users whish for the files to remain where they are.
I was thinking of writing a filename to a text file when each iteration of the loops makes an update. Populating the text file works.
Excluding directories with os.walk is mentioned in many articles and I can get that to work fine, but excluding a list of filenames seems to be a bit more obscure
This is what I have so far:
import ftplib
import os
import os.path
import fnmatch
## set local path variables
dir = 'c:/Temp'
hist_path = 'C:/Temp/hist.txt'
pattern = '*SomePattern*'
## make the ftp connection and set appropriate working directory
ftp = ftplib.FTP('ftp.someserver.com')
ftp.login('someuser', 'somepassword')
ftp.cwd('somedirectory')
## make a list of previously uploaded files
hist_list = open(hist_path, 'r')
hist_content = hist_list.read()
# print(hist_content)
## loop through the files and upload them to the FTP as above
for root, dirs, files in os.walk(dir):
for fname in fnmatch.filter(files, pattern): # this filters for filenames that include the pattern
## upload each file to the ftp
os.chdir(dir)
full_fname = os.path.join(root, fname)
ftp.storbinary('STOR ' + fname, open(full_fname, 'rb'))
## add an entry for each file into the historical uploads log
f = open(hist_path, 'a')
f.write(fname + '\n')
f.close()
Any help would be appreciated

how to avoid searching a folder

how do I avoid searching a folder? This script goes through every folder a searches it for your file, how do I avoid searching Applications? or only search the folders I tell it to. I've been trying for at least 3 hours
from PIL import Image
user_path = ("/Users/" + getpass.getuser())
FileName = input("file name, please, including the exsention: ")
print("working?")
for folder, sub_folder, files in os.walk(user_path):
print(f"folder is {folder}")
for sub_fold in sub_folder:
print(f"sub folder is {sub_fold}")
for f in files:
print(f"file: {f}")
if FileName == f:
print("file found")
print(os.path.abspath(os.path.join(root, name)))

Create array with folders excluded.
When loops entering into folder check is that folder name are in array created above. If it is just ignore.

I made a sample code. Please check and respond to me.
import os
ext = ["a", "b", "c"] # I assume these are unnecessary folders.
for folder, sub_folder, files in os.walk(user_path):
print(f"folder is {folder}")
for sub_fold in sub_folder:
if sub_fold in ext:
continue
else:
print(f"sub folder is {sub_fold}")
for f in files:
print(f"file: {f}")
if FileName == f:
print("file found")
print(os.path.abspath(os.path.join(root, name)))

os.walk walks the entire directory tree, presenting the current directory, its immediate subfolders and its immediate files on each iteration. As long as you are walking top-down (the default) you can stop a subfolder from being iterated by removing it from the folders list. In this example, I made the blacklist a canned list in the source, but you could prompt for it if you'd like. On each folder iteration all you need to do is see if the wanted filename is in the list of file names in that iteration.
from PIL import Image
import getpass
import os
# blacklist folders paths relative to user_path
blacklist = ["Applications", "Downloads"]
# get user root and fix blacklist
# user_path = ("/Users/" + getpass.getuser())
user_path = os.path.expanduser("~")
blacklist = [os.path.join(user_path, name) for name in blacklist]
FileName = input("file name, please, including the exsention: ")
print("working?")
for folder, sub_folders, files in os.walk(user_path):
# eliminate top level folders and their subfolders with inplace
# remove of subfolders
if folder in blacklist:
del sub_folders[:]
continue
# in-place remove of blacklisted folders below top level
for sub_folder in sub_folders[:]:
if os.path.join(folder, sub_folder) in blacklist:
sub_folders.remove(sub_folder)
if FileName in files:
print("file found")
print(os.path.abspath(os.path.join(folder, FileName)))

Python - how to change directory

I am doing a school assignment where I have to take input from a user and save it to a text file.
My file structure will be something like:
- Customer register
- Customer ID
- .txt files 1-5
It can be saved in the python folder and I can make the folders like this:
os.makedirs("Customer register/Customer ID")
My question is, how do I set the path the text files are to be stored in, in the directory when I don't know the directory? So that no matter where the program is run it is saved in the "Customer ID" folder I create (but on the computer the program is run on)?
Also, how do I make this work on both windows and mac?
I also want to program to be able to be executed several times, and check if the folder is there and save to the "Customer ID" folder if it already exists. Is there a way to do that?
EDIT:
This is the code I am trying to use:
try:
dirs = os.makedirs("Folder")
path = os.getcwd()
os.chdir(path + "/Folder")
print (os.getcwd())
except:
if os.path.exists:
path = os.getcwd()
unique_filename = str(uuid.uuid4())
customerpath = os.getcwd()
os.chdir(customerpath + "/Folder/" + unique_filename)
I am able to create a folder and change the directory (everything in "try" works as I want).
When this folder is created I want to create a second folder with a random generated folder name (used for saving customer files). I can't get this to work in the same way.
Error:
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\Users\48736\PycharmProjects\tina/Folder/979b9026-b2f6-4526-a17a-3b53384f60c4'
EDIT 2:
try:
os.makedirs("Folder")
path = os.getcwd()
os.chdir(path + "/Folder")
print (os.getcwd())
except:
if os.path.exists:
path = os.getcwd()
os.chdir(os.path.join(path, 'Folder'))
print(os.getcwd())
def userId(folderid):
try:
if not os.path.exists(folderid):
os.makedirs(folderid)
except:
if os.path.exists(folderid):
os.chdir(path + "/Folder/" + folderid)
userId(str(uuid.uuid4()))
print(os.getcwd())
So I can now create a folder, change directory to the folder I have created and create a new folder with a unique filename within that folder.
But I can't change the directory again to the folder with the unique filename.
Any suggestions?
I have tried:
os.chdir(path + "/Folder/" + folderid)
os.chdir(path, 'Folder', folderid)
os.chdir(os.path.join(path, 'Folder', folderid))
But is still just stays in: C:\Users\47896\PycharmProjects\tina\Folder

You can use relative paths in your create directory command, i.e.
os.makedirs("./Customer register/Customer ID")
to create folder in project root (=where the primary caller is located) or
os.makedirs("../Customer register/Customer ID") in parent directory.
You can, of course, traverse the files tree as you need.
For specific options mentioned in your question, please, see makedirs documentation at Python 3 docs

here is solution
import os
import shutil
import uuid
path_on_system = os.getcwd() # directory where you want to save data
path = r'Folder' # your working directory
dir_path = os.path.join(path_on_system, path)
if not os.path.exists(dir_path):
os.makedirs(dir_path)
file_name = str(uuid.uuid4()) # file which you have created
if os.path.exists(file_name) and os.path.exists(dir_path):
shutil.move(file_name,os.path.join(dir_path,file_name))
else:
print(" {} does not exist".format(file_name))

Python's re.findAll() function won't work as expected

I'm trying to create a python script that will find all the files from a working directory with a certain name pattern.
I have stored all files in a list and then I have tried applying the re.findall method on the list to obtain only a list of files with that name pattern.
I have written this code:
# Create the regex object that we will use to find our files
fileRegex = re.compile(r'A[0-9]*[a-z]*[0-9]*.*')
all_files = []
# Recursevly read the contents of the working_dir/Main folder #:
for folderName, subfolders, filenames in os.walk(working_directory + "/Main"):
for filename in filenames:
all_files.append(filename)
found_files = fileRegex.findall(all_files)
I get this error at the last line of the code:
TypeError: expected string or bytes-like object
I have also tried re.findall(all_files) instead of using the 'fileRegex' created prior to that line. Same error. Please tell me what am I doing wrong. Thank you so much for reading my post!
Edit(second question):
I have followed the answers and it's now working fine. I'm trying to create an archive with the files that match that pattern after I've found them. The archive was created however the way I wrote the code the whole path to the file gets included in the archive (all the folders from / up to the file). I just want the file to be included in the final .zip not the whole directories and subdirectories that make the path to it.
Here is the code. The generation of the .zipfile is at the bottom. Please give me a tip how could I solve this I've tried many things but none worked. Thanks:
# Project properties:
# Recursively read the contents of the 'Main' folder which contains files with different names.
# Select only the files whose name begin with letter A and contain digits in it. Use regexes for this.
# Archive these files in a folder named 'Created_Archive' in the project directory. Give the archive a name of your choosing.
# Files that you should find are:
# Aerials3.txt, Albert0512.txt, Alberto1341.txt
########################################################################################################################################
import os
import re
import zipfile
from pathlib import Path
# Get to the proper working directory
working_directory = os.getcwd()
if working_directory != "/home/paul/Desktop/Python_Tutorials/Projects/Files_And_Archive":
working_directory = "/home/paul/Desktop/Python_Tutorials/Projects/Files_And_Archive"
os.chdir(working_directory)
check_archive = Path(os.getcwd() + "/" + "files.zip")
if check_archive.is_file():
print("Yes. Deleting it and creating it.")
os.unlink(os.getcwd() + "/" + "files.zip")
else:
print("No. Creating it.")
# Create the regex object that we will use to find our files
fileRegex = re.compile(r'A[0-9]*[a-z]*[0-9]+.*')
found_files = []
# Create the zipfile object that we will use to create our archive
fileArchive = zipfile.ZipFile('files.zip', 'a')
# Recursevly read the contents of the working_dir/Main folder #:
for folderName, subfolders, filenames in os.walk(working_directory + "/Main"):
for filename in filenames:
if fileRegex.match(filename):
found_files.append(folderName + "/" + filename)
# Check all files have been found and create the archive. If the archive already exists
# delete it.
for file in found_files:
print(file)
fileArchive.write(file, compress_type=zipfile.ZIP_DEFLATED)
fileArchive.close()

re.findAll works on strings not on lists, so its better you use r.match over the list to filter the ones that actually matches:
found_files = [s for s in all_files if fileRegex.match(s)]

regex works on strings not lists. the following works
import re
import os
# Create the regex object that we will use to find our files
# fileRegex = re.compile(r'A[0-9]*[a-z]*[0-9]*.*')
fileRegex = re.compile(r'.*\.py')
all_files = []
found_files = []
working_directory = r"C:\Users\michael\PycharmProjects\work"
# Recursevly read the contents of the working_dir/Main folder #:
for folderName, subfolders, filenames in os.walk(working_directory):
for filename in filenames:
all_files.append(filename)
if fileRegex.search(filename):
found_files.append(filename)
print('all files\n', all_files)
print('\nfound files\n', found_files)

re.findall doesn't take a list of strings. You need re.match .
# Create the regex object that we will use to find our files
fileRegex = re.compile(r'A[0-9]*[a-z]*[0-9]*.*')
all_files = []
# Recursively read the contents of the working_dir/Main folder #:
for folderName, subfolders, filenames in os.walk(working_directory + "/Main"):
for filename in filenames:
all_files.append(filename)
found_files = [file_name for file_name in all_files if fileRegex.match(file_name)]

How to get the files with the biggest size in the folders, change their name and save to a different folder

I need to get files with the biggest size in different folders, change their name to folder name that they belong to and save to a new folder. I have something like this and I got stuck:
import os
# Core settings
rootdir = 'C:\\Users\\X\\Desktop\\humps'
to_save = 'C:\\Users\\X\\Desktop\\new'
for root, dirs, files in os.walk(rootdir):
new_list = []
for file in files:
if file.endswith(".jpg"):
try:
print(file)
os.chdir(to_save)
add_id = root.split("humps\\")[1]
add_id = add_id.split("\\")[0]
file_name = os.path.join(root,file)
new_list.append(file_name)
bigfile = max(new_list, key=lambda x: x.stat().st_size)
except:
pass
To make it more clear: Let's say the name of the sub-folder is "elephant" and there are different elephant photos and subfolders in in this elephant folder. I want to go through those photos and subfolders and find the elephant foto with the biggest size, name it as elephant and save it to my target folder. Also repaet it for other sub folders such as lion, puma etc.
How I could achieve what I want ?

To find biggest file and save to another location
import os
import shutil
f_list = []
root = "path/to/directory"
root = os.path.abspath(root)
for folder, subfolders, files in os.walk(root):
for file in files:
filePath = os.path.join(folder, file)
f_list.append(filePath)
bigest_file = max(f_list,key=os.path.getsize)
new_path = "path/where/you/want/to/save"
shutil.copy(biggest_file,new_path)
if you want only images then add one more condition in loop
for folder, subfolders, files in os.walk(root):
for file in files:
if file.endswith(".jpg"):
filePath = os.path.join(folder, file)
f_list.append(filePath)
To get all folders biggest file
root = "demo"
root = os.path.abspath(root)
def test(path):
big_files = []
all_paths = [x[0] for x in os.walk(path)]
for paths in all_paths:
f_list = filter(os.path.isfile, os.listdir(paths))
if len(f_list) > 0:
big_files.append((paths,max(f_list,key=os.path.getsize)))
return big_files
print test(root)

How to get the files with the biggest size in the folders, change their name and save to a different folder
Basically you already have a good description of what you need to do. You just need to follow it step by step:
get all files in some search directory
filter for relevant files ("*.jpg")
get their sizes
find the maximum
copy to new directory with name of search directory
IMO it's an important skill to be able to break down a task into smaller tasks. Then, you just need to implement the smaller tasks and combine:
def iterate_files_recursively(directory="."):
for entry in os.scandir(directory):
if entry.is_dir():
for file in iterate_files_recursively(entry.path):
yield file
else:
yield entry
files = iterate_files_recursively(subfolder_name)
I'd use os.scandir because it avoids building up a (potentially) huge list of files in memory and instead allows me (via a generator) to work one file at a time. Note that starting with 3.6 you can use the result of os.scandir as a context manager (with syntax).
images = itertools.filterfalse(lambda f: not f.path.endswith('.jpg'), files)
Filtering is relatively straightforward except for the IMO strange choice of ìtertools.filterfalse to only keep elements for which its predicate returns False.
biggest = max(images, key=(lambda img: img.stat().st_size))
This is two steps in one: Get the maximum with the builtin max function, and use the file size as "key" to establish an order. Note that this raises a ValueError if you don't have any images ... so you might want to supply default=None or handle that exception.
shutil.copy(biggest.path, os.path.join(target_directory, subfolder_name + '.jpg')
shutil.copy copies the file and some metadata. Instead of hardcoding path separators, please use os.path.join!
Now all of this assumes that you know the subfolder_name. You can scan for those easily, too:
def iterate_directories(directory='.'):
for entry in os.scandir(directory):
if entry.is_dir():
yield entry

Here's some code that does what you want. Instead of using the old os.walk function, it uses modern pathlib functions.
The heart of this code is the recursive biggest function. It scans all the files and directories in folder, saving the matching file names to the files list, and recursively searching any directories it finds. It then returns the path of the largest file that it finds, or None if no matching files are found.
from pathlib import Path
import shutil
def filesize(path):
return path.stat().st_size
def biggest(folder, pattern):
''' Find the biggest file in folder that matches pattern
Search recursively in all subdirectories
'''
files = []
for f in folder.iterdir():
if f.is_file():
if f.match(pattern):
files.append(f)
elif f.is_dir():
found = biggest(f, pattern)
if found:
files.append(found)
if files:
return max(files, key=filesize)
def copy_biggest(src, dest, pattern):
''' Find the biggest file in each folder in src that matches pattern
and copy it to dest, using the folder's name as the new file name
'''
for path in src.iterdir():
if path.is_dir():
found = biggest(path, pattern)
if found:
newname = dest / path
print(path, ':', found, '->', newname)
shutil.copyfile(found, newname)
You can call it like this:
rootdir = r'C:\Users\X\Desktop\humps'
to_save = r'C:\Users\X\Desktop\new'
copy_biggest(Path(rootdir), Path(to_save), '*.jpg')
Note that the copied files will have the same name as the top-level folder in rootdir that they were found in, with no file extension. If you want to give them a .jpg extension, you can change
newname = dest / path
to
newname = (dest / path).with_suffix('.jpg')
The shutil module on older versions of Python 3 doesn't understand pathlib paths. But that's easy enough to remedy. In the copy_biggest function, replace
shutil.copyfile(found, newname)
with
shutil.copyfile(str(found), str(newname))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Uploading multiple files via FTP based on filename - python

Related

Skip list of filenames from txt file with os.walk

how to avoid searching a folder

Python - how to change directory

Python's re.findAll() function won't work as expected

How to get the files with the biggest size in the folders, change their name and save to a different folder

Categories

Resources