Skip list of filenames from txt file with os.walk - python

I would like to upload files that users dump into a shared folder to an FTP site. Only certain files must be uploaded based on a pattern in the filename, and that works. I would like to avoid uploading files that have been uploaded in the past. A simple solution would be to move the files to a subdirectory once uploaded, but users whish for the files to remain where they are.
I was thinking of writing a filename to a text file when each iteration of the loops makes an update. Populating the text file works.
Excluding directories with os.walk is mentioned in many articles and I can get that to work fine, but excluding a list of filenames seems to be a bit more obscure
This is what I have so far:
import ftplib
import os
import os.path
import fnmatch
## set local path variables
dir = 'c:/Temp'
hist_path = 'C:/Temp/hist.txt'
pattern = '*SomePattern*'
## make the ftp connection and set appropriate working directory
ftp = ftplib.FTP('ftp.someserver.com')
ftp.login('someuser', 'somepassword')
ftp.cwd('somedirectory')
## make a list of previously uploaded files
hist_list = open(hist_path, 'r')
hist_content = hist_list.read()
# print(hist_content)
## loop through the files and upload them to the FTP as above
for root, dirs, files in os.walk(dir):
for fname in fnmatch.filter(files, pattern): # this filters for filenames that include the pattern
## upload each file to the ftp
os.chdir(dir)
full_fname = os.path.join(root, fname)
ftp.storbinary('STOR ' + fname, open(full_fname, 'rb'))
## add an entry for each file into the historical uploads log
f = open(hist_path, 'a')
f.write(fname + '\n')
f.close()
Any help would be appreciated

Related

How to modify this script so that all of my files are not deleted when trying to delete files that do not have XML files with them?

I am trying to delete all .JPG files that do not have .xml files with the same name attached to them. However, when I run this script, all of my files are deleted in my directory and not just the desired images. How can I change this script so that I can just delete the images without corresponding .xml files?
Note: The only files I have in the directory are .JPG and .XML
import os
from tqdm import tqdm
path = 'C:\\users\\my_username\\path_to_directory_with_xml_and_jpg_images'
files = os.listdir(path)
for file in tqdm(files):
filename, filetype = file.split('.')
if filetype == 'xml':
continue
imgfile = os.path.join(path, file)
xmlfile = os.path.join(path, filename + '.xml')
if not os.path.exists(xmlfile):
print('{} deleted.'.format(imgfile))
os.remove(imgfile)
It's hard to tell why your code doesn't work as we don't know the exact contents of the directory. But a simpler way to do what you want could be to use the amazing pathlib library (Python >= 3.4). The method Path.with_suffix() will make the task quite easy, together with Path.glob():
from pathlib import Path
path = Path('C:\\users\\my_username\\path_to_directory_with_xml_and_jpg_images')
for imgfile in path.glob("*.jpg"):
xmlfile = imgfile.with_suffix(".xml")
if not xmlfile.exists():
imgfile.unlink()
print(imgfile, 'deleted.')

Iterate over files over several directories to extract data

I have a series of files that are nested as shown in the attached image. For each "inner" folder (e.g. like the 001717528 one), I want to extract a row of data from each the FITS files and create a CSV file that contains all the rows, and name that CSV file after the name of the "inner" folder (e.g. 001717528.csv that has data from the 18 fits files). The data-extracting part is easy but I have trouble coding the iteration.
I don't really know how to iterate over both the outer folders such as the 0017 and inner folders, and name the csv files as I want.
My code is looking like this:
for subdir, dirs, files in os.walk('../kepler'):
for file in files:
filepath = subdir + os.sep + file
if filepath.endswith(".fits"):
extract data
write to csv file
Apparently this will iterate over all files in the kepler folder so it doesn't work.
If you need to keep track of how far you've walked into the directory structure, you can count the file path delimiter (os.sep). In your case it's / because you're on a Mac.
for path, dirs, _ in os.walk("../kepler"):
if path.count(os.sep) == 2:
# path should be ../kepler/0017
for dir in dirs:
filename = dir + ".csv"
data_files = os.listdir(path + os.sep + dir)
for file in data_files:
if file.endswith(".fits"):
# Extract data
# Write to CSV file
As far as I can tell this meets your requirements, but let me know if I've missed something.
Try this code it should print the file path of all your ".fits" files:
# !/usr/bin/python
import os
base_dir = './test'
for root, dirs, files in os.walk(base_dir, topdown=False):
for name in files:
if name.endswith(".fits"):
file_path = os.path.join(root, name) #path of files
print(file_path)
# do your treatment on file_path
All you have to do is add your specific treatment.

Pulling out the files from directories by connecting to ftp using python

task: I need to connect to clients FTP, where we have many directories and each directory may or may not have .csv files in it. Now I need to go to each directory and open the files in all directories and if the file is according to the given format then dump in a server.
Presently I'm able to connect to FTP do this much
I'm able to get the directories list but not the files inside the directory.
from ftplib import FTP
from sqlalchemy import create_engine
import os
import sys
import os.path
ftp=FTP('host')
ftp.login('user','pwd')
for files in ftp.dir():
filenames=ftp.nlst(files)
ftp.retrbinary("RETR " + a, file.write)
file.close()
ftp.close() #CLOSE THE FTP CONNECTION
print "FTP connection closed. Goodbye"
I know that is not at all up to the mark.
Looks like you are looking for a way to get a list of the files in a given directory. Here is a function I often use to solve this task in unix system (macOS included). It should be a good starting point if not the final solution you are looking for.
import glob, os
def list_of_files(path, extension, recursive=False):
'''
Return a list of filepaths for each file into path with the target extension.
If recursive, it will loop over subfolders as well.
'''
if not recursive:
for file_path in glob.iglob(path + '/*.' + extension):
yield file_path
else:
for root, dirs, files in os.walk(path):
for file_path in glob.iglob(root + '/*.' + extension):
yield file_path
Also, you can use ftp.cwd('..') to change directory and ftp.retrlines('LIST') to just get the list of files of that directory.
Check the docs for some useful code snippet.

moving files from an unknown folder to other

I am extracting .tar.gz files which inside there are folders (with files with many extensions). I want to move all the .txt files of the folders to another, but I don't know the folders' name.
.txt files location ---> my_path/extracted/?unknown_name_folder?/file.txt
I want to do ---> my_path/extracted/file.txt
My code:
os.mkdir('extracted')
t = tarfile.open('xxx.tar.gz', 'r')
for member in t.getmembers():
if ".txt" in member.name:
t.extract(member, 'extracted')
###
I would try extracting the tar file first (See here)
import tarfile
tar = tarfile.open("xxx.tar.gz")
tar.extractall()
tar.close()
and then use the os.walk() method (See here)
import os
for root, dirs, files in os.walk('.\\xxx\\'):
txt_files = [path for path in files if path[:-4] == '.txt']
OR use the glob package to gather the txt files as suggested by #alper in the comments below:
txt_files = glob.glob('./**/*.txt', recursive=True)
This is untested, but should get you pretty close
And obviously move them once you get the list of text files
new_path = ".\\extracted\\"
for path in txt_files:
name = path[path.rfind('\\'):]
os.rename(path, new_path + name)

UploadWriteFailed(reason=WriteError('disallowed_name', None)

I'm trying to upload a whole folder to dropbox but only the files get uploaded. Should I create a folder programatically or can I solve the folder-uploading so simple? Thanks
import os
import dropbox
access_token = '***********************'
dbx = dropbox.Dropbox(access_token)
dropbox_destination = '/live'
local_directory = 'C:/Users/xoxo/Desktop/man'
for root, dirs, files in os.walk(local_directory):
for filename in files:
local_path = root + '/' + filename
print("local_path", local_path)
relative_path = os.path.relpath(local_path, local_directory)
dropbox_path = dropbox_destination + '/' + relative_path
# upload the file
with open(local_path, 'rb') as f:
dbx.files_upload(f.read(), dropbox_path)
error:
dropbox.exceptions.ApiError: ApiError('xxf84e5axxf86', UploadError('path', UploadWriteFailed(reason=WriteError('disallowed_name', None), upload_session_id='xxxxxxxxxxx')))
[Cross-linking for reference: https://www.dropboxforum.com/t5/API-support/UploadWriteFailed-reason-WriteError-disallowed-name-None/td-p/245765 ]
There are a few things to note here:
In your sample, you're only iterating over files, so you won't get dirs uploaded/created.
The /2/files/upload endpoint only accepts file uploads, not folders. If you want to create folders, use /2/files/create_folder_v2. You don't need to explicitly create folders for any parent folders in the path for files you upload via /2/files/upload though. Those will be automatically created with the upload.
Per the /2/files/upload documentation, disallowed_name means:
Dropbox will not save the file or folder because of its name.
So, it's likely you're getting this error because you're trying to upload an ignored filed, e.g., ".DS_STORE". You can find more information on those in this help article under "Ignored files".

Categories

Resources