file renaming using python - python

I am trying to rename a file in which it is auto-generated by some local modules but I was wondering if using os.listdir is the only way for me to filter/ narrow down this file.
This file will always be generated before it is removed and the code will generate the next one (still in the same directory) based on the next item in list.
Basically, whenever this file is generated, it comes in the following file path:
/user_data/.tmp/tempLibFiles/2015_03_16_192212_182096.con
I had only wanted to rename the 2015_03_16_192212_182096 into connectionFile while keeping the rest the same.

You can also use the glob module to narrow down the list of files to the one that matches a particular pattern. For example:
import glob
files = glob.glob('/user_data/.tmp/tempLibFiles/*.con')

Related

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Evaluating File Paths in Excel

I have an ever increasing list of file paths (i have around 5000 records now) in Excel. More specifically, I have a certain unique identifier in column A and in Column B, I have a file path that leads to a picture for that unique identifier.
The process of adding the file paths is very manual and sometimes mistakes occur. So, I wanted to create a code that goes through each one of this file paths and if file path doesn't open/returns an error, to store these values in a list so that I can go directly to those and fix the file path.
I was thinking of writing a Python code that checks the File Path in Google Chrome URL (I have found it to work better than directly clicking the Hyperlink in Excel), but it's been a while since I have used Python and don't know where to start.
Any recommendation/ideas of how to achieve this?
Thank you,
Ricardo G.
To read excel files, I prefer to use the pandas library, specifically the read_excel function. You can also check if a filepath is a valid, existing file in your filesystem using the os.path module. os.path.isfile returns True if the provided path points to an actual file, so you want to use a list comprehension with a filter to only have filepaths where that is not the case.
import pandas as pd
import os
df = pd.read_excel('path/to/excel')
bad_files = [fp for fp in df['filepath_column'] if !os.path.isfile(path)]
I'm not sure what you mean by check with google chrome, but if you're talking about local files, this should work well for you.

Search for file names that contain words from a list and have a certain file extension

Beginner at python. I'm trying to search users folders for illegal content saved in folders. I want to find all files that contain either one or a number of words from the below list and also the files also have an extension that's listed.
I can search the files using file.endswith but don't know how to add in the word condition.
I've looked through the site and how only come across how to search for a certain word and not a list of words.
Thank you in advance
import os
L = ['720p','aac','ac3','bdrip','brrip','demonoid','disc','hdtv','dvdrip',
'edition','sample','torrent','www','x264','xvid']
for root, dirs, files in os.walk("Y:\User Folders\"):
for file in files:
if file.endswith(('*.7z','.3gp','.alb','.ape','.avi','.cbr','.cbz','.cue','.divx','.epub','.flac',
'.flv','.idx','.iso','.m2ts','.m2v','.m3u','.m4a','.m4b','.m4p','.m4v','.md5',
'.mkv','.mobi','.mov','.mp3','.mp4','.mpeg','.mpg','.mta','.nfo','.ogg','.ogm',
'.pla','.rar','.rm','.rmvb','.sfap0','.sfk','.sfv','.sls','.smfmf','.srt,''.sub',
'.torrent','.vob','.wav','.wma','.wmv','.wpl','.zip')):
print(os.path.join(root, file))
Perhaps it might be better to do a reverse search, and display a warning about files that DON'T match the file types you want. For instance you could do this:
if file.endswith(".txt", ".py"):
print("File is ok!")
else:
print("File is not ok!")
Using py.path.local from py package
The py package (install by $ pip install py) offers a very nice interface for working with files.
from py.path import local
def isbadname(path):
bad_extensions = [".pyc", "txt"]
bad_names = ["code", "xml"]
return (path.ext in bad_extensions) or (path.purebasename in bad_names)
for path in local(".").visit(isbadname):
print(path.strpath)
Explained:
Import
from py.path import local
py.path.local function creates "objectified" file names. To keep my code short, I import
it this way to use only local for objectifying file name strings.
Create objectified path to local directory:
local(".")
Created object is not a string, but an object, which has many interesting properties and methods.
Listing all files within some directory:
local(".").visit("*.txt")
returns a generator, providing all paths to files having extension ".txt"..
Alternative method to detect files to generate is providing a function, which gets argument path
(objectified file name) and returns True if the file is to be used, False otherwise.
The function isbadname serves exactly this purpose.
If you want to google for more information, use py path local (the name py is not giving good hits).
For more see https://py.readthedocs.io/en/latest/path.html
Note, that if you use pytest package, the py is installed with it (for good
reason - it makes tests related to file names much more readable and shorter).

Deleting Contents of a folder selevtively with python

I would periodically like to delete the contents of a Windows directory which includes files and sub directories that contain more files. However I do not there is one specific file that I do not want to remove (it is the same file every time). I am using shutil.rmtree to delete the contents of a folder but I am deleting the file I wish to keep also. How would I make an exception preventing the removal of the file I would like to keep and is shutil the best method for this?
To get a list of pathnames in a directory, use glob:
Say you want to only find .gif files and delete those:
import glob
gif_list = glob.glob('your/path/name/'+'*.gif')
For your path, this finds all the files that end in .gif. You can delete them, copy them, do whatever you want.
Or, use list comprehension:
import glob
#just get a list of all the files/folders in your path:
path_list = glob.glob('your/path/name/'+'*')
gif_list = [x for x in file_list if x[-4:] == '.gif']
Glob will also find all folders, of course. You can filter out which files and/or folders you want to keep, and delete the rest with rmtree. For example, say you wanted to keep the file called keep.me:
import glob
path_list = glob.glob('your/path/name/'+'*')
del_list = [x for x in path_list if x != 'your/path/name/'+'keep.me']
Something like this should work.
Try it out!
rmtree does not appear to have any kind of filtering mechanism that you could use; further, since part of its functionality is to remove the directory itself, and not just its contents, it wouldn't make sense to.
If you could do something to the file so that rmtree's attempt to delete it fails, you can have rmtree ignore such errors, thus leaving your file but deleting the others. If you cannot, you could resort to os.walk to loop over the contents of your directory, and thus decide which items to remove for yourself.
is it just one file or a ton of files?
Becouse if its just this one file you could just right click - properties - sercurity and check 'deny' everywhere
http://www.technorms.com/27407/prevent-files-from-being-deleted-windows

Most efficient/fastest way to get a single file from a directory

What is the most efficient and fastest way to get a single file from a directory using Python?
More details on my specific problem:
I have a directory containing a lot of pregenerated files, and I just want to pick a random one. Since I know that there's no really efficient way of picking a random file from a directory other than listing all the files first, my files are generated with an already random name, thus they are already randomly sorted, and I just need to pick the first file from the folder.
So my question is: how can I pick the first file from my folder, without having to load the whole list of files from the directory (nor having the OS to do that, my optimal goal would be to force the OS to just return me a single file and then stop!).
Note: I have a lot of files in my directory, hence why I would like to avoid listing all the files to just pick one.
Note2: each file is only picked once, then deleted to ensure that only new files are picked the next time (thus ensuring some kind of randomness).
SOLUTION
I finally chose to use an index file that will store:
the index of the current file to be picked (eg: 1 for file1.ext, 2 for file2.ext, etc..)
the index of the last file generated (eg: 1999 for file1999.ext)
Of course, this means that my files are not generated with a random name anymore, but using a deterministic incrementable pattern (eg: "file%s.ext" % ID)
Thus I have a near constant time for my two main operations:
Accessing the next file in the folder
Counting the number of files that are left (so that I can generate new files in a background thread when needed).
This is a specific solution for my problem, for more generic solutions, please read the accepted answer.
Also you might be interested into these two other solutions I've found to optimize the access of files and directory walking using Python:
os.walk optimized
Python FAM (File Alteration Monitor)
Don't have a lot of pregenerated files in 1 directory. Divide them over subdirectories if more than 'n' files in the directory.
when creating the files add the name of the newest file to a list stored in a text file. When you want to read/process/delete a file:
Open the text file
Set filename to the name on the top of the list.
Delete the name from the top of the list
Close the text file
Process filename.
Just use random.choice() on the os.listdir() result:
import random
import os
randomfilename = random.choice(os.listdir(path_to_directory))
os.listdir() returns results in the ordering given by the OS. Using random filenames does not change that ordering, only adding items to or removing items from the directory can influence that ordering.
If your fear that you'll have too many files, do not use a single directory. Instead, set up a tree of directories with pre-generated names, pick one of those at random, then pick a file from there.

Categories

Resources