I tried to delete a file but it just does the else statement even though the file exists.
I also tried to delete other files, but I got same result.
Here is the code:
def deletees():
if os.path.exists("C:\X-Folder\plugins\autorun"):
shutil.rmtree("C:\X-Folder\plugins\autorun")
else:
print("error: does not exists")
deletees()
If you are trying to remove a single file:
os.remove(r"C:\X-Folder\plugins\autorun")
# or
os.remove("C:\\X-Folder\\plugins\\autorun")
If you are trying to remove a directory or directory tree:
shutil.rmtree(r"C:\X-Folder\plugins\autorun")
# or
shutil.rmtree("C:\\X-Folder\\plugins\\autorun")
Notice that a raw(r) string is used so that \ characters aren't escaped.
So your specific example would look like this:
Uncomment the line that is most appropriate for your situation.
def deletees():
if os.path.exists("C:\\X-Folder\\plugins\\autorun"):
shutil.rmtree("C:\\X-Folder\\plugins\\autorun") # uncomment me for a directory
# os.remove("C:\\X-Folder\\plugins\\autorun") # uncomment me for a file
else:
print("error: does not exists")
deletees()
Finally there is also os.rmdir and os.removedirs but they only work on empty directories and I would not recommend using either of them.
Related
I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSON object could be decoded) In turn, I can't read it into my program.
I am currently doing something like the below:
for files in folder:
with open(files) as f:
data = json.load(f); # It causes an error at this part
I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?
SOLVED using #reece's comment:
invalid_json_files = []
read_json_files = []
def parse():
for files in os.listdir(os.getcwd()):
with open(files) as json_file:
try:
simplejson.load(json_file)
read_json_files.append(files)
except ValueError, e:
print ("JSON object issue: %s") % e
invalid_json_files.append(files)
print invalid_json_files, len(read_json_files)
Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.
The built-in JSON module can be used as a validator:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}, I get:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
try:
load_json_file(filename)
except InvalidDataException: # or something
# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
Here is a full python3 example for the next novice python programmer that stumbles upon this answer. I was exporting 16000 records as json files. I had to restart the process several times so I needed to verify that all of the json files were indeed valid before I started importing into a new system.
I am no python programmer so when I tried the answers above as written, nothing happened. Seems like a few lines of code were missing. The example below handles files in the current folder or a specific folder.
verify.py
import json
import os
import sys
from os.path import isfile,join
# check if a folder name was specified
if len(sys.argv) > 1:
folder = sys.argv[1]
else:
folder = os.getcwd()
# array to hold invalid and valid files
invalid_json_files = []
read_json_files = []
def parse():
# loop through the folder
for files in os.listdir(folder):
# check if the combined path and filename is a file
if isfile(join(folder,files)):
# open the file
with open(join(folder,files)) as json_file:
# try reading the json file using the json interpreter
try:
json.load(json_file)
read_json_files.append(files)
except ValueError as e:
# if the file is not valid, print the error
# and add the file to the list of invalid files
print("JSON object issue: %s" % e)
invalid_json_files.append(files)
print(invalid_json_files)
print(len(read_json_files))
parse()
Example:
python3 verify.py
or
python3 verify.py somefolder
tested with python 3.7.3
It was not clear to me how to provide path to the file folder, so I'd like to provide answer with this option.
path = r'C:\Users\altz7\Desktop\your_folder_name' # use your path
all_files = glob.glob(path + "/*.json")
data_list = []
invalid_json_files = []
for filename in all_files:
try:
df = pd.read_json(filename)
data_list.append(df)
except ValueError:
invalid_json_files.append(filename)
print("Files in correct format: {}".format(len(data_list)))
print("Not readable files: {}".format(len(invalid_json_files)))
#df = pd.concat(data_list, axis=0, ignore_index=True) #will create pandas dataframe
from readable files, if you like
I have a simple JSON file that I was supposed to use as a configuration file, it contains the default directories for whoever is running the script using their MacBooks:
{
"main_sheet_path": "/Users/jammer/Documents/Studios/CAT/000-WeeklyReports/2020/",
"reference_sheet_path": "/Users/jammer/Documents/DownloadedFiles/"
}
I read the JSON file and obtain the values using this code:
with open('reportconfig.json','r') as j:
config_data = json.load(j)
main_sheet_path = str(config_data.get('main_sheet_path'))
reference_sheet_path = str(config_data.get('reference_sheet_path'))
I use the path to check for a source file's existence before doing anything with it:
source_file = 'source.xlsx'
source_file = main_sheet_path + filename
if not os.path.isfile(source_file) :
print ('ERROR: Source file \'' + source_file + '\' NOT FOUND!')
return
Note that the filename is inputted as a parameter when the script is run (there are multiple files, the script has to know which one to target).
The file is there for sure but the script never seems to "see" it so I get that "ERROR" that I printed in the above code. Why do I think there are invisible characters? Because when I copy and paste from what was printed in the "error" notice above into the terminal, the last few characters of the file name always gets substituted by some invisible characters and hitting backspace erases characters where the cursor isn't supposed to be.
How do I know for sure that the file is there and that my problem is with reading the JSON file and not in the Directory names or anywhere else in the code? Because I finally gave up on using a JSON config file and went with a configuration file like this instead:
#!/usr/local/bin/python3.7
# -*- coding: utf-8 -*-
file_paths = { "main_sheet_path": "/Users/jammer/Documents/Studios/CAT/000-WeeklyReports/2020/",
"reference_sheet_path": "/Users/jammer/Documents/DownloadedFiles/"
}
I then just import the file and obtain the values like this:
import reportconfig as cfg
main_sheet_path = cfg.file_paths['main_sheet_path']
reference_sheet_path = cfg.file_paths['reference_sheet_path']
...
This workaround works perfectly — I don't get the "error" that the file isn't there when it is and the rest of the script is executed as expected. When the file isn't there, I get the proper "error" I expect and copying-and-pasting the full path and filename from the "error message" gives me the complete file name and hitting the backspace erases the right characters (no funny behavior, no invisible characters).
But could anyone please tell me how read the JSON file properly without getting those pesky invisible characters? I've spent hours trying to figure it out including searching seemingly related questions in stackoverflow but couldn't find the answer. TIA!
I think there is just a typo error in this code:
source_file = 'source.xlsx'
source_file = main_sheet_path + filename
Maybe filename is set to some other file which is not present hence it is giving you error.
Try to set filename='source.xlsx'
Maybe it will help
I have a python script that I need to run from the windows command line. The line
for filename in os.listdir(os.getcwd() + "\\sampdirectory1\\sampdirectory2"):
if filename.startswith("sample.csv"):
os.remove("sample.csv")
keeps giving me the error
The system cannot find the file specified 'sample.csv'
Well the file doesn't exist yet, it's created in the script for the first time then edited by the script every time after that. What I don't understand is why it's trying the do os.remove on sample.csv, when the if statement should fail, meaning the remove shouldn't be reached.
You can't delete it while holding on to it because "On Windows, attempting to remove a file that is in use causes an exception to be raised"
https://docs.python.org/2/library/os.html#os.remove
There are 2 things to notice here:
first: the folder/destination are different. you should be using
os.remove(os.getcwd() + "\sampdirectory1\sampdirectory2" + "sample.csv")
second: a more elegant solution would be
try:
os.remove(os.getcwd() + "\\sampdirectory1\\sampdirectory2" + "sample.csv")
except:
print ('no such file/directory')
pass
There might be a file .\sampdirectory1\sampdirectory2\sample.csv, so the condition is valid. But you're trying to delete a file .\sample.csv (sample.csv in current directory) that doesn't exist and you're getting the error.
Moreover, there might be a file .\sampdirectory1\sampdirectory2\sample.csvSOMETHING, so the condition is still valid and you're getting the error.
You need to do os.remove(filename) instead of os.remove("sample.csv")
Because at first sample.csv is not the file that you are checking if it exists before removing. And even the filename is sample.csv you need to precise to full path of the file.
And as you are iterating over listing the directory files you don't need to check if the file exists.
So if you want to remove files whose names start with sample.csv, the code should be as below:
for filename in os.listdir(os.getcwd() + "\\sampdirectory1\\sampdirectory2"):
if filename.startswith("sample.csv"):
os.remove(filename)
But if you want to remove only sample.csv, then you don't need any loop. Just do
filename = os.path.join(os.getcwd(), "sampdirectory1\\sampdirectory2\\sample.csv")
if os.path.exists(filename):
os.remove(filename)
I want to detect if a file is being written to by another process before I start to read the contents of that file.
This is on Windows and I am using Python (2.7.x).
(By the way, the Python script is providing a service where it acts on files that are placed in a specified folder. It acts on the files as soon as they are detected and it deletes the files after having acted on them. So I don't want to start acting on a file that is only partially written.)
I have found empirically that trying to rename the file to the same name will fail if the file is being written to (by another process) and will succeed (as a null-op) if the file is not in use by another process.
Something like this:
def isFileInUse(filePath):
try:
os.rename(filePath, filePath)
return False
except Exception:
return True
I haven't seen anything documented about the behaviour of os.rename when source and destination are the same.
Does anyone know of something that might go wrong with what I am doing above?
I emphasize that I am looking for a solution that works in Windows,
and I note that os.access doesn't seem to work - even with os.W_OK it returns True even if the file is being written by another process.
One thing that is nice about the above solution (renaming to the same name) is that it is atomic - which is not true if I try to rename to a temp name, then rename back to the original name.
Since you only want to read the file - why not just try to do it? Since this is the operation you are trying to do:
try:
with open("file.txt", "r") as handle:
content = handle.read()
except IOError as msg:
pass # error handling
This will try to read the content, and fail if the file is locked, or unreadable.
I see no reason to check if the file is locked if you just want to read from it - just try reading and see if that throws an exception.
I have the following code:
with open('EcoDocs TK pdfs.csv', 'rb') as pdf_in:
pdflist = csv.reader(pdf_in, quotechar='"')
for row in pdflist:
if row[1].endswith(row[2]):#check if file type is appended to file name
pathname = ''.join(row[0:2])
else:
pathname = ''.join(row)
if os.path.isfile(pathname):
filehash = md5.md5(file(pathname).read()).hexdigest()
It reads in file paths, file names and file types from a csv file. It then checks to see if the file type is appended to the file name, before joining the file path and file name. It then checks to see if the file exists, before doing something with the file. There are about 5000 file names in the csv file, but isfile only returns True for about half of these. I've manually checked that some of those isfile returns False for exist. As all the data is read in, there shouldn't be any problems with escape characters or single backslashes, so I'm a bit stumped. Any ideas? An example of the csv file format is below, as well as an example of some of the pathnamethat isfile can't find.
csv file-
c:\2dir\a. dir\d dir\lo dir\fu dir\wdir\5dir\,5_l B.xls,.xls
c:\2dir\a. dir\d dir\lo dir\fu dir\wdir\5dir\,5_l A.pdf,.pdf
pathname created-
c:\2dir\a. dir\d dir\lo dir\fu dir\wdir\5dir\5_l B.xls
c:\2dir\a. dir\d dir\lo dir\fu dir\wdir\5dir\5_l A.pdf
Thanks.
You can safely assume that os.path.isfile() works correctly. Here is my process to debug issues like this:
Add a print(pathname) before I use it.
Eyeball the output. Does anything look suspicious?
Copy the output into the clipboard -> Win+RcmdReturndirSpace" + paste into new command prompt + "Return
That checks whether the path is really correct (finds slight mistakes that eyeballing will miss). It also helps to validate the insane DOS naming conventions which are still enforced even on Windows.
if this also works, the next step is to check file and folder permissions: Make sure the user that runs the script actually has permissions to see and read the file.
EDIT Paths on Windows are ... complicated. An important detail, for example, is that "." is a very, very special character. The name "a.something very long" isn't valid in the command prompt because it demands that you have at most three characters after the last "." in a file name! You're just lucky that it doesn't demand that the name before the last dot is at most 8 characters.
Conclusion: You must be very, very, very careful with "strange characters" in file names and paths on Windows. The only characters which are safe are listed in this document.