I am having some issues with os.path.join and a Windows system. I have created a script that recursively reads files containing unstructured JSON data, creates a directory named "converted_json", and prints the content of each unstructured JSON file in a structured format into a new file within the "converted_json" directory.
I have tested the script below on macOS and upon execution, the structured JSON data is printed to new files and the new files are output to the "converted_json" directory. However, when I execute the script on a Windows system, the JSON data is printed to new files, but the files are not output to the "converted_json" directory.
Essentially, the following os.path.join code does not appear to be working on Windows in the following section:
conv_json = open(os.path.join(converted_dir, str(file_name[-1]) + '_converted'), 'wb')
The files are created, however they are not stored within the "converted_json" directory that is specified by the converted_dir variable.
The following output is from printing the "conv_json" variable:
open file 'C:\Users\test\Desktop\test\file_name.json.gz.json_converted', mode 'wb' at 0x0000000002617930
As seen from above, the file path contained within the "conv_json" variable does not contain the "converted_json" directory (it should be there from using os.path.join and the converted_dir variable.
Any assistance as to how to get the structured data to output to the "converted_json" directory would be greatly appreciated.
Code below:
argparser = argparse.ArgumentParser()
argparser.add_argument('-d', '--d', dest='dir_path', type=str, default=None, required=True, help='Directory path to Archive/JSON files')
args = argparser.parse_args()
dir_path = args.dir_path
converted_dir = os.path.join(dir_path, 'converted_json')
os.mkdir(converted_dir, 0777)
for subdir1, dirs1, files1 in os.walk(dir_path):
for file in files1:
try:
if file.endswith(".json"):
file = open(os.path.join(subdir1, file))
file_name = str.split(file.name, '/')
conv_json = open(os.path.join(converted_dir, str(file_name[-1]) + '_converted'), 'wb')
conv_json.write('#################################################################################################################################')
conv_json.write('\n')
conv_json.write('File Name: ' + file_name[-1])
conv_json.write('\n')
conv_json.write('#################################################################################################################################')
conv_json.write('\n')
parsed_json = json.load(file)
s = cStringIO.StringIO()
pprint.pprint(parsed_json, s)
conv_json.write(s.getvalue())
conv_json.close()
except:
print 'JSON Files Not Found'
print 'JSON Processing Completed: ' + str(datetime.datetime.now())
I think that this line is bad on Windows:
file_name = str.split(file.name, '/')
The split on '/' will not split at all. You should use os.path.sep instead.
I think os.path.join reacts so confusing since the second part you try to join is already a full file path (since the split failed).
Related
I'm trying to copy all pictures from one directory (also including subdirectories) to another target directory. Whenever the exact picture name is found in one of the xml files the tool should grap all information (attributes in the parent and child nodes) and create subdirectories based on those node informations, also it should rename the picture file.
The part when it extracts all the information from the nodes is already done.
from bs4 import BeautifulSoup as bs
path_xml = r"path\file.xml"
content = []
with open(res, "r") as file:
content = file.readlines()
content = "".join(content)
def get_filename(_content):
bs_content = bs(_content, "html.parser")
# some code
picture_path = f'{pm_1}{pm_2}\{pm_3}\{pm_4}\{pm_5}_{pm_6}_{pm_7}\{pm_8}\{pm_9}.jpg'
get_filename(content)
So in the end I get a string value with the directory path and the file name I want.
Now I struggle with opening all xml files in one directory instead of just opening one file. I tryed this:
import os
dir_xml = r"path"
res = []
for path in os.listdir(dir_xml):
if os.path.isfile(os.path.join(dir_xml, path)):
res.append(path)
with open(res, "r") as file:
content = file.readlines()
but it gives me this error: TypeError: expected str, bytes or os.PathLike object, not list
How can i read through all xml files instead of just one? I have hundreds of xml files so that will take a wile :D
And another question: How can i create directories base on string?
Lets say the value of picture_path is AB\C\D\E_F_G\H\I.jpg
I would need another directory path for the destination of the created folders and a function that somehow creates folders based on that string. How can I do that?
To read all XML files in a directory, you can modify your code as follows:
import os
dir_xml = r"path"
for path in os.listdir(dir_xml):
if path.endswith(".xml"):
with open(os.path.join(dir_xml, path), "r") as file:
content = file.readlines()
content = "".join(content)
get_filename(content)
This code uses the os.listdir() function to get a list of all files in the directory specified by dir_xml. It then uses a for loop to iterate over the list of files, checking if each file ends with the .xml extension. If it does, it opens the file, reads its content, and passes it to the get_filename function.
To create directories based on a string, you can use the os.makedirs function. For example:
import os
picture_path = r'AB\C\D\E_F_G\H\I.jpg'
dest_path = r'path_to_destination'
os.makedirs(os.path.join(dest_path, os.path.dirname(picture_path)), exist_ok=True)
In this code, os.path.join is used to combine the dest_path and the directory portion of picture_path into a full path. os.path.dirname is used to extract the directory portion of picture_path. The os.makedirs function is then used to create the directories specified by the path, and the exist_ok argument is set to True to allow the function to succeed even if the directories already exist.
Finally, you can use the shutil library to copy the picture file to the destination and rename it, like this:
import shutil
src_file = os.path.join(src_path, picture_path)
dst_file = os.path.join(dest_path, picture_path)
shutil.copy(src_file, dst_file)
Here, src_file is the full path to the source picture file and dst_file is the full path to the destination. The shutil.copy function is then used to copy the file from the source to the destination.
You can use os.walk() for recursive search of files:
import os
dir_xml = r"path"
for root, dirs, files in os.walk(dir_xml): #topdown=False
for names in files:
if ".xml" in names:
print(f"file path: {root}\n XML-Files: {names}")
with open(names, 'r') as file:
content = file.readlines()
My English is very poor, and the use of the Google translation, I am sorry for that. :)
Unable to save filename, error indicating no directory exists, but directory exists.
1.You can manually create the file in the resource manager --> the file name is legal.
2.You can manually create a directory in the resource manager --> the directory name is legal
3.You can save other file names such as aaa.png to this directory, that is, this directory can be written to other files --> The path path is legal, there is no permission problem, and there is no problem with the writing method.
4.The file can be written to the upper-level directory download_pictures --> It's not a file name problem.
thank you!!!
import os
path = 'download_pictures\\landscape[or]no people[or]nature[OrderBydata]\\'
download_name = '[6]772803-2500x1459-genshin+impact-lumine+(genshin+impact)-arama+(genshin+impact)-aranara+(genshin+impact)-arabalika+(genshin+impact)-arakavi+(genshin+impact).png'
filename = path + download_name
print('filename = ', filename)
# Create the folder make sure the path exists
if not os.path.exists(path):
os.makedirs(path)
try:
with open(filename, 'w') as f:
f.write('test')
except Exception as e:
print('\n【error!】First save file, failed, caught exception:', e)
print(filename)
filename = path + 'aaa.png'
with open(filename, 'w') as f:
print('\nThe second save file, changed the file name aaa.png, the path remains unchanged')
f.write('test')
print(filename)
path = 'download_pictures\\'
filename = path + download_name
with open(filename, 'w') as f:
print('\nThe third save file, the file name is unchanged, but the directory has changed')
f.write('test')
console
filename = download_pictures\landscape[or]no people[or]nature[OrderBydata]\[6]772803-2500x1459-genshin+impact-lumine+(genshin+impact)-arama+(genshin+impact)-aranara+(genshin+impact)-arabalika+(genshin+impact)-arakavi+(genshin+impact).png
【error!】First save file, failed, caught exception: [Errno 2] No such file or directory: 'download_pictures\\landscape[or]no people[or]nature[OrderBydata]\\[6]772803-2500x1459-genshin+impact-lumine+(genshin+impact)-arama+(genshin+impact)-aranara+(genshin+impact)-arabalika+(genshin+impact)-arakavi+(genshin+impact).png'
download_pictures\landscape[or]no people[or]nature[OrderBydata]\[6]772803-2500x1459-genshin+impact-lumine+(genshin+impact)-arama+(genshin+impact)-aranara+(genshin+impact)-arabalika+(genshin+impact)-arakavi+(genshin+impact).png
The second save file, changed the file name aaa.png, the path remains unchanged
download_pictures\landscape[or]no people[or]nature[OrderBydata]\aaa.png
The third save file, the file name is unchanged, but the directory has changed
Process finished with exit code 0
I couldn't replicate your error (I'm using linux and I think you have a Windows system), but anyway, you should not try to join paths manually. Instead try to use os.path.join to join multiple paths to one valid path. This will also ensure that based on your operating system the correct path separators are used (forward slash on unix and backslash on Windows).
I have adapted the code until the first saving attempt accordingly and it writes a file correctly. Also, the code gets cleaner this way and it's easier to see the separate folder names.
import os
if __name__ == '__main__':
path = os.path.join('download_pictures', 'landscape[or]no people[or]nature[OrderBydata]')
download_name = '[6]772803-2500x1459-genshin+impact-lumine+(genshin+impact)-arama+(genshin+impact)-aranara+(genshin+impact)-arabalika+(genshin+impact)-arakavi+(genshin+impact).png'
filename = os.path.join(path, download_name)
print('filename = ', filename)
# Create the folder make sure the path exists
if not os.path.exists(path):
os.makedirs(path)
try:
with open(filename, 'w') as f:
f.write('test')
except Exception as e:
print('\n【error!】First save file, failed, caught exception:', e)
print(filename)
I hope this helps. I think the issue with your approach is related to the path separators \\ under Windows.
I have a python script that creates a PDF and saves it in a subfoler of the folder where the script is saved. I have the following that saves the file to the subfolder:
outfilename = "Test" + ".pdf" #in real code there is a var that holds the name of the file
outfiledir = 'C:/Users/JohnDoe/Desktop/dev/PARENTFOLDER/SUBFOLDER/' #parent folder is where the script is - subfolder is where the PDFs get saved to
outfilepath = os.path.join(outfiledir, outfilename)
Is there a way I can save the PDFs to the subfolder without having to specify the full path? Lets say I wanted yto make this script an exe that multiple computers could use, how would I display the path so that the PDFs are just saved in the subfoler?
Thanks!
Try it:
import os
dir_name = os.path.dirname(os.path.abspath(__file__)) + '/subdir'
path = os.path.join(dir_name, 'filename')
I am trying to rename a set of files in a directory using python. The files are currently labelled with a Pool number, AR number and S number (e.g. Pool1_AR001_S13__fw_paired.fastq.gz.) Each file refers to a specific plant sequence name. I would like to rename these files by removing the 'Pool_AR_S' and replacing it with the sequence name e.g. 'Lbienne_dor5_GS1', while leaving the suffix (e.g. fw_paired.fastq.gz, rv_unpaired.fastq.gz), I am trying to read the files into a dictionary, but I am stuck as to what to do next. I have a .txt file containing the necessary information in the following format:
Pool1_AR010_S17 - Lbienne_lla10_GS2
Pool1_AR011_S18 - Lbienne_lla10_GS3
Pool1_AR020_S19 - Lcampanulatum_borau4_T_GS1
The code I have so far is:
from optparse import OptionParser
import csv
import os
parser = OptionParser()
parser.add_option("-w", "--wanted", dest="w")
parser.add_option("-t","--trimmed", dest="t")
parser.add_option("-d", "--directory", dest="working_dir", default="./")
(options, args) = parser.parse_args()
wanted_file = options.w
trimmomatic_output = options.t
#Read the wanted file and create a dictionary of index vs species identity
with open(wanted_file, 'rb') as species_sequence:
species_list = list(csv.DictReader(species_sequence, delimiter='-'))
print species_list
#Rename the Trimmomatic Output files according to the dictionary
for trimmed_sequence in os.listdir(trimmomatic_output):
os.rename(os.path.join(trimmomatic_output, trimmed_sequence),
os.path.join(trimmomatic_output, trimmed_sequence.replace(species_list[0], species_list[1]))
Please can you help me to replace half of the . I'm very new to python and to stack overflow, so I am sorry if this question has been asked before or if I have asked this in the wrong place.
First job is to get rid of all those modules. They may be nice, but for a job like yours they are very unlikely to make things easier.
Create a .py file in the directory where those .gz files reside.
import os
files = os.listdir() #files is of list type
#'txt_file' is the path of your .txt file containing those conversions
dic=parse_txt(txt_file) #omitted the body of parse_txt() func.Should return a dictionary by parsing that .txt file
for f in files:
pre,suf=f.split('__') #"Pool1_AR001_S13__(1)fw_paired.fastq.gz"
#(1)=assuming prefix and suffix are divided by double underscore
pre = dic[pre]
os.rename(f,pre+'__'+suf)
If you need help with parse_txt() function, let me know.
Here is a solution that I tested with Python 2. Its fine if you use your own logic instead of the get_mappings function. Refer comments in code for explanation.
import os
def get_mappings():
mappings_dict = {}
with(open('wanted_file.txt', 'r')) as f:
for line in f:
# if you have Pool1_AR010_S17 - Lbienne_lla10_GS2
# it becomes a list i.e ['Pool1_AR010_S17 ', ' Lbienne_lla10_GS2']
#note that there may be spaces before/after the names as shown above
text = line.split('-')
#trim is used to remove spaces in the names
mappings_dict[text[0].strip()] = text[1].strip()
return mappings_dict
#PROGRAM EXECUTION STARTS FROM HERE
#assuming all files are in the current directory
# if not replace the dot(.) with the path of the directory where you have the files
files = os.listdir('.')
wanted_names_dict = get_mappings()
for filename in files:
try:
#prefix='Pool1_AR010_S17', suffix='fw_paired.fastq.gz'
prefix, suffix = filename.split('__')
new_filename = wanted_names_dict[prefix] + '__' + suffix
os.rename(filename, new_filename)
print 'renamed', filename, 'to', new_filename
except:
print 'No new name defined for file:' + filename
My script is downloading files from URLs located in a text file, saving them temporarily to a given location, and then adding them to an already existing zip file in the same directory. The files are being downloaded successfully, and no errors are raised when adding to the zip files, but for some reason, most of the resulting zip files are un-openable by the OS, and when I z.printdir() on them, they do not contain all the expected files.
relevant code:
for root, dirs, files in
os.walk(os.path.join(downloadsdir,dir_dictionary['content']), False):
if "artifacts" in root:
solution_name = root.split('/')[-2]
with open(os.path.join(root,'non-local-files.txt')) as file:
for line in file:
if "string" in line:
print('\tDownloading ' + urllib.unquote(urllib.unquote(line.rstrip())))
file_name = urllib.unquote(urllib.unquote(line.rstrip())).split('/')[-1]
r = requests.get(urllib.unquote(urllib.unquote(line.rstrip())))
with open(os.path.join(root,file_name), 'wb') as temp_file:
temp_file.write(r.content)
z = zipfile.ZipFile(os.path.join(root, solution_name + '.zip'), 'a')
z.write(os.path.join(root,file_name), os.path.join('Dropoff', file_name))
I guess my question is: am I doing something inherently wrong in the code, or do I have to look at the actual files being added to the zip files? The files are all OS-readable and appear normal as far as I can tell. Kind of at a loss as to how to proceed.
for root, dirs, files in
os.walk(os.path.join(downloadsdir,dir_dictionary['content']), False):
if "artifacts" in root:
solution_name = root.split('/')[-2]
with open(os.path.join(root,'non-local-files.txt')) as file:
for line in file:
if "string" in line:
print('\tDownloading ' + urllib.unquote(urllib.unquote(line.rstrip())))
file_name = urllib.unquote(urllib.unquote(line.rstrip())).split('/')[-1]
r = requests.get(urllib.unquote(urllib.unquote(line.rstrip())))
with open(os.path.join(root,file_name), 'wb') as temp_file:
temp_file.write(r.content)
z = zipfile.ZipFile(os.path.join(root, solution_name + '.zip'), 'a')
try:
z.write(os.path.join(root,file_name), os.path.join('Dropoff', file_name))
finally:
z.close()
PS:
https://docs.python.org/2/library/zipfile.html
Note
Archive names should be relative to the archive root, that is, they should not start with a path separator.
here is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin.