How to read json files in subfolders? - python

I have a file path like this '/mnt/extract'. Now inside this extract folder, I have below 3 more subfolders -
subfolder1
subfolder2
subfolder3 (it has one .json file inside it)
The json in subfolder3 looks like this -
{
"x": "/mnt/extract/p",
"y": "/mnt/extract/r",
}
I want to extract the above json file from subfolder3 and concatenate the value - /mnt/extract/p for the key 'x' with one more string 'data' so that the final path will become '/mnt/extract/p/data' where I want to finally export some data. I tried the below approach but it's not working.
import os
for root, dirs, files in list(os.walk(path)):
for name in files:
print (os.path.join(root, name))

Using the in-built python Glob module, you can read files in folders and sub-folders.
Try this:
import glob
files = glob.glob('./mnt/extract/**/*.json', recursive=True)
The files list will contain paths to all json files in the extract directory.
Try this:
import glob
final_paths = []
extract_path= './mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
for file in files:
with open(file, 'r') as f:
json_file = json.load(f)
output_path = json_file['x']+'/'+'data'
final_paths.append(output_path)
The final_path variable will contain the output of all json files in the folder structure.

import glob
extract_path= '/mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
if len(files) != 0:
with open(files[0], 'r') as f:
dict = json.load(f)
final_output_path = dict['x']+'/'+'data'
In the above code, files object is returning a list containing JSON file as the only element. To make sure that we pass json object to the open method and not list, i took files[0] which will pick the json file element from list and then it was parsed easily.If anyone has some other suggestion to handle this list object which is retuning from glob function, feel free to answer as in how can we handle it in a more cleaner way.

Related

Copy and rename pictures based on xml nodes

I'm trying to copy all pictures from one directory (also including subdirectories) to another target directory. Whenever the exact picture name is found in one of the xml files the tool should grap all information (attributes in the parent and child nodes) and create subdirectories based on those node informations, also it should rename the picture file.
The part when it extracts all the information from the nodes is already done.
from bs4 import BeautifulSoup as bs
path_xml = r"path\file.xml"
content = []
with open(res, "r") as file:
content = file.readlines()
content = "".join(content)
def get_filename(_content):
bs_content = bs(_content, "html.parser")
# some code
picture_path = f'{pm_1}{pm_2}\{pm_3}\{pm_4}\{pm_5}_{pm_6}_{pm_7}\{pm_8}\{pm_9}.jpg'
get_filename(content)
So in the end I get a string value with the directory path and the file name I want.
Now I struggle with opening all xml files in one directory instead of just opening one file. I tryed this:
import os
dir_xml = r"path"
res = []
for path in os.listdir(dir_xml):
if os.path.isfile(os.path.join(dir_xml, path)):
res.append(path)
with open(res, "r") as file:
content = file.readlines()
but it gives me this error: TypeError: expected str, bytes or os.PathLike object, not list
How can i read through all xml files instead of just one? I have hundreds of xml files so that will take a wile :D
And another question: How can i create directories base on string?
Lets say the value of picture_path is AB\C\D\E_F_G\H\I.jpg
I would need another directory path for the destination of the created folders and a function that somehow creates folders based on that string. How can I do that?
To read all XML files in a directory, you can modify your code as follows:
import os
dir_xml = r"path"
for path in os.listdir(dir_xml):
if path.endswith(".xml"):
with open(os.path.join(dir_xml, path), "r") as file:
content = file.readlines()
content = "".join(content)
get_filename(content)
This code uses the os.listdir() function to get a list of all files in the directory specified by dir_xml. It then uses a for loop to iterate over the list of files, checking if each file ends with the .xml extension. If it does, it opens the file, reads its content, and passes it to the get_filename function.
To create directories based on a string, you can use the os.makedirs function. For example:
import os
picture_path = r'AB\C\D\E_F_G\H\I.jpg'
dest_path = r'path_to_destination'
os.makedirs(os.path.join(dest_path, os.path.dirname(picture_path)), exist_ok=True)
In this code, os.path.join is used to combine the dest_path and the directory portion of picture_path into a full path. os.path.dirname is used to extract the directory portion of picture_path. The os.makedirs function is then used to create the directories specified by the path, and the exist_ok argument is set to True to allow the function to succeed even if the directories already exist.
Finally, you can use the shutil library to copy the picture file to the destination and rename it, like this:
import shutil
src_file = os.path.join(src_path, picture_path)
dst_file = os.path.join(dest_path, picture_path)
shutil.copy(src_file, dst_file)
Here, src_file is the full path to the source picture file and dst_file is the full path to the destination. The shutil.copy function is then used to copy the file from the source to the destination.
You can use os.walk() for recursive search of files:
import os
dir_xml = r"path"
for root, dirs, files in os.walk(dir_xml): #topdown=False
for names in files:
if ".xml" in names:
print(f"file path: {root}\n XML-Files: {names}")
with open(names, 'r') as file:
content = file.readlines()

How to read pairwise csv and json files having same names inside a folder using python?

Consider my folder structure having files in these fashion:-
abc.csv
abc.json
bcd.csv
bcd.json
efg.csv
efg.json
and so on i.e. a pair of csv files and json files having the same names, i have to perform the same operation by reading the same named files , do some operation and proceed to the next pair of files. How do i go about this?
Basically what i have in mind as a pseudo code is:-
for files in folder_name:
df_csv=pd.read_csv('abc.csv')
df_json=pd.read_json('abc.json')
# some script to execute
#now read the next pair and repeat for all files
Did you think of something like this?
import os
# collects filenames in the folder
filelist = os.listdir()
# iterates through filenames in the folder
for file in filelist:
# pairs the .csv files with the .json files
if file.endswith(".csv"):
with open(file) as csv_file:
pre, ext = os.path.splitext(file)
secondfile = pre + ".json"
with open(secondfile) as json_file:
# do something
You can use the glob module to extract the file names matching a pattern:
import glob
import os.path
for csvfile in glob.iglob('*.csv'):
jsonfile = csvfile[:-3] + 'json'
# optionaly control file existence
if not os.path.exists(jsonfile):
# show error message
...
continue
# do smth. with csvfile
# do smth. else with jsonfile
# and proceed to next pair
If the directory structure is consistent you could do the following:
import os
for f_name in {x.split('.')[0] for x in os.listdir('./path/to/dir')}:
df_csv = pd.read_csv("{f_name}.csv")
df_json = pd.read_json("{f_name}.json")
# execute the rest

Create a loop to open subfolders within a folder read the json files and output as csv

I am trying to create a loop in python which will allow me to open a folder, iterate through the subfolders within it, read the json files and output them as a csv. Then repeat the loop for each subfolder.
My directory looks like this:
Main folder = "Exports"
Subfolder = "Folder1" , "Folder2" etc..
Files within subfolder = "file1.json" , "file2.json" etc...
Currently I am running the following code within a subfolder (for example "Folder1") to create an output file:
import pandas as pd
import os
path = os.getcwd()
frame = pd.DataFrame()
for filename in os.listdir(os.getcwd()):
root, ext = os.path.splitext(filename)
if ext == '.json':
tmp_frame = pd.read_json(filename)
frame = frame.append(tmp_frame, ignore_index=True)
frame.to_csv(os.path.join(path + ".csv"))
My question is how do I run that loop but within the main folder where it will open each subfolder, then run that loop and output the file as csv for each subfolder.
Thanks
Lets try pathlib and defaultdict from the standard lib
we can build a dictionary of subfolders as keys, and all the files as values within a list.
from pathlib import Path
from collections import defaultdict
your_path = 'target_directory'
file_dict = defaultdict(list)
for each_file in Path(p).rglob('*.csv'): # change this to `.json`
file_dict[each_file.parent].append(each_file)
print(file_dict)
your dictionary will be a list of Pathlib objects that will vaguely resemble this, the key is the sub folder (I've just printed the name here)
{Notebooks : [test.csv,
test_file.csv,
test_file_edited.csv] ,
test_csv : [File20200610.csv,
File20201012 - Copy.csv,
File20201012.csv] }
then we can just loop over the dictionary and save each object to your target folder.
for each_sub_folder,files in file_dict.items():
dfs = []
for each_file in files:
j = pd.read_json(each_file) #your read method.
dfs.append(j) # append to list.
df = pd.concat(dfs)
df.to_csv(Path(target_path).joinpath(each_sub_folder.name + '.csv'),index=False)

How can I open multiple json files in Python with a for loop?

For a data challenge at school we need to open a lot of json files with python. There are too many to open manually. Is there a way to open them with a for loop?
This is the way I open one of the json files and make it a dataframe (it works).
file_2016091718 = '/Users/thijseekelaar/Downloads/airlines_complete/airlines-1474121577751.json'
json_2016091718 = pd.read_json(file_2016091718, lines=True)
Here is a screenshot of how the map where the data is in looks (click here)
Yes, you can use os.listdir to list all the json files in your directory, create the full path for all of them and use the full path using os.path.join to open the json file
import os
import pandas as pd
base_dir = '/Users/thijseekelaar/Downloads/airlines_complete'
#Get all files in the directory
data_list = []
for file in os.listdir(base_dir):
#If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
data_list.append(json_data)
print(data_list)
Try this :
import os
# not sure about the order
for root, subdirs, files in os.walk('your/json/dir/'):
for file in files:
with open(file, 'r'):
#your stuff here

How can I extract all .zip extension in a folder without retaining directory using python?

Here is my code I don't know how can I loop every .zip in a folder, please help me: I want all contents of 5 zip files to extracted in one folder, not including its directory name
import os
import shutil
import zipfile
my_dir = r"C:\\Users\\Guest\\Desktop\\OJT\\scanner\\samples_raw"
my_zip = r"C:\\Users\\Guest\\Desktop\\OJT\\samples\\001-100.zip"
with zipfile.ZipFile(my_zip) as zip_file:
zip_file.setpassword(b"virus")
for member in zip_file.namelist():
filename = os.path.basename(member)
# skip directories
if not filename:
continue
# copy file (taken from zipfile's extract)
source = zip_file.open(member)
target = file(os.path.join(my_dir, filename), "wb")
with source, target:
shutil.copyfileobj(source, target)
repeated question, please refer below link.
How to extract zip file recursively in Pythonn
What you are looking for is glob. Which can be used like this:
#<snip>
import glob
#assuming all your zip files are in the directory below.
for my_zip in glob.glob(r"C:\\Users\\Guest\\Desktop\\OJT\\samples\\*.zip"):
with zipfile.ZipFile(my_zip) as zip_file:
zip_file.setpassword(b"virus")
for member in zip_file.namelist():
#<snip> rest of your code here.

Categories

Resources