I am trying to open a directory in which there are multiple json files, to then make a data frame of the data in each of them, I try this:
for file in os.listdir('Datasets/'):
json_read = pd.read_json(file)
However, it gives me an error:
ValueError: Expected object or value
As I inspect the type of the files, it says they are class str. When opening a single file in the directory with read_json it does work correctly as the file is recognized as json. I am not quite sure why the files are turned into strings nor how to solve it. Do you have any tips?
Thanks in advance!
import os
import pandas as pd
base_dir = '/path/to/dir'
#Get all files in the directory
data_list = []
for file in os.listdir(base_dir):
#If file is a json, construct it's full path and open it, append all json data to list
if file.endswith('json'):
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
data_list.append(json_data)
print(data_list)
You probably need to build a list of DataFrames. You may not be able to process every file in the given directory so try this:
import pandas as pd
from glob import glob
from os.path import join
BASEDIR = 'Datasets'
dataframes = []
for file in glob(join(BASEDIR, '*.json')):
try:
dataframes.append(pd.read_json(file))
except ValueError:
print(f'Unable to process {file}')
print(f'Successfully constructed {len(dataframes)} dataframes')
import os
import json
#heres some information about get list of file in a folder: <https://www.geeksforgeeks.org/python-list-files-in-a-directory/>
#heres some information about how to open a json file: <https://www.geeksforgeeks.org/read-json-file-using-python/>
path = "./data"
file_list = os.listdir(path) #opens the directory form path and gets name of file
for i in range(len(file_list)): #loop the list with index number
current = open(path+"/"+file_list[i]) #opens the file in order by current index number
data = json.load(current) #loads the data form file
for k in data['01']: #
print(k)
output
main.json :
{'name': 'Nava', 'Surname': 'No need for to that'}
data.json :
{'name': 'Nava', 'watchs': 'Anime'}
heres a link to run code online https://replit.com/#Nava10Y/opening-open-multiple-json-files-in-a-directory
Related
I have 7 vcf files present in 2 directories:
dir
I want to concatenate all files present on both folders and then read them through python.
I am trying this code:
# Import Modules
import os
import pandas as pd
import vcf
# Folder Path
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/"
path2 = "C://Users//USER//Desktop//Anas/VCFs_2/"
#os.chdir(path1)
def read(f1,f2):
reader = vcf.Reader(open(f1,f2))
df = pd.DataFrame([vars(r) for r in reader])
out = df.merge(pd.DataFrame(df.INFO.tolist()),
left_index=True, right_index=True)
return out
# Read text File
def read_text_file(file_path1,file_path2):
with open(file_path1, 'r') as f:
with open(file_path2,'r') as f:
print(read(path1,path2))
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".vcf"):
file_path1 = f"{path1}\{file}"
file_path2 = f"{path2}\{file}"
print(file_path1,"\n\n",file_path2)
# call read text file function
#data = read_text_file(path1,path2)
print(read_text_file(path1,path2))
But its giving me permission error. I know when we try to read folders instead files then we get this error. But how can i read files present in folders? Any suggestion?
You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.
I have a file path like this '/mnt/extract'. Now inside this extract folder, I have below 3 more subfolders -
subfolder1
subfolder2
subfolder3 (it has one .json file inside it)
The json in subfolder3 looks like this -
{
"x": "/mnt/extract/p",
"y": "/mnt/extract/r",
}
I want to extract the above json file from subfolder3 and concatenate the value - /mnt/extract/p for the key 'x' with one more string 'data' so that the final path will become '/mnt/extract/p/data' where I want to finally export some data. I tried the below approach but it's not working.
import os
for root, dirs, files in list(os.walk(path)):
for name in files:
print (os.path.join(root, name))
Using the in-built python Glob module, you can read files in folders and sub-folders.
Try this:
import glob
files = glob.glob('./mnt/extract/**/*.json', recursive=True)
The files list will contain paths to all json files in the extract directory.
Try this:
import glob
final_paths = []
extract_path= './mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
for file in files:
with open(file, 'r') as f:
json_file = json.load(f)
output_path = json_file['x']+'/'+'data'
final_paths.append(output_path)
The final_path variable will contain the output of all json files in the folder structure.
import glob
extract_path= '/mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
if len(files) != 0:
with open(files[0], 'r') as f:
dict = json.load(f)
final_output_path = dict['x']+'/'+'data'
In the above code, files object is returning a list containing JSON file as the only element. To make sure that we pass json object to the open method and not list, i took files[0] which will pick the json file element from list and then it was parsed easily.If anyone has some other suggestion to handle this list object which is retuning from glob function, feel free to answer as in how can we handle it in a more cleaner way.
I want to read in a txt file that sits in a folder within a zipped folder as a pandas data frame.
I've looked at how to read in a txt file and how to access a file from within a zipped folder, Load data from txt with pandas and Download Returned Zip file from URL respectively.
The problem is I get a KeyError message with my code.
I think it's because my txt file sits in a folder within a folder?
Thanks for any help!
# MWE
import requests
import pandas as pd
from zipfile import ZipFile
from io import BytesIO
txt_raw = 'hcc-data.txt'
zip_raw = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00423/hcc-survival.zip'
r = requests.get(zip_raw)
files = ZipFile(BytesIO(r.content))
df_raw = pd.read_csv(files.open(txt_raw), sep=",", header=None)
# ERROR
KeyError: "There is no item named 'hcc-data.txt' in the archive"
You need to add full path to the file:
txt_raw = 'hcc-survival/hcc-data.txt'
For a data challenge at school we need to open a lot of json files with python. There are too many to open manually. Is there a way to open them with a for loop?
This is the way I open one of the json files and make it a dataframe (it works).
file_2016091718 = '/Users/thijseekelaar/Downloads/airlines_complete/airlines-1474121577751.json'
json_2016091718 = pd.read_json(file_2016091718, lines=True)
Here is a screenshot of how the map where the data is in looks (click here)
Yes, you can use os.listdir to list all the json files in your directory, create the full path for all of them and use the full path using os.path.join to open the json file
import os
import pandas as pd
base_dir = '/Users/thijseekelaar/Downloads/airlines_complete'
#Get all files in the directory
data_list = []
for file in os.listdir(base_dir):
#If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
data_list.append(json_data)
print(data_list)
Try this :
import os
# not sure about the order
for root, subdirs, files in os.walk('your/json/dir/'):
for file in files:
with open(file, 'r'):
#your stuff here
I'm learning Python and can't seem to get pandas dataframes to save. I'm not getting any errors, the file just doesn't appear in the folder.
I'm using a windows10 machine, python3, jupyter notebook and saving to a local google drive folder.
Any ideas?
import feedparser
import pandas as pd
rawrss = [
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
]
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init
df.to_csv('df.to_csv('c:\\Users\\username\\Documents\\myfilename.csv', index=False)', index=False)
The file should be saved in the current working directory.
import os
cwd = os.getcwd()
print(cwd)
In the last line of your code change this:
df.to_csv('C://myfilename.csv', index=False)
Now your file is saved in C drive.
You can change the path as per you wish.
for eg.
df.to_csv('C://Folder//myfilename.csv', index=False)
2.Alternatively ,If you want to locate where your file is stored.
import os
print(os.getcwd())
This gives you the directory where the files are stored.
you can also change your working directory as per your wish.
Just at the beginning of your code
import os
os.chdir("path_to_folder")
In this case then no need of specifying path at the time of saving it to CSV.
You can write a function that saves the file and returns a boolean value as the following:
import os
def save_data(path, file, df):
if (df.to_csv(saving_path + file + '.csv', index = False)):
return True
else:
return False
But you have to provide the right path though.
add this code to the bottom of your file
import os
print(os.getcwd())
That's where your file is
Try writing a simple file with a new script.
F = open(your_path_with_filename, 'w')
F.write("hello")
F.close()