I have multiple number of json files saved in a folder. I would like to parse each json file, use the library flatten and save as a seperate json file.
I have managed to do this with one json, but struggling to parse several json files at once without merging the data and then save.
I think I need to create a loop to load a json file, flatten and save until there were no more json files in the folder, is this possible?
This still seems to only parse one json file.
path_to_json='json_test/'
for file in [file for file in os.listdir(path_to_json)if file.endswith('.json')]:
with open(path_to_json + file) as json_file:
data1=json.load(json_file)
Any help would be much appreciated thanks!
Every looop 'data1' is assigned to new json files. therefore only returns one result.
Instead, append to a new list.
import os
import json
# Flatten not supported on 3.8.3
path = 'X:test folder/'
file_list = [p for p in os.listdir(path) if p.endswith('.json')]
flattened = []
for file in file_list:
with open(path + file) as json_file:
# flatten json here, can't install from pip.
flattened.append(json.load(json_file))
for file, flat_json in zip(file_list, flattened):
json.dump(flat_json, open(path + file + '_flattened.json', "w"), indent=2)
# Can yo try this out
# https://stackoverflow.com/questions/23520542/issue-with-merging-multiple-json-files-in-python
import glob
read_files = glob.glob("*.json")
with open("merged_file.json", "wb") as outfile:
outfile.write('[{}]'.format(
','.join([open(f, "rb").read() for f in read_files])))
Related
the following is the code to convert single files , how to work if there are multiple xml files:
import xmltodict
import json
with open("demo.xml") as xml_file:
data_dict = xmltodict.parse(xml_file.read())
json_data = json.dumps(data_dict)
with open("datas.json", "w") as json_file:
json_file.write(json_data)
Im getting correct output for single file but how to execute for many files
I'm using TweetScraper to scrape tweets with certain keywords. Right now, each tweet gets saved to a separate JSON file in a set folder, so I end up with thousands of JSON files. Is there a way to make each new tweet append to one big JSON file? If not, how do I process/work with thousands of small JSON files in Python?
Here's the part of settings.py that handles saving data:
# settings for where to save data on disk
SAVE_TWEET_PATH = './Data/tweet/'
SAVE_USER_PATH = './Data/user/'
I would read all files., put data in list and save it again as JSON
import os
import json
folder = '.'
all_tweets = []
# -- read ---
for filename in sorted(os.listdir(folder)):
if filename.endswith('.json'):
fullpath = os.path.join(folder, filename)
with open(fullpath) as fh:
tweet = json.load(fh)
all_tweets.append(tweet)
# --- save ---
with open('all_tweets.json', 'w') as fh:
json.dump(all_tweets, fh)
I have a set of 200 JSON file in a folder, i have written a code to take each file from the folder and then convert the JSON files to data-frame do the necessary step and finally save the data-frame as a csv file,the problem i face is to save the csv file, i wanted to save the file as csv in the name of the JSON file.
Since i am taking the folder and processing the files one by one how can i do that
i tried this form
df.to_csv(filename)
but i have to give the filename
Assuming you are not accessing the file by manually calling its name:
with open('whatever.json', 'rb') as file
And using something like glob. I would do something like this:
import os
#File = to whatever variable name you have assigned to the opened json file
filename = os.path.basename(File.name)
filename = filename.split('.')[0]
filename += '.csv'
as requested:
with open(filename, 'w') as file:
file.write(your csv data)
file.close()
There's a CSV file in a S3 bucket that I want to parse and turn into a dictionary in Python. Using Boto3, I called the s3.get_object(<bucket_name>, <key>) function and that returns a dictionary which includes a "Body" : StreamingBody() key-value pair that apparently contains the data I want.
In my python file, I've added import csv and the examples I see online on how to read a csv file, you pass the file name such as:
with open(<csv_file_name>, mode='r') as file:
reader = csv.reader(file)
However, I'm not sure how to retrieve the csv file name from StreamBody, if that's even possible. If not, is there a better way for me to read the csv file in Python? Thanks!
Edit: Wanted to add that I'm doing this in AWS Lambda and there are documented issues with using pandas in Lambda, so this is why I wanted to use the csv library and not pandas.
csv.reader does not require a file. It can use anything that iterates through lines, including files and lists.
So you don't need a filename. Just pass the lines from response['Body'] directly into the reader. One way to do that is
lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)
To retrieve and read CSV file from s3 bucket, you can use the following code:
import csv
import boto3
from django.conf import settings
bucket_name = "your-bucket-name"
file_name = "your-file-name-exists-in-that-bucket.csv"
s3 = boto3.resource('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
bucket = s3.Bucket(bucket_name)
obj = bucket.Object(key=file_name)
response = obj.get()
lines = response['Body'].read().decode('utf-8').splitlines(True)
reader = csv.DictReader(lines)
for row in reader:
# csv_header_key is the header keys which you have defined in your csv header
print(row['csv_header_key1'], row['csv_header_key2')
This problem may be tricky.
I want to create a csv file from a list in Python. This csv file does not exist before. And then export it to some local directory. There is no such file in the local directory either. We just create a new csv file, and export (put) the csv file in some local directory.
I found that StringIO.StringIO can generate the csv file from a list in Python, then what are the next steps.
Thank you.
And I found the following code can do it:
import os
import os.path
import StringIO
import csv
dir = r"C:\Python27"
if not os.path.exists(dir):
os.mkdir(dir)
my_list=[[1,2,3],[4,5,6]]
with open(os.path.join(dir, "filename"+'.csv'), "w") as f:
csvfile=StringIO.StringIO()
csvwriter=csv.writer(csvfile)
for l in my_list:
csvwriter.writerow(l)
for a in csvfile.getvalue():
f.writelines(a)
Did you read the docs?
https://docs.python.org/2/library/csv.html
Lots of examples on that page of how to read / write CSV files.
One of them:
import csv
with open('some.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
import csv
with open('/path/to/location', 'wb') as f:
writer = csv.writer(f)
writer.writerows(youriterable)
https://docs.python.org/2/library/csv.html#examples