Parse multiple json files, flatten data and save as json

Parse multiple json files, flatten data and save as json - python

I have multiple number of json files saved in a folder. I would like to parse each json file, use the library flatten and save as a seperate json file.
I have managed to do this with one json, but struggling to parse several json files at once without merging the data and then save.
I think I need to create a loop to load a json file, flatten and save until there were no more json files in the folder, is this possible?
This still seems to only parse one json file.
path_to_json='json_test/'
for file in [file for file in os.listdir(path_to_json)if file.endswith('.json')]:
with open(path_to_json + file) as json_file:
data1=json.load(json_file)
Any help would be much appreciated thanks!

Every looop 'data1' is assigned to new json files. therefore only returns one result.
Instead, append to a new list.
import os
import json
# Flatten not supported on 3.8.3
path = 'X:test folder/'
file_list = [p for p in os.listdir(path) if p.endswith('.json')]
flattened = []
for file in file_list:
with open(path + file) as json_file:
# flatten json here, can't install from pip.
flattened.append(json.load(json_file))
for file, flat_json in zip(file_list, flattened):
json.dump(flat_json, open(path + file + '_flattened.json', "w"), indent=2)

# Can yo try this out
# https://stackoverflow.com/questions/23520542/issue-with-merging-multiple-json-files-in-python
import glob
read_files = glob.glob("*.json")
with open("merged_file.json", "wb") as outfile:
outfile.write('[{}]'.format(
','.join([open(f, "rb").read() for f in read_files])))

Related

how to convert a bunch of xml files stored in s3 bucket to json format

the following is the code to convert single files , how to work if there are multiple xml files:
import xmltodict
import json
with open("demo.xml") as xml_file:
data_dict = xmltodict.parse(xml_file.read())
json_data = json.dumps(data_dict)
with open("datas.json", "w") as json_file:
json_file.write(json_data)
Im getting correct output for single file but how to execute for many files

How do I get one JSON file instead of thousands?

I'm using TweetScraper to scrape tweets with certain keywords. Right now, each tweet gets saved to a separate JSON file in a set folder, so I end up with thousands of JSON files. Is there a way to make each new tweet append to one big JSON file? If not, how do I process/work with thousands of small JSON files in Python?
Here's the part of settings.py that handles saving data:
# settings for where to save data on disk
SAVE_TWEET_PATH = './Data/tweet/'
SAVE_USER_PATH = './Data/user/'

I would read all files., put data in list and save it again as JSON
import os
import json
folder = '.'
all_tweets = []
# -- read ---
for filename in sorted(os.listdir(folder)):
if filename.endswith('.json'):
fullpath = os.path.join(folder, filename)
with open(fullpath) as fh:
tweet = json.load(fh)
all_tweets.append(tweet)
# --- save ---
with open('all_tweets.json', 'w') as fh:
json.dump(all_tweets, fh)

how to save the json file as csv file by the name of the json file with .csv extension

I have a set of 200 JSON file in a folder, i have written a code to take each file from the folder and then convert the JSON files to data-frame do the necessary step and finally save the data-frame as a csv file,the problem i face is to save the csv file, i wanted to save the file as csv in the name of the JSON file.
Since i am taking the folder and processing the files one by one how can i do that
i tried this form
df.to_csv(filename)
but i have to give the filename

Assuming you are not accessing the file by manually calling its name:
with open('whatever.json', 'rb') as file
And using something like glob. I would do something like this:
import os
#File = to whatever variable name you have assigned to the opened json file
filename = os.path.basename(File.name)
filename = filename.split('.')[0]
filename += '.csv'
as requested:
with open(filename, 'w') as file:
file.write(your csv data)
file.close()

Python - How to read CSV file retrieved from S3 bucket?

There's a CSV file in a S3 bucket that I want to parse and turn into a dictionary in Python. Using Boto3, I called the s3.get_object(<bucket_name>, <key>) function and that returns a dictionary which includes a "Body" : StreamingBody() key-value pair that apparently contains the data I want.
In my python file, I've added import csv and the examples I see online on how to read a csv file, you pass the file name such as:
with open(<csv_file_name>, mode='r') as file:
reader = csv.reader(file)
However, I'm not sure how to retrieve the csv file name from StreamBody, if that's even possible. If not, is there a better way for me to read the csv file in Python? Thanks!
Edit: Wanted to add that I'm doing this in AWS Lambda and there are documented issues with using pandas in Lambda, so this is why I wanted to use the csv library and not pandas.

csv.reader does not require a file. It can use anything that iterates through lines, including files and lists.
So you don't need a filename. Just pass the lines from response['Body'] directly into the reader. One way to do that is
lines = response['Body'].read().splitlines(True)
reader = csv.reader(lines)

To retrieve and read CSV file from s3 bucket, you can use the following code:
import csv
import boto3
from django.conf import settings
bucket_name = "your-bucket-name"
file_name = "your-file-name-exists-in-that-bucket.csv"
s3 = boto3.resource('s3', aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY)
bucket = s3.Bucket(bucket_name)
obj = bucket.Object(key=file_name)
response = obj.get()
lines = response['Body'].read().decode('utf-8').splitlines(True)
reader = csv.DictReader(lines)
for row in reader:
# csv_header_key is the header keys which you have defined in your csv header
print(row['csv_header_key1'], row['csv_header_key2')

How to create a csv file in Python, and export (put) it to some local directory

This problem may be tricky.
I want to create a csv file from a list in Python. This csv file does not exist before. And then export it to some local directory. There is no such file in the local directory either. We just create a new csv file, and export (put) the csv file in some local directory.
I found that StringIO.StringIO can generate the csv file from a list in Python, then what are the next steps.
Thank you.
And I found the following code can do it:
import os
import os.path
import StringIO
import csv
dir = r"C:\Python27"
if not os.path.exists(dir):
os.mkdir(dir)
my_list=[[1,2,3],[4,5,6]]
with open(os.path.join(dir, "filename"+'.csv'), "w") as f:
csvfile=StringIO.StringIO()
csvwriter=csv.writer(csvfile)
for l in my_list:
csvwriter.writerow(l)
for a in csvfile.getvalue():
f.writelines(a)

Did you read the docs?
https://docs.python.org/2/library/csv.html
Lots of examples on that page of how to read / write CSV files.
One of them:
import csv
with open('some.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(someiterable)

import csv
with open('/path/to/location', 'wb') as f:
writer = csv.writer(f)
writer.writerows(youriterable)
https://docs.python.org/2/library/csv.html#examples

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse multiple json files, flatten data and save as json - python

# Can yo try this out # https://stackoverflow.com/questions/23520542/issue-with-merging-multiple-json-files-in-python import glob read_files = glob.glob("*.json") with open("merged_file.json", "wb") as outfile: outfile.write('[{}]'.format( ','.join([open(f, "rb").read() for f in read_files])))

Related

how to convert a bunch of xml files stored in s3 bucket to json format

How do I get one JSON file instead of thousands?

how to save the json file as csv file by the name of the json file with .csv extension

Python - How to read CSV file retrieved from S3 bucket?

How to create a csv file in Python, and export (put) it to some local directory

Categories

Resources