I have a JSON file containing a list of 9000 dictionaries.
Sample
[{"auftrags_id":348667,"vertrags_id":11699,"ursprungsauftrag":"",
"umsatz":0.28,"brutto":0.33,"vertrauensschadenhaftpflicht":"",
"stornoreserve":"","umsatzsteuer_betrag":"0.05","netto":0.28,
"steuerpflichtig":"0","art_der_rechnung":"Rechnung","vp_nummer":538},
{"auftrags_id":348668,"vertrags_id":11699,"ursprungsauftrag":"",
"umsatz":0.28,"brutto":0.33,"vertrauensschadenhaftpflicht":"",
"stornoreserve":"","umsatzsteuer_betrag":"0.05","netto":0.28,
"steuerpflichtig":"0","art_der_rechnung":"Rechnung","vp_nummer":538},
{"auftrags_id":349210,"vertrags_id":24894,"ursprungsauftrag":"X",
"umsatz":0.87,"brutto":1.04,"vertrauensschadenhaftpflicht":"X",
"stornoreserve":"X","umsatzsteuer_betrag":"0.17","netto":0.87,
"steuerpflichtig":"0","art_der_rechnung":"Rechnung","vp_nummer":538}]
To upload the JSON-File to Postgresql, I need to replace the X with a value accepted as float. I think replacing the value 'X' with '0.0001' in every dictionary would do it. Then there are values with no content, "", I don't know how to handle them,maybe also replacing them with '0.0001', just for purpose of uploading.
Desired output:
[{"auftrags_id":348667,"vertrags_id":11699,"ursprungsauftrag":0.0001,
"umsatz":0.28,"brutto":0.33,"vertrauensschadenhaftpflicht":0.0001,
"stornoreserve":0.0001,"umsatzsteuer_betrag":0.05,"netto":0.28,
"steuerpflichtig":0.0001,"art_der_rechnung":"Rechnung","vp_nummer":538},
{"auftrags_id":348668,"vertrags_id":11699,"ursprungsauftrag":0.0001,
"umsatz":0.28,"brutto":0.33,"vertrauensschadenhaftpflicht":0.0001,
"stornoreserve":0.0001,"umsatzsteuer_betrag":0.05,"netto":0.28,
"steuerpflichtig":0.0001,"art_der_rechnung":"Rechnung","vp_nummer":538},
{"auftrags_id":349210,"vertrags_id":24894,"ursprungsauftrag":0.0001,
"umsatz":0.87,"brutto":1.04,"vertrauensschadenhaftpflicht":0.0001,
"stornoreserve":0.0001,"umsatzsteuer_betrag":0.17,"netto":0.87,
"steuerpflichtig":"0","art_der_rechnung":"Rechnung","vp_nummer":538}]
I already have a code to upload the file, but i need to clean the JSON file for Postgresql to accept it. Appreciate any help!
You could use sed as suggested in the comment under your question. It is a command for the Linux command line (shell). See an article about Linux shell here and about the sed command here.
The Python solution:
#!/usr/bin/python3
import json # load the builtin JSON module
JSON_FILE_NAME = "dictionaries.json" # the name of your file with those dictionaries
RESULT_FILE_NAME = "result.json" # the name of the file that will be created
# Load the file contents into variable dictionaries
with open(JSON_FILE_NAME, "r", encoding="utf8") as file:
dictionaries = json.load(file)
result = []
for dictionary in dictionaries: # loop over the dictionaries
for key, value in dictionary.items(): # loop over the key and value pairs in the dictionary
if value in ("", "X"):
# if the value is an empty string or "X", change it to 0.0001
dictionary[key] = 0.0001
# append the dictionary to the result list
result.append(dictionary)
# save the result to a file
with open(RESULT_FILE_NAME, "w", encoding="utf8") as file:
json.dump(result, file)
Related
I am trying to extract some data from JSON files, which are have all the same structure and then write the chosen data into a new JSON file. My goal is to create a new JSON file which is more or less a list of each JSON file in my folder with the data:
Filename, triggerdata, velocity {imgVel, trigVel}, coordinates.
In a further step of my programme, I will need this new splitTest1 for analysing the data of the different files.
I have the following code:
base_dir = 'mypath'
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
if splitTest1[file]['20mm PSL'] == to_find:
splitTest1['20mm PSL'].append({
'filename': os.path.basename(base_dir),
'triggerdata': ['rawData']['adcDump']['0B'],
'velocity': {
'imgVel': ['computedData']['particleProperties']['imgVelocity'],
'trigVel': ['computedData']['img0Properties']['coordinates']},
'coordinates': ['computedData']['img1Properties']['coordinates']})
print(len(splitTest1))
When I run the code, I get this error:
'triggerdata': ['rawData']['adcDump']['0B'], TypeError: list indices must be integers or slices, not str
What is wrong with the code? How do I fix this?
This is my previous code how I accessed that data without saving it in another JSON File:
with open('myJsonFile.json') as f0:
d0 = json.load(f0)
y00B = d0['rawData']['adcDump']['0B']
x = np.arange(0, (2048 * 0.004), 0.004) # in ms, 2048 Samples, 4us
def getData():
return y00B, x
def getVel():
imgV = d0['computedData']['particleProperties']['imgVelocity']
trigV = d0['computedData']['trigger']['trigVelocity']
return imgV, trigV
Basically, I am trying to put this last code snippet into a loop which is reading all my JSON files in my folder and make a new JSON file with a list of the names of these files and some other chosen data (like the ['rawData']['adcDump']['0B'], etc)
Hope this helps understanding my problem better
I assume what you want to do is to take some data from several json files and compile those into a list and write that into a new json file.
In order to get the data from your current json file you'll need to add a "reference" to it in front of the indices (otherwise the code has no idea where it should that data from). Like so:
base_dir = 'mypath'
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
if splitTest1[file]['20mm PSL'] == to_find:
splitTest1['20mm PSL'].append({
'filename': os.path.basename(base_dir),
'triggerdata': json_data['rawData']['adcDump']['0B'],
'velocity': {
'imgVel': json_data['computedData']['particleProperties']['imgVelocity'],
'trigVel': json_data['computedData']['img0Properties']['coordinates']},
'coordinates': json_data['computedData']['img1Properties']['coordinates']})
print(len(splitTest1))
So basically what you need to do is to add "json_data" in front of the indices.
Also I suggest you to write the variable "json_path" and not "base_dir" into the 'filename' field.
I found the solution with help of the post from Mattu475
I had to add the reference in front of the indices and also change on how to open the files found in my folder with the following code;
with open (json_path) as f0:
json_data = json.load(f0)
instead of pd.read_json(...)
Here the full code:
def createJsonFile() :
splitTest1 = {}
splitTest1['20mm PSL'] = []
for file in os.listdir(base_dir):
# If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
print("filename: " ,file) # file is only the file name, the path not included
json_path = os.path.join(base_dir, file)
print("path : ", json_path)
with open (json_path) as f0:
json_data = json.load(f0)
splitTest1['20mm PSL'].append({
'filename': os.path.basename(json_path),
'triggerdata': json_data['rawData']['adcDump']['0B'],
#'imgVel': json_data['computedData']['particleProperties']['imgVelocity'],
'trigVel': json_data['computedData']['trigger']['trigVelocity'],
#'coordinatesImg0': json_data['computedData']['img0Properties']['coordinates'],
#'coordinatesImg1': json_data['computedData']['img1Properties']['coordinates']
})
return splitTest1
few lines (the ones commented out) do not function 100% yet, but the rest works.
Thank you for your help!
The issue is with this line
'imgVel': ['computedData']['particleProperties']['imgVelocity'],
And the two that come after that. What's happening there is you're creating a list with the string 'computedData' as the only element. And then trying to find the index 'particleProperties', which doesn't make sense. You can only index a list with integers. I can't really give you a "solution", but if you want imgVel to just be a list of those strings, then you would do
'imgVel': ['computedData', 'particularProperties', 'imgVelocity']
Your dict value isn't legal Python.
'triggerdata': ['rawData']['adcDump']['0B']
The value doesn't make any sense: you make a list of a single string, then you try to index it with another string. You asked for element "adcDump" of the list ['rawData'], and there isn't any such syntax.
You cannot store arbitrary source code (your partial expression) as if it were a data value.
If you want help to construct a particular reference, then please post a focused question. Please repeat how to ask from the intro tour.
I have the following txt... (I've saved as a dictionary)
"{'03/01/20': ['luiana','macarena']}\n"
"{'03/01/21': ['juana','roberta','mariana']}\n"
"{'03/01/24': ['pedro','jose','mario','luis']}\n"
"{'03/01/24': ['pedro','jose','mario','luis']}\n"
"{'03/01/22': ['emanuel']}\n"
the problem is that I want to open it as a dictionary, but I don't know how I can do it. I've tried with:
f = open ('usuarios.txt','r')
lines=f.readlines()
whip=eval(str(lines))
but it's not working... my idea is for example just take the dictionaries that have as a value the next day 03/01/24
if you want to to have only one dict with all the saved dictionaries you can use:
import ast
my_dict = {}
with open('your_file.txt', 'r') as fp:
for line in fp.readlines():
new_dict = ast.literal_eval(line)
for key, value in new_dict.items():
if key in my_dict:
my_dict[key].extend(value)
else:
my_dict[key] = value
print(my_dict)
output:
{'03/01/20': ['luiana', 'macarena'], '03/01/21': ['juana', 'roberta', 'mariana'], '03/01/24': ['pedro', 'jose', 'mario', 'luis', 'pedro', 'jose', 'mario', 'luis'], '03/01/22': ['emanuel']}
if yo could change the format you are saving the strings from
"{'03/01/20': ['luiana','macarena']}\n"
to
'{"03/01/20": ["luiana","macarena"]}\n'
Then you could just do the following:
import json
line = '{"03/01/20": ["luiana","macarena"]}\n'
d = json.loads('{"03/01/20": ["luiana","macarena"]}\n')
The result would be a dictionary d with dates as keys:
>>> {'03/01/20': ['luiana', 'macarena']}
Them, it would be just a mater of looping over the lines of your file and adding them to your dictionary.
An alternative approach would be to use pickle to save your dictionary instead of the .txt, them use it to load from the disk.
I have just started to learn Python and I have a task of converting a JSON to a CSV file as semicolon as the delimiter and with three constraints.
My JSON is:
{"_id": "5cfffc2dd866fc32fcfe9fcc",
"tuple5": ["system1/folder", "system3/folder"],
"tuple4": ["system1/folder/text3.txt", "system2/folder/text3.txt"],
"tuple3": ["system2/folder/text2.txt"],
"tuple2": ["system2/folder"],
"tuple1": ["system1/folder/text1.txt", "system2/folder/text1.txt"],
"tupleSize": 3}
The output CSV should be in a form:
system1 ; system2 ; system3
system1/folder ; ~ ; system3/folder
system1/folder/text3.txt ; system2/folder/text3.txt ; ~
~ ; system2/folder/text2.txt ; ~
~ ; system2/folder ; ~
system1/folder/text1.txt ; system2/folder/text1.txt ; ~
So the three constraints are that the tupleSize will indicate the number of rows, the first part of the array elements i.e., sys1, sys2 and sys3 will be the array elements and finally only those elements belonging to a particular system will have the values in the CSV file (rest is ~).
I found a few posts regarding the conversion in Python like this and this. None of them had any constraints any way related to these and I am unable to figure out how to approach this.
Can someone help?
EDIT: I should mention that the array elements are dynamic and thus the row headers may vary in the CSV file.
What you want to do is fairly substantial, so if it's just a Python learning exercise, I suggest you begin with more elementary tasks.
I also think you've got what most folks call rows and columns reversed — so be warned that everything below, including the code, is using them in the opposite sense to the way you used them in your question.
Anyway, the code below first preprocesses the data to determine what the columns or fieldnames of the CSV file are going to be and to make sure there are the right number of them as specified by the 'tupleSize' key.
Assuming that constraint is met, it then iterates through the data a second time and extracts the column/field values from each key value, putting them into a dictionary whose contents represents a row to be written to the output file — and then does that when finished.
Updated
Modified to remove all keys that start with "_id" in the JSON object dictionary.
import csv
import json
import re
SEP = '/' # Value sub-component separator.
id_regex = re.compile(r"_id\d*")
json_string = '''
{"_id1": "5cfffc2dd866fc32fcfe9fc1",
"_id2": "5cfffc2dd866fc32fcfe9fc2",
"_id3": "5cfffc2dd866fc32fcfe9fc3",
"tuple5": ["system1/folder", "system3/folder"],
"tuple4": ["system1/folder/text3.txt", "system2/folder/text3.txt"],
"tuple3": ["system2/folder/text2.txt"],
"tuple2": ["system2/folder"],
"tuple1": ["system1/folder/text1.txt", "system2/folder/text1.txt"],
"tupleSize": 3}
'''
data = json.loads(json_string) # Convert JSON string into a dictionary.
# Remove non-path items from dictionary.
tupleSize = data.pop('tupleSize')
_ids = {key: data.pop(key)
for key in tuple(data.keys()) if id_regex.search(key)}
#print(f'_ids: {_ids}')
max_columns = int(tupleSize) # Use to check a contraint.
# Determine how many columns are present and what they are.
columns = set()
for key in data:
paths = data[key]
if not paths:
raise RuntimeError('key with no paths')
for path in paths:
comps = path.split(SEP)
if len(comps) < 2:
raise RuntimeError('component with no subcomponents')
columns.add(comps[0])
if len(columns) > max_columns:
raise RuntimeError('too many columns - conversion aborted')
# Create CSV file.
with open('converted_json.csv', 'w', newline='') as file:
writer = csv.DictWriter(file, delimiter=';', restval='~',
fieldnames=sorted(columns))
writer.writeheader()
for key in data:
row = {}
for path in data[key]:
column, *_ = path.split(SEP, maxsplit=1)
row[column] = path
writer.writerow(row)
print('Conversion complete')
I have a text file in this format:
key:object,
key2:object2,
key3:object3
How can I convert this into a dictionary in Python for the following process?
Open it
Check if string s = any key in the dictionary
If it is, then string s = the object linked to the aforementioned key.
If not, nothing happens
File closes.
I've tried the following code for dividing them with commas, but the output was incorrect. It made the combination of key and object in the text file into a single key and single object, effectively duplicating it:
Code:
file = open("foo.txt","r")
dict = {}
for line in file:
x = line.split(",")
a = x[0]
b = x[0]
dict[a] = b
Incorrect output:
key:object, key:object
key2:object2, key2:object2
key3:object3, key3:object3
Thank you
m={}
for line in file:
x = line.replace(",","") # remove comma if present
y=x.split(':') #split key and value
m[y[0]] = y[1]
# -*- coding:utf-8 -*-
key_dict={"key":'',"key5":'',"key10":''}
File=open('/home/wangxinshuo/KeyAndObject','r')
List=File.readlines()
File.close()
key=[]
for i in range(0,len(List)):
for j in range(0,len(List[i])):
if(List[i][j]==':'):
if(List[i][0:j] in key_dict):
for final_num,final_result in enumerate(List[i][j:].split(',')):
if(final_result!='\n'):
key_dict["%s"%List[i][0:j]]=final_result
print(key_dict)
I am using your file in "/home/wangxinshuo/KeyAndObject"
You can convert the content of your file to a dictionary with some oneliner similar to the below one:
result = {k:v for k,v in [line.strip().replace(",","").split(":") for line in f if line.strip()]}
In case you want the dictionary values to be stripped, just add v.strip()
I have 2 python files, file1.py has only 1 dictionary and I would like to read & write to that dictionary from file2.py. Both files are in same directory.
I'm able to read from it using import file1 but how do I write to that file.
Snippet:
file1.py (nothing additional in file1, apart from following data)
dict1 = {
'a' : 1, # value is integer
'b' : '5xy', # value is string
'c' : '10xy',
'd' : '1xy',
'e' : 10,
}
file2.py
import file1
import json
print file1.dict1['a'] #this works fine
print file1.dict1['b']
# Now I want to update the value of a & b, something like this:
dict2 = json.loads(data)
file1.dict1['a'] = dict2.['some_int'] #int value
file1.dict1['b'] = dict2.['some_str'] #string value
The main reason why I'm using dictionary and not text file, is because the new values to be updated come from a json data and converting it to a dictionary is simpler saving me from string parsing each time I want to update the dict1.
Problem is, When I update the value from dict2, I want those value to be written to dict1 in file1
Also, the code runs on a Raspberry Pi and I've SSH into it using Ubuntu machine.
Can someone please help me how to do this?
EDIT:
file1.py could be saved in any other format like .json or .txt. It was just my assumption that saving data as a dictionary in separate file would allow easy update.
file1.py has to be a separate file, it is a configuration file so I don't want to merge it to my main file.
The data for dict2 mention above comes from socket connection at
dict2 = json.loads(data)
I want to update the *file1** with the data that comes from socket connection.
If you are attempting to print the dictionary back to the file, you could use something like...
outFile = open("file1.py","w")
outFile.writeline("dict1 = " % (str(dict2)))
outFile.close()
You might be better off having a json file, then loading the object from and writing the object value back to a file. You could them manipulate the json object in memory, and serialize it simply.
Z
I think you want to save the data from file1 into a separate .json file, then read the .json file in your second file. Here is what you can do:
file1.py
import json
dict1 = {
'a' : 1, # value is integer
'b' : '5xy', # value is string
'c' : '10xy',
'd' : '1xy',
'e' : 10,
}
with open("filepath.json", "w+") as f:
json.dump(dict1, f)
This will dump the dictionary dict1 into a json file which is stored at filepath.json.
Then, in your second file:
file2.py
import json
with open("pathname.json") as f:
dict1 = json.load(f)
# dict1 = {
'a' : 1, # value is integer
'b' : '5xy', # value is string
'c' : '10xy',
'd' : '1xy',
'e' : 10,
}
dict1['a'] = dict2['some_int'] #int value
dict1['b'] = dict2['some_str'] #string value
Note: This will not change the values in your first file. However, if you need to access the changed values, you can dump your data into another json file, then load that json file again whenever you need the data.
You should use the pickle library to save and load the dictionary https://wiki.python.org/moin/UsingPickle
Here is the basic usage of pickle
1 # Save a dictionary into a pickle file.
2 import pickle
3
4 favorite_color = { "lion": "yellow", "kitty": "red" }
5
6 pickle.dump( favorite_color, open( "save.p", "wb" ) )
1 # Load the dictionary back from the pickle file.
2 import pickle
3
4 favorite_color = pickle.load( open( "save.p", "rb" ) )
5 # favorite_color is now { "lion": "yellow", "kitty": "red" }
Finally as #Zaren suggested, I used a json file instead of dictionary in python file.
Here's what I did:
Modified file1.py to file1.json and store the data with appropriate formatting.
From file2.py, I opened file1.json when needed instead of import file1 and used json.dump & json.load on file1.json