I am new to Python so please bear with me. I am trying to figure out how to loop through a set of values in a YAML file. The file is parsed using PyYAML and then would need to be fed into the loop. Here is some YAML for example:
dohicky.yml
---
#Example file
dohicky:
"1":
Stuff:
- Data
- Data
Morestuff:
- Data
- Data
"2":
Stuff:
- Data
- Data
Morestuff:
- Data
- Data
"n":
- Etc
First, I am pulling the contents of the YAML out.
import yaml
f = open('dohicky.yml')
dohicky = yaml.safe_load(f)
f.close()
Now I just need either a for loop or a while loop to iterate through each numbered "id" under "dohicky".
for x in xrange(1, 2):
So obviously this would work, but is staticly defined for only 2 elements. Not sure how for example:
"do while" id = dohicky["dohicky"]["x"] is true. #Not code, just concept!
The other problem I am immediately running into is how to then create an object inside this loop. For example:
id(x) = dohicky(Pass other info from YAML to class) #Not code, just concept!
Unfortunately I am not familiar enough with Python yet (withPyYAML) to understand the syntax. Any help is MUCH appreciated!
* UPDATE *
This is kinda sudo code, but you should understand what I am trying to do at least.
import yaml
f = open('dohicky.yml')
dohicky = yaml.safe_load(f)
f.close()
for x in dohicky["dohicky"]["x"]
test = dohicky["dohicky"]["1"]["Stuff"]
print test
In this test, I am just printing the output of "Stuff", but in reality, I need to create an object using that data.
You can use the following code to iterate through each numbered "id" under "dohicky":
for dohicky_id in dohicky['dohicky']:
stuff = dohicky['dohicky'][dohicky_id]['Stuff']
In this case, the stuff is a list of Data dicts. If you can't, for whatever reasons, to work with the dictionary, and you want to convert the Data dicts into objects, you can do this by looking the following question: Convert Python dict to object?
Related
I am running a script which takes, say, an hour to generate the data I want. I want to be able to save all of the relevant variables to some external file so I can fiddle with them later without having to run the hour-long calculation over again. Is there an easy way I can save all of the variables I need into one convenient file?
In Matlab I would just contain all of the results of the calculation in a single structure so that later I could just load results.mat and I would have everything I need stored as results.output1, results.output2 or whatever. What is the Python equivalent of this?
In particular, the data that I would like to save includes arrays of complex numbers, which seems to present difficulties for using things like json.
I suggest taking look at built-in shelve module which provides persistent, dictionary-like object and generally does work with all native Python types so you can do:
Write complex to some file (in my example it is named mydata) under key n (keep in mind that keys should be strings).
import shelve
my_number = 2+7j
with shelve.open('mydata') as db:
db['n'] = my_number
Later retrieve that number from given file
import shelve
with shelve.open('mydata') as db:
my_number = db['n']
print(my_number) # (2+7j)
You can use pickle function in Python and then use the dump function to dump all your data into a file. You can reuse the data later.I suggest you find more about pickle.
I would recommend a json file. With json you can assign variables to keywords, just like dictionaries in stock python. The json package is automatically installed when installing python.
import json
dict = {var1: "abcde", var2: "fghij"}
with open(path, "w") as file:
json.dump(dict, file, indent=2, ensure_ascii = False)
You can also load this from a file using the same api:
with open(path, r) as file:
text = file.read()
dict = json.loads(text)
Edit: Json can also handle every datatype python can, so if you want to save an array you can just define that in the dict:
dict = {list1: ["ab", "cd", "ef"]}
Assuming I have a configuration txt file with this content:
{"Mode":"Classic","Encoding":"UTF-8","Colors":3,"Blue":80,"Red":90,"Green":160,"Shortcuts":[],"protocol":"2.1"}
How can i change a specific value like "Red":90 to "Red":110 in the file without changing its original format?
I have tried with configparser and configobj but as they are designed for .INI files I couldn't figure out how to make it work with this custom config file. I also tried splitting the lines searching for the keywords witch values I wanted to change but couldn't save the file the same way it was before. Any ideas how to solve this? (I'm very new in Python)
this looks like json so you could:
import json
obj = json.load(open("/path/to/jsonfile","r"))
obj["Blue"] = 10
json.dump(obj,open("/path/to/mynewfile","w"))
but be aware that a json dict does not have an order.
So the order of the elements is not guaranteed (and normally it's not needed) json lists have an order though.
Here's how you can do it:
import json
d = {} # store your data here
with open('config.txt','r') as f:
d = json.loads(f.readline())
d['Red']=14
d['Green']=15
d['Blue']=20
result = "{\"Mode\":\"%s\",\"Encoding\":\"%s\",\"Colors\":%s,\
\"Blue\":%s,\"Red\":%s,\"Green\":%s,\"Shortcuts\":%s,\
\"protocol\":\"%s\"}"%(d['Mode'],d['Encoding'],d['Colors'],
d['Blue'],d['Red'],d['Green'],
d['Shortcuts'],d['protocol'])
with open('config.txt','w') as f:
f.write(result)
f.close()
print result
The game is about tamaguchis, and I want the tamaguchi to remember its last size and it's 3 last actions the next time it plays. I also want the date to matter, like if you don't play with it for a week it shrinks in size. So first step I thought was to save down all the relevant data to a text file, and each time the game starts the code searchs through the text file and extracts the relevant data again! But I can't even get step 1 working :( I mean, I don't get why this doesn't work??
file = open("Tamaguchis.txt","w")
date = time.strftime("%c")
dictionary = {"size":tamaguchin.size,"date":date,"order":lista}
file.write(dictionary)
it says that it can't export dictonaries, only strings to a text file. But that's not correct is it, I thought you were supposed to be able to put dictionaries in text files? :o
If anyone also has an idea on how calculate the difference between the current date, and the date saved in the text file, that'd be much appriciated :)
Sorry if noob question, and thanks a lot!
If your dictionary consists only of simple python objects, you can use json module to serialize it and write into a file.
import json
with open("Tamaguchis.txt","w") as file:
date = time.strftime("%c")
dictionary = {"size":tamaguchin.size,"date":date,"order":lista}
file.write(json.dumps(dictionary))
The same can be then read with loads.
import json
with open("Tamaguchis.txt","r") as file:
dictionary = json.loads(file.read())
If your dictionary may contain more complex objects, you can either define json serializer for them, or use pickle module. Note, that the latter might make it possible to invoke arbitrary code if not used properly.
You need to convert the dict to a string:
file.write(str(dictionary))
... though you might want to use pickle, json or yaml for the task - reading back is easier/safer then.
Oh,, for date and time caculations you might want to check the timedelta module.
import pickle
a = {'a':1, 'b':2}
with open('temp.txt', 'w') as writer:
data = pickle.dumps(a)
writer.write(data)
with open('temp.txt', 'r') as reader:
data2= pickle.loads(reader.read())
print data2
print type(data2)
Output:
{'a': 1, 'b': 2}
<type 'dict'>
If you care efficiency, ujson or cPinkle is better.
I have some json files with 500MB.
If I use the "trivial" json.load() to load its content all at once, it will consume a lot of memory.
Is there a way to read partially the file? If it was a text, line delimited file, I would be able to iterate over the lines. I am looking for analogy to it.
There was a duplicate to this question that had a better answer. See https://stackoverflow.com/a/10382359/1623645, which suggests ijson.
Update:
I tried it out, and ijson is to JSON what SAX is to XML. For instance, you can do this:
import ijson
for prefix, the_type, value in ijson.parse(open(json_file_name)):
print prefix, the_type, value
where prefix is a dot-separated index in the JSON tree (what happens if your key names have dots in them? I guess that would be bad for Javascript, too...), theType describes a SAX-like event, one of 'null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array', and value is the value of the object or None if the_type is an event like starting/ending a map/array.
The project has some docstrings, but not enough global documentation. I had to dig into ijson/common.py to find what I was looking for.
So the problem is not that each file is too big, but that there are too many of them, and they seem to be adding up in memory. Python's garbage collector should be fine, unless you are keeping around references you don't need. It's hard to tell exactly what's happening without any further information, but some things you can try:
Modularize your code. Do something like:
for json_file in list_of_files:
process_file(json_file)
If you write process_file() in such a way that it doesn't rely on any global state, and doesn't
change any global state, the garbage collector should be able to do its job.
Deal with each file in a separate process. Instead of parsing all the JSON files at once, write a
program that parses just one, and pass each one in from a shell script, or from another python
process that calls your script via subprocess.Popen. This is a little less elegant, but if
nothing else works, it will ensure that you're not holding on to stale data from one file to the
next.
Hope this helps.
Yes.
You can use jsonstreamer SAX-like push parser that I have written which will allow you to parse arbitrary sized chunks, you can get it here and checkout the README for examples. Its fast because it uses the 'C' yajl library.
It can be done by using ijson. The working of ijson has been very well explained by Jim Pivarski in the answer above. The code below will read a file and print each json from the list. For example, file content is as below
[{"name": "rantidine", "drug": {"type": "tablet", "content_type": "solid"}},
{"name": "nicip", "drug": {"type": "capsule", "content_type": "solid"}}]
You can print every element of the array using the below method
def extract_json(filename):
with open(filename, 'rb') as input_file:
jsonobj = ijson.items(input_file, 'item')
jsons = (o for o in jsonobj)
for j in jsons:
print(j)
Note: 'item' is the default prefix given by ijson.
if you want to access only specific json's based on a condition you can do it in following way.
def extract_tabtype(filename):
with open(filename, 'rb') as input_file:
objects = ijson.items(input_file, 'item.drugs')
tabtype = (o for o in objects if o['type'] == 'tablet')
for prop in tabtype:
print(prop)
This will print only those json whose type is tablet.
On your mention of running out of memory I must question if you're actually managing memory. Are you using the "del" keyword to remove your old object before trying to read a new one? Python should never silently retain something in memory if you remove it.
Update
See the other answers for advice.
Original answer from 2010, now outdated
Short answer: no.
Properly dividing a json file would take intimate knowledge of the json object graph to get right.
However, if you have this knowledge, then you could implement a file-like object that wraps the json file and spits out proper chunks.
For instance, if you know that your json file is a single array of objects, you could create a generator that wraps the json file and returns chunks of the array.
You would have to do some string content parsing to get the chunking of the json file right.
I don't know what generates your json content. If possible, I would consider generating a number of managable files, instead of one huge file.
Another idea is to try load it into a document-store database like MongoDB.
It deals with large blobs of JSON well. Although you might run into the same problem loading the JSON - avoid the problem by loading the files one at a time.
If path works for you, then you can interact with the JSON data via their client and potentially not have to hold the entire blob in memory
http://www.mongodb.org/
"the garbage collector should free the memory"
Correct.
Since it doesn't, something else is wrong. Generally, the problem with infinite memory growth is global variables.
Remove all global variables.
Make all module-level code into smaller functions.
in addition to #codeape
I would try writing a custom json parser to help you figure out the structure of the JSON blob you are dealing with. Print out the key names only, etc. Make a hierarchical tree and decide (yourself) how you can chunk it. This way you can do what #codeape suggests - break the file up into smaller chunks, etc
You can parse the JSON file to CSV file and you can parse it line by line:
import ijson
import csv
def convert_json(self, file_path):
did_write_headers = False
headers = []
row = []
iterable_json = ijson.parse(open(file_path, 'r'))
with open(file_path + '.csv', 'w') as csv_file:
csv_writer = csv.writer(csv_file, ',', '"', csv.QUOTE_MINIMAL)
for prefix, event, value in iterable_json:
if event == 'end_map':
if not did_write_headers:
csv_writer.writerow(headers)
did_write_headers = True
csv_writer.writerow(row)
row = []
if event == 'map_key' and not did_write_headers:
headers.append(value)
if event == 'string':
row.append(value)
So simply using json.load() will take a lot of time. Instead, you can load the json data line by line using key and value pair into a dictionary and append that dictionary to the final dictionary and convert it to pandas DataFrame which will help you in further analysis
def get_data():
with open('Your_json_file_name', 'r') as f:
for line in f:
yield line
data = get_data()
data_dict = {}
each = {}
for line in data:
each = {}
# k and v are the key and value pair
for k, v in json.loads(line).items():
#print(f'{k}: {v}')
each[f'{k}'] = f'{v}'
data_dict[i] = each
Data = pd.DataFrame(data_dict)
#Data will give you the dictionary data in dataFrame (table format) but it will
#be in transposed form , so will then finally transpose the dataframe as ->
Data_1 = Data.T
I'm really new to Python, but I've picked a problem that actually pertains to work and I think as I figure out how to do it I'll learn along the way.
I have a directory full of JSON-formatted files. I've gotten as far as importing everything in the directory into a list, and iterating through the list to do a simple print that verifies I got the data.
I'm trying to figure out how to actually work with a given JSON object in Python. In javascript, its as simple as
var x = {'asd':'bob'}
alert( x.asd ) //alerts 'bob'
Accessing the various properties on an object is simple dot notation. What's the equivalent for Python?
So this is my code that is doing the import. I'd like to know how to work with the individual objects stored in the list.
#! /usr/local/bin/python2.6
import os, json
#define path to reports
reportspath = "reports/"
# Gets all json files and imports them
dir = os.listdir(reportspath)
jsonfiles = []
for fname in dir:
with open(reportspath + fname,'r') as f:
jsonfiles.append( json.load(f) )
for i in jsonfiles:
print i #prints the contents of each file stored in jsonfiles
What you get when you json.load a file containing the JSON form of a Javascript object such as {'abc': 'def'} is a Python dictionary (normally and affectionately called a dict) (which in this case happens to have the same textual representation as the Javascript object).
To access a specific item, you use indexing, mydict['abc'], while in Javascript you'd use attribute-access notation, myobj.abc. What you get with attribute-access notation in Python are methods that you can call on your dict, for example mydict.keys() would give ['abc'], a list with all the key values that are present in the dictionary (in this case, only one, and it's a string).
Dictionaries are extremely rich in functionality, with a wealth of methods that will make your head spin plus strong support for many Python language structures (for example, you can loop on a dict, for k in mydict:, and k will step through the dictionary's keys, iteratively and sequentially).
To access all properties, try eval() statement before append a list.
like:
import os
#define path to reports
reportspath = "reports/"
# Gets all json files and imports them
dir = os.listdir(reportspath)
for fname in dir:
json = eval(open(fname).read())
# now, json is a normal python object
print json
# list all properties...
print dir(json)