I have a python script that looks at a json file and lists variables to a CSV. The problem I am having is latitude and logitude are listed twice. Therefore, when I write the row, it looks at those variables and creates an output with duplicate values.
import csv, json, sys
def find_deep_value(d, key):
# Modified from https://stackoverflow.com/questions/48568649/convert-json-to-csv-using-python/48569129#48569129
if key in d:
return d[key]
for k in d.keys():
if isinstance(d[k], dict):
for j in find_deep_value(d[k], key):
return j
inputFile = open("pywu.cache.json", 'r') # open json file
outputFile = open("CurrentObs.csv", 'w') # load csv file
data = json.load(inputFile) # load json content
inputFile.close() # close the input file
output = csv.writer(outputFile) # create a csv.write
# Gives you latitude coordinates from within the json
lat = find_deep_value(data, "latitude")
# Gives you longitude coordinates from within the json
lon = find_deep_value(data, "longitude")
# Gives you a list of weather from within the json
weather = find_deep_value(data, "weather")
# Gives you a list of temperature_strings from within the json
temp = find_deep_value(data, "temperature_string")
output.writerow([lat, lon, weather, temp])
outputFile.close()
Is there a way to only list them once?
You need to use return rather than yield. Yield is for generators. Once you fix that, you'll also need to change
list(find_deep_value(data, "latitude"))
to
find_deep_value(data, "latitude")
for each of those lines. And finally, change
output.writerow(lat + lon + weather + temp)
to
output.writerow([lat, lon, weather, temp])
What's happening (you might want to read up on generators first) is when a key is not in the top-level dictionary, you start looping through them, and when the first 'latitude' is reached, the yield keyword returns a generator object. You have that generator wrapped in list() which immediately unpacks the entire generator into a list. So if you have more than one sub-dictionary with the given key in it, you're going to end up looking through and finding every single one.
Related
X values are being taken from a Y.JSON file. The Y file could change in values depending. I want the X file to save all values without overwriting the previous saved value.
# Initialize new dictionary/JSON for X
X = dict ()
with open('X.json', 'w') as f:
f.write(json.dumps(X))
If these cycles do not represent any value/meaning which you'd like to include in the filename, you could try to time encode the filenames. In this way, you end up with file names that include the time the files were saved at.
from datetime import datetime
now = datetime.now()
time = now.strftime("%y%m%d%M%S") # choose any format you like
filename = time+'_X.json'
X = dict ()
with open(filename, 'w') as f:
f.write(json.dumps(X))
For example, if files are being created every 4 seconds gives the following filenames:
2106265848_X.json
2106265852_X.json
2106265856_X.json
2106265900_X.json
2106265904_X.json
2106265908_X.json
2106265912_X.json
2106265916_X.json
2106265920_X.json
However, if the cycles (or whatever experiment you are doing) do matter I would strongly recommend to include it in the filed name.
e.g.
filename = f"{time}_X_c{cycle}.json"
To end up with something like this as results:
2106260547_X_c0.json
2106260551_X_c1.json
2106260555_X_c2.json
2106260559_X_c3.json
2106260603_X_c4.json
I'm working on cs50's pset6, DNA, and I want to read a csv file that looks like this:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
But the problem is that dictionaries only have a key, and a value, so I don't know how I could structure this. What I currently have is this piece of code:
import sys
with open(argv[1]) as data_file:
data_reader = csv.DictReader(data_file)
And also, my csv file has multiple columns and rows, with a header and the first column indicating the name of the person. I don't know how to do this, and I will later need to access the individual amount of say, Alice's value of AATG.
Also, I'm using the module sys, to import DictReader and also reader
You can always try to create the function on your own.
You can use my code here:
def csv_to_dict(csv_file):
key_list = [key for key in csv_file[:csv_file.index('\n')].split(',')] # save the keys
data = {} # every dictionary
info = [] # list of dicitionaries
# for each line
for line in csv_file[csv_file.index('\n') + 1:].split('\n'):
count = 0 # this variable saves the key index in my key_list.
# for each string before comma
for value in line.split(','):
data[key_list[count]] = value # for each key in key_list (which I've created before), I put the value. This is the way to set a dictionary values.
count += 1
info.append(data) # after updating my data (dictionary), I append it to my list.
data = {} # I set the data dictionary to empty dictionary.
print(info) # I print it.
### Be aware that this function prints a list of dictionaries.
In this function I'm reading from a .txt file, and store the values in a dictionary. I want to be able to pass this dictionary to another function, to do further calculations and sorting.
I can manage to print all rows from the .txt file, but that's it.
Return breaks the loop and only gives the first row.
Global variables and nested functions are bad form.
Have tried to use yield (for the first time), but that only prints "generator object get_all_client_id at 0x03369A20"
file_with_client_info = open("C:\\Users\\clients.txt", "r")
def get_all_client_id():
client_details = {}
for line in file_with_client_info:
element = line.split(",")
while element:
client_details['client_id'] = element[0]
client_details['coordinates'] = {}
client_details['coordinates']['lat'] = element[1]
client_details['coordinates']['long'] = element[2]
break
print(client_details)
There are a few errors in you code.
Use a return statement to output the dictionary.
The while-loop does not loop as you are breaking on the first iteration. Use an if-statement to check if the line is empty instead.
The last entries in the client_details dict are overwritten on each iteration. Create a new entry instead, probably using the client_id as key.
It is recommended you use a with context manager to open your file.
It is preferable to provide the name of your file to your function and let it open it instead of having a globally opened file.
Here is a fixed version of your code.
def get_all_client_id(file):
client_details = {}
with open(file, 'r') as f:
for line in f:
element = line.strip().split(',')
if element:
client_id, lat, long, *more = element
client_details[client_id] = {'lat': lat, 'long': long}
return client_details
clients_dict = get_all_client_id("C:\\Users\\clients.txt")
I am using the following sets of generators to parse XML in to CSV:
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import csv
def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, eprefix)
# treat like list
elif element[0].tag == element[1].tag:
yield from flatten_list(element, eprefix)
elif element.text:
text = element.text.strip()
if text:
yield eprefix[:].rstrip('.'), element.text
def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = element.tag
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract
# the text
else:
yield eprefix, element.text
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend(' ' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def main():
with open('2-Response_duplicate.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('', '') #optional to remove ampersands.
root = ElementTree.XML(xml_string)
# for key, value in flatten_dict(root):
# key = key.rstrip('.').rsplit('.', 1)[-1]
# print(key,value)
writer = csv.writer(open("try5.csv", 'wt'))
writer.writerows(makerows(flatten_dict(root)))
if __name__ == "__main__":
main()
One column of the CSV, when opened in Excel, looks like this:
ObjectGuid
2adeb916-cc43-4d73-8c90-579dd4aa050a
2e77c588-56e5-4f3f-b990-548b89c09acb
c8743bdd-04a6-4635-aedd-684a153f02f0
1cdc3d86-f9f4-4a22-81e1-2ecc20f5e558
2c19d69b-26d3-4df0-8df4-8e293201656f
6d235c85-6a3e-4cb3-9a28-9c37355c02db
c34e05de-0b0c-44ee-8572-c8efaea4a5ee
9b0fe8f5-8ec4-4f13-b797-961036f92f19
1d43d35f-61ef-4df2-bbd9-30bf014f7e10
9cb132e8-bc69-4e4f-8f29-c1f503b50018
24fd77da-030c-4cb7-94f7-040b165191ce
0a949d4f-4f4c-467e-b0a0-40c16fc95a79
801d3091-c28e-44d2-b9bd-3bad99b32547
7f355633-426d-464b-bab9-6a294e95c5d5
This is due to the fact that there are 14 tags with name ObjectGuid. For example, one of these tags looks like this:
<ObjectGuid>2adeb916-cc43-4d73-8c90-579dd4aa050a</ObjectGuid>
My question: is there an efficient method to enumerate the headers (the keys) such that each key is enumerated like so with it's corresponding value (text in the XML data structure):
It would be displayed in Excel as follows:
ObjectGuid_1 ObjectGuid_2 ObejectGuid3 etc.....
Please let me know if there is any other information that you need from me (such as sample XML). Thank you for your help.
It is a mistake to add an element, attribute, or annotative descriptor to the data set itself for the purpose of identity… Normalizing the data should only be done if you own that data and know with 100% guarantee that doing so will not
have any negative effect on additional consumers (ones relying on attribute order to manipulate the DOM). However what is the point of using a dict or nested dicts (which I don’t quite get either t) if the efficiency of the hashed table lookup is taken right back by making 0(n) checks for this attribute new attribute? The point of this hashing is random look up..
If it’s simply the structured in (key, value) pair, which makes sense here.. Why not just use some other contiguous data structure, but treat it like a dictionary.. Say a named tuple…
A second solution is if you want to add additional state is to throw your generator in a class.
class order:
def__init__(self, lines, order):
self.lines = lines
self.order - python(max)
def __iter__(self):
for l, line in enumerate(self.lines, 1);
self.order.append( l, line))
yield line
when open (some file.csv) as f:
lines = oder( f);
Messing with the data a Harmless conversion? For example if were to create a conversion dictionary (see below)
Well that’s fine, that is until one of the values is blank…
types = [ (‘x ’, float’),
(‘y’, float’)
with open(‘some.csv’) as f:
for row in cvs.DictReader(f):
row.update((key, conversion (row [ key]))
for key, conversion in field_types)
[x: ,y: 2. 2] — > that is until there is an empty data point.. Kaboom.
So My suggestion would not be to change or add to the data, but change the algorithm in which deal with such.. If the problem is order why not simply treat say a tuple as a named tuple similar to a dictionary, the caveat being mutability however makes sense with data uniformity...
*I don’t understand the nested dictionary…That is for the y header values yes?
values and order key —> key — > ( key: value ) ? or you could just skip the
first row :p..
So just skip the first row..
for line in {header: list, header: list }
line.next() # problem solved.. or print(line , end = ‘’)
*** Notables
-To iterator over multiple sequences in parallel
h = [a,b,c]
x = [1,2,3]
for i in zip(a,b):
print(i)
(a, 1)
(b, 2)
-Chaining
a = [1,2 , 3]
b= [a, b , c ]enter code here
for x in chain(a, b):
//remove white space
i have a document containing a dict of results, what i want to do is loop through the document and save each result that is right
this is my current code which works fine but will only the return the first result
#Fetch router descriptors based on a given flag
def getHSDirFlag():
for r in router.itervalues():
if 'HSDir' in r['flags']:
return r
return None
i have tried :
def getHSDirFlag():
HSDirList =()
for r in router.itervalues():
if 'HSDir' in r['flags']:
HSDirList += r
return HSDirList
return None
but get the error TypeError: can only concatenate tuple (not "dict") to tuple
what is the best data type to save a dict to and how can i loop through the doc finding every result
First, why would you call a variable HSDirList and make it a tuple, not a list?!
Second, why return the "list" inside the for loop, then tack a return None (which will never be reached) to the end of the function?
Try:
def getHSDirFlag(router):
HSDirList = [] # an actual list
for r in router.itervalues():
if 'HSDir' in r['flags']:
HSDirList.append(r) # add to the list
return HSDirList # return the list
Note that the return is outside the for loop, so doesn't happen until you've iterated over all itervalues. Also, router is now an argument to the function, rather than relying on scope.
Finally, you should read and consider implementing the Python style guide, PEP-0008.
you can save your dictionaries in JSON file ! in this code you have a tuple and you want to concatenate dictionaries on it but i suggest you use JSON for saving dicts !
this code is for saving a json file :
import json
with open('data.json', 'wb') as fp:
json.dump(data, fp)
and this one for load
with open('data.json', 'rb') as fp:
data = json.load(fp)
read more at https://docs.python.org/2/library/json.html