I have the following JSON structure given in a python script:
print("Producers: ", metadata['plist']['dict']['array'][2]['dict']['string'])
The Problem is that I don't have a single entry on that field, instead I have multiple ones.
Please also see the RAW JSON here: https://pastebin.com/rtTgmwvn
How can I pull out these entries as a comma separated string for [2] which is the producers field?
Thanks in advance
You're almost there:
you can do something like this
print("Producers: ", ", ".join(i["string"] for i in metadata['plist']['dict']['array'][2]['dict'])
to break down the solution... your "dict" element in the JSON is actually a list of "dict", and therefore you can simply iterate over this list:
metadata['plist']['dict']['array'][2]['dict']
where each element is an actual dict with a "string" key.
Update
The format of the JSON is so tahat in some cases it is a list, and in some cases it is a single element. In that case, I would suggest writing a small function or use an if statement that handles each situation:
def get_csv(element):
if isinstance(element, dict):
return element["string"]
return ", ".join(i["string"] for i in element)
# and you would use it like so:
print("Producers: ", get_csv(metadata['plist']['dict']['array'][2]['dict']))
The following should do the trick:
def get_producer_csv(data):
producers = []
dict = data["plist"]["dict"]["array"][2]["dict"]
for dict_entry in dict:
producers.append(dict_entry["string"])
return ",".join(producers)
For your example, it returns the following: "David Heyman,David Barron,Tim Lewis"
Related
i am trying to add some file operation capabilities to a example program, i am strugling with the reading from the file. here is the code that is modified.
def read(fn):
fileout=open(f"{fn}","a+")
fileout.seek(0,0)
s=fileout.readlines()
if s==[]:
print("the file specified does not appear to exists or is empty. if the file does not exist, it will be created")
else:
last=s[-1]
print(last)
print(type(last))
convert(last)
def find(last):
tup=last.partition(".")
fi=tup[0:1]
return fi[0]
def convert(last):
tup=last.partition(".")
part=tup[2:]
print(part)
part=part[0]
print(part)
part=part.split("\n")
print(part)
part=part[0]
print(part)
print(type(part))
#__main__
file(fn)
the write functionality writes in the form of
(fileindex number).[(planned campaign)][(conducted campaign)]
example:- some random data writen to the file by the program(first two number are dates)
0.['12hell']['12hh']
1.['12hell']['12hh']
2.['121341']['132324']
but i am strugling to write the read function, i don't understand how i could convert the data back.
with the current read function i get back
['121341']['132324']
as a string type, i have brainstormed many ideas but could not figureout how to convert string to list(they need to be 2 separate lists)
edit: the flaw as actually in the format that i was writing in, i added a , between the two lists and used eval as suggested in an answer, thanks
Insert a ',' inbetween the brackets, then use eval. This will return a tuple of lists.
strLists = "['121341']['132324']['abcdf']"
strLists = strLists.replace('][', '],[')
evalLists = eval(strLists)
for each in evalLists:
print(each)
Output:
['121341']
['132324']
['abcdf']
I have a strings in nested lists structure, can someone give me a tip on how to modify the strings in a for loop?
For example, I am trying to delete the last couple characters my string these values: /CLG-MAXFLOW
If I do
example = 'slipstream_internal/slipstream_hq/36/CLG-MAXFLOW'
print(example[0:36])
This is what I am looking for:
'slipstream_internal/slipstream_hq/36'
But how can I apply this to strings inside nested lists?
devices = [['slipstream_internal/slipstream_hq/36/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/38/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/31/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/21/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/29/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/25/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/9/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/6/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/13/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/14/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/30/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/19/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/8/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/26/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/24/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/34/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/11/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/27/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/20/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/23/CLG-MAXFLOW'],
['slipstream_internal/slipstream_hq/15/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/37/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/39/CLG-MAXFLOW',
'slipstream_internal/slipstream_hq/10/CLG-MAXFLOW']]
Not exactly the best solution but worth giving a try:
def check_about(lists:list):
for i,j in enumerate(lists):
if isinstance(j,list):
check_about(j)
else:
lists[i]=lists[i].strip('/CLG-MAXFLOW')
return lists
print(check_about(devices))
If your output must be the same structure given by devices variable, but with the string changed, then you can do this:
for row in devices:
for index, string in enumerate(row):
row[index] = '/'.join(string.split('/')[:-1])
Output:
[['slipstream_internal/slipstream_hq/36'],
['slipstream_internal/slipstream_hq/38',
'slipstream_internal/slipstream_hq/31'],
['slipstream_internal/slipstream_hq/21',
'slipstream_internal/slipstream_hq/29'],
['slipstream_internal/slipstream_hq/25',
'slipstream_internal/slipstream_hq/9',
'slipstream_internal/slipstream_hq/6'],
['slipstream_internal/slipstream_hq/13',
'slipstream_internal/slipstream_hq/14',
'slipstream_internal/slipstream_hq/30'],
['slipstream_internal/slipstream_hq/19',
'slipstream_internal/slipstream_hq/8',
'slipstream_internal/slipstream_hq/26',
'slipstream_internal/slipstream_hq/24'],
['slipstream_internal/slipstream_hq/34',
'slipstream_internal/slipstream_hq/11',
'slipstream_internal/slipstream_hq/27',
'slipstream_internal/slipstream_hq/20',
'slipstream_internal/slipstream_hq/23'],
['slipstream_internal/slipstream_hq/15',
'slipstream_internal/slipstream_hq/37',
'slipstream_internal/slipstream_hq/39',
'slipstream_internal/slipstream_hq/10']]
Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.
Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:
# Regular expressions module
import re
# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()
# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))
The result is good, and is as something like this:
stations = ['AAAA','BBBB','CCCC','DDDD']
# Initialize a new dictionary
rows = {}
# Loop over each station in the station list, and start populating
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
The result is good, and is something like this:
rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]
However, when I try to create a sub-dictionary with a string key:
for station in stations:
rows[station] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
I get the following error.
"TypeError: list indices must be integers, not str"
It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.
Any thoughts on how to get this working?
The issue is that by doing
rows[station] = re.findall(...)
You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by
rows[station]["dt"] = re.findall(...)
on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do
rows[station] = dict()
rows[station]["dt"] = re.findall(...)
To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.
The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.
In your case you could do
from collections import defaultdict
rows = defaultdict(dict)
for station in stations:
rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)
Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.
Trying to display data in a structure that is a dictionary contained in a defaultdict
dictionary=defaultdic(dict)
Example
defaultdict = {key1} :
{subkey1}:
(val1,
val2,
val3)
{subkey2}:
(val4,
val5,
val6)
{key2} :
{subkey3}:
(val7,
val8,
val9),
{subkey4}:
(val10,
val11,
val12)
I tried to do
for key in dictionary.iterkeys():
print key # This will return me the key
for items in dictionary[key]:
print items # This will return me the subkey
for values in dictionary[key][items]:
print values #this return the values for each subkey)
The problem is that I just get printed out a flat list of items; which is almost impossible to follow when you have too many items and keys.
How do you properly print such complex structures, to present them in a way that does not make you rip your eyes out? I tried with pprint and json.dumps but neither was making the situation better. Ideally I would like to have it printed as in my example, but I can't see a simple way to do so, without going trough complex string manipulation to format the print output.
Python has the PrettyPrint module just for this purpose. Note that defaultdicts won't print nicely, but if you convert back to a regular dict first it'll do fine.
from pprint import pprint
pprint(dict(dictionary))
Use indentation to print them out so your eye can visually see the structure.
for key in dictionary.iterkeys():
print key # This will return me the key
for items in dictionary[key]:
print(" %s" % items) # This will return me the subkey
for values in dictionary[key][items]:
print(" %s" % values) #this return the values for each subkey)
You can also substitute the space characters for \t to use a tab character; both will work fine. You may also have to use repr(values) or str(values) to explicitly get a string representation of them, if Python complains about the objects not being able to be formatted as a String.
I have a dictionary looks like this, the DNA is the keys and quality value is value:
{'TTTGTTCTTTTTGTAATGGGGCCAGATGTCACTCATTCCACATGTAGTATCCAGATTGAAATGAAATGAGGTAGAACTGACCCAGGCTGGACAAGGAAGG\n':
'eeeecdddddaaa`]eceeeddY\\cQ]V[F\\\\TZT_b^[^]Z_Z]ac_ccd^\\dcbc\\TaYcbTTZSb]Y]X_bZ\\a^^\\S[T\\aaacccBBBBBBBBBB\n',
'ACTTATATTATGTTGACACTCAAAAATTTCAGAATTTGGAGTATTTTGAATTTCAGATTTTCTGATTAGGGATGTACCTGTACTTTTTTTTTTTTTTTTT\n':
'dddddd\\cdddcdddcYdddd`d`dcd^dccdT`cddddddd^dddddddddd^ddadddadcd\\cda`Y`Y`b`````adcddd`ddd_dddadW`db_\n',
'CTGCCAGCACGCTGTCACCTCTCAATAACAGTGAGTGTAATGGCCATACTCTTGATTTGGTTTTTGCCTTATGAATCAGTGGCTAAAAATATTATTTAAT\n':
'deeee`bbcddddad\\bbbbeee\\ecYZcc^dd^ddd\\\\`]``L`ccabaVJ`MZ^aaYMbbb__PYWY]RWNUUab`Y`BBBBBBBBBBBBBBBBBBBB\n'}
I want to write a function so that if I query a DNA sequence, it returns a tuple of this DNA sequence and its corresponding quality value
I wrote the following function, but it gives me an error message that says list indices must be integers, not str
def query_sequence_id(self, dna_seq=''):
"""Overrides the query_sequence_id so that it optionally returns both the sequence and the quality values.
If DNA sequence does not exist in the class, return a string error message"""
list_dna = []
for t in self.__fastqdict.keys():
list_dna.append(t.rstrip('\n'))
self.dna_seq = dna_seq
if self.dna_seq in list_dna:
return (self.dna_seq,self.__fastqdict.values()[self.dna_seq + "\n"])
else:
return "This DNA sequence does not exist"
so I want something like if I print
query_sequence_id("TTTGTTCTTTTTGTAATGGGGCCAGATGTCACTCATTCCACATGTAGTATCCAGATTGAAATGAAATGAGGTAGAACTGACCCAGGCTGGACAAGGAAGG"),
I would get
('TTTGTTCTTTTTGTAATGGGGCCAGATGTCACTCATTCCACATGTAGTATCCAGATTGAAATGAAATGAGGTAGAACTGACCCAGGCTGGACAAGGAAGG',
'eeeecdddddaaa`]eceeeddY\\cQ]V[F\\\\TZT_b^[^]Z_Z]ac_ccd^\\dcbc\\TaYcbTTZSb]Y]X_bZ\\a^^\\S[T\\aaacccBBBBBBBBBB')
I want to get rid of "\n" for both keys and values, but my code failed. Can anyone help me fix my code?
The newline characters aren't your problem, though they are messy. You're trying to index the view returned by dict.values() based on the string. That's not only not what you want, but it also defeats the whole purpose of using the dictionary in the first place. Views are iterables, not mappings like dicts are. Just look up the value in the dictionary, the normal way:
return (self.dna_seq, self.__fastqdict[self.dna_seq + "\n"])
As for the newlines, why not just take them out when you build the dictionary in the first place?
To modify the dictionary you can just do the following:
myNewDict = {}
for var in myDict:
myNewDict[var.strip()] = myDict[var].strip()
You can remove those pesky newlines from your dictionary's keys and values like this (assuming your dictionary was stored in a variable nameddna):
dna = {k.rstrip(): v.rstrip() for k, v in dna.iteritems()}