I have a list of dictionary:: Sample data:: Like this I have n number of data.
datas = [{"_id":"1234as", "Total students":"123,321", "TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"}]
I tried
for data in datas:
for i in data.values():
re.sub('[^A-Za-z0-9]+', '', i)
datas.append(i)
I just want to remove comma(,) from TotalStudents and TotalPresent and replace the value in datas.
Edit 1
In my list of dictionary I also have value as::
datas = [{"_id":"1234as","Totalstudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
Here, in key TotalStudents value is "NA" and TotalPresent is "". Is there a way to replace whereever "NA" or "" appears replace with "0".
If you want to replace the values of specific keys, make sure that the keys are the same because the first dict in your example has Total Students but the second has TotalStudents.
Try this:
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "Total Students": "343,431", "TotalPresent": "541,656"}]
for d in datas:
d["Total Students"] = d["Total Students"].replace(",", "")
d["TotalPresent"] = d["TotalPresent"].replace(",", "")
print(datas)
# output: [{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'}, {'_id': '1234asas', 'Total Students': '343431', 'TotalPresent': '541656'}]
If you want to replace commas from all the keys, you can try (but bare in mind that in this case, all the values of your dict must be strings):
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "Total Students": "343,431", "TotalPresent": "541,656"}]
for d in datas:
for k in d:
d[k] = d[k].replace(",", "")
You can iterate over the key,value pairs in the dictionaries. And after removing the comma replace the value for that key.
import re
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "TotalStudents": "343,431", "TotalPresent": "541,656"}]
for data in datas:
for key, value in data.items():
print(key, value)
value = re.sub('[^A-Za-z0-9]+', '', value)
data[key] = value
print(datas)
Result
_id 1234as
Total Students 123,321
TotalPresent 321,345
_id 1234asas
TotalStudents 343,431
TotalPresent 541,656
[{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'},
{'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}]
This is a way to make your code working, and thus always replacing all values. If necessary you need to add your own checks to make it smarter.
EDIT
To catch the "NA" and "" values I have added some if statements. It's simple and stays close to your own code.
import re
datas = [{"_id":"1234as","TotalStudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
for data in datas:
print(data)
for key, value in data.items():
if key == "TotalStudents":
if value == "NA":
value = "0"
else:
value = re.sub('[^A-Za-z0-9]+', '', value)
elif key == "TotalPresent":
if not value:
value = "0"
else:
value = re.sub('[^A-Za-z0-9]+', '', value)
data[key] = value
print()
for data in datas:
print(data)
Result
{'_id': '1234as', 'TotalStudents': '123321', 'TotalPresent': '321345'}
{'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}
{'_id': '9934 asas', 'TotalStudents': '0', 'TotalPresent': '0'}
To make the code more efficient you can place the new values directly in data. In this case you don't replace the "_id" anymore with it's own value.
import re
datas = [{"_id":"1234as","TotalStudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
for data in datas:
print(data)
for key, value in data.items():
if key == "TotalStudents":
if value == "NA":
data[key] = "0"
else:
data[key] = re.sub('[^A-Za-z0-9]+', '', value)
elif key == "TotalPresent":
if not value:
data[key] = "0"
else:
data[key] = re.sub('[^A-Za-z0-9]+', '', value)
print()
for data in datas:
print(data)
re.sub does not work in-place - it does return altered str. More generally as strs are immutable functions processing them are not working in-place. Solution using re.sub might looks following way:
import re
datas = [{"_id":"1234as","Total Students":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"}]
cleandatas = []
for data in datas:
cleandatas.append({k:re.sub('[^A-Za-z0-9]+', '', v) for k,v in data.items()})
print(cleandatas)
Output:
[{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'}, {'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}]
I used dict-comprehension to create new cleaned dicts
Related
I have ['key','value'] format list, which also contain sub-list. How can I convert nested list to JSON format in python
[[' key ', ' 1542633482511430199'
],
['value=>>>BasicData',
[['isConfirmAndOrder', '0'],['brmRequestId', 'BR-2018-0000124'],
['requestType','batch'],['projectId', 'PRJ-2018-0000477'],
['createdOn', 'Mon Nov 19 18:48:02 IST 2018']]
],
['createdBy=>>>BasicData',
[['userId', '999996279'], ['email', 'ITEST275#ITS.JNJ.com'],
['firstName', 'Iris'], ['lastName', 'TEST275'],
['ntId', 'itest275'], ['region', 'NA'],
[' LastAccessTime ', ' 1542639905785 ']]
]
]
Excepted format is
{
"key": "1542633482511430199",
"value=>>>BasicData": {
"isConfirmAndOrder": "0",
"brmRequestId": "BR-2018-0000124"
.
},
"createdBy=>>>BasicData": {
"userId": "999996279",
"email": "ITEST275#ITS.JNJ.com"
.
}
.
}
Actually format of large data is:
[
[
['key11','value11']
['key12',['key13','value13']]
['key14',['key15','value15']]
]
[
['key21','value21']
['key22',['key23','value23']]
['key24',['key25','value25']]
]
]
You can write a simple recursive function for this:
def to_dict_recursive(x):
d = {}
for key, value in x:
if isinstance(value, list):
value = to_dict_recursive(value)
else:
value = value.strip() # get rid of unnecessary whitespace
d[key.strip()] = value
return d
to_dict_recursive(x)
# {'createdBy=>>>BasicData': {'displayName': 'Iris TEST275',
# 'email': 'ITEST275#ITS.JNJ.com',
# 'firstName': 'Iris',
# 'lastName': 'TEST275',
# 'ntId': 'itest275',
# 'region': 'NA',
# 'roles': '[0]CG510_DHF_AP_Role',
# 'userId': '999996279'},
# 'formulaDetails=>>>BasicData': {'CreationTime': '1542633482512',
# 'LastAccessTime': '1542639905785',
# 'batchSizeUnits': 'kg<<<<<<',
# 'hitCount': '1',
# 'version': '1'},
# 'key': '1542633482511430199',
# 'value=>>>BasicData': {'brmRequestId': 'BR-2018-0000124',
# 'createdMonth': 'Nov',
# 'createdOn': 'Mon Nov 19 18:48:02 IST 2018',
# 'department': 'Global Packaging',
# 'gxp': '1',
# 'id': '1542633482511430199',
# 'isConfirmAndOrder': '0',
# 'isFilling': 'false',
# 'projectId': 'PRJ-2018-0000477',
# 'projectName': 'Automation_Product_By_Admin',
# 'requestType': 'batch',
# 'status': 'New',
# 'statusDescription': 'Batch request created',
# 'updatedOn': 'Mon Nov 19 18:48:02 IST 2018'}}
(I ran this in Python 3.6 so the order of the keys in the dictionary representation is different than insertion order. In Python 3.7+ this would be different.)
You can even make this into a dict comprehension:
def to_dict_recursive(x):
return {key.strip(): to_dict_recursive(value) if isinstance(value, list)
else value.strip
for key, value in x}
Since apparently some elements in your object are not a two-element list of key and value, you can add a simple guard against that:
def to_dict_recursive(x):
d = {}
try:
for key, value in x:
if isinstance(value, list):
value = to_dict_recursive(value)
else:
value = value.strip()
d[key.strip()] = value
except ValueError:
return x
return d
x = [[' key ', ' 1542633482511430199'],
["test", ["a", "b", "c"]]
]
to_dict_recursive(x)
# {'key': '1542633482511430199', 'test': ['a', 'b', 'c']}
Note that if mylist is a key-value pair list, then dict(mylist) simply returns a dictionary version of it. The tricky part is traversing deep into those nested lists to replace them with dictionaries. Here's a recursive function that does that:
# Where <kv> is your giant list-of-lists.
def kv_to_dict(kv):
if isinstance(kv, list):
kv = dict(kv)
for k in kv:
if isinstance(kv[k], list):
kv[k] = kv_to_dict(kv[k])
return kv
newdict = kv_to_dict(kvpairs)
Once you have things converted to a dictionary, you can just use json.dumps() to format it as JSON:
import json
as_json = json.dumps(newdict, indent=4)
print(as_json)
I see though that you've tried something similar and got an error. Are you sure that all of the lists in your data are really key-value pairs, and not for example a list of 3 strings?
I'm having some hard time filtering multiple json datas, I need to know the type of each data and if the type corresponds to a fruit then print the element's fields key, see python example comments for a better explanation.
Here's what the JSON looks like :
#json.items()
{
'type': 'apple',
'fields': {
'protein': '18g',
'glucide': '3%',
}
},
{
'type': 'banana',
'fields': {
'protein': '22g',
'glucide': '8%',
}
},
Here's what I tried to do :
for key, value in json.items(): #access json dict.
if key == 'type': #access 'type' key
if value == 'apple': #check the fruit
if key == 'fields': #ERROR !!! Now I need to access the 'fields' key datas of this same fruit. !!!
print('What a good fruit, be careful on quantity!')
print('more :' + value['protein'] + ', ' + value['glucid'])
if value == 'banana': #if not apple check for bananas
print('One banana each two days keeps you healthy !')
print('more:' + value['protein'] + ', ' + value['glucid'])
Is there a way I can achieve this ?
What you have seems to be a list of dicts.
You then check if keys type and fields exist in the dictionary before checking their value, like this:
for d in data: # d is a dict
if 'type' in d and 'fields' in d:
if d['type'] == 'apple':
... # some print statements
elif d['type'] == 'banana':
... # some more print statements
Based on your representation of the JSON, it appears that is actually a list, not a dictionary. So in order to iterate through it, you could try something like this:
for item in json:
fields = item['fields']
if item['type'] == 'banana':
print('Bananas have {} of protein and {} glucide'.format(fields['protein'], fields['glucide']))
elif item['type'] == 'apple':
print('Apples have {} of protein and {} glucide'.format(fields['protein'], fields['glucide']))
Given the following data received from a web form:
for key in request.form.keys():
print key, request.form.getlist(key)
group_name [u'myGroup']
category [u'social group']
creation_date [u'03/07/2013']
notes [u'Here are some notes about the group']
members[0][name] [u'Adam']
members[0][location] [u'London']
members[0][dob] [u'01/01/1981']
members[1][name] [u'Bruce']
members[1][location] [u'Cardiff']
members[1][dob] [u'02/02/1982']
How can I turn it into a dictionary like this? It's eventually going to be used as JSON but as JSON and dictionaries are easily interchanged my goal is just to get to the following structure.
event = {
group_name : 'myGroup',
notes : 'Here are some notes about the group,
category : 'social group',
creation_date : '03/07/2013',
members : [
{
name : 'Adam',
location : 'London',
dob : '01/01/1981'
}
{
name : 'Bruce',
location : 'Cardiff',
dob : '02/02/1982'
}
]
}
Here's what I have managed so far. Using the following list comprehension I can easily make sense of the ordinary fields:
event = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] != "catches" ]
but I'm struggling with the members list. There can be any number of members. I think I need to separately create a list for them and add that to a dictionary with the non-iterative records. I can get the member data like this:
tmp_members = [(key, request.form.getlist(key)) for key in request.form.keys() if key[0:7]=="members"]
Then I can pull out the list index and field name:
member_arr = []
members_orig = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] ==
"members" ]
for i in members_orig:
p1 = i[0].index('[')
p2 = i[0].index(']')
members_index = i[0][p1+1:p2]
p1 = i[0].rfind('[')
members_field = i[0][p1+1:-1]
But how do I add this to my data structure. The following won't work because I could be trying to process members[1][name] before members[0][name].
members_arr[int(members_index)] = {members_field : i[1]}
This seems very convoluted. Is there a simper way of doing this, and if not how can I get this working?
You could store the data in a dictionary and then use the json library.
import json
json_data = json.dumps(dict)
print(json_data)
This will print a json string.
Check out the json library here
Yes, convert it to a dictionary, then use json.dumps(), with some optional parameters, to print out the JSON in the format you need:
eventdict = {
'group_name': 'myGroup',
'notes': 'Here are some notes about the group',
'category': 'social group',
'creation_date': '03/07/2013',
'members': [
{'name': 'Adam',
'location': 'London',
'dob': '01/01/1981'},
{'name': 'Bruce',
'location': 'Cardiff',
'dob': '02/02/1982'}
]
}
import json
print json.dumps(eventdict, indent=4)
The order of the key:value pairs is not always consistent, but if you're just looking for pretty-looking JSON that can be parsed by a script, while remaining human-readable, this should work. You can also sort the keys alphabetically, using:
print json.dumps(eventdict, indent=4, sort_keys=True)
The following python functions can be used to create a nested dictionary from the flat dictionary. Just pass in the html form output to decode().
def get_key_name(str):
first_pos = str.find('[')
return str[:first_pos]
def get_subkey_name(str):
'''Used with lists of dictionaries only'''
first_pos = str.rfind('[')
last_pos = str.rfind(']')
return str[first_pos:last_pos+1]
def get_key_index(str):
first_pos = str.find('[')
last_pos = str.find(']')
return str[first_pos:last_pos+1]
def decode(idic):
odic = {} # Initialise an empty dictionary
# Scan all the top level keys
for key in idic:
# Nested entries have [] in their key
if '[' in key and ']' in key:
if key.rfind('[') == key.find('[') and key.rfind(']') == key.find(']'):
print key, 'is a nested list'
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
# Append can't be used because we may not get the list in the correct order.
try:
odic[key_name][key_index] = idic[key][0]
except KeyError: # List doesn't yet exist
odic[key_name] = [None] * (key_index + 1)
odic[key_name][key_index] = idic[key][0]
except IndexError: # List is too short
odic[key_name] = odic[key_name] + ([None] * (key_index - len(odic[key_name]) + 1 ))
# TO DO: This could be a function
odic[key_name][key_index] = idic[key][0]
else:
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
subkey_name = get_subkey_name(key).replace('[','',1).replace(']','',1)
try:
odic[key_name][key_index][subkey_name] = idic[key][0]
except KeyError: # Dictionary doesn't yet exist
print "KeyError"
# The dictionaries must not be bound to the same object
odic[key_name] = [{} for _ in range(key_index+1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
except IndexError: # List is too short
# The dictionaries must not be bound to the same object
odic[key_name] = odic[key_name] + [{} for _ in range(key_index - len(odic[key_name]) + 1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
else:
# This can be added to the output dictionary directly
print key, 'is a simple key value pair'
odic[key] = idic[key][0]
return odic
I'm wondering if anyone has a sort of hacky / cool solution to this problem . I have a text file like so:
NAME:name
ID:id
PERSON:person
LOCATION:location
NAME:name
morenamestuff
ID:id
PERSON:person
LOCATION:location
JUNK
So I have some blocks that all contain lines that can be split into a dict, and some that cannot. How can I take lines without the : character and join them to the previous line? Here's what I'm currently doing
# loop through chunk
# the first element of dat is a Title, so skip that
key_map = dict(x.split(':') for x in dat[1:])
But I of course get an error because the second chunk has a line without the : character. So I wanted my dict to look something like this after correctly splitting it:
# there will be a key_map for each chunk of data
key_map['NAME'] == 'name morenamestuff' # 3rd line appended to previous
key_map['ID'] == 'id'
key_map['PERSON'] = 'person'
key_map['LOCATION'] = 'location
Solution
EDIT: Here's my final solution on github, and the full code here:
parseScript.py
import re
import string
bad_chars = '(){}"<>[] ' # characers we want to strip from the string
key_map = []
# parse file
with open("dat.txt") as f:
data = f.read()
data = data.strip('\n')
data = re.split('}|\[{', data)
# format file
with open("format.dat") as f:
formatData = [x.strip('\n') for x in f.readlines()]
data = filter(len, data)
# strip and split each station
for dat in data[1:-1]:
# perform black magic, don't even try to understand this
dat = dat.translate(string.maketrans("", "", ), bad_chars).split(',')
key_map.append(dict(x.split(':') for x in dat if ':' in x ))
if ':' not in dat[1]:key_map['NAME']+=dat[k][2]
for station in range(0, len(key_map)):
for opt in formatData:
print opt,":",key_map[station][opt]
print ""
dat.txt
View raw here
format.dat
NAME
STID
LONGITUDE
LATITUDE
ELEVATION
STATE
ID
out.dat
View raw here
When in doubt, write your own generator.
Add in itertools.groupby to chunk by groups of text delimited by whitespace breaks.
def chunker(s):
it = iter(s)
out = [next(it)]
for line in it:
if ':' in line or not line:
yield ' '.join(out)
out = []
out.append(line)
if out:
yield ' '.join(out)
usage:
from itertools import groupby
[dict(x.split(':') for x in g) for k,g in groupby(chunker(lines), bool) if k]
Out[65]:
[{'ID': 'id', 'LOCATION': 'location', 'NAME': 'name', 'PERSON': 'person'},
{'ID': 'id',
'LOCATION': 'location',
'NAME': 'name morenamestuff',
'PERSON': 'person'}]
(if those fields are always the same, I'd go with something like creating some namedtuples instead of a bunch of dicts)
from collections import namedtuple
Thing = namedtuple('Thing', 'ID LOCATION NAME PERSON')
[Thing(**dict(x.split(':') for x in g)) for k,g in groupby(chunker(lines), bool) if k]
Out[76]:
[Thing(ID='id', LOCATION='location', NAME='name', PERSON='person'),
Thing(ID='id', LOCATION='location', NAME='name morenamestuff', PERSON='person')]
Here is something that addresses all your requirements. It handles joining of multiple lines, ignoring blank lines, and ignoring junk lines that do not appear within a block. It is implemented as a generator that yields each dictionary as it is completed.
def parser(data):
d = {}
for line in data:
line = line.strip()
if not line:
if d:
yield d
d = {}
else:
if ':' in line:
key, value = line.split(':')
d[key] = value
else:
if d:
d[key] = '{} {}'.format(d[key], line)
if d:
yield d
When run with this data:
ignore me
NAME:name1
ID:id1
PERSON:person1
LOCATION:location1
NAME:name2
morenamestuff
ID:id2
PERSON:person2
LOCATION:location2
junk
and
other
stuff
NAME:name3
morenamestuff
and more
ID:id3
PERSON:person3
more person stuff
LOCATION:location3
JUNK
MORE JUNK
>>> for d in parser(open('data')):
... print d
{'PERSON': 'person1', 'LOCATION': 'location1', 'NAME': 'name1', 'ID': 'id1'}
{'PERSON': 'person2', 'LOCATION': 'location2', 'NAME': 'name2 morenamestuff', 'ID': 'id2'}
{'PERSON': 'person3 more person stuff', 'LOCATION': 'location3', 'NAME': 'name3 morenamestuff and more', 'ID': 'id3'}
You can grab the lot as a list:
>>> results = list(parser(open('data')))
>>> results
[{'PERSON': 'person1', 'LOCATION': 'location1', 'NAME': 'name1', 'ID': 'id1'}, {'PERSON': 'person2', 'LOCATION': 'location2', 'NAME': 'name2 morenamestuff', 'ID': 'id2'}, {'PERSON': 'person3 more person stuff', 'LOCATION': 'location3', 'NAME': 'name3 morenamestuff and more', 'ID': 'id3'}]
I don't find itertools or regex particularly nice to work with, here's a pure-python solution
separator = ':'
output = []
chunk = None
with open('/tmp/stuff.txt') as f:
for line in (x.strip() for x in f):
if not line:
# we are between 'chunks'
chunk, key = None, None
continue
if chunk is None:
# we are at the beginning of a new 'chunk'
chunk, key = {}, None
output.append(chunk)
if separator in line:
key, val = line.split(separator)
chunk[key] = val
else:
chunk[key] += line
not as elegant, as you requested, but this works
dat=[['NAME:name',
'ID:id',
'PERSON:person',
'LOCATION:location'],
['NAME:name',
'morenamestuff',
'ID:id',
'PERSON:person',
'LOCATION:location']]
k=1
key_map = dict(x.split(':') for x in dat[k] if ':' in x )
if ':' not in dat[k][1]:key_map['NAME']+=dat[k][1]
key_map>>
{'ID': 'id',
'LOCATION': 'location',
'NAME': 'namemorenamestuff',
'PERSON': 'person'}
Just add something to lines with no ":".
if line.find(':') == -1:
line=line+':None'
Then you won't get an error.
{'action_name':'mobile signup',
'functions':[{'name':'test_signUp',
'parameters':{'username':'max#getappcard.com',
'password':'12345',
'mobileLater':'123454231',
'mobile':'1e2w1e2w',
'card':'1232313',
'cardLater':'1234321234321'}}],
'validations':[
{'MOB_header':'My stores'},
{'url':"/stores/my"}]}
I want to get all the keys & values of this dict as a list (out of values that they are dict or array)
print result should be like this:
action name = mobile signup
name = test_signUp
username : max#getappcard.com
password : 12345
mobileLater: 123454231
mobile : 1e2w1e2w
card : 1232313
cardLater : 1234321234321
MOB_header : My stores
You might want to use a recursive function to extract all the key, value pairs.
def extract(dict_in, dict_out):
for key, value in dict_in.iteritems():
if isinstance(value, dict): # If value itself is dictionary
extract(value, dict_out)
elif isinstance(value, unicode):
# Write to dict_out
dict_out[key] = value
return dict_out
Something of this sort. I come from C++ background so I had to google for all the syntaxes.
I have modified a little bit from this link to get all keys&values in nested dict of list-of-dicts and dicts:
def recursive_items(dictionary):
for key, value in dictionary.items():
if type(value) is dict:
yield (key, value)
yield from recursive_items(value)
elif type(value) is list:
yield (key, value)
for i in value:
if type(i) is dict:
yield from recursive_items(i)
else:
yield (key, value)
for i in recursive_items(your_dict):
print(i) #print out tuple of (key, value)
Output:
('action_name', 'mobile signup')
('functions', [{'name': 'test_signUp', 'parameters': {'username':
'max#getappcard.com', 'password': '12345', 'mobileLater': '123454231', 'mobile':
'1e2w1e2w', 'card': '1232313', 'cardLater': '1234321234321'}}])
('name', 'test_signUp')
('parameters', {'username': 'max#getappcard.com', 'password': '12345',
'mobileLater': '123454231', 'mobile': '1e2w1e2w', 'card': '1232313',
'cardLater': '1234321234321'})
('username', 'max#getappcard.com')
('password', '12345')
('mobileLater', '123454231')
('mobile', '1e2w1e2w')
('card', '1232313')
('cardLater', '1234321234321')
('validations', [{'MOB_header': 'My stores'}, {'url': '/stores/my'}])
('MOB_header', 'My stores')
('url', '/stores/my')
a little late but for python 3.8 you can use yield from
def dictitems2list(d):
for k, v in d.items():
yield k
if isinstance(v, dict):
yield from get_all_items(v)
else:
yield v
all_itemt = list(dict2items(d))