Trying to seed a database in django app. I have a csv file that I converted to json and now I need to reformat it to match the django serialization required format found here
This is what the json format needs to look like to be acceptable to django (Which looks an awful lot like a dictionary with 3 keys, the third having a value which is a dictionary itself):
[
{
"pk": "4b678b301dfd8a4e0dad910de3ae245b",
"model": "sessions.session",
"fields": {
"expire_date": "2013-01-16T08:16:59.844Z",
...
}
}
]
My json data looks like this after converting it from csv with pandas:
[{'model': 'homepage.territorymanager', 'pk': 1, 'Name': 'Aaron ##', 'Distributor': 'National Energy', 'State': 'BC', 'Brand': 'Trane', 'Cell': '778-###-####', 'email address': None, 'Notes': None, 'Unnamed: 9': None}, {'model': 'homepage.territorymanager', 'pk': 2, 'Name': 'Aaron Martin ', 'Distributor': 'Pierce ###', 'State': 'PA', 'Brand': 'Bryant/Carrier', 'Cell': '267-###-####', 'email address': None, 'Notes': None, 'Unnamed: 9': None},...]
I am using this function to try and reformat
def re_serialize_reg_json(d, jsonFilePath):
for i in d:
d2 = {'Name': d[i]['Name'], 'Distributor' : d[i]['Distributor'], 'State' : d[i]['State'], 'Brand' : d[i]['Brand'], 'Cell' : d[i]['Cell'], 'EmailAddress' : d[i]['email address'], 'Notes' : d[i]['Notes']}
d[i] = {'pk': d[i]['pk'],'model' : d[i]['model'], 'fields' : d2}
print(d)
and it returns this error which doesn't make any sense because the format that django requires has a dictionary as the value of the third key:
d2 = {'Name': d[i]['Name'], 'Distributor' : d[i]['Distributor'], 'State' : d[i]['State'], 'Brand' : d[i]['Brand'], 'Cell' : d[i]['Cell'], 'EmailAddress' : d[i]['email address'], 'Notes' : d[i]['Notes']}
TypeError: list indices must be integers or slices, not dict
Any help appreciated!
Here is what I did to get d:
df = pandas.read_csv('/Users/justinbenfit/territorymanagerpython/territory managers - Sheet1.csv')
df.to_json('/Users/justinbenfit/territorymanagerpython/territorymanagers.json', orient='records')
jsonFilePath = '/Users/justinbenfit/territorymanagerpython/territorymanagers.json'
def load_file(file_path):
with open(file_path) as f:
d = json.load(f)
return d
d = load_file(jsonFilePath)
print(d)
D is actually a list containing multiple dictionaries, so in order to make it work you want to change that for i in d part to: for i in range(len(d)).
Related
I have the a dictionary like this:
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
I want to create another list as follows:
[{"label":{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"},"value":
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}}]
I have tried some methods with .items() but none of them gives the desired result.
Is that what you want?
dict_ = {"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
output = [{"label": dict_ , "value": dict_ }]
print(output)
[{"label":{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"},"value":
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}}] == [{"label": dict_ , "value": dict_ }]
Gives True
Following my comment, below is the code I would go through assuming key and output:
# Could be the keys would get from somewhere
vals = ["1","2","3","4"]
# Probably same coming from external sources
example_op =
{"Topic":"text","title":"texttitle","abstract":"textabs","year":"textyear","authors":"authors"}
#Global list
item_list = []
temp_dict = {}
for key in vals:
temp_dict[key] = example_op
item_list.append(temp_dict)
Final output of the list would be as:
Out[9]:
[{'1': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'2': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'3': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'},
'4': {'Topic': 'text',
'title': 'texttitle',
'abstract': 'textabs',
'year': 'textyear',
'authors': 'authors'}}]
I want to convert this nested json into a df.
Tried different functions but none works correctly.
The encoding that worked for my was -
encoding = "utf-8-sig"
[{'replayableActionOperationState': 'SKIPPED',
'replayableActionOperationGuid': 'RAO_1037351',
'failedMessage': 'Cannot replay action: RAO_1037351: com.ebay.sd.catedor.core.model.DTOEntityPropertyChange; local class incompatible: stream classdesc serialVersionUID = 7777212484705611612, local class serialVersionUID = -1785129380151507142',
'userMessage': 'Skip all mode',
'username': 'gfannon',
'sourceAuditData': [{'guid': '24696601-b73e-43e4-bce9-28bc741ac117',
'operationName': 'UPDATE_CATEGORY_ATTRIBUTE_PROPERTY',
'creationTimestamp': 1563439725240,
'auditCanvasInfo': {'id': '165059', 'name': '165059'},
'auditUserInfo': {'id': 1, 'name': 'gfannon'},
'externalId': None,
'comment': None,
'transactionId': '0f135909-66a7-46b1-98f6-baf1608ffd6a',
'data': {'entity': {'guid': 'CA_2511202',
'tagType': 'BOTH',
'description': None,
'name': 'Number of Shelves'},
'propertyChanges': [{'propertyName': 'EntityProperty',
'oldEntity': {'guid': 'CAP_35',
'name': 'DisableAsVariant',
'group': None,
'action': 'SET',
'value': 'true',
'tagType': 'SELLER'},
'newEntity': {'guid': 'CAP_35',
'name': 'DisableAsVariant',
'group': None,
'action': 'SET',
'value': 'false',
'tagType': 'SELLER'}}],
'entityChanges': None,
'primary': True}}],
'targetAuditData': None,
'conflictedGuids': None,
'fatal': False}]
This is what i tried so far, there are more tries but that got me as close as i can.
with open(r"Desktop\Ann's json parsing\report.tsv", encoding='utf-8-sig') as data_file:
data = json.load(data_file)
df = json_normalize(data)
print (df)
pd.DataFrame(df) ## The nested lists are shown as a whole column, im trying to parse those colums - 'failedMessage' and 'sourceAuditData'`I also tried json.loads/json(df) but the output isnt correct.
pd.DataFrame.from_dict(a['sourceAuditData'][0]['data']['propertyChanges'][0]) ##This line will retrive one of the outputs i need but i dont know how to perform it on the whole file.
The expected result should be a csv/xlsx file with a column and value for each row.
For your particular example:
def unroll_dict(d):
data = []
for k, v in d.items():
if isinstance(v, list):
data.append((k, ''))
data.extend(unroll_dict(v[0]))
elif isinstance(v, dict):
data.append((k, ''))
data.extend(unroll_dict(v))
else:
data.append((k,v))
return data
And given the data in your question is stored in the variable example:
df = pd.DataFrame(unroll_dict(example[0])).set_index(0).transpose()
I have a text file filled with place data provided by twitter api. Here is the sample data of 2 lines
{'country': 'United Kingdom', 'full_name': 'Dorridge, England', 'id': '31fe56e2e7d5792a', 'country_code': 'GB', 'name': 'Dorridge', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json'}
{'country': 'India', 'full_name': 'New Delhi, India', 'id': '317fcc4b21a604d5', 'country_code': 'IN', 'name': 'New Delhi', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[76.84252, 28.397657], [77.347652, 28.397657], [77.347652, 28.879322], [76.84252, 28.879322]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/317fcc4b21a604d5.json'}
I want 'country', 'name' and 'cordinates' filed of each line.In order to do this we need to iterate line by line the entire file.so i append each line to a list
data = []
with open('place.txt','r') as f:
for line in f:
data.append(line)
when i checked the data type it shows as 'str' instead of 'dict'.
type(data[0])
str
data[0].keys()
AttributeError: 'str' object has no attribute 'keys'
how to fix this so that it can be saved as list of dictionaries.
Originally tweets were encoded and decoded by following code:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n') #encoded and saved to a .txt file
tweets.append(jsonpickle.decode(line)) # decoding
And place data file is saved by following code:
fName = "place.txt"
newLine = "\n"
with open(fName, 'a', encoding='utf-8') as f:
for i in range(len(tweets)):
f.write('{}'.format(tweets[i]['place']) +'\n')
In your case you should use json to do the data parsing. But if you have a problem with json (which is almost impossible since we are talking about an API ), then in general to convert from string to dictionary you can do:
>>> import ast
>>> x = "{'country': 'United Kingdom', 'full_name': 'Dorridge, England', 'id': '31fe56e2e7d5792a', 'country_code': 'GB', 'name': 'Dorridge', 'attributes': {}, 'contained_within': [], 'place_type': 'city', 'bounding_box': {'coordinates': [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], 'type': 'Polygon'}, 'url': 'https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json'}
"
>>> d = ast.literal_eval(x)
>>> d
d now is a dictionary instead of a string.
But again if your data are in json format python has a built-in lib to handle json format, and is better and safer to use json than ast.
For example if you get a response let's say resp you could simply do:
response = json.loads(resp)
and now you could parse response as a dictionary.
Note: Single quotes are not valid JSON.
I have never tried Twitter API. Looks like your data are not valid JSON. Here is a simple preprocess method to replace '(single quote) into "(double quote)
data = "{'country': 'United Kingdom', ... }"
json_data = data.replace('\'', '\"')
dict_data = json.loads(json_data)
dict_data.keys()
# [u'full_name', u'url', u'country', ... ]
You should use python json library for parsing and getting the value.
In python it's quite easy.
import json
x = '{"country": "United Kingdom", "full_name": "Dorridge, England", "id": "31fe56e2e7d5792a", "country_code": "GB", "name": "Dorridg", "attributes": {}, "contained_within": [], "place_type": "city", "bounding_box": {"coordinates": [[[-1.7718518, 52.3635912], [-1.7266702, 52.3635912], [-1.7266702, 52.4091167], [-1.7718518, 52.4091167]]], "type": "Polygon"}, "url": "https://api.twitter.com/1.1/geo/id/31fe56e2e7d5792a.json"}'
y = json.loads(x)
print(y["country"],y["name"],y["bounding_box"]["coordinates"])
You can use list like this
mlist= list()
for i in ndata.keys():
mlist.append(i)
So I have a list of dictionaries like so:
data = [ {
'Organization' : '123 Solar',
'Phone' : '444-444-4444',
'Email' : '',
'website' : 'www.123solar.com'
}, {
'Organization' : '123 Solar',
'Phone' : '',
'Email' : 'joey#123solar.com',
'Website' : 'www.123solar.com'
}, {
etc...
} ]
Of course, this is not the exact data. But (maybe) from my example here you can catch my problem. I have many records with the same "Organization" name, but not one of them has the complete information for that record.
Is there an efficient method for searching over the list, sorting the list based on the dictionary's first entry, and finally merging the data from duplicates to create a unique entry? (Keep in mind these dictionaries are quite large)
You can make use of itertools.groupby:
from itertools import groupby
from operator import itemgetter
from pprint import pprint
data = [ {
'Organization' : '123 Solar',
'Phone' : '444-444-4444',
'Email' : '',
'website' : 'www.123solar.com'
}, {
'Organization' : '123 Solar',
'Phone' : '',
'Email' : 'joey#123solar.com',
'Website' : 'www.123solar.com'
},
{
'Organization' : '234 test',
'Phone' : '111',
'Email' : 'a#123solar.com',
'Website' : 'b.123solar.com'
},
{
'Organization' : '234 test',
'Phone' : '222',
'Email' : 'ac#123solar.com',
'Website' : 'bd.123solar.com'
}]
data = sorted(data, key=itemgetter('Organization'))
result = {}
for key, group in groupby(data, key=itemgetter('Organization')):
result[key] = [item for item in group]
pprint(result)
prints:
{'123 Solar': [{'Email': '',
'Organization': '123 Solar',
'Phone': '444-444-4444',
'website': 'www.123solar.com'},
{'Email': 'joey#123solar.com',
'Organization': '123 Solar',
'Phone': '',
'Website': 'www.123solar.com'}],
'234 test': [{'Email': 'a#123solar.com',
'Organization': '234 test',
'Phone': '111',
'Website': 'b.123solar.com'},
{'Email': 'ac#123solar.com',
'Organization': '234 test',
'Phone': '222',
'Website': 'bd.123solar.com'}]}
UPD:
Here's what you can do to group items into single dict:
for key, group in groupby(data, key=itemgetter('Organization')):
result[key] = {'Phone': [],
'Email': [],
'Website': []}
for item in group:
result[key]['Phone'].append(item['Phone'])
result[key]['Email'].append(item['Email'])
result[key]['Website'].append(item['Website'])
then, in result you'll have:
{'123 Solar': {'Email': ['', 'joey#123solar.com'],
'Phone': ['444-444-4444', ''],
'Website': ['www.123solar.com', 'www.123solar.com']},
'234 test': {'Email': ['a#123solar.com', 'ac#123solar.com'],
'Phone': ['111', '222'],
'Website': ['b.123solar.com', 'bd.123solar.com']}}
Is there an efficient method for searching over the list, sorting the list based on the dictionary's first entry, and finally merging the data from duplicates to create a unique entry?
Yes, but there's an even more efficient method without searching and sorting. Just build up a dictionary as you go along:
datadict = {}
for thingy in data:
organization = thingy['Organization']
datadict[organization] = merge(thingy, datadict.get(organization, {}))
Now you've making a linear pass over the data, doing a constant-time lookup for each one. So, it's better than any sorted solution by a factor of O(log N). It's also one pass instead of multiple passes, and it will probably have lower constant overhead besides.
It's not clear exactly what you want to do to merge the entries, and there's no way anyone can write the code without knowing what rules you want to use. But here's a simple example:
def merge(d1, d2):
for key, value in d2.items():
if not d1.get(key):
d1[key] = value
return d1
In other words, for each item in d2, if d1 already has a truthy value (like a non-empty string), leave it alone; otherwise, add it.
Dear Stackoverflow Members,
I have this JSON array, and it consists of the following items (basically):
{
{
'Name': 'x',
'Id': 'y',
'Unsusedstuff' : 'unused',
'Unsusedstuff2' : 'unused2',
'Children': []
},
{ 'Name' : 'xx',
'Id': 'yy',
'Unsusedstuff' : 'unused',
'Unsusedstuff2' : 'unused2',
'Children': [{
'Name': 'xyx',
'Id' : 'yxy',
'Unsusedstuff' : 'unused',
'Unsusedstuff2' : 'unused2',
'Children: []
}
You get the basic idea. I want to emulate this (and just grab the id and the name and the structure) in a Python-list using the following code:
names = []
def parseNames(col):
for x in col:
if(len(x['Children'])> 0):
names.append({'Name' : x['Name'], 'Id' : x['Id'], 'Children' : parseNames(x['Children'])})
else:
return {'Name' : x['Name'], 'Id' : x['Id']}
But, it only seems to return the first 'root' and the first nested folder, but doesn't loop through them all.
How would I be able to fix this?
Greetings,
Mats
The way I read this, you're trying to convert this tree into a tree of nodes which only have Id, Name and Children. In that case, the way I'd think of it is as cleaning nodes.
To clean a node:
Create a node with the Name and Id of the original node.
Set the new node's Children to be the cleaned versions of the original node's children. (This is the recursive call.)
In code, that would be:
def clean_node(node):
return {
'Name': node['Name'],
'Id': node['Id'],
'Children': map(clean_node, node['Children']),
}
>>> print map(clean_node, data)
[{'Name': 'x', 'Children': [], 'Id': 'y'}, {'Name': 'xx', 'Children': [{'Name': 'xyx', 'Children': [], 'Id': 'yxy'}], 'Id': 'yy'}]
I find it's easier to break recursive problems down like this - trying to use global variables turns simple things very confusing very quickly.
Check this
def parseNames(col):
for x in col:
if(len(x['Children'])> 0):
a = [{
'Name' : x['Name'],
'Id' : x['Id'],
'Children' : x['Children'][0]['Children']
}]
parseNames(a)
names.append({'Name' : x['Name'], 'Id' : x['Id']})
return names
Output I get is
[{'Name': 'x', 'Id': 'y'}, {'Name': 'xx', 'Id': 'yy'}, {'Name': 'xx', 'Id': 'yy'}]
You can parse a Json object with this:
import json
response = json.loads(my_string)
Now response is a dictionary with the keys of every Json object.