Parsing a jsonl file into a useful structure - python

I am importing a jsonl file from my hard drive and trying to get it into a usable format. Here is how I'm importing the data.
train_data=[]
with open("Documents/data/train.jsonl",'r',encoding='utf-8') as j:
for line in j:
train_data.append(json.loads(line))
Which produces data structured like this.
train_data[1]
Out[59]:
{'id': 46971,
'img': 'img/46971.png',
'label': 1,
'text': 'text'}
Basically I would like to convert this data to a dictionary format where the dictionary value is the "id" and the rest of the data is associated with that dictionary label. I believe something like the following, but I'm pretty new to Python so I may be displaying this incorrectly.
print(dict_ex)
{46971: ['img/46971.png', 1, 'text']}

You can create a dictionary and add new elements from train_data list one by one:
di = dict()
for o in train_data:
di[o['id']] = [o['img'], o['label'], o['text']]
print(di)
>>> {46971: ['img/46971.png', 1, 'text']}

# dict[key] = value
dict_ex[data['id']] = [data['img'], data['label'], data['text']]

Try this,
result = {}
for d in train_data:
for k, v in d.items():
if k == "id":
result[v] = []
else:
result[v].append(v)

Related

How to create a number of dictionaries based on a given number in python?

I have an HTML page that handles several forms.
When making a post request (i.e. submitting the values of the fields of the different forms) i received in django/python this dictionary:
form-0-Name:Dupont
form-0-Town:Paris
form-1-Name:Macron
form-1-Town:Marseille
From this dictionary, how to create a number of dictionaries based on the number of forms that i receive?
In this example, i would like to create two dictionaries named (form_0 and form_1) such that form_0:
{Name: Dupont,Town:Paris} and form_1:{Name: Macron, Town:Marseille}.
for dict like this :
a = {'form-0-Name':'Dupont', 'form-0-Town':'Paris', 'form-1-Name':'Macron','form-1-Town':'Marseille'}
you can proceed like this:
final = {}
for k,v in a.items():
key = '_'.join(k.split('-')[:2])
subkey = k.split('-')[-1]
final.setdefault(key, {}).update({subkey:v})
output :
final = {'form_0': {'Name': 'Dupont', 'Town': 'Paris'},
'form_1': {'Name': 'Macron', 'Town': 'Marseille'}}
Make an auxiliary function:
def get_form_dicts(form_dict):
dictionaries = []
form_strings = list(set([x[0:6] for x in form_dict.keys()]))
for i in form_strings:
dictionaries.append({x.split('-')[2]: v for x, v in form_dict.items() if i in x})
return tuple(form_strings)
Now, presuming that you know how many forms you are going to get:
# Naming django_dict the dict that is created in your application
form_0, form_1 = get_form_dicts(django_dict)

Add values to an existing dictionary key in Python

I need to append dictionary values to an already existing JSON file. How can I be able to do that?
My details.json File
{"name": "someName"}
Dictionary generated by my python script
list1 = {"name": "someOthername"}
with open("details.json") as r:
data = json.load(r)
desirableDict = data.append(list1) # It has to be something like this
print(desirableDict)
Desirable Output: {"name": ["someName", "someOthername"]}
You can check all keys within a for loop and put the values ​​of the json file and list1 inside a list like this:
import json
list1 = {"name": "someOthername"}
with open("details.json") as file:
data = json.load(file)
desirableDict = data.copy()
for key in data:
if key in list1:
if type(data[key]) is list:
data[key].append(list1[key])
else:
desirableDict[key] = [data[key],list1[key]]
print(desirableDict)
It seems like you need deep merging of structures. I would like to recommend you to use this awesome library https://pypi.org/project/deepmerge/.
There are a lot of examples like you want to achieve.
from deepmerge import always_merger
base = {"foo": ["bar"]}
next = {"foo": ["baz"]}
expected_result = {'foo': ['bar', 'baz']}
result = always_merger.merge(base, next)
assert expected_result == result

how to write multiple line statement to single line python dictionary

I want to write this below code in one line. I have lots of data. so the page goes on. So i want to shrink it. How to make it possible. I know it is possible in python. Help me with some solutions.
data['url']=url
data['user agent']=userAgent
data['browser']=browser
data['uniqueId']=uniqueId
data['ip']=ip
data['language']=language
and its going on.
I tried this but it fails.
data['url','user agent','browser'...] = url,useragent,browser....
keys = ("url", "ip", "language")
values = ("http://example.com", "93.184.216.34", "en")
# if you want to update an existing dict:
data = {}
data.update(zip(keys, values))
# if you just want to create a dict:
data = dict(zip(keys, values))
If you want to set all the values at once, you could do something like this:
data = { 'url': url, 'user agent': userAgent, ... }
If data already has... data, you could update it with:
data.update({ 'url': url, 'user agent': userAgent, ... })
You can use dict comprehension:
keys = ['url','user agent','browser']
vals = [url,useragent,browser]
data = {key:val for key,val in zip(keys,vals)}
You could do a for loop:
Example:
for key, value in Data:
finalData[key] = value

Formatting dicts and nested dicts

Amazon's DynamoDB requires specially formatted JSON when inserting items into the database.
I have a function that takes a dictionary and transforms values into a nested dict formatted for insertion; the value is transformed into a nested dict where the nested key is the value's data type.
For example, input like {'id':1, 'firstName':'joe'} would be transformed to {'id': {'N':1}, 'firstName': {'S':'joe'}}
This is currently successful with this function:
type_map = {
str:'S', unicode:'S', dict:'M',
float:'N', int:'N', bool:'BOOL'
}
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
I need to modify this function to handle values that might be dicts.
So, for example:
{
'id':1,
'a':{'x':'hey', 'y':1},
'b':{'x':1}
}
Should be transformed to:
{
'id': {'N':1},
'a':{'M': {'x': {'S':'hey'}, 'y':{'N':1}}},
'b': {'M': {'x': {'N':1}}}
}
I'm thinking the correct way to do this must be to call the function from within the function right?
Note: I'm using Python 2.7
What ultimately ended up working for me was the following function:
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
if type(v) == dict:
v = self.format_row(v)
type_dict = {}
type_dict['M'] = v
formatted[k] = type_dict
else:
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
If anyone has a better way of doing this, please let me know!

removing json items from array if value is duplicate python

I am incredibly new to python.
I have an array full of json objects. Some of the json objects contain duplicated values. The array looks like this:
[{"id":"1","name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"},
{"id":"3","name":"Paul","age":"23"}]
What I am trying to do is to remove an item if the name is the same as another json object, and leave the first one in the array.
So in this case I should be left with
[{"id":"1"."name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"}]
The code I currently have can be seen below and is largely based on this answer:
import json
ds = json.loads('python.json') #this file contains the json
unique_stuff = { each['name'] : each for each in ds }.values()
all_ids = [ each['name'] for each in ds ]
unique_stuff = [ ds[ all_ids.index(text) ] for text in set(texts) ]
print unique_stuff
I am not even sure that this line is working ds = json.loads('python.json') #this file contains the json as when I try and print ds nothing shows up in the console.
You might have overdone in your approach. I might tend to rewrite the list as a dictionary with "name" as a key and then fetch the values
ds = [{"id":"1","name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"},
{"id":"3","name":"Paul","age":"23"}]
{elem["name"]:elem for elem in ds}.values()
Out[2]:
[{'age': '23', 'id': '3', 'name': 'Paul'},
{'age': '22', 'id': '2', 'name': 'Peter'}]
Off-course the items within the dictionary and the list may not be ordered, but I do not see much of a concern. If it is, let us know and we can think over it.
If you need to keep the first instance of "Paul" in your data a dictionary comprehension gives you the opposite result.
A simple solution could be as following
new = []
seen = set()
for record in old:
name = record['name']
if name not in seen:
seen.add(name)
new.append(record)
del seen
First of all, your json snippet has invalid format - there are dot instead of commas separating some keys.
You can solve your problem using a dictionary with names as keys:
import json
with open('python.json') as fp:
ds = json.load(fp) #this file contains the json
mem = {}
for record in ds:
name = record["name"]
if name not in mem:
mem[name] = record
print mem.values()

Categories

Resources