Parsing a jsonl file into a useful structure

Parsing a jsonl file into a useful structure - python

I am importing a jsonl file from my hard drive and trying to get it into a usable format. Here is how I'm importing the data.
train_data=[]
with open("Documents/data/train.jsonl",'r',encoding='utf-8') as j:
for line in j:
train_data.append(json.loads(line))
Which produces data structured like this.
train_data[1]
Out[59]:
{'id': 46971,
'img': 'img/46971.png',
'label': 1,
'text': 'text'}
Basically I would like to convert this data to a dictionary format where the dictionary value is the "id" and the rest of the data is associated with that dictionary label. I believe something like the following, but I'm pretty new to Python so I may be displaying this incorrectly.
print(dict_ex)
{46971: ['img/46971.png', 1, 'text']}

You can create a dictionary and add new elements from train_data list one by one:
di = dict()
for o in train_data:
di[o['id']] = [o['img'], o['label'], o['text']]
print(di)
>>> {46971: ['img/46971.png', 1, 'text']}

# dict[key] = value
dict_ex[data['id']] = [data['img'], data['label'], data['text']]

Try this,
result = {}
for d in train_data:
for k, v in d.items():
if k == "id":
result[v] = []
else:
result[v].append(v)

Related

How to create a number of dictionaries based on a given number in python?

I have an HTML page that handles several forms.
When making a post request (i.e. submitting the values of the fields of the different forms) i received in django/python this dictionary:
form-0-Name:Dupont
form-0-Town:Paris
form-1-Name:Macron
form-1-Town:Marseille
From this dictionary, how to create a number of dictionaries based on the number of forms that i receive?
In this example, i would like to create two dictionaries named (form_0 and form_1) such that form_0:
{Name: Dupont,Town:Paris} and form_1:{Name: Macron, Town:Marseille}.

for dict like this :
a = {'form-0-Name':'Dupont', 'form-0-Town':'Paris', 'form-1-Name':'Macron','form-1-Town':'Marseille'}
you can proceed like this:
final = {}
for k,v in a.items():
key = '_'.join(k.split('-')[:2])
subkey = k.split('-')[-1]
final.setdefault(key, {}).update({subkey:v})
output :
final = {'form_0': {'Name': 'Dupont', 'Town': 'Paris'},
'form_1': {'Name': 'Macron', 'Town': 'Marseille'}}

Make an auxiliary function:
def get_form_dicts(form_dict):
dictionaries = []
form_strings = list(set([x[0:6] for x in form_dict.keys()]))
for i in form_strings:
dictionaries.append({x.split('-')[2]: v for x, v in form_dict.items() if i in x})
return tuple(form_strings)
Now, presuming that you know how many forms you are going to get:
# Naming django_dict the dict that is created in your application
form_0, form_1 = get_form_dicts(django_dict)

Add values to an existing dictionary key in Python

I need to append dictionary values to an already existing JSON file. How can I be able to do that?
My details.json File
{"name": "someName"}
Dictionary generated by my python script
list1 = {"name": "someOthername"}
with open("details.json") as r:
data = json.load(r)
desirableDict = data.append(list1) # It has to be something like this
print(desirableDict)
Desirable Output: {"name": ["someName", "someOthername"]}

You can check all keys within a for loop and put the values of the json file and list1 inside a list like this:
import json
list1 = {"name": "someOthername"}
with open("details.json") as file:
data = json.load(file)
desirableDict = data.copy()
for key in data:
if key in list1:
if type(data[key]) is list:
data[key].append(list1[key])
else:
desirableDict[key] = [data[key],list1[key]]
print(desirableDict)

It seems like you need deep merging of structures. I would like to recommend you to use this awesome library https://pypi.org/project/deepmerge/.
There are a lot of examples like you want to achieve.
from deepmerge import always_merger
base = {"foo": ["bar"]}
next = {"foo": ["baz"]}
expected_result = {'foo': ['bar', 'baz']}
result = always_merger.merge(base, next)
assert expected_result == result

how to write multiple line statement to single line python dictionary

I want to write this below code in one line. I have lots of data. so the page goes on. So i want to shrink it. How to make it possible. I know it is possible in python. Help me with some solutions.
data['url']=url
data['user agent']=userAgent
data['browser']=browser
data['uniqueId']=uniqueId
data['ip']=ip
data['language']=language
and its going on.
I tried this but it fails.
data['url','user agent','browser'...] = url,useragent,browser....

keys = ("url", "ip", "language")
values = ("http://example.com", "93.184.216.34", "en")
# if you want to update an existing dict:
data = {}
data.update(zip(keys, values))
# if you just want to create a dict:
data = dict(zip(keys, values))

If you want to set all the values at once, you could do something like this:
data = { 'url': url, 'user agent': userAgent, ... }
If data already has... data, you could update it with:
data.update({ 'url': url, 'user agent': userAgent, ... })

You can use dict comprehension:
keys = ['url','user agent','browser']
vals = [url,useragent,browser]
data = {key:val for key,val in zip(keys,vals)}

You could do a for loop:
Example:
for key, value in Data:
finalData[key] = value

Formatting dicts and nested dicts

Amazon's DynamoDB requires specially formatted JSON when inserting items into the database.
I have a function that takes a dictionary and transforms values into a nested dict formatted for insertion; the value is transformed into a nested dict where the nested key is the value's data type.
For example, input like {'id':1, 'firstName':'joe'} would be transformed to {'id': {'N':1}, 'firstName': {'S':'joe'}}
This is currently successful with this function:
type_map = {
str:'S', unicode:'S', dict:'M',
float:'N', int:'N', bool:'BOOL'
}
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
I need to modify this function to handle values that might be dicts.
So, for example:
{
'id':1,
'a':{'x':'hey', 'y':1},
'b':{'x':1}
}
Should be transformed to:
{
'id': {'N':1},
'a':{'M': {'x': {'S':'hey'}, 'y':{'N':1}}},
'b': {'M': {'x': {'N':1}}}
}
I'm thinking the correct way to do this must be to call the function from within the function right?
Note: I'm using Python 2.7

What ultimately ended up working for me was the following function:
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
if type(v) == dict:
v = self.format_row(v)
type_dict = {}
type_dict['M'] = v
formatted[k] = type_dict
else:
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
If anyone has a better way of doing this, please let me know!

removing json items from array if value is duplicate python

I am incredibly new to python.
I have an array full of json objects. Some of the json objects contain duplicated values. The array looks like this:
[{"id":"1","name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"},
{"id":"3","name":"Paul","age":"23"}]
What I am trying to do is to remove an item if the name is the same as another json object, and leave the first one in the array.
So in this case I should be left with
[{"id":"1"."name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"}]
The code I currently have can be seen below and is largely based on this answer:
import json
ds = json.loads('python.json') #this file contains the json
unique_stuff = { each['name'] : each for each in ds }.values()
all_ids = [ each['name'] for each in ds ]
unique_stuff = [ ds[ all_ids.index(text) ] for text in set(texts) ]
print unique_stuff
I am not even sure that this line is working ds = json.loads('python.json') #this file contains the json as when I try and print ds nothing shows up in the console.

You might have overdone in your approach. I might tend to rewrite the list as a dictionary with "name" as a key and then fetch the values
ds = [{"id":"1","name":"Paul","age":"21"},
{"id":"2","name":"Peter","age":"22"},
{"id":"3","name":"Paul","age":"23"}]
{elem["name"]:elem for elem in ds}.values()
Out[2]:
[{'age': '23', 'id': '3', 'name': 'Paul'},
{'age': '22', 'id': '2', 'name': 'Peter'}]
Off-course the items within the dictionary and the list may not be ordered, but I do not see much of a concern. If it is, let us know and we can think over it.

If you need to keep the first instance of "Paul" in your data a dictionary comprehension gives you the opposite result.
A simple solution could be as following
new = []
seen = set()
for record in old:
name = record['name']
if name not in seen:
seen.add(name)
new.append(record)
del seen

First of all, your json snippet has invalid format - there are dot instead of commas separating some keys.
You can solve your problem using a dictionary with names as keys:
import json
with open('python.json') as fp:
ds = json.load(fp) #this file contains the json
mem = {}
for record in ds:
name = record["name"]
if name not in mem:
mem[name] = record
print mem.values()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a jsonl file into a useful structure - python

You can create a dictionary and add new elements from train_data list one by one: di = dict() for o in train_data: di[o['id']] = [o['img'], o['label'], o['text']] print(di) >>> {46971: ['img/46971.png', 1, 'text']}

# dict[key] = value dict_ex[data['id']] = [data['img'], data['label'], data['text']]

Try this, result = {} for d in train_data: for k, v in d.items(): if k == "id": result[v] = [] else: result[v].append(v)

Related

How to create a number of dictionaries based on a given number in python?

Add values to an existing dictionary key in Python

how to write multiple line statement to single line python dictionary

Formatting dicts and nested dicts

removing json items from array if value is duplicate python

Categories

Resources