Convert multiline "tabular" string to dictionary - python

I have a string that looks something like this:
name1 pass blue n/a
name-6t56-yt6 fail red n/a
name-45 pass blue n/a
name-6t567-yt6 fail red n/a
I want to extract data from the first 2 columns and would ideally store it in a dictionary in the following manner:
[{'type': 'name1', 'status': 'pass'}, {'type': 'name-6t56-yt6', 'status': 'fail'}, {'type': 'name-45', 'status': 'pass'}, {'type': 'name-6t567-yt6', 'status': 'fail'}]
Any ideas of how to approach this?
Note that this is a multi-line string(in utf-8 format).

Assuming you want a list:
Setup:
>>> s = '''name1 pass blue n/a
... name-6t56-yt6 fail red n/a
... name-45 pass blue n/a
... name-6t567-yt6 fail red n/a'''
Construct result:
>>> [dict(zip(('type', 'status'), line.split(maxsplit=2)[:2])) for line in s.splitlines()]
[{'type': 'name1', 'status': 'pass'}, {'type': 'name-6t56-yt6', 'status': 'fail'}, {'type': 'name-45', 'status': 'pass'}, {'type': 'name-6t567-yt6', 'status': 'fail'}]

In your code you are using a set of dictionaries, it's not the best idea, here i am using a list of dictionaries
s = """name1 pass blue n/a
name-6t56-yt6 fail red n/a
name-45 pass blue n/a
name-6t567-yt6 fail red n/a"""
d = []
for line in s.split('\n'):
type, status = line.split()[0:2]
d.append({'type': type, 'status': status})
content of d:
[{'type': 'name1', 'status': 'pass'},
{'type': 'name-6t56-yt6', 'status': 'fail'},
{'type': 'name-45', 'status': 'pass'},
{'type': 'name-6t567-yt6', 'status': 'fail'}]

from pprint import pprint
with open('file.txt') as f:
data = f.readlines()
result = []
for line in data:
result.append({
'type': line[0:line.index(' ')],
'status': 'pass' if 'pass' in line else 'fail'
})
pprint(result)
# [{'status': 'pass', 'type': 'name1'},
# {'status': 'fail', 'type': 'name-6t56-yt6'},
# {'status': 'pass', 'type': 'name-45'},
# {'status': 'fail', 'type': 'name-6t567-yt6'}]

With the text input defined as a multinine string, text, you can read it's content into the desired dictionary structure like this:
# from collections import defaultdict
from pprint import pprint as pp
text = """name1 pass blue n/a
name-6t56-yt6 fail red n/a
name-45 pass blue n/a
name-6t567-yt6 fail red n/a"""
d = []
for line in text.split("\n"):
type, status = line.split()[0:2]
d.append({"type": type, "status": status})
pp(d)
Which will output:
[{'status': 'name1', 'type': 'pass'},
{'status': 'name-6t56-yt6', 'type': 'fail'},
{'status': 'name-45', 'type': 'pass'},
{'status': 'name-6t567-yt6', 'type': 'fail'}]

Related

Convert an uneven nested dictioary into tabular form

res = {'Head': {'Ide': 'GLE', 'ID': '7b', 'Source': 'CARS', 'Target': 'TULUM', 'Country': 'GL'},
'Load': {'Stat': {'Code': '21', 'Reason': 'invalid'}, 'SrcFilePath': '/path.xls'}}
res is the nested dictionary that needs to be converted into a tabular form.
With the following columns and respective values:
Ide ID Source Target Country Code Reason SrcFilePath
Code:
for col,data in res.items():
final_data = dict(data.items())
df = pd.DataFrame(final_data)
print(df)
Error:
ValueError: If using all scalar values, you must pass an index
You can try:
pd.DataFrame.from_dict(res, orient='index')
You could try using:
pd.json_normalize(res)
Although the output can be a bit "ugly", but it actually works.
I assume that res isn't the only record and there's data like:
data = [
{'Head': {'Ide': 'GLE', 'ID': '7b', 'Source': 'CARS', 'Target': 'TULUM', 'Country': 'GL'}, 'Load': {'Stat': {'Code': '21', 'Reason': 'invalid'}, 'SrcFilePath': '/path.xls'}}
, {'Head': {'Ide': 'ABC', 'ID': '8b', 'Source': 'CARS', 'Target': 'TULUM', 'Country': 'AB'}, 'Load': {'Stat': {'Code': '21', 'Reason': 'invalid'}, 'SrcFilePath': '/path.xls'}}
, {'Head': {'Ide': 'EFG', 'ID': '9b', 'Source': 'CARS', 'Target': 'TULUM', 'Country': 'EF'}, 'Load': {'Stat': {'Code': '21', 'Reason': 'invalid'}, 'SrcFilePath': '/path.xls'}}
]
So we have to write a procedure to flatten records and apply it by map to the data before transforming records into a frame:
def flatten_dict(d:dict) -> dict:
res = {}
for k, v in d.items():
if type(v) is dict:
res.update(flatten_dict(v))
else:
res[k] = v
return res
output = pd.DataFrame(map(flatten_dict, data))
The output:
Ide ID Source Target Country Code Reason SrcFilePath
0 GLE 7b CARS TULUM GL 21 invalid /path.xls
1 ABC 8b CARS TULUM AB 21 invalid /path.xls
2 EFG 9b CARS TULUM EF 21 invalid /path.xls

How to convert json into a pandas dataframe?

I'm trying to covert an api response from json to a dataframe in pandas. the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.
The data is collect from a api with the following format:
{'tickets': [{'url': 'https...',
'id': 1,
'external_id': None,
'via': {'channel': 'web',
'source': {'from': {}, 'to': {}, 'rel': None}},
'created_at': '2020-05-01T04:16:33Z',
'updated_at': '2020-05-23T03:02:49Z',
'type': 'incident',
'subject': 'Subject',
'raw_subject': 'Raw subject',
'description': 'Hi, this is the description',
'priority': 'normal',
'status': 'closed',
'recipient': None,
'requester_id': 409467360874,
'submitter_id': 409126461453,
'assignee_id': 409126461453,
'organization_id': None,
'group_id': 360009916453,
'collaborator_ids': [],
'follower_ids': [],
'email_cc_ids': [],
'forum_topic_id': None,
'problem_id': None,
'has_incidents': False,
'is_public': True,
'due_at': None,
'tags': ['tag_1',
'tag_2',
'tag_3',
'tag_4'],
'custom_fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'satisfaction_rating': {'score': 'unoffered'},
'sharing_agreement_ids': [],
'comment_count': 2,
'fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'followup_ids': [],
'ticket_form_id': 360003608013,
'deleted_ticket_form_id': 360003608013,
'brand_id': 360004571673,
'satisfaction_probability': None,
'allow_channelback': False,
'allow_attachments': True},
What I already tried is the following: I have converted the JSON format into a dict as following:
x = response.json()
df = pd.DataFrame(x['tickets'])
But I'm struggling with the output. I don't know how to get a correct, ordered, normalized dataframe.
(I'm new in this :) )
Let's supose you get your request data by this code r = requests.get(url, auth)
Your data ins't clear yet, so let's get a dataframe of it data = pd.read_json(json.dumps(r.json, ensure_ascii = False))
But, probably you will get a dataframe with one single row.
When I faced a problem like this, I wrote this function to get the full data:
listParam = []
def listDict(entry):
if type(entry) is dict:
listParam.append(entry)
elif type(entry) is list:
for ent in entry:
listDict(ent)
Because your data looks like a dict because of {'tickets': ...} you will need to get the information like that:
listDict(data.iloc[0][0])
And then,
pd.DataFrame(listParam)
I can't show the results because you didn't post the complete data nor told where I can find the data to test, but this will probably work.
You have to convert the json to dictionary first and then convert the dictionary value for key 'tickets' into dataframe.
file = open('file.json').read()
ticketDictionary = json.loads(file)
df = pd.DataFrame(ticketDictionary['tickets'])
'file.json' contains your data here.
df now contains your dataFrame in this format.
For the lists within the response you can have separate dataframes if required:
for field in df['fields']:
df = pd.DataFrame(field)
It will give you this for lengths:
id value
0 360042034433 value of the first custom field
1 360041487874 value of the second custom field
2 360041489414 value of the third custom field
3 360040980053 correo_electrónico
4 360040980373 suscribe_newsletter
5 360042046173 None
6 360041028574 product
7 360042103034 None
This can be one way to structure as you haven't mentioned the exact expected format.

Python: Iterating over list of dictionaries and output to a list

I am looping over the json object and output of the object is in the form of lists of dictionaries.
[{'Status': 'active', 'id': '0f1fb86da9c7ee380'}]
[{'Status': 'active', 'id': '0d6b330e4960c3382'}, {'Status': 'active', 'id': '033cfb634e595ccfa'}]
[{'Status': 'active', 'id': '0457f623cbb9f7c95'}]
[{'Status': 'active', 'id': '01b69eb6a3048f749'}, {'Status': 'active', 'id': '0f7ce44a9a5fc82f5'}, {'Status': 'active', 'id': '05417e161acf3ec5d'}]
[{'Status': 'active', 'id': '033cfb634e595ccfa'}, {'Status': 'active', 'id': '01eab32f9808acf19'}]
I have tried something like this so far but it prints in the form of string. I tried using list and appending it but it gives me a weird output. If I don't use list then it gives me the list of all in the form of strings.
Current output:
0f1fb86da9c7ee380
0d6b330e4960c3382
033cfb634e595ccfa
0457f623cbb9f7c95
01b69eb6a3048f749
0f7ce44a9a5fc82f5
05417e161acf3ec5d
0f373f123dc8221de
05417e161acf3ec5d
My code:
for i in data['DBI']:
t = 0
while t < len(i['Groups']):
print i['Groups'][t]['id']
t += 1`
Expected Output: I am looking for the output in the form of the list like this ['0f1fb86da9c7ee380'] ['0d6b330e4960c3382','033cfb634e595ccfa'] ['0457f623cbb9f7c95'] ['01b69eb6a3048f749','0f7ce44a9a5fc82f5',''05417e161acf3ec5d'] ['033cfb634e595ccfa','01eab32f9808acf19']
What you want is something like this:
data = [
[{'Status': 'active', 'id': '0f1fb86da9c7ee380'}],
[{'Status': 'active', 'id': '0d6b330e4960c3382'}, {'Status': 'active', 'id': '033cfb634e595ccfa'}],
[{'Status': 'active', 'id': '0457f623cbb9f7c95'}],
[{'Status': 'active', 'id': '01b69eb6a3048f749'}, {'Status': 'active', 'id': '0f7ce44a9a5fc82f5'}, {'Status': 'active', 'id': '05417e161acf3ec5d'}],
[{'Status': 'active', 'id': '033cfb634e595ccfa'}, {'Status': 'active', 'id': '01eab32f9808acf19'}],
]
new_data = []
for l in data:
current_ids = []
for d in l:
current_ids.append(d["id"])
new_data.append(current_ids)
new_data
output:
[['0f1fb86da9c7ee380'],
['0d6b330e4960c3382', '033cfb634e595ccfa'],
['0457f623cbb9f7c95'],
['01b69eb6a3048f749', '0f7ce44a9a5fc82f5', '05417e161acf3ec5d'],
['033cfb634e595ccfa', '01eab32f9808acf19']]
You can also use list comprehension to achieve this.
output = [[item["id"] for item in items] for items in data]
data =json.loads(v_string)
for id in data['DBI']:
datalist = id['Groups']
i=0
data_list=[] #<-- This was outside above mann for loop.
while i < len(datalist):
for dic in datalist[i]:
if 'active' not in datalist[i][dic]:
data_list.append(datalist[i][dic])
print(data_list)
i+=1

Iterate through a json list and get specific key and value with Python

I have a JSON list like this (it was a JSON response, the below is after i did json.loads)
[{'status': 'ok', 'slot': None, 'name': 'blah', 'index': 0, 'identify': 'off',
'details': None, 'speed': None, 'temperature': None}, {'status': 'ok', 'slot':
None, 'name': 'blah0', 'index': 0, 'identify': 'off', 'details': None,
'speed': None, 'temperature': None}, {'status': 'ok', 'slot': None, 'name':
'blah1', 'index': 1, 'identify': 'off', 'details': None, 'speed': None,
'temperature': None}, {'status': 'ok', 'slot': None, 'name': 'blah2',
'index': 2, 'identify': 'off', 'details': None, 'speed': None, 'temperature':
None}, {'status': 'ok', 'slot': None, 'name': 'blah3', 'index': 3,
'identify': 'off', 'details': None, 'speed': None, 'temperature': None}]
I want to get both name and the status of the list, if name='blah' or 'blah0' or 'blah1' or 'blah2' or 'blah3'
Essentially, for all the matches i want to store all the name and status in separate variables to use it elsewhere. (can be dynamically creating variables or statically assigning them will also work for me)
I tried this, but doesn't seems to work the way i want.
for value in data:
if value['name'] in ['blah', 'blah0', 'blah1', 'blah2', 'blah3']:
print(value['name'], value['status'])
This prints out the name and status as a string one line below the other. But i want each name and status to be assigned to a variable so i can use it later. Any help is much appreciated!
EDITED
Try something like:
new_data = []
# Extract all the data and map them by name and status
for value in data:
name = value.get("name")
status = value.get("status")
if name in ['blah', 'blah0', 'blah1', 'blah2', 'blah3']:
new_data.append(dict(
name=name,
status=status))
Option 1
# loop through the new data
for data in new_data:
print(data)
# OUTPUT:
{'name': 'blah', 'status': 'ok'}
{'name': 'blah0', 'status': 'ok'}
{'name': 'blah1', 'status': 'ok'}
{'name': 'blah2', 'status': 'ok'}
{'name': 'blah3', 'status': 'ok'}
Option 2
for data in new_data:
for key, value in data.items():
print(key, value)
#OUTPUT:
name blah
status ok
name blah0
status ok
name blah1
status ok
name blah2
status ok
name blah3
status ok
Option 3
for data in new_data:
print(data['name'], data['status'])
#OUTPUT
blah ok
blah0 ok
blah1 ok
blah2 ok
blah3 ok
You don't really want dynamic variables, but you can use a list comprehension. You should also take advantage of constant-cost set membership test:
keep = set(['blah', 'blah0', 'blah1', 'blah2', 'blah3'])
result = [(value['name'], value['status']) for value in data if value['name'] in keep]
print(result)
Output:
[('blah', 'ok'),
('blah0', 'ok'),
('blah1', 'ok'),
('blah2', 'ok'),
('blah3', 'ok')]
If you want a dictionary:
keep = set(['blah', 'blah0', 'blah1', 'blah2', 'blah3'])
result = {value['name']: value['status'] for value in data if value['name'] in keep}
print(result)

Accessing YAML data in Python

I have a YAML file that parses into an object, e.g.:
{'name': [{'proj_directory': '/directory/'},
{'categories': [{'quick': [{'directory': 'quick'},
{'description': None},
{'table_name': 'quick'}]},
{'intermediate': [{'directory': 'intermediate'},
{'description': None},
{'table_name': 'intermediate'}]},
{'research': [{'directory': 'research'},
{'description': None},
{'table_name': 'research'}]}]},
{'nomenclature': [{'extension': 'nc'}
{'handler': 'script'},
{'filename': [{'id': [{'type': 'VARCHAR'}]},
{'date': [{'type': 'DATE'}]},
{'v': [{'type': 'INT'}]}]},
{'data': [{'time': [{'variable_name': 'time'},
{'units': 'minutes since 1-1-1980 00:00 UTC'},
{'latitude': [{'variable_n...
I'm having trouble accessing the data in python and regularly see the error TypeError: list indices must be integers, not str
I want to be able to access all elements corresponding to 'name' so to retrieve each data field I imagine it would look something like:
import yaml
settings_stream = open('file.yaml', 'r')
settingsMap = yaml.safe_load(settings_stream)
yaml_stream = True
print 'loaded settings for: ',
for project in settingsMap:
print project + ', ' + settingsMap[project]['project_directory']
and I would expect each element would be accessible via something like ['name']['categories']['quick']['directory']
and something a little deeper would just be:
['name']['nomenclature']['data']['latitude']['variable_name']
or am I completely wrong here?
The brackets, [], indicate that you have lists of dicts, not just a dict.
For example, settingsMap['name'] is a list of dicts.
Therefore, you need to select the correct dict in the list using an integer index, before you can select the key in the dict.
So, giving your current data structure, you'd need to use:
settingsMap['name'][1]['categories'][0]['quick'][0]['directory']
Or, revise the underlying YAML data structure.
For example, if the data structure looked like this:
settingsMap = {
'name':
{'proj_directory': '/directory/',
'categories': {'quick': {'directory': 'quick',
'description': None,
'table_name': 'quick'}},
'intermediate': {'directory': 'intermediate',
'description': None,
'table_name': 'intermediate'},
'research': {'directory': 'research',
'description': None,
'table_name': 'research'},
'nomenclature': {'extension': 'nc',
'handler': 'script',
'filename': {'id': {'type': 'VARCHAR'},
'date': {'type': 'DATE'},
'v': {'type': 'INT'}},
'data': {'time': {'variable_name': 'time',
'units': 'minutes since 1-1-1980 00:00 UTC'}}}}}
then you could access the same value as above with
settingsMap['name']['categories']['quick']['directory']
# quick

Categories

Resources