Creating Pandas DataFrame from SmartSheet API (nested, awkward, JSON)

Creating Pandas DataFrame from SmartSheet API (nested, awkward, JSON) - python

I'm trying to connect to my office's SmartSheet API via Python to create some performance tracking dashboards that utilize data outside of SmartSheet. All I want to do is create a simple DataFrame where fields reflect columnId and cell values reflect the displayValue key in the Smartsheet dictionary. I am doing this using a standard API requests.get rather than SmartSheet's API documentation because I've found the latter less easy to work with.
The table (sample) is set up as:
Number Letter Name
1 A Joe
2 B Jim
3 C Jon
The JSON syntax from the sheet GET request is:
{'id': 339338304219012,
'name': 'Sample Smartsheet',
'version': 1,
'totalRowCount': 3,
'accessLevel': 'OWNER',
'effectiveAttachmentOptions': ['GOOGLE_DRIVE',
'EVERNOTE',
'DROPBOX',
'ONEDRIVE',
'LINK',
'FILE',
'BOX_COM',
'EGNYTE'],
'ganttEnabled': False,
'dependenciesEnabled': False,
'resourceManagementEnabled': False,
'cellImageUploadEnabled': True,
'userSettings': {'criticalPathEnabled': False, 'displaySummaryTasks': True},
'userPermissions': {'summaryPermissions': 'ADMIN'},
'hasSummaryFields': False,
'permalink': 'https://app.smartsheet.com/sheets/5vxMCJQhMV7VFFPMVfJgg2hX79rj3fXgVGG8fp61',
'createdAt': '2020-02-13T16:32:02Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'isMultiPicklistEnabled': True,
'columns': [{'id': 6273865019090820,
'version': 0,
'index': 0,
'title': 'Number',
'type': 'TEXT_NUMBER',
'primary': True,
'validation': False,
'width': 150},
{'id': 4022065205405572,
'version': 0,
'index': 1,
'title': 'Letter',
'type': 'TEXT_NUMBER',
'validation': False,
'width': 150},
{'id': 8525664832776068,
'version': 0,
'index': 2,
'title': 'Name',
'type': 'TEXT_NUMBER',
'validation': False,
'width': 150}],
'rows': [{'id': 8660990817003396,
'rowNumber': 1,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 1.0, 'displayValue': '1'},
{'columnId': 4022065205405572, 'value': 'A', 'displayValue': 'A'},
{'columnId': 8525664832776068, 'value': 'Joe', 'displayValue': 'Joe'}]},
{'id': 498216492394372,
'rowNumber': 2,
'siblingId': 8660990817003396,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 2.0, 'displayValue': '2'},
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'},
{'columnId': 8525664832776068, 'value': 'Jim', 'displayValue': 'Jim'}]},
{'id': 5001816119764868,
'rowNumber': 3,
'siblingId': 498216492394372,
'expanded': True,
'createdAt': '2020-02-14T13:15:18Z',
'modifiedAt': '2020-02-14T13:15:18Z',
'cells': [{'columnId': 6273865019090820, 'value': 3.0, 'displayValue': '3'},
{'columnId': 4022065205405572, 'value': 'C', 'displayValue': 'C'},
{'columnId': 8525664832776068, 'value': 'Jon', 'displayValue': 'Jon'}]}]}
Here are the two ways I've approached the problem:
INPUT:
from pandas.io.json import json_normalize
samplej = sample.json()
s_rows = json_normalize(data=samplej['rows'], record_path='cells', meta=['id', 'rowNumber'])
s_rows
OUTPUT:
DataFrame with columnId, value, disdlayValue, id, and rowNumber as their own fields.
If I could figure out how to transpose this data in the right way I could probably make it work, but that seems incredibly complicated.
INPUT:
samplej = sample.json()
cellist = []
def get_cells():
srows = samplej['rows']
for s_cells in srows:
scells = s_cells['cells']
cellist.append(scells)
get_cells()
celldf = pd.DataFrame(cellist)
celldf
OUTPUT:
This returns a DataFrame with the correct number of columns and rows, but each cell is populated with a dictionary that looks like
In [14]:
celldf.loc[1,1]
Out [14]:
{'columnId': 4022065205405572, 'value': 'B', 'displayValue': 'B'}
If there was a way to remove everything except the value corresponding to the displayValue key in every cell, this would probably solve my problem. Again, though, it seems weirdly complicated.
I'm fairly new to Python and working with API's, so there may be a simple way to address the problem I'm overlooking. Or, if you have a suggestion for approaching the possible solutions I outlined above I'm all ears. Thanks for your help!

You must make use of the columns field:
colnames = {x['id']: x['title'] for x in samplej['columns']}
columns = [x['title'] for x in samplej['columns']]
cellist = [{colnames[scells['columnId']]: scells['displayValue']
for scells in s_cells['cells']} for s_cells in samplej['rows']]
celldf = pd.DataFrame(cellist, columns=columns)
This gives as expected:
Number Letter Name
0 1 A Joe
1 2 B Jim
2 3 C Jon
If some cells could contain only a columnId but no displayValue field, scells['displayValue'] should be replaced in above code with scells.get('displayValue', defaultValue), where defaultValue could be None, np.nan or any other relevant default.

Related

How to convert json into a pandas dataframe?

I'm trying to covert an api response from json to a dataframe in pandas. the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.
The data is collect from a api with the following format:
{'tickets': [{'url': 'https...',
'id': 1,
'external_id': None,
'via': {'channel': 'web',
'source': {'from': {}, 'to': {}, 'rel': None}},
'created_at': '2020-05-01T04:16:33Z',
'updated_at': '2020-05-23T03:02:49Z',
'type': 'incident',
'subject': 'Subject',
'raw_subject': 'Raw subject',
'description': 'Hi, this is the description',
'priority': 'normal',
'status': 'closed',
'recipient': None,
'requester_id': 409467360874,
'submitter_id': 409126461453,
'assignee_id': 409126461453,
'organization_id': None,
'group_id': 360009916453,
'collaborator_ids': [],
'follower_ids': [],
'email_cc_ids': [],
'forum_topic_id': None,
'problem_id': None,
'has_incidents': False,
'is_public': True,
'due_at': None,
'tags': ['tag_1',
'tag_2',
'tag_3',
'tag_4'],
'custom_fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'satisfaction_rating': {'score': 'unoffered'},
'sharing_agreement_ids': [],
'comment_count': 2,
'fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
{'id': 360041487874, 'value': 'value of the second custom field'},
{'id': 360041489414, 'value': 'value of the third custom field'},
{'id': 360040980053, 'value': 'correo_electrónico'},
{'id': 360040980373, 'value': 'suscribe_newsletter'},
{'id': 360042046173, 'value': None},
{'id': 360041028574, 'value': 'product'},
{'id': 360042103034, 'value': None}],
'followup_ids': [],
'ticket_form_id': 360003608013,
'deleted_ticket_form_id': 360003608013,
'brand_id': 360004571673,
'satisfaction_probability': None,
'allow_channelback': False,
'allow_attachments': True},
What I already tried is the following: I have converted the JSON format into a dict as following:
x = response.json()
df = pd.DataFrame(x['tickets'])
But I'm struggling with the output. I don't know how to get a correct, ordered, normalized dataframe.
(I'm new in this :) )

Let's supose you get your request data by this code r = requests.get(url, auth)
Your data ins't clear yet, so let's get a dataframe of it data = pd.read_json(json.dumps(r.json, ensure_ascii = False))
But, probably you will get a dataframe with one single row.
When I faced a problem like this, I wrote this function to get the full data:
listParam = []
def listDict(entry):
if type(entry) is dict:
listParam.append(entry)
elif type(entry) is list:
for ent in entry:
listDict(ent)
Because your data looks like a dict because of {'tickets': ...} you will need to get the information like that:
listDict(data.iloc[0][0])
And then,
pd.DataFrame(listParam)
I can't show the results because you didn't post the complete data nor told where I can find the data to test, but this will probably work.

You have to convert the json to dictionary first and then convert the dictionary value for key 'tickets' into dataframe.
file = open('file.json').read()
ticketDictionary = json.loads(file)
df = pd.DataFrame(ticketDictionary['tickets'])
'file.json' contains your data here.
df now contains your dataFrame in this format.
For the lists within the response you can have separate dataframes if required:
for field in df['fields']:
df = pd.DataFrame(field)
It will give you this for lengths:
id value
0 360042034433 value of the first custom field
1 360041487874 value of the second custom field
2 360041489414 value of the third custom field
3 360040980053 correo_electrónico
4 360040980373 suscribe_newsletter
5 360042046173 None
6 360041028574 product
7 360042103034 None
This can be one way to structure as you haven't mentioned the exact expected format.

How can I reformat JSON / Dictionaries in Python

I have the below list -
[{'metric': 'sales', 'value': '100', 'units': 'dollars'},
{'metric': 'instock', 'value': '95.2', 'units': 'percent'}]
I would like to reformat it like the below in Python -
{'sales': '100', 'instock': '95.2'}
I did the below -
a = [above list]
for i in a:
print({i['metric']: i['value']})
But it outputs like this -
{'sales': '100'}
{'instock': '95.2'}
I would like these 2 lines to be a part of the same dictionary

d = [{'metric': 'sales', 'value': '100', 'units': 'dollars'},
{'metric': 'instock', 'value': '95.2', 'units': 'percent'}]
new_d = {e["metric"]: e["value"] for e in d}
# output: {'sales': '100', 'instock': '95.2'}
I believe that it's best to try it first by yourself, and then post a question in case you don't succeed. You should consider posting your attempts next time.

Python, iterate through multiple lists in a dictionary object to find a specific value

Pulling data over a RESTapi, I'm left with a dictionary object containing multiple lists. I'm looking for a very specific data point within one of the lists, however the actual amount of lists vary with each item in the dictionary.
I've tried to manually pull this field using indexing, etc..., but because the list is not always in the same place, I'm banging my head on the wall. The API results look something like this..
b = [
{'internal': False, 'protocol_parameters': [{'name': 'identifier', 'id': 1, 'value': 'x.x.x.x'}]},
{'internal': False, 'protocol_parameters': [{'name': 'identifier', 'id': 0, 'value': 'y.y.y.y'}, {'name': 'incomingPayloadEncoding', 'id': 1, 'value': 'UTF-8'}]},
{'internal': False, 'protocol_parameters': [{'name': 'incomingPayloadEncoding', 'id': 1, 'value': 'UTF-8'}, {'name': 'identifier', 'id': 0, 'value': 'z.z.z.z'}]}]
for a in b:
c = (a['protocol_parameters'])[0].get('value')
print(c)
This of course will not parse correctly because the list is not in a consistent place so I'm curious if I can parse all lists in the dictionary looking for a specific string. My end goal would be as shown below regardless of the list position.
x.x.x.x
y.y.y.y
z.z.z.z
In this example, find all lists that contain "identifier". Apologies if this is a noob mistake :) and thank you for your time.

From what I understand you need to pick the value field from the item of protocol_parameters that contains name=identifier.
You may use next() to find the first item from the protocol_parameters list that matches that criteria. See below:
records = [{'internal': False, 'protocol_parameters': [{'name': 'identifier', 'id': 1, 'value': 'x.x.x.x'}]},
{'internal': False, 'protocol_parameters': [{'name': 'identifier', 'id': 0, 'value': 'y.y.y.y'}, {'name': 'incomingPayloadEncoding', 'id': 1, 'value': 'UTF-8'}]},
{'internal': False, 'protocol_parameters': [{'name': 'incomingPayloadEncoding', 'id': 1, 'value': 'UTF-8'}, {'name': 'identifier', 'id': 0, 'value': 'z.z.z.z'}]}
]
for record in records:
identifier_param = next((prot_param for prot_param in record['protocol_parameters'] if prot_param['name']=='identifier'), None)
if identifier_param:
print(identifier_param['value'])
prints
x.x.x.x
y.y.y.y
z.z.z.z

This should work:
b = [{'internal': False, 'protocol_parameters': [{'name': 'identifier', 'id': 1, 'value': 'x.x.x.x'}]},
{'internal': False, 'protocol_parameters': [{'name': 'identifier', 'id': 0, 'value': 'y.y.y.y'}, {'name': 'incomingPayloadEncoding', 'id': 1, 'value': 'UTF-8'}]},
{'internal': False, 'protocol_parameters': [{'name': 'incomingPayloadEncoding', 'id': 1, 'value': 'UTF-8'}, {'name': 'identifier', 'id': 0, 'value': 'z.z.z.z'}]}]
for a in b:
c = next(param.get('value') for param in a['protocol_parameters'] if param.get('name')=="identifier")
print(c)

remove points from excel file line chart using python package xlsxwriter

I have this chart, created using python package xlsxwriter and I wanna remove all the points in the middle and only keep the first one and last one.
before :
Final results wanted.:
I tried the attribute points but unfortunate, it didn't work for me.
line_chart.add_series(
{
'values': '='+worksheet_name+'!$C$'+str(row_range+1)+':$I'+str(row_range+1),
'marker': {'type': 'diamond'},
'data_labels': {'value': True, 'category': True, 'position': 'center', 'leader_lines': True},
'points': [
{'fill': {'color': 'green'}},
None,
None,
None,
None,
None,
{'fill': {'color': 'red'}}
],
'trendline': {
'type': 'polynomial',
'name': 'My trend name',
'order': 2,
'forward': 0.5,
'backward': 0.5,
'line': {
'color': 'red',
'width': 1,
'dash_type': 'long_dash'
}
}
}
)
I also tried:
line_chart.set_x_axis({
'major_gridlines': {'visible': False},
'minor_gridlines': {'visible': False}
'delete_series': [1, 6]
})
no luck.
Anobody can help me please?
Thanks in advance!

Accessing YAML data in Python

I have a YAML file that parses into an object, e.g.:
{'name': [{'proj_directory': '/directory/'},
{'categories': [{'quick': [{'directory': 'quick'},
{'description': None},
{'table_name': 'quick'}]},
{'intermediate': [{'directory': 'intermediate'},
{'description': None},
{'table_name': 'intermediate'}]},
{'research': [{'directory': 'research'},
{'description': None},
{'table_name': 'research'}]}]},
{'nomenclature': [{'extension': 'nc'}
{'handler': 'script'},
{'filename': [{'id': [{'type': 'VARCHAR'}]},
{'date': [{'type': 'DATE'}]},
{'v': [{'type': 'INT'}]}]},
{'data': [{'time': [{'variable_name': 'time'},
{'units': 'minutes since 1-1-1980 00:00 UTC'},
{'latitude': [{'variable_n...
I'm having trouble accessing the data in python and regularly see the error TypeError: list indices must be integers, not str
I want to be able to access all elements corresponding to 'name' so to retrieve each data field I imagine it would look something like:
import yaml
settings_stream = open('file.yaml', 'r')
settingsMap = yaml.safe_load(settings_stream)
yaml_stream = True
print 'loaded settings for: ',
for project in settingsMap:
print project + ', ' + settingsMap[project]['project_directory']
and I would expect each element would be accessible via something like ['name']['categories']['quick']['directory']
and something a little deeper would just be:
['name']['nomenclature']['data']['latitude']['variable_name']
or am I completely wrong here?

The brackets, [], indicate that you have lists of dicts, not just a dict.
For example, settingsMap['name'] is a list of dicts.
Therefore, you need to select the correct dict in the list using an integer index, before you can select the key in the dict.
So, giving your current data structure, you'd need to use:
settingsMap['name'][1]['categories'][0]['quick'][0]['directory']
Or, revise the underlying YAML data structure.
For example, if the data structure looked like this:
settingsMap = {
'name':
{'proj_directory': '/directory/',
'categories': {'quick': {'directory': 'quick',
'description': None,
'table_name': 'quick'}},
'intermediate': {'directory': 'intermediate',
'description': None,
'table_name': 'intermediate'},
'research': {'directory': 'research',
'description': None,
'table_name': 'research'},
'nomenclature': {'extension': 'nc',
'handler': 'script',
'filename': {'id': {'type': 'VARCHAR'},
'date': {'type': 'DATE'},
'v': {'type': 'INT'}},
'data': {'time': {'variable_name': 'time',
'units': 'minutes since 1-1-1980 00:00 UTC'}}}}}
then you could access the same value as above with
settingsMap['name']['categories']['quick']['directory']
# quick

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating Pandas DataFrame from SmartSheet API (nested, awkward, JSON) - python

Related

How to convert json into a pandas dataframe?

How can I reformat JSON / Dictionaries in Python

Python, iterate through multiple lists in a dictionary object to find a specific value

remove points from excel file line chart using python package xlsxwriter

Accessing YAML data in Python

Categories

Resources