I am trying to parse data from an API call to ERP system. I want to bring this data in as a data frame with Pandas so that I can work with the data. Every attempt to parse it with either json_string / json_dumps or DataFrame.from_dict() is not working for me.
My raw data file looks like:
Type: dict
String form: {0: [{'productID': 144194, 'name': 'XXXtentacion, ?, LP', 'code': '1039907', 'code2': '1210672', <...> Field4Title': 'Format Notes', 'extraField4ID': 0, 'extraField4Code': '', 'extraField4Name': ''}]}
Length: 1
Docstring:
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
(key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
d = {}
for k, v in iterable:
d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
The closest I get is calling:
pd.DataFrame.from_dict(data)
which returns:
0
0 {'productID': 144194, 'name': 'XXXtentacion, ?...
1 {'productID': 131605, 'name': 'Sufjan Stevens ...
2 {'productID': 143699, 'name': 'Sufjan Stevens ...
3 {'productID': 134277, 'name': 'Sufjan Stevens ...
4 {'productID': 135151, 'name': 'Sufjan Stevens ...
5 {'productID': 145844, 'name': 'Spearhead, Home...
but what I want is for the keys to be column headers (i.e. 'productID' should be 1st column header.
I'm just starting out with Python so any help is greatly appreciated. I've looked around on similar topics and can't seem to find the solution.
Assuming your data is structured as Dict(key1: List(Dict(...)), key2: ...)
Try
data = {d:data[d][0] for d in data}
pd.DataFrame.from_dict(data)
Related
I have a nested dictionary as below. I'm trying to convert the below to a dataframe with the columns iid, Invnum, #type, execId, CId, AId, df, type. What’s the best way to go about it?
data = {'A': {'B1': {'iid': 'B1', 'Invnum': {'B11': {'#type': '/test_data', 'execId': 42, 'CId': 42, 'AId': 'BAZ'}, 'B12': {'#type': '/test_data', 'CId': 8, 'AId': '123'}}}}, 'B2': {'iid': 'B2', 'Invnum': {'B21': {'#type': '/test_data', 'execId': 215, 'CId': 253,'df': [], 'type': 'F'}, 'B22': {'#type': '/test_data', 'execId': 10,'df': [], 'type': 'F'}}}}
for key1 in data['A'].keys():
for key2 in data['A'][key1]['Invnum']:
print(key1,key2)
Expected output:
As indicated in the comments, your input data is very obscure. This provides a lot of trouble for us, because we don't know what we can assume or not. For my solution I will assume at least the following, based on the example you provide:
In the dictionary there is an entry containing the iid and Invnum as keys in the same level.
The Invnum key is the only key, which has multiple values, or in otherwords is iterable (besides df), and on iteration it must hold the last dictionary. In otherwords, after the Invnum value (e.g. B11), you can only get the last dict with the other fields as keys (#type, execId, CId, AId, df, type), if they exists.
If there is a df value, it will hold a list.
# This is a place holder dictionary, so we can create entries that have the same pattern.
entry = {'#type': '', 'execId': '', 'CId': '', 'AId': '', 'df': '', 'type': ''}
# This will hold all the (properly) format entries for the df.
items = []
def flatten(data):
if isinstance(data, dict):
match data:
# We are searching for a level that contains both an `iid` and `Invnum` key.
case {'iid': id, 'Invnum': remainder}:
for each in remainder:
entry_row = dict(**entry, iid=id, Invnum=each)
entry_row.update(remainder[each])
items.append(entry_row)
case _:
for key, value in data.items():
flatten(value)
# We flatten the data, such that the `items` variable will hold consistent entries
flatten(data)
# Transfer to pandas dataframe, and reorder the values for easy comparison.
df = pd.DataFrame(items)
df = df[['iid', 'Invnum', '#type', 'execId', 'CId', 'AId', 'df', 'type']]
print(df.to_string(index=False))
Output:
iid Invnum #type execId CId AId df type
B1 B11 /test_data 42 42 BAZ
B1 B12 /test_data 8 123
B2 B21 /test_data 215 253 [] F
B2 B22 /test_data 10 [] F
Note:
All entries have been turned into strings, since I am using '' for empty values.
I heavily rely on the above made assumptions, in case they are incorrect, the answer will not match your expectation.
I am using Structural pattern matching, which is introduced in python 3.10.
I have the following dataframe that I extracted from an API, inside that df there is a column that I need to extract data from it, but the structure of that data inside that column is a list of dictionaries:
I could get the data that I care from that dictionary using this chunk of code:
for k,v in d.items():
for i,j in v.items():
if isinstance(j, list):
for l in range(len(j)):
for k in j[l]:
print(j[l])
I get a structure like this one, so I´d need to get each of that 'values' inside the list of dictionaries
and then organize them in a dataframe. like for example the first item on the list of dictionaries:
Once I get to the point of getting the above structure, how could I make a dataframe like the one in the image?
Raw data:
data = {'rows': [{'values': ['Tesla Inc (TSLA)', '$1056.78', '$1199.78', '13.53%'], 'children': []}, {'values': ['Taiwan Semiconductor Manufacturing Company Limited (TSM)', '$120.31', '$128.80', '7.06%'], 'children': []}]}
You can use pandas. First cast your data to pd.DataFrame, then use apply(pd.Series) to expand lists inside 'values' column to separate columns and set_axis method to change column names:
import pandas as pd
data = {'rows': [{'values': ['Tesla Inc (TSLA)', '$1056.78', '$1199.78', '13.53%'], 'children': []}, {'values': ['Taiwan Semiconductor Manufacturing Company Limited (TSM)', '$120.31', '$128.80', '7.06%'], 'children': []}]}
out = pd.DataFrame(data['rows'])['values'].apply(pd.Series).set_axis(['name','price','price_n','pct'], axis=1)
Output:
name price price_n pct
0 Tesla Inc (TSLA) $1056.78 $1199.78 13.53%
1 Taiwan Semiconductor Manufacturing Company Lim... $120.31 $128.80 7.06%
I have the following pandas dataframe which looks like,
code comp name
0 A292340 디비자산운용 마이티 200커버드콜ATM레버리지
1 A291630 키움투자자산운용 KOSEF 코스닥150선물레버리지
2 A278240 케이비자산운용 KBSTAR 코스닥150선물레버리지
3 A267770 미래에셋자산운용 TIGER 200선물레버리지
4 A267490 케이비자산운용 KBSTAR 미국장기국채선물레버리지(합성 H)
And I like to make dictionary out of this which will look like,
{'20180408' :{'A292340' : {comp : 디비자산운용}, {name : 마이티 200커버드콜ATM 레버리지}}}
Sorry about the data which is in foreign to you, but let me please ask.
What I tried is like,
values = [comp, name]
names = ['comp', 'name']
tmp = {names:values for names, values in zip(names, values)}
tpm = {code:values for values in zip(*tmp)}
aaaa = {date:c for c in zip(*tpm)}
print(aaaa)
aaaa is what I try to get.. and date is just simple list of date, from prior to todate. but when I run this, I got the error
TypeError: unhashable type: 'numpy.ndarray'
Thank you in advance for your answer.
Is this what you want? First, set "code" column as the index. Then use to_dict with "orient="index".
df.set_index("code").to_dict("index")
{'A267490': {'comp': '케이비자산운용', 'name': 'KBSTAR 미국장기국채선물레버리지(합성 H)'},
'A267770': {'comp': '미래에셋자산운용', 'name': 'TIGER 200선물레버리지'},
'A278240': {'comp': '케이비자산운용', 'name': 'KBSTAR 코스닥150선물레버리지'},
'A291630': {'comp': '키움투자자산운용', 'name': 'KOSEF 코스닥150선물레버리지'},
'A292340': {'comp': '디비자산운용', 'name': '마이티 200커버드콜ATM레버리지'}}
Using the argument "index" will give the layout:
{index -> {columnName -> valueOfTheColumn}}
Here since we set code as the index, we have
code -> {"comp" -> comp's value, "name" -> name's value}
'A267490': {'comp': '케이비자산운용', 'name': 'KBSTAR 미국장기국채선물레버리지(합성 H)'}
I am doing a research project and trying to pull thousands of quarterly results for companies from the SEC EDGAR API.
Each result is a list of dictionaries structured as follows:
[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}...]
I want each result to be a row of a pandas dataframe. The issue is that each result may not have the same fields due to the data available. I would like to check if the column(field) of the dataframe is present in one of the results field and if it is add the result value to the row. If not, I would like to add an np.NaN. How would I go about doing this?
A list/dict comprehension ought to work here:
In [11]: s
Out[11]:
[[{'field': 'othercurrentliabilities', 'value': 6886000000.0},
{'field': 'otherliabilities', 'value': 13700000000.0},
{'field': 'propertyplantequipmentnet', 'value': 15789000000.0}],
[{'field': 'othercurrentliabilities', 'value': 6886000000.0}]]
In [12]: pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s])
Out[12]:
othercurrentliabilities otherliabilities propertyplantequipmentnet
0 6.886000e+09 1.370000e+10 1.578900e+10
1 6.886000e+09 NaN NaN
make a list of df.result.rows[x]['values']
like below
s=[]
for x in range(df.result.totalrows[0]):
s=s+[df.result.rows[x]['values']]
print(x)
df1=pd.DataFrame([{d["field"]: d["value"] for d in row} for row in s]
df1
will give you result.
I'm trying to perform operations on a nested dictionary (data retrieved from a yaml file):
data = {'services': {'web': {'name': 'x'}}, 'networks': {'prod': 'value'}}
I'm trying to modify the above using the inputs like:
{'services.web.name': 'new'}
I converted the above to a list of indices ['services', 'web', 'name']. But I'm not able to/not sure how to perform the below operation in a loop:
data['services']['web']['name'] = new
That way I can modify dict the data. There are other values I plan to change in the above dictionary (it is extensive one) so I need a solution that works in cases where I have to change, EG:
data['services2']['web2']['networks']['local'].
Is there a easy way to do this? Any help is appreciated.
You may iterate over the keys while moving a reference:
data = {'networks': {'prod': 'value'}, 'services': {'web': {'name': 'x'}}}
modification = {'services.web.name': 'new'}
for key, value in modification.items():
keyparts = key.split('.')
to_modify = data
for keypart in keyparts[:-1]:
to_modify = to_modify[keypart]
to_modify[keyparts[-1]] = value
print(data)
Giving:
{'networks': {'prod': 'value'}, 'services': {'web': {'name': 'new'}}}