In python, would the following be considered a list or a dict?
temp = [{'lat': 39.7612992, 'lon': -86.1519681},
{'lat': 39.762241, 'lon': -86.158436 },
{'lat': 39.7622292, 'lon': -86.1578917}]
I have a pandas dataframe that I am trying to convert to look like the above but I am not certain what I should be converting it to.
Yes, it is a list. More precisely, it is a list object, containing a sequence of dict objects. You can run type(temp) to know the type of that object.
Related
This question already has answers here:
Pandas DataFrame to List of Dictionaries
(5 answers)
Closed 1 year ago.
I am new in python, so every tip will be helpful :)
I have a pandas dataframe with multiple columns and I need it converted to a new list of objects. Among all of dataframes columns I have two (lat, lon) that I want in my new object as attributes.
index
city
lat
lon
0
London
42.33
55.44
1
Rome
92.44
88.11
My new list of object will need to look something like this:
[
{'lat': 42.33, 'lon': 55.44},
{'lat': 92.44, 'lon': 88.11}
]
More specifically I need this for Machine Learning with ML Studio.
Thanks!
Use Pandas.DataFrame.to_dict(orient) to convert a DataFrame into a dictionary. There are multiple dictionary orientations; for your case use orient='records'
You also want to only select the lat & lon columns, like this:
df[['lat','lon']].to_dict(orient='records')
This will give you your result:
[{'lat': 42.33, 'lon': 55.44}, {'lat': 92.44, 'lon': 88.11}]
Here are some other orientations you could try out:
‘dict’ (default) : dict like {column -> {index -> value}}
‘list’ : dict like {column -> [values]}
‘series’ : dict like {column -> Series(values)}
‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
‘records’ : list like [{column -> value}, … , {column -> value}]
‘index’ : dict like {index -> {column -> value}}
You can choose the columns you want and then use to_dict with orient='records' to get the required result
df[["lat", "lon"]].to_dict(orient='records')
I am trying to extract all values only and change it to dataframe
My code:
miss = pd.DataFrame({'currency': x, 'balance': miss.values()})
Output:
balance
currency
SNIP (0.007275)
TEM (15.97)
1WO (6.51)
The output is not correct as it seems that it is still in dict type as demonstrated by the output into an excel sheet where it is written dict_values([0.02706]) in each excel cell.
Could you help me to create the correct dataframe please.
Use list comprehension if miss.values() has one element lists:
miss = pd.DataFrame({'currency': x, 'balance': [x[0] for x in miss.values()]})
Or pandas altrnative with indexing str[0]:
miss = pd.DataFrame({'currency': x, 'balance': miss.values()})
miss['balance'] = miss['balance'].str[0]
Currently I have the following:
a data file called "world_bank_projects.json"
projects = json.load((open('data/world_bank_projects.json'))
Which I made a dataframe on the column "mjtheme_namecode"
proj_norm = json_normalize(projects, 'mjtheme_namecode')
After which I removed the duplicated entries
proj_norm_no_dup = proj_norm.drop_duplicates()
However, when I tried to sort the dataframe by the 'code' column, it somehow doesn't work:
proj_norm_no_dup.sort_values(by = 'code')
My question is why doesn't the sort function sort 10 and 11 to the bottom of the dataframe? it sorted everything else correctly.
Edit1: mjtheme_namecode is a list of dictionaries containing the keys 'code' and 'name'. Example: 'mjtheme_namecode': [{'code': '5', 'name': 'Trade and integration'}, {'code': '4', 'name': 'Financial and private sector development'}]
After normalization, the 'code' column is a series type.
type(proj_norm_no_dup['code'])
pandas.core.series.Series
I have a dataframe (called msg_df) that has a column called "messages". This column has, for each row, a list of dictionaries as values
(example:
msg_df['messages'][0]
output:
[{'id': 1, 'date': '2018-12-04T16:26:13Z', 'type': 'b'},
{'id': 2, 'date': '2018-12-11T15:28:49Z', 'type': 'i'},
{'id': 3, 'date': '2018-12-04T16:26:13Z', 'type': 'c'}] )
What I need to do is to create a new column, let's call it "filtered_messages", which only contains the dictionaries that have 'type': 'b' and 'type': 'i'.
The problem is, when I apply a list comp to a single value, it works, for example:
test = msg_df['messages'][0]
keys_list = ['b','i']
filtered = [d for d in test if d['type'] in keys_list]
filtered
output:
[{'id': 1, 'date': '2018-12-04T16:26:13Z', 'type': 'b'},
{'id': 2, 'date': '2018-12-11T15:28:49Z', 'type': 'i'}]
the output is the filtered list, however, I am not being able to:
1. apply the same concept to the whole column, row by row
2. obtain a new column with the values being the filtered list
New to Python, really need some help over here.
PS: Working on Jupyter, have pandas, numpy, etc.
As a general remark, this looks like a weird pandas structure. The underlying containers from pandas are numpy arrays, which means that pandas is very good at numeric processing, and can store other type elements. And storing containers is pandas cell is not better...
That being said, you can use apply to apply a Python function to every element of a pandas Series, said differently to a DataFrame column:
keys_list = ['b','i']
msg_df['filtered_messages'] = msg_df['messages'].apply(lambda x:
[d for d in test if d['type'] in keys_list])
I have a list of nested dictionaries in python.
I tried to convert it into a dataframe using:
data=pd.DataFrame(list_of_dicts)
This converts most of the dicts into columns. However there is still the first column which consists of another list of dicts. Data looks like this:
FIS mid LI DE PBT
4182 L234 L3133 2020-02-13T09:50:53Z
In the FIScolumn are still dictionaries, column first row of FIS looks like this:
[{'FI': [{'TMC': {'PC': 6671, 'DE': 'Pohlheim-Dorf-Güll', 'QD': '+', 'LE': 0.04984}, 'SHP': [], 'CF': [{'TY': 'TR', 'SP': 30.0, 'SU': 30.0, 'FF': 30.0, 'JF': 0.0, 'CN': 0.7}]}
I tried to applied the method described above again on the FIS column. But this doesn't write the dicts in new columns.
So my question is: How can I convert this list of dicts to a dataframe?
I extracted the data from the here api (https://developer.here.com/documentation/traffic/dev_guide/topics/examples.html)
Thank you in advance!
It works
list_of_dicts = [{"qwerty":[1,2,3]}, {"lol":["l","o","l"]}]
df = pd.concat([pd.DataFrame(e) for e in list_of_dicts], axis=1)