Extract information from a list of dictionaries - python

I have a list of dictionaries:
movies['genres'].head()
where each line looks like:
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
3 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
4 [{'id': 35, 'name': 'Comedy'}]
Name: genres, dtype: object
I would like to save it in a data frame where one column is 'id' and the rows are the id values and another column 'name' where the rows are the name values. I tried with:
pd.DataFrame(movies['genres'])
However when I ran it I obtained:
genres
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
Could you help me?
Regards

You should use the command .from_dict() as described here
df = pd.DataFrame.from_dict(movies["genres"])

Related

Extract data from JSON format using python

df= [{'id': 16, 'name': 'Animation'},
{'id': 35, 'name': 'Comedy'},
{'id': 10751, 'name': 'Family'}]
How could I extract data from this format?
The output should look like this:
id Name
16 Animation
35 Comedy
10751 Family
Not sure what is the actual problem here, but are you looking for something like this?
data = [{'id': 16, 'name': 'Animation'},
{'id': 35, 'name': 'Comedy'},
{'id': 10751, 'name': 'Family'}]
print("Id Name")
for i in data:
print(i['id'], i['name'])

list of dicts- get the number of duplications [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a list of dicts (same format) like this :
L = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 3, 'name': 'stack', 'age': 40}
]
I want to remove duplication and get the number of this duplication like this
[
{'id': 1, 'name': 'john', 'age': 34, 'duplication': 2},
{'id': 2, 'name': 'hanna', 'age': 30, 'duplication': 2},
{'id': 3, 'name': 'stack', 'age': 40, 'duplication': 1}
]
I already managed to remove the duplication by using a set.... but I can't get the number of duplications
my code :
no_duplication = [dict(s) for s in set(frozenset(d.items()) for d in L)]
no_duplication = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 3, 'name': 'stack', 'age': 40}
]
Here is a solution you can give a try using collections.Counter,
from collections import Counter
print([
{**dict(k), "duplicated": v}
for k, v in Counter(frozenset(i.items()) for i in L).items()
])
[{'age': 34, 'duplicated': 2, 'id': 1, 'name': 'john'},
{'age': 30, 'duplicated': 2, 'id': 2, 'name': 'hanna'},
{'age': 40, 'duplicated': 1, 'id': 3, 'name': 'stack'}]
ar = [
{'id': 1, 'name': 'john', 'age': 34},
{'id': 1, 'name': 'john', 'age': 34},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 2, 'name': 'hanna', 'age': 30},
{'id': 3, 'name': 'stack', 'age': 40}
]
br = []
cnt = []
for i in ar:
if i not in br:
br.append(i)
cnt.append(1)
else:
cnt[br.index(i)] += 1
for i in range(len(br)):
br[i]['duplication'] = cnt[i]
The desired output is contained in br as:
[
{'id': 1, 'name': 'john', 'age': 34, 'duplication': 2},
{'id': 2, 'name': 'hanna', 'age': 30, 'duplication': 2},
{'id': 3, 'name': 'stack', 'age': 40, 'duplication': 1}
]

How do I stack back the data according to their row? Python

I have a large data set like ~30000 records. I would like to extract words like "Animation", "Comedy", "Family". It is successful for me to extract the words out and delete the id, however I do not know how to stack the words back according to their row.
My code currently:
import ast, json
import pandas as pd
from csv import reader
file_name = 'xx.csv'
data = []
with open(file_name, 'r', encoding= 'unicode_escape') as read_obj:
csv_reader = reader(read_obj)
headings = next(csv_reader)
for i in csv_reader:
data.extend(ast.literal_eval(i[7]))
df = pd.DataFrame(data)
del df["id"]
print(df)
And it would produce result:
name
0 Animation
1 Comedy
2 Family
3 Adventure
4 Fantasy
...
40060 Drama
40061 Thriller
40062 Action
40063 Drama
40064 Thriller
The large data set is in csv format, but the cell should be in json formatting.
Sample data:
[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
I think this does everything you need:
import json
import pandas as pd
df = pd.read_csv(file_name, encoding='unicode_escape', usecols=['name'])
result = df.to_json(orient='records')
parsed = json.loads(result)
json.dumps(parsed, indent=4)

How to get values from list of dictionaries?

This is my data set, this is the column I separated from the csv file.
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, '...
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, '...
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, ...
3 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...
4 [{'id': 35, 'name': 'Comedy'}]
How to get just a list with the content ['Animation', 'Adventure', 'Romance', 'Comedy', 'Comedy'] as output?
I guess you want to see something like that.
list_of_items = [[{'id': 16, 'name': 'Animation'}, {'id': 16, 'name': 'Animation2'}],[{'id': 16, 'name': 'Animation3'}, {'id': 16, 'name': 'Animation4'}]]
output_list = []
for item in list_of_items:
for dict in item:
output_list.append(dict['name'])
Output:
>>> print(output_list)
['Animation', 'Animation2', 'Animation3', 'Animation4']
I don't know if you made a typo but you have some errors with the ' in what you wrote.
But nevertheless from what I can see you have a list with dictionaries. So we loop through that list to access each dictionary and select what in the dictionary we want and append it to the list you created:
d = [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
list_1 = []
for el in d:
list_1.append(el['name'])
print(list_1)
The output will be: ['Romance', 'Comedy']
It's unclear if you have a list of lists or just one list.
For a single list you can use a list comprehension:
dict_list = [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[dict_item['name'] for dict_item in dict_list]
Otherwise, you can unnest the first list and then do a list comprehension
dict_list = [[{'id': 1, 'name': 'Animation'}, {'id': 2, 'name': 'Comedy'}],[{'id': 3, 'name': 'Romance'}, {'id': 4, 'name': 'Comedy'}]]
[dict_item['name'] for dict_item in [dict_item for sublist in dict_list for dict_item in sublist]]

How to order list of dictionaries in python

I have list of dictionaries as follows:
[
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 54, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'},
{'id': 14979, 'name': 'Opel'},
{'id': 6, 'name': 'Volkswagen'}
]
What I want to do is to order it. And I want that some dictionaries with some name values show in the beginning of the list.
I want that for example Volkswagen, Audi, BMW, Opel, Peugeot as first params appears in list.
Thus the wanted result should be something like this:
[
{'id': 6, 'name': 'Volkswagen'}
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 54, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'},
{'id': 14979, 'name': 'Opel'},
]
Any idea how to do that?
You can use an appropriate key function for your sorting. This one orders by the given names first (in the given order). All other brands come after that with no order specified among themselves:
>>> rank = {x: i for i, x in enumerate(['Volkswagen', 'Audi', 'BMW', 'Opel', 'Peugeot'])}
# {'Volkswagen': 0, 'Audi': 1, ...}
>>> sorted(lst, key=lambda x: rank.get(x['name'], len(rank)))
[{'id': 6, 'name': 'Volkswagen'},
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 54, 'name': 'Opel'},
{'id': 14979, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'}]
You can use a dictionary to define a custom sorting order.
dicts = [
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 54, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'},
{'id': 14979, 'name': 'Opel'},
{'id': 6, 'name': 'Volkswagen'}
]
brand_order = ['Volkswagen', 'Audi', 'BMW', 'Opel', 'Peugeot']
order = dict(zip(brand_order, range(len(brand_order))))
dicts_sorted = sorted(dicts, key=lambda d: order.get(d['name'], float('inf')))
print(dicts_sorted)
Output:
[{'id': 6, 'name': 'Volkswagen'},
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 54, 'name': 'Opel'},
{'id': 14979, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'}]
Falling back to float('inf') ensures that whatever is not in order comes last.

Categories

Resources