Extract data from JSON format using python - python

df= [{'id': 16, 'name': 'Animation'},
{'id': 35, 'name': 'Comedy'},
{'id': 10751, 'name': 'Family'}]
How could I extract data from this format?
The output should look like this:
id Name
16 Animation
35 Comedy
10751 Family

Not sure what is the actual problem here, but are you looking for something like this?
data = [{'id': 16, 'name': 'Animation'},
{'id': 35, 'name': 'Comedy'},
{'id': 10751, 'name': 'Family'}]
print("Id Name")
for i in data:
print(i['id'], i['name'])

Related

How do I stack back the data according to their row? Python

I have a large data set like ~30000 records. I would like to extract words like "Animation", "Comedy", "Family". It is successful for me to extract the words out and delete the id, however I do not know how to stack the words back according to their row.
My code currently:
import ast, json
import pandas as pd
from csv import reader
file_name = 'xx.csv'
data = []
with open(file_name, 'r', encoding= 'unicode_escape') as read_obj:
csv_reader = reader(read_obj)
headings = next(csv_reader)
for i in csv_reader:
data.extend(ast.literal_eval(i[7]))
df = pd.DataFrame(data)
del df["id"]
print(df)
And it would produce result:
name
0 Animation
1 Comedy
2 Family
3 Adventure
4 Fantasy
...
40060 Drama
40061 Thriller
40062 Action
40063 Drama
40064 Thriller
The large data set is in csv format, but the cell should be in json formatting.
Sample data:
[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 35, 'name': 'Comedy'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 28, 'name': 'Action'}, {'id': 80, 'name': 'Crime'}, {'id': 18, 'name': 'Drama'}, {'id': 53, 'name': 'Thriller'}]
[{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}]
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}]
I think this does everything you need:
import json
import pandas as pd
df = pd.read_csv(file_name, encoding='unicode_escape', usecols=['name'])
result = df.to_json(orient='records')
parsed = json.loads(result)
json.dumps(parsed, indent=4)

Extract information from a list of dictionaries

I have a list of dictionaries:
movies['genres'].head()
where each line looks like:
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
3 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
4 [{'id': 35, 'name': 'Comedy'}]
Name: genres, dtype: object
I would like to save it in a data frame where one column is 'id' and the rows are the id values and another column 'name' where the rows are the name values. I tried with:
pd.DataFrame(movies['genres'])
However when I ran it I obtained:
genres
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
Could you help me?
Regards
You should use the command .from_dict() as described here
df = pd.DataFrame.from_dict(movies["genres"])

How to get values from list of dictionaries?

This is my data set, this is the column I separated from the csv file.
0 [{'id': 16, 'name': 'Animation'}, {'id': 35, '...
1 [{'id': 12, 'name': 'Adventure'}, {'id': 14, '...
2 [{'id': 10749, 'name': 'Romance'}, {'id': 35, ...
3 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...
4 [{'id': 35, 'name': 'Comedy'}]
How to get just a list with the content ['Animation', 'Adventure', 'Romance', 'Comedy', 'Comedy'] as output?
I guess you want to see something like that.
list_of_items = [[{'id': 16, 'name': 'Animation'}, {'id': 16, 'name': 'Animation2'}],[{'id': 16, 'name': 'Animation3'}, {'id': 16, 'name': 'Animation4'}]]
output_list = []
for item in list_of_items:
for dict in item:
output_list.append(dict['name'])
Output:
>>> print(output_list)
['Animation', 'Animation2', 'Animation3', 'Animation4']
I don't know if you made a typo but you have some errors with the ' in what you wrote.
But nevertheless from what I can see you have a list with dictionaries. So we loop through that list to access each dictionary and select what in the dictionary we want and append it to the list you created:
d = [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
list_1 = []
for el in d:
list_1.append(el['name'])
print(list_1)
The output will be: ['Romance', 'Comedy']
It's unclear if you have a list of lists or just one list.
For a single list you can use a list comprehension:
dict_list = [{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
[dict_item['name'] for dict_item in dict_list]
Otherwise, you can unnest the first list and then do a list comprehension
dict_list = [[{'id': 1, 'name': 'Animation'}, {'id': 2, 'name': 'Comedy'}],[{'id': 3, 'name': 'Romance'}, {'id': 4, 'name': 'Comedy'}]]
[dict_item['name'] for dict_item in [dict_item for sublist in dict_list for dict_item in sublist]]

creating dataframe from csv file having lists as entries in one of the columns

I have a csv file which looks like this -
id genres
1 [{'id': 35, 'name': 'Comedy'}]
2 [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]
3 [1,2,3]
4 [{'id':31, 'name':'Comedy'}]
When I import the csv as dataframe, the lists in genres column are loaded as strings. For example - "[{'id': 35, 'name': 'Comedy'}]"
How do I load the lists without the quotes?
Use:
import ast, json
df['genres'] = df['genres'].apply(ast.literal_eval)
Or:
df['genres'] = df['genres'].apply(json.loads)
Also using strip()+split():
df['genres']= [x.strip("[]").split(',') for x in df['genres']]
or,
df['genres']= df['genres'].apply(lambda x: x.strip("[]").split(','))

How to order list of dictionaries in python

I have list of dictionaries as follows:
[
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 54, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'},
{'id': 14979, 'name': 'Opel'},
{'id': 6, 'name': 'Volkswagen'}
]
What I want to do is to order it. And I want that some dictionaries with some name values show in the beginning of the list.
I want that for example Volkswagen, Audi, BMW, Opel, Peugeot as first params appears in list.
Thus the wanted result should be something like this:
[
{'id': 6, 'name': 'Volkswagen'}
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 54, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'},
{'id': 14979, 'name': 'Opel'},
]
Any idea how to do that?
You can use an appropriate key function for your sorting. This one orders by the given names first (in the given order). All other brands come after that with no order specified among themselves:
>>> rank = {x: i for i, x in enumerate(['Volkswagen', 'Audi', 'BMW', 'Opel', 'Peugeot'])}
# {'Volkswagen': 0, 'Audi': 1, ...}
>>> sorted(lst, key=lambda x: rank.get(x['name'], len(rank)))
[{'id': 6, 'name': 'Volkswagen'},
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 54, 'name': 'Opel'},
{'id': 14979, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'}]
You can use a dictionary to define a custom sorting order.
dicts = [
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 54, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'},
{'id': 14979, 'name': 'Opel'},
{'id': 6, 'name': 'Volkswagen'}
]
brand_order = ['Volkswagen', 'Audi', 'BMW', 'Opel', 'Peugeot']
order = dict(zip(brand_order, range(len(brand_order))))
dicts_sorted = sorted(dicts, key=lambda d: order.get(d['name'], float('inf')))
print(dicts_sorted)
Output:
[{'id': 6, 'name': 'Volkswagen'},
{'id': 16419, 'name': 'Audi'},
{'id': 13, 'name': 'BMW'},
{'id': 54, 'name': 'Opel'},
{'id': 14979, 'name': 'Opel'},
{'id': 55, 'name': 'Peugeot'},
{'id': 31, 'name': 'Honda'},
{'id': 50060, 'name': 'KTM'},
{'id': 50083, 'name': 'PGO'},
{'id': 16350, 'name': 'Skoda'},
{'id': 68, 'name': 'Suzuki'},
{'id': 2120, 'name': 'Triumph'},
{'id': 16328, 'name': 'Others'},
{'id': 16396, 'name': 'Seat'}]
Falling back to float('inf') ensures that whatever is not in order comes last.

Categories

Resources