This question already has answers here:
Split cell into multiple rows in pandas dataframe
(5 answers)
Pandas DataFrame to List of Dictionaries
(5 answers)
Closed 2 years ago.
In a large pandas Dataframe, I have three columns (fruit, vegetable, and first_name). The values of these columns are lists.
From the lists, I want to create one new column with a list of dictionaries for each row of the DataFrame.
I have three columns (fruit, vegetable, and first_name) with each row having lists as their values.
First row of my dataframe:
df = pd.DataFrame({
"fruit": [["Apple", "Banana","Pear","Grape","Pineapple"]],
"vegetable": [["Celery","Onion","Potato","Broccoli","Sprouts"]],
"first_name": [["Sam", "Beth", "John", "Daisy", "Jane"]]
})
How do I transform the three columns to one column and have the value look like this instead?
[
{"fruit": "Apple", "vegetable":"Celery", "first_name":"Sam"},
{"fruit": "Banana", "vegetable":"Onion", "first_name":"Beth"},
{"fruit": "Pear", "vegetable":"Potato", "first_name":"John"},
{"fruit": "Grape", "vegetable":"Broccoli", "first_name":"Daisy"},
{"fruit": "Pineapple", "vegetable":"Sprouts", "first_name":"Jane"}
]
IIUC you can do it with (1) .explode() and (2) .to_dict()
df.apply(pd.Series.explode).to_dict(orient='records')
#output:
[{'fruit': 'Apple', 'vegetable': 'Celery', 'first_name': 'Sam'},
{'fruit': 'Banana', 'vegetable': 'Onion', 'first_name': 'Beth'},
{'fruit': 'Pear', 'vegetable': 'Potato', 'first_name': 'John'},
{'fruit': 'Grape', 'vegetable': 'Broccoli', 'first_name': 'Daisy'},
{'fruit': 'Pineapple', 'vegetable': 'Sprouts', 'first_name': 'Jane'}]
You can also create the exploded DataFrame using to_dict and then calling pd.DataFrame. It will be a bit faster for smaller lists, but is essentially the same once you have 10,000+ items.
pd.DataFrame(df.iloc[0].to_dict()).to_dict('records')
[{'fruit': 'Apple', 'vegetable': 'Celery', 'first_name': 'Sam'},
{'fruit': 'Banana', 'vegetable': 'Onion', 'first_name': 'Beth'},
{'fruit': 'Pear', 'vegetable': 'Potato', 'first_name': 'John'},
{'fruit': 'Grape', 'vegetable': 'Broccoli', 'first_name': 'Daisy'},
{'fruit': 'Pineapple', 'vegetable': 'Sprouts', 'first_name': 'Jane'}]
the major issue to take care of is to flatten the value for each value in the dictionary. A rather manual implementation is:
for i in ["fruit","vegetable","first_name"]:
flat_list = [item for sublist in df[i] for item in sublist]
list.append(flat_list)
list_of_dic = [] for i in range(5):
dic = {}
dic["furit"] = list[0][i]
dic["vegetable"] = list[1][i]
dic["first_name"] = list[2][i]
list_of_dic.append(dic)
Related
I have two lists.
List A :
A = ["apple","cherry","pear","mango","banana","grape","kiwi","orange","pineapple"]
List B :
B = [{"offset":0, "xx":789},{"offset":3, "xx":921},{"offset":6, "xx":89}]
The idea is to use the offset from each item in B as an index offset for setting the xx values in our results array.
For instance, this would be the expected result:
C=[
{"fruit":"apple","xx":789},
{"fruit":"cherry","xx":789},
{"fruit":"pear","xx":789},
{"fruit":"mango","xx":921},
{"fruit":"banana","xx":921},
{"fruit":"grape","xx":921},
{"fruit":"kiwi","xx":89},
{"fruit":"orange","xx":89},
{"fruit":"pineapple","xx":89},
]
For example, B[0] has "offset" of 0. this means that C of index >= 0 will have an "xx" value of B[0]['xx']. Then we have B[0]['offset'] of 3 that will set new "xx" values to the C items with index >= 3 and so on.
I am able to acheive a similar result using a dataframes and pandas. But since pandas library is quite heavy, I am requested to do it without using pandas.
What about using a simple loop?
# rework B in a better format
dic = {d['offset']:d['xx'] for d in B}
# {0: 789, 3: 921, 6: 89}
C = []
v = None
for i, a in enumerate(A):
v = dic.get(i, v) # if we reached a threshold, update the value
C.append({'fruit':a, 'xx': v})
print(C)
Output:
[{'fruit': 'apple', 'xx': 789},
{'fruit': 'cherry', 'xx': 789},
{'fruit': 'pear', 'xx': 789},
{'fruit': 'mango', 'xx': 921},
{'fruit': 'banana', 'xx': 921},
{'fruit': 'grape', 'xx': 921},
{'fruit': 'kiwi', 'xx': 89},
{'fruit': 'orange', 'xx': 89},
{'fruit': 'pineapple', 'xx': 89}]
If the structure of B is required to be this way, you can do this:
A = ["apple","cherry","pear","mango","banana","grape","kiwi","orange","pineapple"]
B = [{"offset":0, "xx":789},{"offset":3, "xx":921},{"offset":6, "xx":89}]
C = []
B_iter = 0
for i, fruit in enumerate(A):
# check if not the last element and next element is start of new range
if B[B_iter] != B[-1] and B[B_iter+1]["offset"] == i:
B_iter += 1
C.append({"fruit": fruit, "xx": B[B_iter]["xx"]})
print(C)
Output:
[{'fruit': 'apple', 'xx': 789},
{'fruit': 'cherry', 'xx': 789},
{'fruit': 'pear', 'xx': 789},
{'fruit': 'mango', 'xx': 921},
{'fruit': 'banana', 'xx': 921},
{'fruit': 'grape', 'xx': 921},
{'fruit': 'kiwi', 'xx': 89},
{'fruit': 'orange', 'xx': 89},
{'fruit': 'pineapple', 'xx': 89}]
If the offset is always multiple of 3 you can simply do integer division to map the actual index to offset.
A = ["apple","cherry","pear","mango","banana","grape","kiwi","orange","pineapple"]
B = [{"offset": 0, "xx": 789}, {"offset": 3, "xx": 921}, {"offset": 6, "xx": 89}]
C = [{"fruit": fruit,"xx": B[int(idx/3)]["xx"]} for idx, fruit in enumerate(A)]
Output:
[{'fruit': 'apple', 'xx': 789},
{'fruit': 'cherry', 'xx': 789},
{'fruit': 'pear', 'xx': 789},
{'fruit': 'mango', 'xx': 921},
{'fruit': 'banana', 'xx': 921},
{'fruit': 'grape', 'xx': 921},
{'fruit': 'kiwi', 'xx': 89},
{'fruit': 'orange', 'xx': 89},
{'fruit': 'pineapple', 'xx': 89}]
x=[['apple', 'banana', 'carrot'],
['apple', 'banana'],
['banana', 'carrot']]
I want the result to look like this:
dict = {'banana': 3,'apple': 2, 'carrot': 2}
You can "flatten" your x, and then just count, while iterating over unique elements of the (flat) x:
x = [['apple', 'banana', 'carrot'],
['apple', 'banana'],
['banana', 'carrot']]
flat_x = [a for b in x for a in b]
res = {a: sum(b == a for b in flat_x) for a in set(flat_x)}
print(res)
# {'carrot': 2, 'apple': 2, 'banana': 3}
Another way to flatten-out the list-of-lists is to use chain from itertools, so you can use:
from itertools import chain
flat_x = list(chain(*x))
One-liner solution (if you don’t mind the import):
import collections
lists = [
['apple', 'banana', 'carrot'],
['apple', 'banana'],
['banana', 'carrot']
]
print(collections.Counter(
item for sublist in lists for item in sublist).most_common())
Documentation: Counter.most_common, generator expressions.
I need to look for a specific value within a variable.
market = {'fruit': [{'fruit_id': '25', 'fruit': 'banana', weight: 1.00}, {'fruit_id': '15', 'fruit': 'apple', weight: 1 .50}, {'fruit_id': '5', 'fruit': 'pear', weight: 2.00}]}
#print(type(market))
<class 'dict'>
How can I find the fruit whose fruit_id is '15'?
You can iterate over the value of fruit from markets dict and search.
for fr in market['fruit']:
if fr['fruit_id'] == '15':
ans = fr
print(ans)
ans = {'fruit_id': '15', 'fruit': 'apple', 'weight': 1.50}
In addition to the comments, you can create a function to search in your market dictionary:
market = {'fruit': [{'fruit_id': '25', 'fruit': 'banana', 'weight': 1.0},
{'fruit_id': '15', 'fruit': 'apple', 'weight': 1.5},
{'fruit_id': '5', 'fruit': 'pear', 'weight': 2.0}]}
def search_by_id(fruit_id):
for fruit in market['fruit']:
if fruit['fruit_id'] == fruit_id:
return fruit['fruit']
How to use it:
>>> search_by_id('15')
'apple'
>>> search_by_id('5')
'pear'
If you will need to search for fruits by id more than once, consider creating a dict where the id is the key:
>>> market = {'fruit': [
... {'fruit_id': '25', 'fruit': 'banana', 'weight': 1.00},
... {'fruit_id': '15', 'fruit': 'apple', 'weight': 1.50},
... {'fruit_id': '5', 'fruit': 'pear', 'weight': 2.00}
... ]}
>>>
>>> fruits_by_id = {f['fruit_id']: f for f in market['fruit']}
>>> fruits_by_id['15']
{'fruit_id': '15', 'fruit': 'apple', 'weight': 1.5}
Once you have a dict where a particular piece of data is the key, locating that piece of data by the key is easy, both for you and the computer (it's "constant time", aka effectively instantaneous, to locate an item in a dict by its key, whereas iterating through an entire dict takes an amount of time depending on how big the dict is).
If you aren't constrained in how market is defined, and your program is going to be looking up items by their id most of the time, it might make more sense to simply make market['fruit'] a dict up front (keyed on id) rather than having it be a list. Consider the following representation:
>>> market = {'fruit': {
... 25: {'name': 'banana', 'weight': 1.00},
... 15: {'name': 'apple', 'weight': 1.50},
... 5: {'name': 'pear', 'weight': 2.00}
... }}
>>> market['fruit'][15]
{'name': 'apple', 'weight': 1.5}
I have a data frame with entries that have the same meaning, I'd like to get them in the same row(and column).
My mock df:
my = pd.DataFrame(
{'fruit': ['Apple', 'Banana', 'Pomme', 'aeble', 'Banan', 'Orange', 'Apelsin'],
'bites': [1, 2, 3, 1, 2, 3, 4]})
and what I would like it to be:
The closest I've gotten is
my.loc['Apple'] +=my.loc['Pomme'] += my.loc['aeble']
But I am wondering if there is an easier way.
If you had some dict mapping all fruit values to one language, you could use groupby and map with agg functions join and sum:
d = {'Apple': 'Apple',
'Banana': 'Banana',
'Pomme': 'Apple',
'aeble': 'Apple',
'Banan': 'Banana',
'Orange': 'Orange',
'Apelsin': 'Orange'
}
my.groupby(my['fruit'].map(d)).agg({'fruit': lambda x: ', '.join(x),
'bites': 'sum'})
[out]
fruit bites
fruit
Apple Apple, Pomme, aeble 5
Banana Banana, Banan 4
Orange Orange, Apelsin 7
One way to help generate you're mapping dict could be to use the googletrans package:
from googletrans import Translator
translator = Translator()
d = {x.origin: x.text for x in translator.translate(my['fruit'].unique().tolist())}
[out]
{'Apple': 'Apple',
'Banana': 'Banana',
'Pomme': 'Apple',
'aeble': 'aeble',
'Banan': 'Banana',
'Orange': 'Orange',
'Apelsin': 'Orange'}
As you can see, it's not perfect, but will give you a head-start instead of creating entirely manually.
An alternative could be to create a third column to identify your fruit and then do a groupby :
my = pd.DataFrame(
{'fruit': ['Apple', 'Banana', 'Pomme', 'aeble', 'Banan', 'Orange', 'Apelsin'],
'bites': [1, 2, 3, 1, 2, 3, 4]})
#Create new column
my['Type Fruit'] = ['Apple', 'Bannana', 'Apple', 'Apple', 'Bannana', 'Orange', 'Orange']
# Group by fruit type
fruit_type = my.groupby(['Type Fruit'])['bites'].agg('sum')
In [1] : print(fruit_type )
Out[1] : Type Fruit
Apple 5
Bannana 4
Orange 7
The idea of #Chris to use google translate could also be used in this method to create the third column :
from googletrans import Translator
translator = Translator()
my['Type Fruit'] = [x.text for x in translator.translate(my['fruit'].unique().tolist())]
# Group by fruit type
fruit_type = my.groupby(['Type Fruit'])['bites'].agg('sum')
I have a list of fruits [{'name': 'apple', 'qty': 233}, {'name': 'orange', 'qty': '441'}]
When i filter the list for orange using lambda, list(filter(lambda x: x['name']=='orange', fruits)) , i get the right dict but i can not get the index of the dict. Index should be 1 not 0.
How do i get the right index of the filtered item ?
You can use a list comprehension and enumerate() instead:
>>> fruits = [{'name': 'apple', 'qty': 233}, {'name': 'orange', 'qty': '441'}]
>>> [(idx, fruit) for idx, fruit in enumerate(fruits) if fruit['name'] == 'orange']
[(1, {'name': 'orange', 'qty': '441'})]
Like #ChrisRands posted in the comments, you could also use filter by creating a enumeration object for your fruits list:
>>> list(filter(lambda fruit: fruit[1]['name'] == 'orange', enumerate(fruits)))
[(1, {'name': 'orange', 'qty': '441'})]
>>>
Here are some timings for the two methods:
>>> setup = \
"fruits = [{'name': 'apple', 'qty': 233}, {'name': 'orange', 'qty': '441'}]"
>>> listcomp = \
"[(idx, fruit) for idx, fruit in enumerate(fruits) if fruit['name'] == 'orange']"
>>> filter_lambda = \
"list(filter(lambda fruit: fruit[1]['name'] == 'orange', enumerate(fruits)))"
>>>
>>> timeit(setup=setup, stmt=listcomp)
1.0297133629997006
>>> timeit(setup=setup, stmt=filter_lambda)
1.6447856079998928
>>>