I have two lists.
List A :
A = ["apple","cherry","pear","mango","banana","grape","kiwi","orange","pineapple"]
List B :
B = [{"offset":0, "xx":789},{"offset":3, "xx":921},{"offset":6, "xx":89}]
The idea is to use the offset from each item in B as an index offset for setting the xx values in our results array.
For instance, this would be the expected result:
C=[
{"fruit":"apple","xx":789},
{"fruit":"cherry","xx":789},
{"fruit":"pear","xx":789},
{"fruit":"mango","xx":921},
{"fruit":"banana","xx":921},
{"fruit":"grape","xx":921},
{"fruit":"kiwi","xx":89},
{"fruit":"orange","xx":89},
{"fruit":"pineapple","xx":89},
]
For example, B[0] has "offset" of 0. this means that C of index >= 0 will have an "xx" value of B[0]['xx']. Then we have B[0]['offset'] of 3 that will set new "xx" values to the C items with index >= 3 and so on.
I am able to acheive a similar result using a dataframes and pandas. But since pandas library is quite heavy, I am requested to do it without using pandas.
What about using a simple loop?
# rework B in a better format
dic = {d['offset']:d['xx'] for d in B}
# {0: 789, 3: 921, 6: 89}
C = []
v = None
for i, a in enumerate(A):
v = dic.get(i, v) # if we reached a threshold, update the value
C.append({'fruit':a, 'xx': v})
print(C)
Output:
[{'fruit': 'apple', 'xx': 789},
{'fruit': 'cherry', 'xx': 789},
{'fruit': 'pear', 'xx': 789},
{'fruit': 'mango', 'xx': 921},
{'fruit': 'banana', 'xx': 921},
{'fruit': 'grape', 'xx': 921},
{'fruit': 'kiwi', 'xx': 89},
{'fruit': 'orange', 'xx': 89},
{'fruit': 'pineapple', 'xx': 89}]
If the structure of B is required to be this way, you can do this:
A = ["apple","cherry","pear","mango","banana","grape","kiwi","orange","pineapple"]
B = [{"offset":0, "xx":789},{"offset":3, "xx":921},{"offset":6, "xx":89}]
C = []
B_iter = 0
for i, fruit in enumerate(A):
# check if not the last element and next element is start of new range
if B[B_iter] != B[-1] and B[B_iter+1]["offset"] == i:
B_iter += 1
C.append({"fruit": fruit, "xx": B[B_iter]["xx"]})
print(C)
Output:
[{'fruit': 'apple', 'xx': 789},
{'fruit': 'cherry', 'xx': 789},
{'fruit': 'pear', 'xx': 789},
{'fruit': 'mango', 'xx': 921},
{'fruit': 'banana', 'xx': 921},
{'fruit': 'grape', 'xx': 921},
{'fruit': 'kiwi', 'xx': 89},
{'fruit': 'orange', 'xx': 89},
{'fruit': 'pineapple', 'xx': 89}]
If the offset is always multiple of 3 you can simply do integer division to map the actual index to offset.
A = ["apple","cherry","pear","mango","banana","grape","kiwi","orange","pineapple"]
B = [{"offset": 0, "xx": 789}, {"offset": 3, "xx": 921}, {"offset": 6, "xx": 89}]
C = [{"fruit": fruit,"xx": B[int(idx/3)]["xx"]} for idx, fruit in enumerate(A)]
Output:
[{'fruit': 'apple', 'xx': 789},
{'fruit': 'cherry', 'xx': 789},
{'fruit': 'pear', 'xx': 789},
{'fruit': 'mango', 'xx': 921},
{'fruit': 'banana', 'xx': 921},
{'fruit': 'grape', 'xx': 921},
{'fruit': 'kiwi', 'xx': 89},
{'fruit': 'orange', 'xx': 89},
{'fruit': 'pineapple', 'xx': 89}]
Related
This question already has answers here:
Split cell into multiple rows in pandas dataframe
(5 answers)
Pandas DataFrame to List of Dictionaries
(5 answers)
Closed 2 years ago.
In a large pandas Dataframe, I have three columns (fruit, vegetable, and first_name). The values of these columns are lists.
From the lists, I want to create one new column with a list of dictionaries for each row of the DataFrame.
I have three columns (fruit, vegetable, and first_name) with each row having lists as their values.
First row of my dataframe:
df = pd.DataFrame({
"fruit": [["Apple", "Banana","Pear","Grape","Pineapple"]],
"vegetable": [["Celery","Onion","Potato","Broccoli","Sprouts"]],
"first_name": [["Sam", "Beth", "John", "Daisy", "Jane"]]
})
How do I transform the three columns to one column and have the value look like this instead?
[
{"fruit": "Apple", "vegetable":"Celery", "first_name":"Sam"},
{"fruit": "Banana", "vegetable":"Onion", "first_name":"Beth"},
{"fruit": "Pear", "vegetable":"Potato", "first_name":"John"},
{"fruit": "Grape", "vegetable":"Broccoli", "first_name":"Daisy"},
{"fruit": "Pineapple", "vegetable":"Sprouts", "first_name":"Jane"}
]
IIUC you can do it with (1) .explode() and (2) .to_dict()
df.apply(pd.Series.explode).to_dict(orient='records')
#output:
[{'fruit': 'Apple', 'vegetable': 'Celery', 'first_name': 'Sam'},
{'fruit': 'Banana', 'vegetable': 'Onion', 'first_name': 'Beth'},
{'fruit': 'Pear', 'vegetable': 'Potato', 'first_name': 'John'},
{'fruit': 'Grape', 'vegetable': 'Broccoli', 'first_name': 'Daisy'},
{'fruit': 'Pineapple', 'vegetable': 'Sprouts', 'first_name': 'Jane'}]
You can also create the exploded DataFrame using to_dict and then calling pd.DataFrame. It will be a bit faster for smaller lists, but is essentially the same once you have 10,000+ items.
pd.DataFrame(df.iloc[0].to_dict()).to_dict('records')
[{'fruit': 'Apple', 'vegetable': 'Celery', 'first_name': 'Sam'},
{'fruit': 'Banana', 'vegetable': 'Onion', 'first_name': 'Beth'},
{'fruit': 'Pear', 'vegetable': 'Potato', 'first_name': 'John'},
{'fruit': 'Grape', 'vegetable': 'Broccoli', 'first_name': 'Daisy'},
{'fruit': 'Pineapple', 'vegetable': 'Sprouts', 'first_name': 'Jane'}]
the major issue to take care of is to flatten the value for each value in the dictionary. A rather manual implementation is:
for i in ["fruit","vegetable","first_name"]:
flat_list = [item for sublist in df[i] for item in sublist]
list.append(flat_list)
list_of_dic = [] for i in range(5):
dic = {}
dic["furit"] = list[0][i]
dic["vegetable"] = list[1][i]
dic["first_name"] = list[2][i]
list_of_dic.append(dic)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am working with a nested structure like this:
l=[
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'bannana', 'grapes']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'apricot', avocado]],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'tomato']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon','pear','strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'apricot', 2]]
]
How can I preserve an arbitrary number of elements from each element (sublist) of two nested lists? For example, say I want to preserve at least 5 elements. The expected output should be:
]
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']]
]
Or 9:
[
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'bannana', 'grapes']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon','pear','strawberry', 'strawberry', 'strawberry']]
]
Or 11:
[
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'bannana', 'grapes']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'apricot', avocado]],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'tomato']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon','pear','strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry']]
]
Alternatively, consider this list:
l2 = [
[['apple'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'apricot', avocado]],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'tomato']],
[['apple', 'tomato'], ['watermelon','pear','strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'apricot', 2]]
]
If I want 4, the output should look like this:
[
[['apple'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'],[]],
[['apple', 'bannana', 'pear', 'watermelon'],[]],
[['apple', 'tomato'], ['watermelon','pear']]
]
I could iterate and join over each sublist. However, If I do that I might break the inner lists inside the list. Any idea of how to remove a number elements without losing the [[],[]] structure efficiently?
Using for loop:
res = []
n = 4
for li, lj in l2:
res.append([li[:n], lj[:max(0,n-len(li))]])
res
Output:
[[['apple'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], []],
[['apple', 'bannana', 'pear', 'watermelon'], []],
[['apple', 'tomato'], ['watermelon', 'pear']]]
With l and n=5:
res = []
n = 5
for li, lj in l:
res.append([li[:n], lj[:max(0,n-len(li))]])
res
Output:
[[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']]]
To cut the input list in-place (using Python's list.clear() feature):
import pprint
def cut_list(lst, n):
for i, (l1, l2) in enumerate(lst):
if len(l1 + l2) > n: # check if there are items to cut
if len(l1) >= n: # if the 1st sublist covers the limit
lst[i][0] = l1[:n]
lst[i][1].clear() # clear the 2nd sublist in-place
else: # cut the 2nd sublist leaving the 1st one intact
lst[i][1] = l2[:n - len(l1)]
lst = [
[['apple'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'],
['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'apricot', 'avocado']],
[['apple', 'bannana', 'pear', 'watermelon'],
['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'tomato']],
[['apple', 'tomato'],
['watermelon', 'pear', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry', 'strawberry',
'apricot', 2]]
]
cut_list(lst, 4)
pprint.pprint(lst)
The output:
[[['apple'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], []],
[['apple', 'bannana', 'pear', 'watermelon'], []],
[['apple', 'tomato'], ['watermelon', 'pear']]]
Here's one that works for arbitrarily-sized inner lists:
def truncate_inner(it, keep):
for x in it:
yield x[:max(0, keep)]
keep -= len(x)
Usage for a 3d list such as l2:
for row in [list(truncate_inner(x, 3)) for x in l2]:
print(row)
Try this?
>>> def shrink( b, keep ) :
... result = []
... for bb in b :
... if keep < 1 : break
... result.append( bb[:keep] )
... keep -= len(bb)
... return result
...
>>> [shrink( b, 6 ) for b in a]
print json.dumps( [shrink( b, 6 ) for b in a], indent=4)
[[
[
"apple",
"bannana",
"pear",
"watermelon"
],
[
"watermelon",
"pear"
]
],
[
[
"apple",
"bannana",
"pear",
"watermelon"
],
[
"watermelon",
"pear"
]
],
[
[
"apple",
"bannana",
"pear",
"watermelon"
],
[
"watermelon",
"pear"
]
],
[
[
"apple",
"bannana",
"pear",
"watermelon"
],
[
"watermelon",
"pear"
]
]
]
>>>
for arr_2d in l:
assert len (arr_2d) == 2
fir_arr = arr_2d[0]
sec_arr = arr_2d[1]
arr_2d[1] = sec_arr[0:n-len(fir_arr)]
It works. I have tested.
for arr_2d in l: # iterate each 2D array inside l
assert len (arr_2d) == 2 # make sure the current 2D array has 2 elements
fir_arr = arr_2d[0] # assign variable
sec_arr = arr_2d[1] # to each of this 2d array
arr_2d[1] = sec_arr[0:n-len(fir_arr)] # pythonic way to cut the second element based on the number of items in the first
Output:
l
Out[50]:
[[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']],
[['apple', 'bannana', 'pear', 'watermelon'], ['watermelon']]]
Given these two nested lists, which are the same. For example, from a_lis and b_lis are the same. However, list a is the reversed form of b_lis:
['Berries', 'grapes', 'lemon', 'Orange', 'Apple']
and
['Apple', 'Orange', 'lemon', 'grapes', 'Berries']
a_lis, and b_lis:
a_lis = [['Berries', 'grapes', 'lemon', 'Orange', 'Apple'],
['Apricots', 'peach', 'grapes', 'lemon', 'Orange', 'Apple'],
[1, 'Melons', 'strawberries', 'lemon', 'Orange', 'Apple'],
['pumpkin', 'avocados', 'strawberries', 'lemon', 'Orange', 'Apple'],
[3, 'Melons', 'strawberries', 'lemon', 'Orange', 'Apple']]
And
b_lis = [['Apple', 'Orange', 'lemon', 'grapes', 'Berries'],
['Apple', 'Orange', 'lemon', 'grapes', 'peach', 'Apricots'],
['Apple', 'Orange', 'lemon', 'strawberries', 'Melons', 1],
['Apple', 'Orange', 'lemon', 'strawberries', 'avocados', 'pumpkin'],
['Apple', 'Orange', 'lemon', 'strawberries', 'Melons', 3]]
How can I align them into a 2 dimensional nested list with all the possible alignments, if and only if the lists are different? For example, ['Berries', 'grapes', 'lemon', 'Orange', 'Apple'], and ['Apple', 'Orange', 'lemon', 'grapes', 'Berries'] should not be concatenated because they are the same (i.e. the first one is the reversed version from the other). This is how the expected output should look like this (*):
So far, I tried to first, create a function that tell me if two lists are the same no matter its position:
def sequences_contain_same_items(a, b):
for item in a:
try:
i = b.index(item)
except ValueError:
return False
b = b[:i] + b[i+1:]
return not b
Then I iterated the lists:
lis= []
for f, b in zip(a_lis, b_lis):
#print(f, b)
lis.append(f)
lis.append(b)
print(lis)
However, I do not get how to produce the alignment output list. What I do not understand is if product is the right operation to apply here. Any idea of how to produce (*)?
a_lis = [['Berries', 'grapes', 'lemon', 'Orange', 'Apple'],
['Apricots', 'peach', 'grapes', 'lemon', 'Orange', 'Apple'],
[1, 'Melons', 'strawberries', 'lemon', 'Orange', 'Apple'],
['pumpkin', 'avocados', 'strawberries', 'lemon', 'Orange', 'Apple'],
[3, 'Melons', 'strawberries', 'lemon', 'Orange', 'Apple']]
reva = [k[-1::-1] for k in a_lis]
m = []
for i, v in enumerate(a_lis):
for i1,v1 in enumerate(reva):
if i==i1:
pass
else:
m.append(v)
m.append(v1)
print(m)
In a more compact way,
m = sum([[v, v1] for i, v in enumerate(a_lis) for i1,v1 in enumerate(reva) if i!=i1], [])
m = [[v, v1] for i, v in enumerate(a_lis) for i1,v1 in enumerate(reva) if i!=i1]
My question is about random.choice function. As we know, when we run random.choice(['apple','banana']), it will return either 'apple' or 'banana' with equal probabilities, what if I want to return biased result, for example, rerurn 'apple' with 0.9 probability and 'banana' with 0.1 probability? How to implement this?
Luckily, in Python 3, you can simply use
import random
random.choices(a, probability)
#random.choices(population, weights=None, *, cum_weights=None, k=1)
A basic way would be to get a rand number between 0 and 1 and make some test:
randNumber = random.random()
if randNumber < 0.9:
fruit = "apple"
else:
fruit = "banana"
which can be simplified by: ['apple', 'banana'][random.random()>0.9] (thanks to #falsetru comment)
The point is to create a new list with more of or less of that certain element
you wan't to biase
This ought to do it:
import random
a = ['apple','banana']
probability = [0.1,0.9]
def biase(lst,probability):
zipped = zip(lst,probability)
lst = [[i[0]] * int(i[1]*100) for i in zipped]
new = [b for i in lst for b in i]
return new
biased_list = biase(a,probability)
random_word = random.choice(biased_list)
print random_word
This code will produce banana in most cases because the string banana is repeated 90% than apple
I've added a list called probability and I've zipped (python lists are ordered) it but a dictionary is more suitable for these sort of tasks
And if you go under the hood and print biased_list you'll see something like:
['apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'apple', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana', 'banana']
Say I have a list
food_list = ['apple', 'pear', 'tomato', 'bean', 'carrot', 'grape']
How would I print the list in rows containing 4 columns, so it would look like:
apple pear tomato bean
carrot grape
food_list = ['apple', 'pear', 'tomato', 'bean', 'carrot', 'grape']
for i in xrange(0, len(food_list), 4):
print '\t'.join(food_list[i:i+4])
Try with this
food_list = ['apple', 'pear', 'tomato', 'bean', 'carrot', 'grape']
size = 4
g = (food_list[i:i+size] for i in xrange(0, len(food_list), size))
for i in g:
print i
food_list = ['apple', 'pear', 'tomato', 'bean', 'carrot', 'grape']
index = 0
for each_food in food_list:
if index < 3:
print each_food,
index += 1
else:
print each_food
index = 0