Convert multi-dim list in one column in python pandas - python

I would like to know whether I can get some help in "translating" a multi dim list in a single column of a frame in pandas.
I found help here to translate a multi dim list in a column with multiple columns, but I need to translate the data in one
Suppose I have the following list of list
x=[[1,2,3],[4,5,6]]
If I create a frame I get
frame=pd.Dataframe(x)
0 1 2
1 2 3
4 5 6
But my desire outcome shall be
0
1
2
3
4
5
6
with the zero as column header.
I can of course get the result with a for loop, which from my point of view takes much time. Is there any pythonic/pandas way to get it?
Thanks for helping men

You can use np.concatenate
x=[[1,2,3],[4,5,6]]
frame=pd.DataFrame(np.concatenate(x))
print(frame)
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6

First is necessary flatten values of lists and pass to DataFrame constructor:
df = pd.DataFrame([z for y in x for z in y])
Or:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(x)))
print (df)
0
0 1
1 2
2 3
3 4
4 5
5 6

If you use numpy you can utilize the method ravel():
pd.DataFrame(np.array(x).ravel())

Related

Pandas solution to computing the maximum of two sums of two columns?

So I have a DataFrame with (amongst others) four colours with numerical values. I want to add a column to the DataFrame that has the maximum of the two sums obtained from summing two columns.
My solutions so far is
from pandas import DataFrame
df = DataFrame(data={'text': ['a','b','c'], 'a':[1,2,3],'b':[2,3,4],'c':[5,4,2],'d':[-2,4,1]})
df['sum1'] = df['a'].add(df['b'])
df['sum2'] = df['c'].add(df['d'])
df['maxsum'] = df[['sum1','sum2']].max(axis=1)
which gives the desired result.
I am pretty sure, there is a more concise way to do this...
There is nothing wrong with your approach. In fact, it is the approach I would take if nothing more than the fact it is easy to read and figure out what you are doing. But if you are looking for another solution, here is one using numpy.ufunc.reduceat
import pandas as pd
import numpy as np
# sample frame
df = pd.DataFrame(data={'text': ['a','b','c'], 'a':[1,2,3],'b':[2,3,4],'c':[5,4,2],'d':[-2,4,1]})
# we skip the first column and convert to an array - df[df.columns[1:]].values
# we specify the indicies to slice - np.arange(len(df.columns[1:]))[::2]
# then find the max
df['max'] = np.max(np.add.reduceat(df[df.columns[1:]].values,
np.arange(len(df.columns[1:]))[::2],
axis=1),
axis=1)
text a b c d max
0 a 1 2 5 -2 3
1 b 2 3 4 4 8
2 c 3 4 2 1 7
Not that it much more concised, but instead of your current approach you can apply one-shot assignment:
df = df.assign(sum1=df[['a', 'b']].sum(1), sum2=df[['c', 'd']].sum(1),
maxsum=lambda df: df[['sum1','sum2']].max(1))
text a b c d sum1 sum2 maxsum
0 a 1 2 5 -2 3 3 3
1 b 2 3 4 4 5 8 8
2 c 3 4 2 1 7 3 7

Put dataframe rows in front

I have a dataframe like this:
A
B
1
2
3
4
5
6
I want to take its rows and put them in front like this:
A
B
A
B
A
B
1
2
3
4
5
6
Is there any way I can do that?
I tried using iloc but could not figure out how to do this.
One option is to:
flatten values as numpy array using .values dataframe property and np.reshape function
build a new dataframe, whose column names can be obtained by using np.tile on the original column list
pd.DataFrame(
df.values.reshape(1, -1),
columns = np.tile(df.columns.values, len(df)).tolist()
)
Output:
A B A B A B
0 1 2 3 4 5 6

Pandas add a second level index to the columns using a list

I have a dataframe with column headings (and for my real data multi-level row indexes). I want to add a second level index to the columns based on a list I have.
import pandas as pd
data = {"apple": [7,5,6,4,7,5,8,6],
"strawberry": [3,5,2,1,3,0,4,2],
"banana": [1,2,1,2,2,2,1,3],
"chocolate" : [5,8,4,2,1,6,4,5],
"cake":[4,4,5,1,3,0,0,3]
}
df = pd.DataFrame(data)
food_cat = ["fv","fv","fv","j","j"]
I am wanting something that looks like this:
I tried to use How to add a second level column header/index to dataframe by matching to dictionary values? - however couldn't get it working (and not ideal as I'd need to figure out how to automate the dictionary, which I don't have).
I also tried adding the list as a row in the dataframe and converting that row to a second level index as in this answer using
df.loc[len(df)] = food_cat
df = pd.MultiIndex.from_arrays(df.columns, df.iloc[len(df)-1])
but got the error
Check if lengths of all arrays are equal or not,
TypeError: Input must be a list / sequence of array-likes.
I also tried using df = pd.MultiIndex.from_arrays(df.columns, np.array(food_cat)) with import numpy as np but got the same error.
I feel like this should be a simple task (it is for rows), and there are a lot of questions asked, but I was struggling to find something I could duplicate to adapt to my data.
Pandas multi index creation requires a list(or list like) passed as an argument:
df.columns = pd.MultiIndex.from_arrays([food_cat, df.columns])
df
fv j
apple strawberry banana chocolate cake
0 7 3 1 5 4
1 5 5 2 8 4
2 6 2 1 4 5
3 4 1 2 2 1
4 7 3 2 1 3
5 5 0 2 6 0
6 8 4 1 4 0
7 6 2 3 5 3

Python Pandas: Passing previous, current and next row to a function

I have a dataframe and a function that I want to apply elementwise.
Of course I can iterate over the dataframe but this is very slow and I want to find a quicker way.
My dataframe:
A B C D pattern
4 4 5 6 0
6 4 1 2 0
5 2 2 1 0
5 6 7 9 0
My function takes three rows as an input and returns a value that I want to store in the pattern column of current_row.
def findPattern(previous_row, current_row, next_row):
.....
return "pattern"
How can I apply this function to my dataframe without iterating over it with a for loop?
Thanks for any help :)

Pandas Count values across rows that are greater than another value in a different column

I have a pandas dataframe like this:
X a b c
1 1 0 2
5 4 7 3
6 7 8 9
I want to print a column called 'count' which outputs the number of values greater than the value in the first column('x' in my case). The output should look like:
X a b c Count
1 1 0 2 2
5 4 7 3 1
6 7 8 9 3
I would like to refrain from using 'lambda function' or 'for' loop or any kind of looping techniques since my dataframe has a large number of rows. I tried something like this but i couldn't get what i wanted.
df['count']=df [ df.iloc [:,1:] > df.iloc [:,0] ].count(axis=1)
I Also tried
numpy.where()
Didn't have any luck with that either. So any help will be appreciated. I also have nan as part of my dataframe. so i would like to ignore that when i count the values.
Thanks for your help in advance!
You can using ge(>=) with sum
df.iloc[:,1:].ge(df.iloc[:,0],axis = 0).sum(axis = 1)
Out[784]:
0 2
1 1
2 3
dtype: int64
After assign it back
df['Count']=df.iloc[:,1:].ge(df.iloc [:,0],axis=0).sum(axis=1)
df
Out[786]:
X a b c Count
0 1 1 0 2 2
1 5 4 7 3 1
2 6 7 8 9 3
df['count']=(df.iloc[:,2:5].le(df.iloc[:,0],axis=0).sum(axis=1) + df.iloc[:,2:5].ge(df.iloc[:,1],axis=0).sum(axis=1))
In case anyone needs such a solution, you can just add the output you get from '.le' and '.ge' in one line. Thanks to #Wen for the answer to my question though!!!

Categories

Resources