Put dataframe rows in front - python

I have a dataframe like this:
A
B
1
2
3
4
5
6
I want to take its rows and put them in front like this:
A
B
A
B
A
B
1
2
3
4
5
6
Is there any way I can do that?
I tried using iloc but could not figure out how to do this.

One option is to:
flatten values as numpy array using .values dataframe property and np.reshape function
build a new dataframe, whose column names can be obtained by using np.tile on the original column list
pd.DataFrame(
df.values.reshape(1, -1),
columns = np.tile(df.columns.values, len(df)).tolist()
)
Output:
A B A B A B
0 1 2 3 4 5 6

Related

Pandas add a second level index to the columns using a list

I have a dataframe with column headings (and for my real data multi-level row indexes). I want to add a second level index to the columns based on a list I have.
import pandas as pd
data = {"apple": [7,5,6,4,7,5,8,6],
"strawberry": [3,5,2,1,3,0,4,2],
"banana": [1,2,1,2,2,2,1,3],
"chocolate" : [5,8,4,2,1,6,4,5],
"cake":[4,4,5,1,3,0,0,3]
}
df = pd.DataFrame(data)
food_cat = ["fv","fv","fv","j","j"]
I am wanting something that looks like this:
I tried to use How to add a second level column header/index to dataframe by matching to dictionary values? - however couldn't get it working (and not ideal as I'd need to figure out how to automate the dictionary, which I don't have).
I also tried adding the list as a row in the dataframe and converting that row to a second level index as in this answer using
df.loc[len(df)] = food_cat
df = pd.MultiIndex.from_arrays(df.columns, df.iloc[len(df)-1])
but got the error
Check if lengths of all arrays are equal or not,
TypeError: Input must be a list / sequence of array-likes.
I also tried using df = pd.MultiIndex.from_arrays(df.columns, np.array(food_cat)) with import numpy as np but got the same error.
I feel like this should be a simple task (it is for rows), and there are a lot of questions asked, but I was struggling to find something I could duplicate to adapt to my data.
Pandas multi index creation requires a list(or list like) passed as an argument:
df.columns = pd.MultiIndex.from_arrays([food_cat, df.columns])
df
fv j
apple strawberry banana chocolate cake
0 7 3 1 5 4
1 5 5 2 8 4
2 6 2 1 4 5
3 4 1 2 2 1
4 7 3 2 1 3
5 5 0 2 6 0
6 8 4 1 4 0
7 6 2 3 5 3

Python how to merge two dataframes with multiple columns while preserving row order in each column?

My data is contained within two dataframes. Within each dataframe, the entries are sorted in each column. I want to now merge the two dataframes while preserving row order. For example, suppose I have this:
The first dataframe "A1" looks like this:
index a b c
0 1 4 1
3 2 7 3
5 5 8 4
6 6 10 8
...
and the second dataframe "A2" looks like this (A1 and A2 are the same size):
index a b c
1 3 1 2
2 4 2 5
4 7 3 6
7 8 5 7
...
I want to merge both of these dataframes to get the final dataframe "data":
index a b c
0 1 4 1
1 3 1 2
2 4 2 5
3 2 7 3
...
Here is what I have tried:
data = A1.merge(A2, how='outer', left_index=True, right_index=True)
But I keep getting strange results. I don't even know if this works if you have multiple columns whose row order you need to preserve. I find that some of the entries become NaNs for some reason. I don't know how to fix it. I also tried data.join(A1, A2) but the compiler printed out that it couldn't join these two dataframes.
import pandas as pd
#Create Data Frame df and df1
df = pd.DataFrame({'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,0,11,12]},index=[0,3,5,6])
df1 = pd.DataFrame({'a':[13,14,15,16],'b':[17,18,19,20],'c':[21,22,23,24]},index=[1,2,4,7])
#Append df and df1 and sort by index.
df2 = df.append(df1)
print(df2.sort_index())

Convert multi-dim list in one column in python pandas

I would like to know whether I can get some help in "translating" a multi dim list in a single column of a frame in pandas.
I found help here to translate a multi dim list in a column with multiple columns, but I need to translate the data in one
Suppose I have the following list of list
x=[[1,2,3],[4,5,6]]
If I create a frame I get
frame=pd.Dataframe(x)
0 1 2
1 2 3
4 5 6
But my desire outcome shall be
0
1
2
3
4
5
6
with the zero as column header.
I can of course get the result with a for loop, which from my point of view takes much time. Is there any pythonic/pandas way to get it?
Thanks for helping men
You can use np.concatenate
x=[[1,2,3],[4,5,6]]
frame=pd.DataFrame(np.concatenate(x))
print(frame)
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
First is necessary flatten values of lists and pass to DataFrame constructor:
df = pd.DataFrame([z for y in x for z in y])
Or:
from itertools import chain
df = pd.DataFrame(list(chain.from_iterable(x)))
print (df)
0
0 1
1 2
2 3
3 4
4 5
5 6
If you use numpy you can utilize the method ravel():
pd.DataFrame(np.array(x).ravel())

Python Pandas: Counting the Frequency of unique values over all Columns

I have a question, how does one count the number of unique values that occur within each column of a pandas data-frame?
Say I have a data frame named df that looks like this:
1 2 3 4
a yes f c
b no f e
c yes d h
I am wanting to get output that shows the frequency of unique values within the four columns. The output would be something similar to this:
Column # of Unique Values
1 3
2 2
3 2
4 3
I don't need to know what the unique values are, just how many there are within each column.
I have played around with something like this:
df[all_cols].value_counts()
[all_cols] is a list of all the columns within the data frame. But this is counting how many times the value appears within the column.
Any advice/suggestions would be a great help. Thanks
You could apply Series.nunique:
>>> df.apply(pd.Series.nunique)
1 3
2 2
3 2
4 3
dtype: int64
Or you could do a groupby/nunique on the unstacked version of the frame:
>>> df.unstack().groupby(level=0).nunique()
1 3
2 2
3 2
4 3
dtype: int64
Both of these produce a Series, which you could then use to build a frame with whatever column names you wanted.
You could try df.nunique()
>>> df.nunique()
1 3
2 2
3 2
4 3
dtype: int64

Pandas: Add new columns to DataFrame based on values in columns

Given a DataFrame like this:
>>> df
0 1 2
0 2 3 5
1 3 4 7
and a function that returns multiple results, like this:
def sumprod(x, y, z):
return x+y+z, x*y*z
I want to add new columns, so the result would be:
>>> df
0 1 2 sum prod
0 2 3 5 10 30
1 3 4 7 14 84
I have been successful with functions that returns one result:
df["sum"] = p.apply(sum, axis=1)
but not if it returns more than one result.
One way to do this is to pass the columns of the DataFrame to the function by unpacking the transpose of the array:
>>> df['sum'], df['prod'] = sumprod(*df.values.T)
>>> df
0 1 2 sum prod
0 2 3 5 10 30
1 3 4 7 14 84
sumprod returns a tuple of columns and, since Python supports multiple assignment, you can assign them to new column labels as above.
You could write df['sum'], df['prod'] = sumprod(df[0], df[1], df[2]) to get the same result. This is clearer and is preferable if you need to pass the columns to the function in a particular order. On the other hand, it's a lot more verbose if you have a lot of columns to pass to the function.

Categories

Resources