How to convert first column of dataframe in to its headers

How to convert first column of dataframe in to its headers - python

I have dataframe df:
0
0 a
1 b
2 c
3 d
4 e
O/P should be:
a b c d e
0
1
2
3
4
5
I want column containing(a, b,c,d,e) as header of my dataframe.
Could anyone help?

If your dataframe is pandas and its name is df. Try solving it with pandas:
Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with the list.
import pandas as pd
list = df[0].tolist() #df[0] is getting the content of first column
dfSolved = pd.DataFrame([], columns = list)

You may provide more details like the index and values of the expected output, the operation you wanna do, etc, so that we could give a specific solution to your case
Here is the solution:
import pandas as pd
import io
import numpy as np
data_string = """ columns_name
0 a
1 b
2 c
3 d
4 e
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s+')
# Solution
df_result = pd.DataFrame(data=[[np.nan]*5],
columns=df['columns_name'].tolist())

Related

How to print excel column non null value with python?

I have the following excel sheet:
and want to print column 1 value if the column 2 value is not null. The output should be [1,3].
This the script created by me, but it doesn't work:
import xlrd
import pandas as pd
filename='test.xlsx'
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["col2"] !=" "]
df = frame.iloc[:, 0]
ndarray = df.to_numpy()
print(ndarray)

You can first filter down to nona rows and then show the values of the column you want to show:
dataframe[df['col2'].notna()]['col1'].values

If you print the dataframe, you will see that the empty cells are NaN:
Col1 Col2
0 1 a
1 2 NaN
2 3 b
3 4 NaN
So, you need to use the notna() method to filter
Here is your fixed code:
import xlrd
import pandas as pd
filename='test.xlsx'
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["col2"].notna()]
df = frame.iloc[:, 0]
ndarray = df.to_numpy()
print(ndarray)

Fetch max value column with rows condition

I want to fetch the max value according to 2 columns in a pandas dataframe. I managed to do this according to 1 column but not 2.
For 1 column:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"), "value": np.arange(6)})
maxes = df.groupby(["name"]).agg("max")
df["maxvalue"]=df["name"].apply(lambda x: maxes.loc[x])
>>> df
name value maxvalue
0 A 0 2
1 B 1 3
2 A 2 2
3 B 3 3
4 C 4 4
5 D 5 5
For 2 columns, I've tried this but it doesn't work:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"),"name2": list("MNOMNO"), "value": np.arange(6)})
maxes = df.groupby(["name","name2"]).agg("max")
df["maxvalue"]=df[["name","name2"]].apply(lambda x: maxes.loc[x])
How can this be done for multiple columns?

Use transform instead of agg. Using one or two columns is exactly the same, for two columns it will be as follows:
df["maxvalue"] = df.groupby(["name", "name2"]).transform("max")

data frame and series in python to create excel column

is there a way to create either data frames or series in python from an excel sheet that has multiple rows and columns such as
and expect the output to be all in one column
tried different codes for data frames and series non did what I expected and series are doing letter by letter in the code that I used
import numpy as np
sr=pd.read_excel('eng.xlsx')
s1=pd.Series(sr, expand=True)
print s1

Use DataFrame.stack with remove MultiIndex by Series.reset_index with drop=True:
s1 = sr.stack().reset_index(drop=True)
Or convert values to numpy array with numpy.ravel or numpy.flatten:
s1 = pd.Series(sr.values.ravel())
s1 = pd.Series(sr.values.flatten())
Sample:
sr = pd.DataFrame({
'A':list('ab'),
'B':list('cd'),
'C':list('ef'),
})
print (sr)
A B C
0 a c e
1 b d f
s1 = sr.stack().reset_index(drop=True)
print (s1)
0 a
1 c
2 e
3 b
4 d
5 f
dtype: object

How to select columns which contain non-duplicate from a pandas data frame

I want to select columns which contain non-duplicate from a pandas data frame and use these columns to make up a subset data frame. For example, I have a data frame like this:
x y z
a 1 2 3
b 1 2 2
c 1 2 3
d 4 2 3
The columns "x" and "z" have non-duplicate values, so I want to pick them out and create a new data frame like:
x z
a 1 3
b 1 2
c 1 3
d 4 3
The can be realized by the following code:
import pandas as pd
df = pd.DataFrame([[1,2,3],[1,2,2],[1,2,3],[4,2,3]],index=['a','b','c','d'],columns=['x','y','z'])
df0 = pd.DataFrame()
for i in range(df.shape[1]):
if df.iloc[:,i].nunique() > 1:
df1 = df.iloc[:,i].T
df0 = pd.concat([df0,df1],axis=1, sort=False)
However, there must be more simple and direct methods. What are they?
Best regards

df[df.columns[(df.nunique()!=1).values]]
Maybe you can try this one-liner.

Apply nunique, then remove columns where nunique is 1:
nunique = df.apply(pd.Series.nunique)
cols_to_drop = nunique[nunique == 1].index
df = df.drop(cols_to_drop, axis=1)

df =df[df.columns[df.nunique()>1]]
assuming columns with all repeated values with give nunique =1 other will be more 1.
df.columns[df.nunique()>1] will give all columns names which fulfill the purpose

simple one liner:
df0 = df.loc[:,(df.max()-df.min())!=0]
or even better
df0 = df.loc[:,(df.max()!=df.min())]

See if a value exists in a DataFrame

In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?

Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert first column of dataframe in to its headers - python

I have dataframe df: 0 0 a 1 b 2 c 3 d 4 e O/P should be: a b c d e 0 1 2 3 4 5 I want column containing(a, b,c,d,e) as header of my dataframe. Could anyone help?

Related

How to print excel column non null value with python?

Fetch max value column with rows condition

data frame and series in python to create excel column

How to select columns which contain non-duplicate from a pandas data frame

See if a value exists in a DataFrame

Categories

Resources