I have dataframe df:
0
0 a
1 b
2 c
3 d
4 e
O/P should be:
a b c d e
0
1
2
3
4
5
I want column containing(a, b,c,d,e) as header of my dataframe.
Could anyone help?
If your dataframe is pandas and its name is df. Try solving it with pandas:
Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with the list.
import pandas as pd
list = df[0].tolist() #df[0] is getting the content of first column
dfSolved = pd.DataFrame([], columns = list)
You may provide more details like the index and values of the expected output, the operation you wanna do, etc, so that we could give a specific solution to your case
Here is the solution:
import pandas as pd
import io
import numpy as np
data_string = """ columns_name
0 a
1 b
2 c
3 d
4 e
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s+')
# Solution
df_result = pd.DataFrame(data=[[np.nan]*5],
columns=df['columns_name'].tolist())
Related
I have the following excel sheet:
and want to print column 1 value if the column 2 value is not null. The output should be [1,3].
This the script created by me, but it doesn't work:
import xlrd
import pandas as pd
filename='test.xlsx'
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["col2"] !=" "]
df = frame.iloc[:, 0]
ndarray = df.to_numpy()
print(ndarray)
You can first filter down to nona rows and then show the values of the column you want to show:
dataframe[df['col2'].notna()]['col1'].values
If you print the dataframe, you will see that the empty cells are NaN:
Col1 Col2
0 1 a
1 2 NaN
2 3 b
3 4 NaN
So, you need to use the notna() method to filter
Here is your fixed code:
import xlrd
import pandas as pd
filename='test.xlsx'
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["col2"].notna()]
df = frame.iloc[:, 0]
ndarray = df.to_numpy()
print(ndarray)
I want to fetch the max value according to 2 columns in a pandas dataframe. I managed to do this according to 1 column but not 2.
For 1 column:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"), "value": np.arange(6)})
maxes = df.groupby(["name"]).agg("max")
df["maxvalue"]=df["name"].apply(lambda x: maxes.loc[x])
>>> df
name value maxvalue
0 A 0 2
1 B 1 3
2 A 2 2
3 B 3 3
4 C 4 4
5 D 5 5
For 2 columns, I've tried this but it doesn't work:
import numpy as np
import pandas as pd
df = pd.DataFrame({"name": list("ABABCD"),"name2": list("MNOMNO"), "value": np.arange(6)})
maxes = df.groupby(["name","name2"]).agg("max")
df["maxvalue"]=df[["name","name2"]].apply(lambda x: maxes.loc[x])
How can this be done for multiple columns?
Use transform instead of agg. Using one or two columns is exactly the same, for two columns it will be as follows:
df["maxvalue"] = df.groupby(["name", "name2"]).transform("max")
is there a way to create either data frames or series in python from an excel sheet that has multiple rows and columns such as
and expect the output to be all in one column
tried different codes for data frames and series non did what I expected and series are doing letter by letter in the code that I used
import numpy as np
sr=pd.read_excel('eng.xlsx')
s1=pd.Series(sr, expand=True)
print s1
Use DataFrame.stack with remove MultiIndex by Series.reset_index with drop=True:
s1 = sr.stack().reset_index(drop=True)
Or convert values to numpy array with numpy.ravel or numpy.flatten:
s1 = pd.Series(sr.values.ravel())
s1 = pd.Series(sr.values.flatten())
Sample:
sr = pd.DataFrame({
'A':list('ab'),
'B':list('cd'),
'C':list('ef'),
})
print (sr)
A B C
0 a c e
1 b d f
s1 = sr.stack().reset_index(drop=True)
print (s1)
0 a
1 c
2 e
3 b
4 d
5 f
dtype: object
I want to select columns which contain non-duplicate from a pandas data frame and use these columns to make up a subset data frame. For example, I have a data frame like this:
x y z
a 1 2 3
b 1 2 2
c 1 2 3
d 4 2 3
The columns "x" and "z" have non-duplicate values, so I want to pick them out and create a new data frame like:
x z
a 1 3
b 1 2
c 1 3
d 4 3
The can be realized by the following code:
import pandas as pd
df = pd.DataFrame([[1,2,3],[1,2,2],[1,2,3],[4,2,3]],index=['a','b','c','d'],columns=['x','y','z'])
df0 = pd.DataFrame()
for i in range(df.shape[1]):
if df.iloc[:,i].nunique() > 1:
df1 = df.iloc[:,i].T
df0 = pd.concat([df0,df1],axis=1, sort=False)
However, there must be more simple and direct methods. What are they?
Best regards
df[df.columns[(df.nunique()!=1).values]]
Maybe you can try this one-liner.
Apply nunique, then remove columns where nunique is 1:
nunique = df.apply(pd.Series.nunique)
cols_to_drop = nunique[nunique == 1].index
df = df.drop(cols_to_drop, axis=1)
df =df[df.columns[df.nunique()>1]]
assuming columns with all repeated values with give nunique =1 other will be more 1.
df.columns[df.nunique()>1] will give all columns names which fulfill the purpose
simple one liner:
df0 = df.loc[:,(df.max()-df.min())!=0]
or even better
df0 = df.loc[:,(df.max()!=df.min())]
In Python to check if a value is in a list you can simply do the following:
>>>9 in [1,2,3,6,9]
True
I would like to do the same for a Pandas DataFrame but unfortunately Pandas does not recognise that notation:
>>>import pandas as pd
>>>df = pd.DataFrame([[1,2,3,4],[5,6,7,8]],columns=["a","b","c","d"])
a b c d
0 1 2 3 4
1 5 6 7 8
>>>7 in df
False
How would I achieve this using Pandas DataFrame without iterating through each column/row or anything complicated?
Basically you have to check the matrix without the schema, so:
7 in df.values
x in df checks if x is in the columns:
for x in df:
print x,
out: a b c d