How to print excel column non null value with python?

How to print excel column non null value with python? - python

I have the following excel sheet:
and want to print column 1 value if the column 2 value is not null. The output should be [1,3].
This the script created by me, but it doesn't work:
import xlrd
import pandas as pd
filename='test.xlsx'
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["col2"] !=" "]
df = frame.iloc[:, 0]
ndarray = df.to_numpy()
print(ndarray)

You can first filter down to nona rows and then show the values of the column you want to show:
dataframe[df['col2'].notna()]['col1'].values

If you print the dataframe, you will see that the empty cells are NaN:
Col1 Col2
0 1 a
1 2 NaN
2 3 b
3 4 NaN
So, you need to use the notna() method to filter
Here is your fixed code:
import xlrd
import pandas as pd
filename='test.xlsx'
dataframe = pd.read_excel(filename)
frame = dataframe.loc[dataframe["col2"].notna()]
df = frame.iloc[:, 0]
ndarray = df.to_numpy()
print(ndarray)

Related

How to convert first column of dataframe in to its headers

I have dataframe df:
0
0 a
1 b
2 c
3 d
4 e
O/P should be:
a b c d e
0
1
2
3
4
5
I want column containing(a, b,c,d,e) as header of my dataframe.
Could anyone help?

If your dataframe is pandas and its name is df. Try solving it with pandas:
Firstly convert initial df content to a list, afterwards create a new dataframe defining its columns with the list.
import pandas as pd
list = df[0].tolist() #df[0] is getting the content of first column
dfSolved = pd.DataFrame([], columns = list)

You may provide more details like the index and values of the expected output, the operation you wanna do, etc, so that we could give a specific solution to your case
Here is the solution:
import pandas as pd
import io
import numpy as np
data_string = """ columns_name
0 a
1 b
2 c
3 d
4 e
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s+')
# Solution
df_result = pd.DataFrame(data=[[np.nan]*5],
columns=df['columns_name'].tolist())

Assign a new column in pandas in a similar way as in pyspark

I have the following dataframe:
df = pd.DataFrame([['A', 1],['B', 2],['C', 3]], columns=['index', 'result'])
index
result
A
1
B
2
C
3
I would like to create a new column, for example multiply the column 'result' by two, and I am just curious to know if there is a way to do it in pandas as pyspark does it.
In pyspark:
df = df\
.withColumn("result_multiplied", F.col("result")*2)
I don't like the fact of writing the name of the dataframe everytime I have to perform an operation as it is done in pandas such as:
In pandas:
df['result_multiplied'] = df['result']*2

Use DataFrame.assign:
df = df.assign(result_multiplied = df['result']*2)
Or if column result is processing in code before is necessary lambda function for processing counted values in column result:
df = df.assign(result_multiplied = lambda x: x['result']*2)
Sample for see difference column result_multiplied is count by multiple original df['result'], for result_multiplied1 is used multiplied column after mul(2):
df = df.mul(2).assign(result_multiplied = df['result']*2,
result_multiplied1 = lambda x: x['result']*2)
print (df)
index result result_multiplied result_multiplied1
0 AA 2 2 4
1 BB 4 4 8
2 CC 6 6 12

Python append rows in dataframe are flipped

I have a dataset imported from a CSV file to a dataframe in Python. I want to remove some specific rows from this dataframe and append them to an empty dataframe. So far I have tried to remove row 1 and 0 from the "big" dataframe called df and put these into dff using this code:
dff = pd.DataFrame() #Create empty dataframe
for x in range(0, 2):
dff = dff.append(df.iloc[x]) #Append the first 2 rows from df to dff
#How to remove appended rows from df?
This seems to work, however the columns are flipped, for e.g., df got order A, B, C, then dff will get the order C, B, A; other than that the data is correct. Also how do I remove a specific row from a dataframe?

If your goal is just to remove the first two rows into another dataframe, you don't need to use a loop, just slice:
import pandas as pd
df = pd.DataFrame({"col1": [1,2,3,4,5,6], "col2": [11,22,33,44,55,66]})
dff = df.iloc[:2]
df = df.iloc[2:]
Will give you:
dff
Out[6]:
col1 col2
0 1 11
1 2 22
df
Out[8]:
col1 col2
2 3 33
3 4 44
4 5 55
5 6 66
If your list of desired rows is more complex than just the first two, per your example, a more generic method could be:
dff = df.iloc[[1,3,5]] # Your list of row numbers
df = df.iloc[~df.index.isin(dff.index)]
This means that even if the index column isn't sequential integers, any rows that you used to populate dff will be removed from df.

I managed to solve it by doing:
dff = pd.DataFrame()
dff = df.iloc[:0]
This will copy the first row of df (the titles of the colums e.g. A,B,C) into dff, then append work as it should with any row and row e.g. 1150 can be appended and removed using:
dff = dff.append(df.iloc[1150])
df = df.drop(df.index[1150])

How i can replace the XNA value with NAN in dataframe using Replace function?

I have a dataframe which contains 36 columns and 1 600 000 rows. I have XNA value in data so when i try to find null value using df.isnull().sum(). the xna value didnot count so for count that i have to replace xna value with Nan so, how i can do that?

Just do:
import pandas as pd
import numpy as np
df = pd.DataFrame([[0, 1, 2], ["test", "NXA", "test2"]]).T
df.columns = ["col1", "col2"]
df.col2.replace("NXA", np.nan)
to replace all "XNA" value in col2 with missing values in numpy / pandas format.

Unable to fillna a column in dataframe with values from a series

I am trying to fillna in a specific column of the dataframe with the mean of not-null values of the same type (based on the value from another column in the dataframe).
Here is the code to reproduce my issue:
import numpy as np
import pandas as pd
df = pd.DataFrame()
#Create the DateFrame with a column of floats
#And a column of labels (str)
np.random.seed(seed=6)
df['col0']=np.random.randn(100)
lett=['a','b','c','d']
df['col1']=np.random.choice(lett,100)
#Set some of the floats to NaN for the test.
toz = np.random.randint(0,100,25)
df.loc[toz,'col0']=np.NaN
df[df['col0'].isnull()==False].count()
#Create a DF with mean for each label.
w_series = df.loc[(~df['col0'].isnull())].groupby('col1').mean()
col0
col1
a 0.057199
b 0.363899
c -0.068074
d 0.251979
#This dataframe has our label (a,b,c,d) as the index. Doesn't seem
#to work when I try to df.fillna(w_series). So I try to reindex such
#that the labels (a,b,c,d) become a column again.
#
#For some reason I cannot just do a set_index and expect the
#old index to become column. So I append the new index and
#then reset it.
w_series['col2'] = list(range(w_series.size))
w_frame = w_series.set_index('col2',append=True)
w_frame.reset_index('col1',inplace=True)
#I try fillna() with the new dataframe.
df.fillna(w_frame)
Still no luck:
col0 col1
0 0.057199 b
1 0.729004 a
2 0.217821 d
3 0.251979 c
4 -2.486781 a
5 0.913252 b
6 NaN a
7 NaN b
What am I doing wrong?
How do I fillna the dataframe with the averages of specific rows that match the missing information?
Does the size of the dataframe being filled (df) and the filler dataframe (w_frame) have to match?
Thank you

fillna is base on index, so , you need same index for your target dataframe and process dataframe
df.set_index('col1')['col0'].fillna(w_frame.set_index('col1').col0).reset_index()
# I only show the first 11 row
Out[74]:
col1 col0
0 b 0.363899
1 a 0.729004
2 d 0.217821
3 c -0.068074
4 a -2.486781
5 b 0.913252
6 a 0.057199
7 b 0.363899
8 c -0.068074
9 b -0.429894
10 a 2.631281
My way to fillna
df['col1']=df.groupby("col1")['col0'].transform(lambda x: x.fillna(x.mean()))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to print excel column non null value with python? - python

You can first filter down to nona rows and then show the values of the column you want to show: dataframe[df['col2'].notna()]['col1'].values

Related

How to convert first column of dataframe in to its headers

Assign a new column in pandas in a similar way as in pyspark

Python append rows in dataframe are flipped

How i can replace the XNA value with NAN in dataframe using Replace function?

Unable to fillna a column in dataframe with values from a series

Categories

Resources