How to drop an empty row in dataframe - python

I have a dataframe where there is one empty row every 10 rows, it looks like the following
A B C D E
0
1 a b c d e
2 f g h i j
.....
I would like to drop the empty row in the dataframe, but the problem is the row is not filled with empty string " ", they are more like "".
Therefore, the df.fillna and df.dropna both do not work and I'm not sure how to replace them.
Any suggestion would be helpful! Thank you guys!

Filter all rows with no empty values like:
df = df[df.ne('').all(axis=1)]

If it's every 10 rows as you say you may do something like:
df_clean = df[df.index % 10 != 0]
Which will drop every 10th row starting at the first one.

Related

Dataframe method to Transpose multiple rows to single column

How can i transpose multiple rows to a single column.
**** my rows contain a word 'Narrative', so there are many similar words.
if the word 'Narrative' is found then I want to transpose it to a single column.
example input input data
OUTPUT needed output
original dataframe
Updated
Find rows and where x == 'narrative' and move them to columns:
idx = df[df['x'] == 'narrative'].index
df1 = df.drop(idx).assign(narrative=df.loc[idx, 'y'].values).reset_index(drop=True)
Output:
>>> df1
x y z narrative
0 a b c A
1 d a b B

append or join value from one dataframe to every row in another dataframe in Pandas

I'm normally OK on the joining and appending front, but this one has got me stumped.
I've got one dataframe with only one row in it. I have another with multiple rows. I want to append the value from one of the columns of my first dataframe to every row of my second.
df1:
id
Value
1
word
df2:
id
data
1
a
2
b
3
c
Output I'm seeking:
df2
id
data
Value
1
a
word
2
b
word
3
c
word
I figured that this was along the right lines, but it listed out NaN for all rows:
df2 = df2.append(df1[df1['Value'] == 1])
I guess I could just join on the id value and then copy the value to all rows, but I assumed there was a cleaner way to do this.
Thanks in advance for any help you can provide!
Just get the first element in the value column of df1 and assign it to value column of df2
df2['value'] = df1.loc[0, 'value']

Store nth row elements in a list panda dataframe

I am new to python.Could you help on follow
I have a dataframe as follows.
a,d,f & g are column names. dataframe can be named as df1
a d f g
20 30 20 20
0 1 NaN NaN
I need to put second row of the df1 into a list without NaN's.
Ideally as follows.
x=[0,1]
Select the second row using df.iloc[1] then using .dropna remove the nan values, finally using .tolist method convert the series into python list.
Use:
x = df.iloc[1].dropna().astype(int).tolist()
# x = [0, 1]
Check itertuples()
So you would have something like taht:
for row in df1.itertuples():
row[0] #-> that's your index of row. You can do whatever you want with it, as well as with whole row which is a tuple now.
you can also use iloc and dropna() like that:
row_2 = df1.iloc[1].dropna().to_list()

Adding elements to an empty dataframe in pandas

I am new to Python and have a basic question. I have an empty dataframe Resulttable with columns A B and C which I want to keep filling with the answers of some calculations which I run in a loop represented by the loop index n. For ex. I want to store the value 12 in the nth row of column A, 35 in nth row of column B and so on for the whole range of n.
I have tried something like
Resulttable['A'].iloc[n] = 12
Resulttable['B'].iloc[n] = 35
I get an error single positional indexer is out-of-bounds for the first value of n, n=0.
How do I resolve this? Thanks!
You can first create an empty pandas dataframe and then append rows one by one as you calculate. In your range you need to specify one above the highest value you want i.e. range(0, 13) if you want to iterate for 0-12.
import pandas as pd
df = pd.DataFrame([], columns=["A", "B", "C"])
for i in range(0, 13):
x = i**1
y = i**2
z = i**3
df_tmp = pd.DataFrame([(x, y, z)], columns=["A", "B", "C"])
df = df.append(df_tmp)
df = df.reset_index()
This will result in a DataFrame as follows:
df.head()
index A B C
0 0 0 0 0
1 0 1 1 1
2 0 2 4 8
3 0 3 9 27
4 0 4 16 64
There is no way of filling an empty dataframe like that. Since there are no entries in your dataframe something like
Resulttable['A'].iloc[n]
will always result in the IndexError you described.
Instead of trying to fill the dataframe like that you better store the results from your loop in a list which you could call 'result_list'. Then you can create a dataframe using your list like that:
Resulttable= pd.DataFrame({"A": result_list})
If you've got another another list of results you want to store in another column of your dataframe, let's say result_list2, then you can create your dataframe like that:
Resulttable= pd.DataFrame({"A": result_list, "B": result_list2})
If 'Resulttable' has already been created you can add column B like that
Resulttable["B"] = result_list2
I hope I could help you.

Python: create new row based on column names in DataFrame

I would like to know how to make a new row based on the column names row in a python dataframe, and append it to the same dataframe.
example
df = pd.DataFrame(np.random.randn(10, 5),columns=['abx', 'bbx', 'cbx', 'acx', 'bcx'])
I want to create a new row based on the column names that gives:
b | b | b | c | c |by taking the middle char of the column name.
the idea is to use that new row, later, for multi-indexing the columns.
I'm assuming this is what you want as you've not responded, we can append a new row by creating a dict from zipping the df columns and a list comprehension of the middle character (assuming that column name lengths are 3):
In [126]:
df.append(dict(zip(df.columns, [col[1] for col in df])), ignore_index=True)
Out[126]:
abx bbx cbx acx bcx
0 -0.373421 -0.1005462 -0.8280985 -0.1593167 1.335307
1 1.324328 -0.6189612 -0.743703 0.9419248 1.282682
2 0.3730312 -0.06697892 1.113707 -0.9691056 1.779643
3 -0.6644958 1.379606 -0.3751724 -1.135034 0.3287292
4 0.4406139 -0.5767996 -0.2267589 -1.384412 -0.03038372
5 -1.242734 -0.838923 -0.6724592 1.405247 -0.3716862
6 -1.682637 -1.69309 -1.291833 1.781704 0.6321988
7 -0.5793783 -0.6809975 1.03502 -0.6498381 -1.124236
8 1.589016 1.272961 -1.968225 0.5515182 0.3058628
9 -2.275342 2.892237 2.076253 -0.1422845 -0.09776171
10 b b b c c
ix --- lets you read the entire row-- you just say which ever row you want.
then you get your columns and assign them to the raw you want.
See the example below.
virData = DataFrame(df)
virData.columns = virData.ix[1].values
virData.columns

Categories

Resources