Store nth row elements in a list panda dataframe - python

I am new to python.Could you help on follow
I have a dataframe as follows.
a,d,f & g are column names. dataframe can be named as df1
a d f g
20 30 20 20
0 1 NaN NaN
I need to put second row of the df1 into a list without NaN's.
Ideally as follows.
x=[0,1]

Select the second row using df.iloc[1] then using .dropna remove the nan values, finally using .tolist method convert the series into python list.
Use:
x = df.iloc[1].dropna().astype(int).tolist()
# x = [0, 1]

Check itertuples()
So you would have something like taht:
for row in df1.itertuples():
row[0] #-> that's your index of row. You can do whatever you want with it, as well as with whole row which is a tuple now.
you can also use iloc and dropna() like that:
row_2 = df1.iloc[1].dropna().to_list()

Related

append or join value from one dataframe to every row in another dataframe in Pandas

I'm normally OK on the joining and appending front, but this one has got me stumped.
I've got one dataframe with only one row in it. I have another with multiple rows. I want to append the value from one of the columns of my first dataframe to every row of my second.
df1:
id
Value
1
word
df2:
id
data
1
a
2
b
3
c
Output I'm seeking:
df2
id
data
Value
1
a
word
2
b
word
3
c
word
I figured that this was along the right lines, but it listed out NaN for all rows:
df2 = df2.append(df1[df1['Value'] == 1])
I guess I could just join on the id value and then copy the value to all rows, but I assumed there was a cleaner way to do this.
Thanks in advance for any help you can provide!
Just get the first element in the value column of df1 and assign it to value column of df2
df2['value'] = df1.loc[0, 'value']

get value from dataframe based on row values without using column names

I am trying to get a value situated on the third column from a pandas dataframe by knowing the values of interest on the first two columns, which point me to the right value to fish out. I do not know the row index, just the values I need to look for on the first two columns. The combination of values from the first two columns is unique, so I do not expect to get a subset of the dataframe, but only a row. I do not have column names and I would like to avoid using them.
Consider the dataframe df:
a 1 bla
b 2 tra
b 3 foo
b 1 bar
c 3 cra
I would like to get tra from the second row, based on the b and 2 combination that I know beforehand. I've tried subsetting with
df = df.loc['b', :]
which returns all the rows with b on the same column (provided I've read the data with index_col = 0) but I am not able to pass multiple conditions on it without crashing or knowing the index of the row of interest. I tried both df.loc and df.iloc.
In other words, ideally I would like to get tra without even using row indexes, by doing something like:
df[(df[,0] == 'b' & df[,1] == `2`)][2]
Any suggestions? Probably it is something simple enough, but I have the tendency to use the same syntax as in R, which apparently is not compatible.
Thank you in advance
As #anky has suggested, a way to do this without knowing the column names nor the row index where your value of interest is, would be to read the file in a pandas dataframe using multiple column indexing.
For the provided example, knowing the column indexes at least, that would be:
df = pd.read_csv(path, sep='\t', index_col=[0, 1])
then, you can use:
df = df.iloc[df.index.get_loc(("b", 2)):]
df.iloc[0]
to get the value of interest.
Thanks again #anky for your help. If you found this question useful, please upvote #anky 's comment in the posted question.
I'd probably use pd.query for that:
import pandas as pd
df = pd.DataFrame(index=['a', 'b', 'b', 'b', 'c'], data={"col1": [1, 2, 3, 1, 3], "col2": ['bla', 'tra', 'foo', 'bar', 'cra']})
df
col1 col2
a 1 bla
b 2 tra
b 3 foo
b 1 bar
c 3 cra
df.query('col1 == 2 and col2 == "tra"')
col1 col2
b 2 tra

How to drop an empty row in dataframe

I have a dataframe where there is one empty row every 10 rows, it looks like the following
A B C D E
0
1 a b c d e
2 f g h i j
.....
I would like to drop the empty row in the dataframe, but the problem is the row is not filled with empty string " ", they are more like "".
Therefore, the df.fillna and df.dropna both do not work and I'm not sure how to replace them.
Any suggestion would be helpful! Thank you guys!
Filter all rows with no empty values like:
df = df[df.ne('').all(axis=1)]
If it's every 10 rows as you say you may do something like:
df_clean = df[df.index % 10 != 0]
Which will drop every 10th row starting at the first one.

add values to multiple columns in one go with new index - pandas

df = pd.DataFrame(columns=['w','x','y','z'])
I'm trying to insert a new index row by row, and add values to certain columns.
If I were adding one value to a specific column, I could do: df.loc['a','x'] = 2
However what if I'd like to add values to several columns in one go, like this:
{'x':2, 'z':3}
is there a way to do this in pandas?
reindex and assign
df.reindex(['a']).assign(**d)
w x y z
a NaN 2 NaN 3
Where:
d = {'x':2, 'z':3}
df=pd.DataFrame(d,index=['a']).combine_first(df)
w x y z
a NaN 2 NaN 3
Use loc but selecting multiple columns and assign an iterable (like a list or tuple)
df.loc['a',['x','z']] = [2,3]
Or as suggested from #jfaccioni, in case the data is a dictionary d:
df.loc['a', list(d.keys())] = list(d.values())

Adding elements to an empty dataframe in pandas

I am new to Python and have a basic question. I have an empty dataframe Resulttable with columns A B and C which I want to keep filling with the answers of some calculations which I run in a loop represented by the loop index n. For ex. I want to store the value 12 in the nth row of column A, 35 in nth row of column B and so on for the whole range of n.
I have tried something like
Resulttable['A'].iloc[n] = 12
Resulttable['B'].iloc[n] = 35
I get an error single positional indexer is out-of-bounds for the first value of n, n=0.
How do I resolve this? Thanks!
You can first create an empty pandas dataframe and then append rows one by one as you calculate. In your range you need to specify one above the highest value you want i.e. range(0, 13) if you want to iterate for 0-12.
import pandas as pd
df = pd.DataFrame([], columns=["A", "B", "C"])
for i in range(0, 13):
x = i**1
y = i**2
z = i**3
df_tmp = pd.DataFrame([(x, y, z)], columns=["A", "B", "C"])
df = df.append(df_tmp)
df = df.reset_index()
This will result in a DataFrame as follows:
df.head()
index A B C
0 0 0 0 0
1 0 1 1 1
2 0 2 4 8
3 0 3 9 27
4 0 4 16 64
There is no way of filling an empty dataframe like that. Since there are no entries in your dataframe something like
Resulttable['A'].iloc[n]
will always result in the IndexError you described.
Instead of trying to fill the dataframe like that you better store the results from your loop in a list which you could call 'result_list'. Then you can create a dataframe using your list like that:
Resulttable= pd.DataFrame({"A": result_list})
If you've got another another list of results you want to store in another column of your dataframe, let's say result_list2, then you can create your dataframe like that:
Resulttable= pd.DataFrame({"A": result_list, "B": result_list2})
If 'Resulttable' has already been created you can add column B like that
Resulttable["B"] = result_list2
I hope I could help you.

Categories

Resources