I know that you can pull out a single column from a datframe to a list by doing this:
newList = df['column1'].tolist()
and that you can convert all values to a list like this:
newList = df.values.tolist()
But is there a way to convert 2 columns from a dataframe to a list so that you get a list that looks like this
Column 1 Column 2
0 apple 9
1 peach 12
and the resulting list is:
[[apple,9],[peach,12]]
Thanks
As per your example, you can convert a pandas DataFrame to a list with df.values.tolist().
If you want just specific columns, you just need to change df in this code to df containing only those columns, as df[[column1, column2, ..., columnN]].values.tolist()
You can use zip:
[list(i) for i in zip(df['Column 1'], df['Column 2'])]
Output
[[apple,9],[peach,12]]
To convert the entire data frame to a list of lists:
lst = df.to_numpy().tolist()
Related
I have an excel spreadsheet with raw data in:
demo-data:
1
2
3
4
5
6
7
8
9
How do I combine all the numbers to one series, so I can start doing math on it. They are all just numbers of the same "kind"
Given your dataframe as df, this function may help df.values.flatten().
You can convert your dataframe to a list and iterate through it to extract and put values into a 1D list:
df = pd.read_excel("data.xls")
lst = df.to_numpy().tolist()
result = []
for row in lst:
for item in row:
result.append(item)
I am new to python.Could you help on follow
I have a dataframe as follows.
a,d,f & g are column names. dataframe can be named as df1
a d f g
20 30 20 20
0 1 NaN NaN
I need to put second row of the df1 into a list without NaN's.
Ideally as follows.
x=[0,1]
Select the second row using df.iloc[1] then using .dropna remove the nan values, finally using .tolist method convert the series into python list.
Use:
x = df.iloc[1].dropna().astype(int).tolist()
# x = [0, 1]
Check itertuples()
So you would have something like taht:
for row in df1.itertuples():
row[0] #-> that's your index of row. You can do whatever you want with it, as well as with whole row which is a tuple now.
you can also use iloc and dropna() like that:
row_2 = df1.iloc[1].dropna().to_list()
How do I append a list of integers as new columns to each row in a dataframe in Pandas?
I have a dataframe which I need to append a 20 column sequence of integers as new columns. The use case is that I'm translating natural text in a cell of the row into a sequence of vectors for some NLP with Tensorflow.
But to illustrate, I create a simple data frame to append:
df = pd.DataFrame([(1, 2, 3),(11, 12, 13)])
df.head()
Which generates the output:
And then, for each row, I need to pass a function that takes in a particular value in the column '2' and will return an array of integers that need to be appended as columns in the the data frame - not as an array in a single cell:
def foo(x):
return [x+1, x+2, x+3]
Ideally, to run a function like:
df[3, 4, 5] = df['2'].applyAsColumns(foo)
The only solution I can think of is to create the data frame with 3 blank columns [3,4,5] , and then use a for loop to iterate through the blank columns and then input them as values in the loop.
Is this the best way to do it, or is there any functions built into Pandas that would do this? I've tried checking the documentation, but haven't found anything.
Any help is appreciated!
IIUC,
def foo(x):
return pd.Series([x+1, x+2, x+3])
df = pd.DataFrame([(1, 2, 3),(11, 12, 13)])
df[[3,4,5]] = df[2].apply(foo)
df
Output:
0 1 2 3 4 5
0 1 2 3 4 5 6
1 11 12 13 14 15 16
I have the following dataset;
I would like to end up with a column like this one;
Ideally, I would like to convert the columns to the same case and split the strings by spaces and return rows that contain a substring that is found on the other column.
Check values of Series.str.splited first column by DataFrame.isin with flatten splitted values of second column and get at least one True value per row by DataFrame.any, pass to boolean indexing with filter first column and if necessary create one column Dataframe by Series.to_frame:
df = pd.DataFrame({'column_a':['ga lt','ka','ku','na ma',np.nan, np.nan],
'column_b':['se','ga','ma po','na','ka ch', 'wa wo']})
vals = [y for x in df['column_b'] for y in x.split()]
mask = df['column_a'].str.split(expand=True).isin(vals).any(axis=1)
df = df.loc[mask, 'column_a'].to_frame('column_a_in_column_b')
print (df)
column_a_in_column_b
0 ga lt
1 ka
3 na ma
Given the following list:
list=['a','b','c']
I'd like to create a data frame where the list is the column of values.
I'd like the header to be "header".
Like this:
header
a
b
c
Thanks in advance!
Wouldn't that be:
list=['a','b','c']
df= pd.DataFrame({'header': list})
header
0 a
1 b
2 c