How to create a dataframe from a single row of another dataframe? - python

I'd like to create a dataframe from a select row of another dataframe, e.g.,
import pandas as pd
df = pd.DataFrame({"col1": [1, 2], "col2": [0, 1]})
# I want to create df1 such that df1 is a dataframe from the first row of df
df1 = df.iloc[0] # produces a pandas Series which is not what i want
This one seems to work but seems to involve a lot of (unnecessary?) ops?
df1 = pd.DataFrame(df.iloc[0]).T

Related

Mapping a multiindex dataframe to another using row ID

I have two dataframes of different shape
The 'ANTENNA1' and 'ANTENNA2' in the bigger dataframe correspond to the ID columns in the smaller dataframe. I want to create merge the smaller dataframe to the bigger one so that the bigger dataframe will have '(POSITION, col1)', '(POSITION, col2)', '(POSITION, col3)' according to ANTENNA1 == ID
Edit: I tried with pd.merge but it is changing the original dataframe column values
Original:
df = pd.merge(df_main, df_sub, left_on='ANTENNA1', right_on ='id', how = 'left')
Result:
I want to keep the original dataframe columns as it is.
Assuming your first dataframe (with positions) is called df1, and the second is called df2, with your loaded data, you could just use pandas.DataFrame.merge: ( -> pd.merge(...) )
df = pd.merge(df1,df2,left_on='id', right_on='ANTENNA1')
Than you might select the df on your needed columns(col1,col2,..) to get the desired result df[["col1","col2",..]].
simple example:
# import pandas as pd
import pandas as pd
# creating dataframes as df1 and df2
df1 = pd.DataFrame({'ID': [1, 2, 3, 5, 7, 8],
'Name': ['Sam', 'John', 'Bridge',
'Edge', 'Joe', 'Hope']})
df2 = pd.DataFrame({'id': [1, 2, 4, 5, 6, 8, 9],
'Marks': [67, 92, 75, 83, 69, 56, 81]})
# merging df1 and df2 by ID
# i.e. the rows with common ID's get
# merged i.e. {1,2,5,8}
df = pd.merge(df1, df2, left_on="ID", right_on="id")
print(df)

Python, unlist array in DataFrame

I have a dataframe df with 1 column
inside each row of this column I have a np.array object with 3 elements
Is there a way to unlist this array and create a DataFrame with 3 columns for easy manipulation.
Your question was already asked and answered here:
enter link description here
Anyway, let's say your dataframe looks like this
import pandas as pd, numpy as np
df = pd.DataFrame({'A': [ np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]})
df
Create New dataframe df1, converting column A of original df as a list
df1 = pd.DataFrame(columns = ['A1', 'A2', 'A3'], data = list(df['A']))
df1

Picking out certain indexes from a pandas data frame

I have a pandas data frame with hundreds of entries and an array of random entries in the array. For example:
import pandas as pd
list1 = [13,2,32,34,15,7,19]
list2 = [15,65,95,9,90,88,10]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
cols = [df1, df2]
df1.loc[:, cols]
and I have another array called
M =[1, 2, 5, 6, 9]
where these are the indexes of the pandas data frame I want, is there a way to create a new table that picks out only the rows that match the index given by the array M?
import pandas as pd
list1 = [13,2,32,34,15,7,19]
df1 = pd.DataFrame(list1)
M =[1, 2, 5, 6]
df1[df1.index.isin(M)]
Note that in your problem statement, cols is a list of dataframes, not a two-column dataframe. I am not sure if that was not clear from your code and question.

Organizing a MultiIndex DataFrame after set_index [duplicate]

When there is a DataFrame like the following:
import pandas as pd
df = pd.DataFrame(1, index=[100, 29, 234, 1, 150], columns=['A'])
How can I sort this dataframe by index with each combination of index and column value intact?
Dataframes have a sort_index method which returns a copy by default. Pass inplace=True to operate in place.
import pandas as pd
df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=['A'])
df.sort_index(inplace=True)
print(df.to_string())
Gives me:
A
1 4
29 2
100 1
150 5
234 3
Slightly more compact:
df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=['A'])
df = df.sort_index()
print(df)
Note:
sort has been deprecated, replaced by sort_index for this scenario
preferable not to use inplace as it is usually harder to read and prevents chaining. See explanation in answer here:
Pandas: peculiar performance drop for inplace rename after dropna
If the DataFrame index has name, then you can use sort_values() to sort by the name as well. For example, if the index is named lvl_0, you can sort by this name. This particular case is common if the dataframe is obtained from a groupby or a pivot_table operation.
df = df.sort_values('lvl_0')
If the index has name(s), you can even sort by both index and a column value. For example, the following sorts by both the index and the column A values:
df = df.sort_values(['lvl_0', 'A'])
If you have a MultiIndex dataframe, then, you can sort by the index level by using the level= parameter. For example, if you want to sort by the second level in descending order and the first level in ascending order, you can do so by the following code.
df = df.sort_index(level=[1, 0], ascending=[False, True])
If the indices have names, again, you can call sort_values(). For example, the following sorts by indexes 'lvl_1' and 'lvl_2'.
df = df.sort_values(['lvl_1', 'lvl_2'])

How to sort a Pandas DataFrame by index?

When there is a DataFrame like the following:
import pandas as pd
df = pd.DataFrame(1, index=[100, 29, 234, 1, 150], columns=['A'])
How can I sort this dataframe by index with each combination of index and column value intact?
Dataframes have a sort_index method which returns a copy by default. Pass inplace=True to operate in place.
import pandas as pd
df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=['A'])
df.sort_index(inplace=True)
print(df.to_string())
Gives me:
A
1 4
29 2
100 1
150 5
234 3
Slightly more compact:
df = pd.DataFrame([1, 2, 3, 4, 5], index=[100, 29, 234, 1, 150], columns=['A'])
df = df.sort_index()
print(df)
Note:
sort has been deprecated, replaced by sort_index for this scenario
preferable not to use inplace as it is usually harder to read and prevents chaining. See explanation in answer here:
Pandas: peculiar performance drop for inplace rename after dropna
If the DataFrame index has name, then you can use sort_values() to sort by the name as well. For example, if the index is named lvl_0, you can sort by this name. This particular case is common if the dataframe is obtained from a groupby or a pivot_table operation.
df = df.sort_values('lvl_0')
If the index has name(s), you can even sort by both index and a column value. For example, the following sorts by both the index and the column A values:
df = df.sort_values(['lvl_0', 'A'])
If you have a MultiIndex dataframe, then, you can sort by the index level by using the level= parameter. For example, if you want to sort by the second level in descending order and the first level in ascending order, you can do so by the following code.
df = df.sort_index(level=[1, 0], ascending=[False, True])
If the indices have names, again, you can call sort_values(). For example, the following sorts by indexes 'lvl_1' and 'lvl_2'.
df = df.sort_values(['lvl_1', 'lvl_2'])

Categories

Resources