How do you take 2 columns from a dataframe and create a series (1 column as index)?
number a
one 1
two 2
three 3
if the above was a dataframe, how would I convert it to a series with number column being the index?
I tried:
pd.Series(df['a'], index = df.number)
but all the values become nan.
Need set_index and select column a:
s = df.set_index('number')['a']
And for your solution is necessary add values for numpy array for avoid alignment:
s = pd.Series(df['a'].values, index = df.number)
Related
I have two Pandas dataframes. They all contain 3 decimal point float values.
Dataframe A is a one column dataframe with 12 rows. Dataframe B is a one column dataframe with over 40,000 rows, which contain the 12 values in Dataframe A spread out randomly.
I need to find the indices of the values in Dataframe A within Dataframe B.
I have tried .query(), .index.value() and .where() but am unable to return the indices.
Dataframe A
Row Index
Time
0
148.521
1
112.379
...
...
12
510.121
Dataframe B
Row Index
Time
0
0.000
1
0.025
...
...
46871
1171.675
You can use df.loc[]
for i in dataframe_A['Time']:
dataframe_B.loc[dataframe_B['Time'] == i]
This should return the twelve values along with their row index from dataframe B
I'm working with 2 dataframes, A & B. Dataframe A is populated with values, while dataframe B is empty except for a header structure
I want to take the value of column in dataframe A, and append them to the corresponding column in dataframe B.
I've placed the values of the dataframe A column I want to append in a list. I 've tried setting the destination column values to equal the list of start column values, but that gives me the following error:
dataframeB[x] = list(dataframeA[A])
This yields the following error:
ValueError: Length of values does not match length of index
The result I expect is
Dataframe A's column A transfers over to Dataframe B's column x
A B C D
1 2 3 4
1 2 3 4
Dataframe B
x y
- -
Create the dataframe with the data already in it...
dataframeB = pd.DataFrame(dataframeA['A'], columns = ['x'])
Then you can add columns in from the other dataframe:
dataframeB['y'] = dataframeA['B']
Result:
x y
1 2
1 2
I have a 36 rows x 36 columns dataframe of pivot table which I transform using code below:
df_pivoted = pd.pivot_table(df,index='From',columns='To',values='count')
df_pivoted.fillna(0,inplace=True)
I transpose the same dataframe using this code:
df_trans = df_pivoted.transpose()
and want to substract those two dataframes with this code:
new_pivoted = df_pivoted - df_trans
It gives me 72 rows x 72 columns dataframe with NaN value in all cell.
Then I try to use other code:
delta = df_pivoted.subtract(df_trans, fill_value=0)
However, it yields 72 rows x 72 columns with dataframe that looks like this:
Please help me to find the difference between the original dataframe with the transpose dataframe.
After transforming of you DataFrame (pivot table) you have new DataFrame where columns become Indices and vise versa. Now when you subtract on df from another Pandas use columns and Indices and fill NaN in the rest.
if you need to subtract values no matter of index and columns use:
delta = df_pivoted.values - df_trans.values
If you want to keep Columns and Index of df_trans in df_pivoted:
df_trans = pd.DataFrame(data=df_pivoted.transpose().values,
index=df_pivoted.index,
columns = df_pivoted.columns)
delta = df_pivoted - df_trans
Now simple subtraction works.
Hope that helps!
I'm trying to create a column of microsatellite motifs in a pandas dataframe. I have one column that gives the length of the motif and another that has the whole microsatellite.
Here's an example of the columns of interest.
motif_len sequence
0 3 ATTATTATTATT
1 4 ATCTATCTATCT
2 3 ATCATCATCATC
I would like to slice the values in sequence using the values in motif_len to give a single repeat(motif) of each microsatellite. I'd then like to add all these motifs as a third column in the data frame to give something like this.
motif_len sequence motif
0 3 ATTATTATTATT ATT
1 4 ATCTATCTATCT ATCT
2 3 ATCATCATCATC ATC
I've tried a few things with no luck.
>>df['motif'] = df.sequence.str[:df.motif_len]
>>df['motif'] = df.sequence.str[:df.motif_len.values]
Both make the motif column but all the values are NaN.
I think I understand why these don't work. I'm passing a series/array as the upper index in the slice rather than the a value from the mot_len column.
I also tried to create a series by iterating through each
Any ideas?
You can call apply on the df pass axis=1 to apply row-wise and use the column values to slice the str:
In [5]:
df['motif'] = df.apply(lambda x: x['sequence'][:x['motif_len']], axis=1)
df
Out[5]:
motif_len sequence motif
0 3 ATTATTATTATT ATT
1 4 ATCTATCTATCT ATCT
2 3 ATCATCATCATC ATC
Let say I have a list of 6 integers named ‘base’ and a dataframe of 100,000 rows with 6 columns of integers as well.
I need to create an additional column which show frequency of occurences of the list ‘base’ against each row in the dataframe data.
The sequence of integers both in the list ‘base’ and dataframe are to be ignored in this case.
The occurrence frequency can have a value ranging from 0 to 6.
0 means all 6 integers in list ‘base’ does not match any of 6 columns from a row in the dataframe.
Can anyone shed some light on this please ?
you can try this:
import pandas as pd
# create frame with six columns of ints
df = pd.DataFrame({'a':[1,2,3,4,10],
'b':[8,5,3,2,11],
'c':[3,7,1,8,8],
'd':[3,7,1,8,8],
'e':[3,1,1,8,8],
'f':[7,7,1,8,8]})
# list of ints
base =[1,2,3,4,5,6]
# define function to count membership of list
def base_count(y):
return sum(True for x in y if x in base)
# apply the function row wise using the axis =1 parameter
df.apply(base_count, axis=1)
outputs:
0 4
1 3
2 6
3 2
4 0
dtype: int64
then assign it to a new column:
df['g'] = df.apply(base_count, axis=1)