I would like to find the numeric difference between two or more columns of two different dataframe.
The following
would be the starting table.
This one Table (Table 2)
contains the single values that I need to subtract to Table 1.
I would like to get a third table where I get the numeric differences between each row of Table 1 and the single row from Table 2. Any help?
Try
df.subtract(df2.values)
with df being your starting table and df2 being Table 2.
Can you try this and see if this is what you need:
import pandas as pd
df = pd.DataFrame({'A':[5, 3, 1, 2, 2], 'B':[2, 3, 4, 2, 2]})
df2 = pd.DataFrame({'A':[1], 'B':[2]})
pd.DataFrame(df.values-df2.values, columns=df.columns)
Out:
A B
0 4 0
1 2 1
2 0 2
3 1 0
4 1 0
you can just do df1-df2.values like below this will use numpy broadcast to substract all df2 from all rows but df2 must have only one row
example
df1 = pd.DataFrame(np.arange(15).reshape(-1,3), columns="A B C".split())
df2 = pd.DataFrame(np.ones(3).reshape(-1,3), columns="A B C".split())
df1-df2.values
Related
I have this dataset
In [4]: df = pd.DataFrame({'A':[1, 2, 3, 4, 5]})
In [5]: df
Out[5]:
A
0 1
1 2
2 3
3 4
4 5
I want to add a new column in dataset based em last value of item, like this
A
New Column
1
2
1
3
2
4
3
5
4
I tryed to use apply with iloc, but it doesn't worked
Can you help
Thank you
With your shown samples, could you please try following. You could use shift function to get the new column which will move all elements of given column into new column with a NaN in first element.
import pandas as pd
df['New_Col'] = df['A'].shift()
OR
In case you would like to fill NaNs with zeros then try following, approach is same as above for this one too.
import pandas as pd
df['New_Col'] = df['A'].shift().fillna(0)
import pandas as pd
import numpy as np
df=pd.DataFrame(np.array([['M',1, 1, 2, 3],
['F', 2, 4, 5, 6], ['M', 3, 7, 8, 9]]),columns=['SEX','AGE','A','B','C'])
dfm=pd.melt(df,id_vars=('SEX','AGE'),value_vars=list(df.columns[2:]),
var_name='LOCATION',value_name='DEATHS')
Based on the code provided i can create a basic table and melt the tables from df to dfm using the 'AGE' and 'SEX' as id variables.
Is there a simple way of reverting this table back to its original format ?
Going from dfm > df assuming i do not have df.
many thanks
The pivot_table method should allow you to return to the original dataframe
# Change data types from object integer
dfm[['AGE', 'DEATHS']] = dfm[['AGE', 'DEATHS']].astype(int)
# Pivot dataframe to "undo melt"
reshaped = dfm.pivot_table(index=['SEX', 'AGE'],columns=['LOCATION'],
values='DEATHS')
# Reset index to flatten dataframe
reshaped.reset_index(inplace=True)
# Change column name attribute to blank
reshaped.columns.rename('',inplace=True)
SEX AGE A B C
0 F 2 4 5 6
1 M 1 1 2 3
2 M 3 7 8 9
I have a dataframe, and series of the same vertical size as df, I want to assign
that series to ALL columns of the DataFrame.
What is the natural why to do it ?
For example
df = pd.DataFrame([[1, 2 ], [3, 4], [5 , 6]] )
ser = pd.Series([1, 2, 3 ])
I want all columns of "df" to be equal to "ser".
PS Related:
One way to solve it via answer:
How to assign dataframe[ boolean Mask] = Series - make it row-wise ? I.e. where Mask = true take values from the same row of the Series (creating all true mask), but I guess there should be some more
simple way.
If I need NOT all, but SOME columns - the answer is given here:
Assign a Series to several Rows of a Pandas DataFrame
Use to_frame with reindex:
a = ser.to_frame().reindex(columns=df.columns, method='ffill')
print (a)
0 1
0 1 1
1 2 2
2 3 3
But it seems easier is solution from comment, there was added columns parameter if need same order columns as original with real data:
df = pd.DataFrame({c:ser for c in df.columns}, columns=df.columns)
Maybe a different way to look at it:
df = pd.concat([ser] * df.shape[1], axis=1)
I'm trying to re-insert back into a pandas dataframe a column that I extracted and of which I changed the order by sorting it.
Very simply, I have extracted a column from a pandas df:
col1 = df.col1
This column contains integers and I used the .sort() method to order it from smallest to largest. And did some operation on the data.
col1.sort()
#do stuff that changes the values of col1.
Now the indexes of col1 are the same as the indexes of the overall df, but in a different order.
I was wondering how I can insert the column back into the original dataframe (replacing the col1 that is there at the moment)
I have tried both of the following methods:
1)
df.col1 = col1
2)
df.insert(column_index_of_col1, "col1", col1)
but both methods give me the following error:
ValueError: cannot reindex from a duplicate axis
Any help will be greatly appreciated.
Thank you.
Consider this DataFrame:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [6, 5, 4]}, index=[0, 0, 1])
df
Out:
A B
0 1 6
0 2 5
1 3 4
Assign the second column to b and sort it and take the square, for example:
b = df['B']
b = b.sort_values()
b = b**2
Now b is:
b
Out:
1 16
0 25
0 36
Name: B, dtype: int64
Without knowing the exact operation you've done on the column, there is no way to know whether 25 corresponds to the first row in the original DataFrame or the second one. You can take the inverse of the operation (take the square root and match, for example) but that would be unnecessary I think. If you start with an index that has unique elements (df = df.reset_index()) it would be much easier. In that case,
df['B'] = b
should work just fine.
So I created two dataframes from existing CSV files, both consisting of entirely numbers. The second dataframe consists of an index from 0 to 8783 and one column of numbers and I want to add it on as a new column to the first dataframe which has an index consisting of a month, day and hour. I tried using append, merge and concat and none worked and then tried simply using:
x1GBaverage['Power'] = x2_cut
where x1GBaverage is the first dataframe and x2_cut is the second. When I did this it added x2_cut on properly but all the values were entered as NaN instead of the numerical values that they should be. How should I be approaching this?
x1GBaverage['Power'] = x2_cut.values
problem solved :)
The thing about pandas is that values are implicitly linked to their indices unless you deliberately specify that you only need the values to be transferred over.
If they're the same row counts and you just want to tack it on the end, the indexes either need to match, or you need to just pass the underlying values. In the example below, columns 3 and 5 are the index matching & value versions, and 4 is what you're running into now:
In [58]: df = pd.DataFrame(np.random.random((3,3)))
In [59]: df
Out[59]:
0 1 2
0 0.670812 0.500688 0.136661
1 0.185841 0.239175 0.542369
2 0.351280 0.451193 0.436108
In [61]: df2 = pd.DataFrame(np.random.random((3,1)))
In [62]: df2
Out[62]:
0
0 0.638216
1 0.477159
2 0.205981
In [64]: df[3] = df2
In [66]: df.index = ['a', 'b', 'c']
In [68]: df[4] = df2
In [70]: df[5] = df2.values
In [71]: df
Out[71]:
0 1 2 3 4 5
a 0.670812 0.500688 0.136661 0.638216 NaN 0.638216
b 0.185841 0.239175 0.542369 0.477159 NaN 0.477159
c 0.351280 0.451193 0.436108 0.205981 NaN 0.205981
If the row counts differ, you'll need to use df.merge and let it know which columns it should be using to join the two frames.