Indexing pandas dataframe when column names are integers

Indexing pandas dataframe when column names are integers - python

I don't seem to be able to subset data using integer column names using loc command
# 6*4 data set with column names as x,y,8,9
df = pd.DataFrame(np.random.randint(0,10,(6,4)),
index=('a','b','c','1','2','3'),
columns=['x','y', 8, 9])
df2 = df.loc[:,:'x']
df3 = df.loc[:,:'8']
df2 works but df3 throws error.

You can do either:
df3 = df.loc[:,8]
To get only column 8
Or:
df3 = df.loc[:,df.columns[:list(df.columns).index(8)+1]]
To get all columns until column 8 (inclusive - remove +1 to get exclusive).

Related

combining dataframes and adding values on common date index

I have many dataframes with one column (same name in all) whose indexes are date ranges - I want to merge/combine these dataframes into one, summing the values where any dates are common. below is a simplified example
range1 = pd.date_range('2021-10-01','2021-11-01')
range2 = pd.date_range('2021-11-01','2021-12-01')
df1 = pd.DataFrame(np.random.rand(len(range1),1), columns=['value'], index=range1)
df2 = pd.DataFrame(np.random.rand(len(range2),1), columns=['value'], index=range2)
here '2021-11-01' appears in both df1 and df2 with different values
I would like to obtain a single dataframe of 62 rows (32+31-1) where the 2021-11-01 date contains the sum of its values in df1 and df2

We can use pd.concate() on the two dataframes, then df.reset_index() to get a new regular-integer index, rename the date column, and then use df.groupby().sum().
df = pd.concat([df1,df2]) # this gives 63 rows by 1 column, where the column is the values and the dates are the index
df = df.reset_index() # moves the dates to a column, now called 'index', and makes a new integer index
df = df.rename(columns={'index':'Date'}) #renames the column
df.groupby('Date').sum()

comapre value in two dataframe for alerting

I have df like below with:-
import pandas as pd
# initialize list of lists
data = [[0, 2, 3],[0,2,2],[1,1,1]]
# Create the pandas DataFrame
df1 = pd.DataFrame(data, columns = ['10028', '1090','1058'])
The clauses are the column names are dynamic sometimes it's 3 columns and sometimes it's 5 columns sometimes 1 column.
and I have on other df which is telling me the anomaly
# initialize list of lists
data = [[0,1,1]]
# Create the pandas DataFrame
df2 = pd.DataFrame(data, columns = ['10028', '1090','1058'])
Now if any of the columns in df2 is having value 1 it means it's an anomaly then I have to alert. the only clause is I want to check if 1090 is 1 in df2 then the value of 1090 in df1 and if it's less than 4 then do nothing
As of now, I am doing it like this:-
if df2.any(axis=1).any() == True:
print("alert")

Split pandas dataframe rows up to searched column value into new dataframes

I have a dataframe that contains multiple header rows (a combination of multiple csvs). Is there a way to split the dataframe back into individual dataframes without using .iloc? iloc works, but will be time consuming for my workflow.
data = {'A': [1,2,3,'A',4,5,6,'A',7,8,9],
'B': [9,8,7,'B',6,5,4,'B',3,2,1]}
df = pd.DataFrame(data, columns = ['A','B'])
## My current approach:
df1 = df.iloc[:3,]
df2 = df.iloc[4:7,]
df3 = df.iloc[8:,]
Is there a better way to split the data frame by searching for the values in the columns? i.e. something like df1,df2,df3 = df.split(df['A']=='A')

One can use eq to check for the header rows, then groupby on the cumsum:
header_rows = df.eq(df.columns).all(1)
dfs = {k:v for k,v in df[~header_rows].groupby(header_rows.cumsum())}
then, for example dfs[0] gives:
A B
0 1 9
1 2 8
2 3 7

Sum of columns from two data frames that contain float values

I have two data frames.
The columns name are the same of those data frames.
I want to sum the float values of the same columns from dataframes
Then I can use
df3 = df1.add(df2)
However, my dataframes contain two colums of string. These strings are added too.
How can I wrtie the code not to add the string but to add the float in two data frames
The two sample dataframes are as follow:
df1 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[1,2,3,4]),index=[0,1,2,3])
df2 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[3,1,2,4]),index=[0,1,2,3])
When I used df3 = df1.add(df2)
it also added the string in column "Team" as follow:
Team Value
0 AA 4
1 BB 3
2 CC 5
3 DD 8
How can I write code without adding the Team but the Value.
Thanks,
Zep

Use the team names as indices instead of integer indices:
In [2]: df1 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[1,2,3,4])).set_index('Team')
...: df2 = pd.DataFrame(dict(Team=['A','B','C','D'],Value=[3,1,2,4])).set_index('Team')
In [3]: df1 + df2
Out[3]:
Value
Team
A 4
B 3
C 5
D 8
In case you have multiple other columns, just sum the columns:
total = df1['Value'] + df2['Value']
If, in addition, you need a dataframe of the same shape as df1 and df2 with Value replaced by the sum, you can do
df3 = df1.copy()
df3['Value'] = total

How to compare two dataframes cell by cell?

I have two dataframes containing the result of a corr() from different parts of a single source (csv). Now I want to compare all the values in the two dataframes to check if they are equal or even if they fall within a particular range. So the puseudo code would be something like:
df1['column1']['row1'] == df2['column1']['row1']
Is there a simple way of doing this in Pandas?

You have many ways to do that. One of the ways I follow is as below:
df3 = df2[df1.ne(df2).any(axis=1)]
df3 will list out all the rows in which atleast one cell will not match.
FYI, ne here stands for not equal.
Example:
create df1
data = [['batman', 10], ['joker', 15], ['alfred', 14]]
df1 = pd.DataFrame(data, columns = ['Name', 'Age'])
create df2 which is slightly different from df1
data = [['batman', 10], ['joker', 6], ['alfred', 17]]
df2 = pd.DataFrame(data, columns = ['Name', 'Age'])
extract the rows with atleast one unequal cell
df3 = df2[df1.ne(df2).any(axis=1)]
df3
print the the resultant df3
Name Age
1 joker 6 // the age is different in df1 and df2 for joker
2 alfred 17 // the age is different in df1 and df2 for alfred
Now, from the resultant dataframe, you can check the range requirements as per your business case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing pandas dataframe when column names are integers - python

You can do either: df3 = df.loc[:,8] To get only column 8 Or: df3 = df.loc[:,df.columns[:list(df.columns).index(8)+1]] To get all columns until column 8 (inclusive - remove +1 to get exclusive).

Related

combining dataframes and adding values on common date index

comapre value in two dataframe for alerting

Split pandas dataframe rows up to searched column value into new dataframes

Sum of columns from two data frames that contain float values

How to compare two dataframes cell by cell?

Categories

Resources