I wanna create sub dataframes from a main one
My main dataframe will look more or less like this one enter image description here
I wanna be able to have as a sub-dataframe like the following :
first sub dataframe
second sub dataframe
and all the rest for example in an another frame. My goal is to transform correctly my big dataframe into sub dataframe.
Any help will be appreciated, thanks :-)
You can take a rectangular section with numeric indices like this:
df.iloc[4:8, 0:8] # four rows, eight columns
Or you can use loc with column and row labels (but your data seem to be numerically labeled).
Related
I have created a dataframe with pandas.
There are more than 1000 rows
I want to merge rows of overlapping columns among them.
For convenience, there are example screenshots made in Excel.
I want to make that form in PYTHON.
I want to make the above data like below
This should be as simple as setting the index.
df = df.set_index('Symbol', append=True).swaplevel(0,1)
Output should be as desired.
I have written the following codes in three separate cells in my jupyter notebook and have been able to generate the output I want. However, having this information in one dataframe will make it much easier to read.
How can I combine these separate dataframes into one so that the member_casual column is the index with max_ride_length, avg_ride_length and most_active_day_of_week columns next to it in the same dataframe?
Malo is correct. I will expand a little bit because you can also name the columns when they are aggregated:
df.groupby('member_casual').agg(max_ride_length=('ride_length','max'), avg_ride_length=('ride_length','mean'), most_active_day_of_the_week=('day_of_week',pd.Series.mode))
In the doc https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html
agg accepts a list a function as in the example:
df.groupby('A').agg(['min', 'max'])
I'm attempting to have a script in pandas where it will take the top 6 values from a column, transfer them to the the bottom of a different column, and shift the values of the original column up. An example is below. Thanks!
The original table is the following:
And I need it to end up looking like this.
I think I might be over complicating this but essentially what I am trying to do is take the data-frame below and group by the unique values in the "MATNR_BATCH" column and create another data frame with the columns: "STORAGE_BIN", "FULL_IND" & "PRCNT_UTIL", "MAX_NO_SU_IN_SB", "NO_SU_IN_SB":
From something like this:
To something like this:
From here what I would like to do is only filter on the "groups" (MATNR_BATCH) that have a mix of "FULL" and "NF" values in the "FULL_IND" column. So basically, I would like to create a data-frame that only has the unique "MATNR_BATCH" (groups) that have a combination of both "FULL" and "NF" in them.
Can anyone please help me out with this? I have been struggling to come up with a way to do this in python. Is groupby the right function to use or should I try and take a different approach?
As a first pass do
df1 = df[(df.FULL_IND=='FULL')| (df.FULL_IND=='NF')]
And then carry on. I can't quite figure out what you want to do with the other columns.
I don't know whether this is a very simple qustion, but I would like to do a condition statement based on two other columns.
I have two columns like: the age and the SES and the another empty column which should be based on these two columns. For example when one person is 65 years old and its corresponding socio-economic status is high, then in the third column(empty column=vitality class) a value of 1 is for example given. I have got an idea about what I want to achieve, however I have no idea how to implement that in python itself. I know I should use a for loop and I know how to write conditons, however due to the fact that I want to take two columns into consideration for determining what will be written in the empty column, I have no idea how to write that in a function
and furthermore how to write back into the same csv (in the respective empty column)
[]
Use the pandas module to import the csv as a DataFrame object. Then you can do logical statements to fill empty columns:
import pandas as pd
df = pd.read_csv('path_to_file.csv')
df.loc[(df['age']==65) & (df['SES']=='high'), 'vitality_class'] = 1
df.to_csv('path_to_new_file.csv', index=False)