Right way to reindex a dataframe? [duplicate] - python

This question already has answers here:
How to reset index in a pandas dataframe? [duplicate]
(3 answers)
Closed 1 year ago.
I have a large dataset which I filtered by location. The end result is something like this:
column 1 column 2
0 a 1
106 b 2
178 c 3
I guessed that the index values are skipping all over the place since the all the columns with the same locations aren't consecutive. To reset the indices, I did df.reindex(index = np.arange(len(df))), and it worked... but broke everything else. The output is this:
column 1 column 2
0 a 1
1 NAN NAN
12 NAN NAN
I don't have any idea why this is happening, and how I can fix this. Thanks for any help provided!

Use reset_index:
>>> df.reset_index(drop=True)
column 1 column 2
0 a 1
1 b 2
2 c 3

Related

Pandas Split String in colum into colums with 0/1 ; get dummies on all characters of a string [duplicate]

This question already has an answer here:
Quickest way to make a get_dummies type dataframe from a column with a multiple of strings
(1 answer)
Closed last year.
Very likely this question was answered, but I can not find a good title. I have this pandas data structure, based on a given excel sheet:
other columns
Code
...
ABC
...
CAB
...
R
I want to get this:
other columns
A
B
C
R
...
1
1
1
0
...
1
1
1
0
...
0
0
0
1
Of course, I could iterate over each row and so this manually, but all ideas in my head will either be slow or memory consuming or both.
What is the one line solution here?
You can use str.get_dummies with an empty separator to get all letters:
df['Code'].str.get_dummies(sep='')
joining to original data:
df2 = df.drop('Code', axis=1).join(df['Code'].str.get_dummies(sep=''))
output:
other columns A B C R
0 ... 1 1 1 0
1 ... 1 1 1 0
2 ... 0 0 0 1

How to use groupby to create repeating index for each group in a Dataframe? [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 1 year ago.
When using groupby(), how can I create a DataFrame with a new column containing an increasing index of each group. For example, if I have
df=pd.DataFrame('a':[1,1,1,2,2,2])
df
a
0 1
1 1
2 1
3 2
4 2
5 2
How can I get a DataFrame where the index resets for each new group in the column. The association between a and index is not important...just need to have each case of a receive a unique index starting from 1.
a idx
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
The answer in the comments :
df['idx'] = df.groupby('a').cumcount() + 1

pandas pivot table and aggregate [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
so what i have is the following:
test_df = pd.DataFrame({"index":[1,2,3,1],"columns":[5,6,7,5],"values":[9,9,9,9]})
index columns values
0 1 5 9
1 2 6 9
2 3 7 9
3 1 5 9
i would like the following, the index cols as my index, the columns cols as the columns and the values aggregated in their respective fields, like this:
5 6 7
1 18 nan nan
2 nan 9 nan
3 nan nan 9
thank you!!
EDIT: sorry i made i mistake. the value columns are also categorical, and i need their individual values.. so instead of 18 it should be something like [9:2,10:0,11:0] (assuming the possible value categoricals are 9,10,11)
What about?:
test_df.pivot_table(values='values', index='index', columns='columns', aggfunc='sum')
Also: This is just about reading the manual here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html. I suspect you've to read better about the 'aggfunc' param.

Pandas - "Smart" Way of Summing Up Previous Values at Index 'n' [duplicate]

This question already has answers here:
Cumulative Sum Function on Pandas Data Frame
(2 answers)
Closed 4 years ago.
I'm fairly new to working with pandas, and I've been trying to create a new dataframe where the price value at index n is the sum of the values from indices 0 to n.
For example, if I have:
dataframe df:
price
0 1
1 2
2 3
3 2
4 4
5 2
The resulting data frame should look like this:
dataframe df:
price
0 1
1 3
2 6
3 8
4 12
5 14
I can think of a messy way of doing it using nested for loops, but I'm trying to shy away from using costly methods and doing things a more "sophisticated" way. But I can't really seem to think of a better method of doing this, and I know there has to a better way. What is the smart way of getting this "sum" dataframe? Thank you.
I think what you're looking for is the cumulative sum, for which you can use df.cumsum:
df.cumsum()
Which returns:
price
0 1
1 3
2 6
3 8
4 12
5 14

Displaying column names for each row in Pandas dataframe over specific condition [duplicate]

This question already has answers here:
How can I map the headers to columns in pandas?
(5 answers)
Closed 5 years ago.
I am working with Python Pandas Dataframe and trying to print a list of columns for each row in my dataset, assume that each column can have a 0 or 1 value. Eg:
id A B C D
0 1 1 1 1
1 0 1 0 1
2 1 1 0 0
3 1 0 0 0
Now, I need my output to be:
id output
0 A,B,C,D
1 B,D
2 A,B
3 A
Please note that I need to prepare a generic function irrespective of column names or number.
You can do:
df = df.assign(output=df.dot(df.columns))
df[['output']]
id output
0 A,B,C,D
1 B,D
2 A,B
3 A

Categories

Resources