Pandas Dataframe split into seperate csv by column value [duplicate]

Pandas Dataframe split into seperate csv by column value [duplicate] - python

This question already has answers here:
Split pandas dataframe based on groupby
(4 answers)
Closed 2 years ago.
Hello i have dataset containing 4 columns
x y z s
1 42.8 157.5 1
1 43.8 13.5 1
1 44.8 152 2
.
.
.
4 7528 157.5 2
4 45.8 13.5 3
8 72.8 152 3
i want to split my dataframe into separate csv files by their "s" column but I couldn't figure out a proper way of doing it.
"s" column has arbitrary numbers of labels. We don't know how many 1's or 2's dataset has. it is until 30 but not every number is contained in this dataset.
My desired output is:
df1
x y z s
1 42.8 157.5 1
.
1 43.8 13.5 1
df2
1 44.8 152 2
.
4 7528 157.5 2
df3
4 45.8 13.5 3
.
8 72.8 152 3
after I get this split I can easily write it to separate csv files.
The problem I am having is that I don't know how many different "s" values I have and how much from each of them.
Thank you

Just groupby before sending to csv to do this dynamically:
for i, x in df.groupby('s'): x.to_csv(f'df{i}.csv', index=False)

Related

Comparing Columns in a Pandas Dataframe

I have a pandas data frame with racing results.
Place BibNum Time
0 1 2 5:50
1 2 4 8:09
2 3 7 10:27
3 4 3 11:12
4 5 1 12:13
...
34 1 5 2:03
35 2 9 4:35
36 3 7 5:36
What I would like to know is how can I get a count of how many times the BibNum showed up where the Place was 1, 2, 3 etc?
I know that I can do a "value_counts" but that is for how many times it shows up in a single column. I also looked into using numpy "where" but that is using a conditional like greater than or less than.

IIUC , this is what you need:
out = df.groupby(['Place','BibNum']).size()

Set row index in pandas based on whether another row's value has already been indexed or not

What I am trying to accomplish through Pandas is:
Let's say we have a Pandas DataFrame like this:
transaction_code
1 4373-36
2 3626-68
3 3626-68
4 3281-23
5 4721-44
...
101 6273-56
102 2836-78
103 1657-28
104 3281-23
105 5323-64
I wanna create a new column called 'transaction_code_new_index' that will contain indexes just like the current existing one, buuuut whenever a transaction_code is duplicated (i.e. the code 6273-75 might exist 3 times in it), I want the index of those codes be the same for them (i.e. for every single transaction_code that matches 6273-75, their index must be the same)
Example:
transaction_code transaction_code_new_index
1 4373-36 1
2 3626-68 2
3 3626-68 2 (because 3626-68 has already been indexed before)
4 3281-23 3
5 4721-44 4
...
101 6273-56 100
102 2836-78 101
103 1657-28 102
104 3281-23 3 (because 3281-23 has already been indexed before)
105 5323-64 103
Thanks.

You can take the min index of every group. Using transform will assign the results back to the respective rows.
df['new_index'] = df.groupby('transaction_code')['transaction_code'].transform(lambda x: x.index.min())
Output
transaction_code new_index
1 4373-36 1
2 3626-68 2
3 3626-68 2
4 3281-23 4
5 4721-44 5

Convert continuous numerical data to discrete numerical data in Pandas

I have a pandas dataframe df with a column having continuous numerical data.
A
0 1.5
1 15.0
2 12.8
3 23.2
4 9.6
I want to replace the continuous variables with numerical value based on the following rules:
0-10=10
10-20=50
20-100=80
The final dataframe obtained should be like this:
A
0 10
1 50
2 50
3 80
4 10
I had tried to use pandas.cut(df, bins=[0,10,20,100], labels=[10,50,80]) but it returns a Categorical column. I need the output column to be numerical.

Adding to_numeric to your code
pd.to_numeric(pd.cut(df['A'], bins=[0,10,20,100], labels=[10,50,80]))
Out[54]:
0 10
1 50
2 50
3 80
4 10
Name: A, dtype: int64

estimate the average value group by a specific column using python

I have an ascii file containing 2 columns as following;
id value
1 15.1
1 12.1
1 13.5
2 12.4
2 12.5
3 10.1
3 10.2
3 10.5
4 15.1
4 11.2
4 11.5
4 11.7
5 12.5
5 12.2
I want to estimate the average value of column "value" for each id (i.e. group by id)
Is it possible to do that in python using numpy or pandas ?

If you don't know how to read the file, there are several methods as you can see here that you could use, so you can try one of them, e.g. pd.read_csv().
Once you have read the file, you could try this using pandas functions as pd.DataFrame.groupby and pd.Series.mean():
df.groupby('id').mean()
#if df['id'] is the index, try this:
#df.reset_index().groupby('id').mean()
Output:
value
id
1 13.566667
2 12.450000
3 10.266667
4 12.375000
5 12.350000

import pandas as pd
filename = "data.txt"
df = pd.read_fwf(filename)
df.groupby(['id']).mean()
Output
value
id
1 13.566667
2 12.450000
3 10.266667
4 12.375000
5 12.350000

Sorting rows in csv file using Python Pandas

I have a quick question regarding sorting rows in a csv files using Pandas. The csv file which I have has the data that looks like:
quarter week Value
5 1 200
3 2 100
2 1 50
2 2 125
4 2 175
2 3 195
3 1 10
5 2 190
I need to sort in following way: sort the quarter and the corresponding weeks. So the output should look like following:
quarter week Value
2 1 50
2 2 125
2 3 195
3 1 10
3 2 100
4 2 175
5 1 200
5 2 190
My attempt:
df = df.sort('quarter', 'week')
But this does not produce the correct result. Any help/suggestions?
Thanks!

New answer, as of 14 March 2019
df.sort_values(by=["COLUMN"], ascending=False)
This returns a new sorted data frame, doesn't update the original one.
Note: You can change the ascending parameter according to your needs, without passing it, it will default to ascending=True

Note: sort has been deprecated in favour of sort_values, which you should use in Pandas 0.17+.
Typing help(df.sort) gives:
sort(self, columns=None, column=None, axis=0, ascending=True, inplace=False) method of pandas.core.frame.DataFrame instance
Sort DataFrame either by labels (along either axis) or by the values in
column(s)
Parameters
----------
columns : object
Column name(s) in frame. Accepts a column name or a list or tuple
for a nested sort.
[...]
Examples
--------
>>> result = df.sort(['A', 'B'], ascending=[1, 0])
[...]
and so you pass the columns you want to sort as a list:
>>> df
quarter week Value
0 5 1 200
1 3 2 100
2 2 1 50
3 2 2 125
4 4 2 175
5 2 3 195
6 3 1 10
7 5 2 190
>>> df.sort(["quarter", "week"])
quarter week Value
2 2 1 50
3 2 2 125
5 2 3 195
6 3 1 10
1 3 2 100
4 4 2 175
0 5 1 200
7 5 2 190

DataFrame object has no attribute sort

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas Dataframe split into seperate csv by column value [duplicate] - python

Just groupby before sending to csv to do this dynamically: for i, x in df.groupby('s'): x.to_csv(f'df{i}.csv', index=False)

Related

Comparing Columns in a Pandas Dataframe

Set row index in pandas based on whether another row's value has already been indexed or not

Convert continuous numerical data to discrete numerical data in Pandas

estimate the average value group by a specific column using python

Sorting rows in csv file using Python Pandas

Categories

Resources