selecting bigest value among data [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have following data
I'd like to select the bigest value on 2nd column related to value on 1st column.
For value 1 on 1st column, the selected value shall be 5.
The 1st column is time (for example: 06:54:11)
I can use matlab, python, excel, bash.

Using python, you can download your file (assuming it's an Excel file) to a pandas DataFrame, groupby on the first column and find the max value in the second column:
import pandas as pd
df = pd.read_excel('your_data.xlsx')
output = df.groupby('column1')['column2'].max()

Using Matlab you can get the Maximum with the build-in "max" function.
Try using [M,I] = max(data)
and replace data with your matrix name.
M will return you the maxima. In your case M(2) will be the maximum of the second row. With the Index (I) you can grab the corresponding time out of the first row.
time = data(I(2),1)

Related

Replace the value in Pandas dataframe [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I would like to edit a Pandas dataframe, and you can obtain the dataset from here.
Sample_dataset
As you can see, each "area" has some "category" and each "category" has different "price". I want to unify the "category" for each "area", and the value of "category" should be the bottom of each "area". In other words, some values of "category" will change as follows.
Before:
area:A, category:1, price:500
After:
area:A, category:2, price:500
image
I know that it's possible to edit this dataframe by pivot table as follows. But in this case, I cannot unify and display the values of "category".
pd.pivot_table(df, values="price", index=["area",], aggfunc='sum')
I would appreciate if you provide an idea to unify the category values.
You can try this, although it may not be the best option.
After using the code you mentioned:
df_new = pd.pivot_table(df, values="price", index=["area",], aggfunc='sum')
I have created a function that finds the last category for each area (where df is the original DataFrame):
def find_category(cat, list_categories):
list_categories.append(df[df['area'] == cat].iloc[-1].category)
Then with a for loop the last category for each area is searched and added to a new category column. Then you can reorder the columns if you want:
list_categories = []
for area in df_new.index:
find_category(area, list_categories)
df_new['category'] = list_categories
df_new = df_new[['category','price']]
The output would be:
category price
area
A 2 900
B 1 350
C 4 800
D 1 500

How do I get the sum of column from a csv within specified rows using dates in python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Date,hrs,Count,Status
2018-01-02,4,15,SFZ
2018-01-03,5,16,ACZ
2018-01-04,3,14,SFZ
2018-01-05,5,15,SFZ
2018-01-06,5,18,ACZ
This is the fraction of data to what I've been working on. The actual data is in the same format with around 1000 entries of each date in it. I am taking the start_date and end_date as inputs from user. Consider in this case it is:
start_date:2018-01-02
end_date:2018-01-06
So, I have to display a total for hrs and the count within the selected date range, on the output. Also I want to do it using an #app.callback in dash(plot.ly). Can someone help please?
Use Series.between with filtering by DataFrame.loc and boolean indexing for columns by condition and then sum:
df = df.loc[df['Date'].between('2018-01-02','2018-01-06'), ['hrs','Count']].sum()
print (df)
hrs 22
Count 78
dtype: int64

How to get cell value from pandas data frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
So currently, I ask the user for the column name input and the row input but I don't know how to get the cell value.
import pandas as pd
data_column = input("what column do you want to choose")
print(data_column)
data_row = input("What row do you want to choose")
print(data_row)
I have tried with iloc and loc but it doesn't return the cell value.
You should be able to get the value of a specific cell by using
data_table.iloc[data_row, data_column]
Remember that
input("x")
returns a string, so you'd have to cast it into an int if you want to use the variable directly.
data_column = int(input("what column do you want to choose"))

Python method to display dataframe rows with least common column string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataframe with 3 columns (department, sales, region), and I want to write a method to display all rows that are from the least common region. Then I need to write another method to count the frequency of the departments that are represented in the least common region. No idea how to do this.
Functions would be unecessary - pandas already has implementations to accomplish what you want! Suppose I had the following csv file, test.csv...
department,sales,region
sales,26,midwest
finance,45,midwest
tech,69,west
finance,43,east
hr,20,east
sales,34,east
If I'm understanding you correctly, I would obtain a DataFrame representing the least common region like so:
import pandas as pd
df = pd.read_csv('test.csv')
counts = df['region'].value_counts()
least_common = counts[counts == counts.min()].index[0]
least_common_df = df.loc[df['region'] == least_common]
least_common_df is now:
department sales region
2 tech 69 west
As for obtaining the department frequency for the least common region, I'll leave that up to you. (I've already shown you how to get the frequency for region.)

Calculating mean of each row, ignoring 0 values in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a data frame with 1000 rows and 10 columns.
3 of these columns are 'total_2013', 'total_2014' and 'total_2015'
I would like to create a new column, containing the average of total over these 3 years for each row, but ignoring any 0 values.
If you are using pandas:
Use DataFrame.mean leveraging the skipna attribute.
First replace 0 with None using:
columns = ['total_2013', 'total_2014', 'total_2015']
df[columns].replace(0, None)
Then compute the mean:
df["total"] = df[columns].mean(
axis=1, # columns mean
skipna=True # skip nan values
)

Categories

Resources