Identifying consecutive declining values in a column from a data frame [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a 278 x 2 data frame, and I want to find the rows that have 2 consecutive declining values in the second column. Here's a snippet:
I'm not sure how to approach this problem. I've searched how to identify consecutive declining values in a data frame, but so far I've only found questions that pertain to consecutive SAME values, which isn't what I'm looking for. I could iterate over the data frame, but I don't believe that's very efficient.
Also, I'm not asking for someone to show me how to code this problem. I'm simply asking for potential ways I could go about solving this problem on my own because I'm unsure of how to approach the issue.

Use shift to create a temporary column with all values shifted up one row.
Compare the two columns, "GDP" > "shift" This gives you a new
column of Boolean values.
Look for consecutive True values in this Boolean column. That identifies two consecutive declining values.

Related

perform math operation between scalar and pandas dataframe [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 days ago.
Improve this question
In an existing code, I used the following code to perform operation against a dataframe column.
df.loc[:, ['test1']] = m/(df.loc[:, ['rh']]*d1)
Here, both m and d1 are scalar. 'test1' and 'rh' are the column names.
Is this the right way or the best practice to perform math operation against the dataframe?
Yes, what you have there is fine. If you're looking for ways to improve it, a couple suggestions:
When accessing entire columns (like you're doing here), you can be more concise by not using .loc, and just doing df["test1"] and df["rh"]
You could alternatively use the .apply() method, which is useful for a more general case of wanting to perform any arbitrary operation (that you can implement in a function, anyway) on a DataFrame column. You could use it here, and it would look like
df["test1"] = df["rh"].apply(lambda rh: m/(rh*d1)), though it is almost certainly not necessary for this simple case.

change individual linestyle when using pandas's plot [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have many dataframes(df) which have multiple varying number of columns and the first column is date, the rest of columns are the data I like to plot. I used df.plot() to plot the lines automatically. It is simple to use panda's plot function directly. However, for example, I like to change the linewidth of the first and 4th line or even only the first line. How to do it in pandas? I know how to do it using matplotlib by looping over each column to plot each line. what about just using pandas's plot function? Thanks
Maybe you can create a list with a fixed lenght size (depending of your DataFrame size):
list_of_line_width = [1] * len(df.columns)
The rest is just changing the size of the lines you are looking for:
list_of_line_width[index_position] = lenght_of_the_line

In feature selection, I came across a situation where NaN were filled by median of the column values [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Why median value is used for NaN? Why not something else like mean? What is the logic behind using median value?
The process you described is known as imputation. Whether it makes sense to impute missing values with mean or median depends entirely on the dataset and the context of your problem.
Usually, it does not hurt to impute missing values with the mean. However, if there are outliers in the dataset that adversely impact the mean, then it is probably a good idea to impute with the median, as the median is a metric that is not influenced by the presence of outliers in the dataset.

Converge multiple rows of an CSV in one [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have a CSV in which a student appears on multiple lines. The goal is to obtain a CSV where the student's name appears only once and a "Sports" column is created where all the sports practiced by the student separated by a space converge (like the photos)
csv
final csv
I'm not going to post a full solution, as this sounds like a homework problem. If this is infact for a school assignment, please edit your question to include the information.
From your description, the problem can be broken into three steps, each of which can be independently written as code in your solution.
Parse a CSV file
Create a new data structure that reduces the number of rows and adds a new column
Output the data to a new CSV file.
Step 1 and 3 are the simplest. You will want to use things like with open('file', 'r'), list.split(), and ",".join()
For step 2, the problem is eaiser to understand if you think in terms of dictionaries. If you can turn your original data (which is a list of rows) into a dictionary of rows, then it becomes eaiser to detect duplicates. All dictionaries must have a unique key (or column in this case), and you already know that you have a key (student name) that you would like to be unique, but isn't.
Your code for step 2 will iterate over the list of rows, adding each one to a dictionary using student_name as a unique key. If that key already exists, then instead of adding a new entry, you will need to modify the existing entry's "sports" field.

Making a sorted list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have been given a set of 20,000 entries in Excel. Each entry is a string, and they are all names of events such as: Daytona 500, NASCAR, 3x1 Brand Rep, etc.
Many of the event names are repeated, and I would like to make a list and sort them and find the most common items in the list, and how many times each one is entered. I am half way through my first semester of Python and have just learned about lists, and would like to use Python 2.7 to do this task, but I am also open to using Excel or R if it makes more sense to use one of these.
I'm not sure where to start or how to input such a large list into a program.
In Excel I would use a PivotTable, about 15 seconds to set up:
your_list = ['Daytona 500', 'NASCAR'] # more values of course
Now use a dictionary comprehension to count items for each unique key.
your_dict = {i:your_list.count(i) for i in set(your_list)}

Categories

Resources