Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have many dataframes(df) which have multiple varying number of columns and the first column is date, the rest of columns are the data I like to plot. I used df.plot() to plot the lines automatically. It is simple to use panda's plot function directly. However, for example, I like to change the linewidth of the first and 4th line or even only the first line. How to do it in pandas? I know how to do it using matplotlib by looping over each column to plot each line. what about just using pandas's plot function? Thanks
Maybe you can create a list with a fixed lenght size (depending of your DataFrame size):
list_of_line_width = [1] * len(df.columns)
The rest is just changing the size of the lines you are looking for:
list_of_line_width[index_position] = lenght_of_the_line
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a 278 x 2 data frame, and I want to find the rows that have 2 consecutive declining values in the second column. Here's a snippet:
I'm not sure how to approach this problem. I've searched how to identify consecutive declining values in a data frame, but so far I've only found questions that pertain to consecutive SAME values, which isn't what I'm looking for. I could iterate over the data frame, but I don't believe that's very efficient.
Also, I'm not asking for someone to show me how to code this problem. I'm simply asking for potential ways I could go about solving this problem on my own because I'm unsure of how to approach the issue.
Use shift to create a temporary column with all values shifted up one row.
Compare the two columns, "GDP" > "shift" This gives you a new
column of Boolean values.
Look for consecutive True values in this Boolean column. That identifies two consecutive declining values.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have to do these with Python, Matplotlib, and Pandas.
read a CSV file separated with "," and decimals
count ALL the lines of the file
plot a bar graph with the values of the column year of the same file
find the expected value of ALL the values of a column
find the quartile (with Python and his libraries).
find the proper sample size.
What I ask you is what are the best methods/functions to do all these things.
The only thing which I have reached to write is this.
pd.read_csv('pandas_tutorial_read.csv', delimiter=';')
Here is a problem very similar to what I have to do.
https://www.dropbox.com/sh/sy7vqq2x2740u9d/AACFap-NPA04znDMNX5W9wdza?dl=0
Thank you!
To read in a csv, you can use this code. Delimiter is not required if input file is comma separated.
df = pd.read_csv('path')
To count all rows, use shape attribute of df.
rows = df.shape[0]
To plot a bar graph use this.
import matplotlib.pyplot as plt
plt.bar(col1,col2)
If you mean regressing value by "expected value" use Imputer. You can find documentation online.
Quantiling can be done like this.
df[col].quantile([0,0.25,0.5,0.75])
Didn't understand what you mean by "sample size".
There are tonnes of documentation and tutorials out there. All the best!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a large data file where each row looks as follows, where each pipe-delimited value represents a consistent variable (i.e. 1517892812 and 1517892086 represent the Unix Timestamp, and the last pipe delimited object will always be UnixTimestamp)
264|2|8|6|1.32235000|1.33070000|1.31400000|1257.89480966|1517892812
399|10|36|2|1.12329614|1.12659227|1.12000000|148194.47200218|1517892086
How can I pull out the values I need to make variables in Python? For example, looking at a row and getting UnixTimestamp=1517892812 (and other variables) out of it.
I want to pull out each relevant variable per line, work with them, and then look at the next line and reevaluate all of the variable values.
Is RegEx what I should be dealing with here?
No need for regex, you can use split():
int(a.strip().split('|')[-1])
If all variable are only number and you want a matrix whit all your values you can simply do something like:
[int(line.strip().split('|')) for line in your_data.splitlines()]
You can use regex and re.search():
int(re.search(r'[^|]+$', text).group())
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm new on Python and I have a problem like this.
I have to represent the results of a survey, with the following values: Positive or Negative.
However, I acted on a sample, so I have to represent also the confidence interval of the results.
My idea is to represent a stacked barchart (in percentage, where clearly the entire bar is the 100%) divided in Positive and Negative, but with a sort of representation of the confidence. Is there someone with a valid code?
For data visualisation using Python I would suggest using the matplotlib library.
For your barchart suggestion maybe have a look at the following barchart example taken from the matplotlib website.
Barchart Example
As a suggestion, if you are looking at showing a comparison of total Positive and Negative values perhaps you should consider using a pie chart. See the following example taken from the matplotlib website again.
Basic Pie Chart Example
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a list of csv file which contains several columns.
There's one that contains the lenght of my test in this format hh:mm:ss
I need to divide this data in two database based on lenght: <00:16:00 or >00:16:00
How can I do that?
Thanks for helping and sorry for my bad english.
Brute force:
value = "00:15:47" # taken from csv
if value < "00:16:00":
# handle smaller values
else:
# handle bigger values