Plotting multiple graphs from one dataframe with a single x axis- Python - python
I have searched every possible solution but it never seems to create the plots in a way that is legible for me. It should also work for potentially 100's of dataframe columns so a solution being in a loop or something of that nature would be preferred
My dataframe is roughly this
data=
Time Pressure Static Temperature Stag Temperature
0 100 50 75
10 105 55 77
20 110 59 81
30 106 57 79
What I would like is 3 different graphs that plot Pressure, Static Temp, and Stag Temp vs Time which would be the X-axis.
My current code looks like
import pandas
data=pandas.read_csv(data.csv')
for header in data:
data.plot(x='System Time',y=header)
I think I understand the problem which is that for my data.plot needs to have y="Something in quotes" but I thought because header is a string it should work.
Any solution to get multiple graphs would be absolutely wonderful!
Also I apologize if my formatting is messed up as this is my first time posting!
I think you're looking for this:
>>> data.plot(x="Time")
However, to achieve this, I had to reformat your data.csv file to replace white spaces with commas, as it is the default separator in a Comma Separated Values file. Maybe your original file is tabulated and in this case, your need to specify sep='\t' to the read_csv() call.
If anyone finds this in the future, I figured out my own problem!
The problem was an error was thrown every time and all it said was
KeyError: 'Time'
This issue arose because 'Time' was my x axis and then it became my y axis through the iteration of "data". Thus is would stop every single time on the first loop.
To fix this, all I had to do was add a statement that skipped the column which was my x-axis
import pandas
data=pandas.read_csv(r'data.csv')
for header in data:
if header!="Time":
data.plot(x='Time',y=header,legend=False)
This skipped the first column and allowed the rest of the headers to be plotted in separate graphs.
If headers confuses you (like it confused me at first), you can use a more general form
import pandas
data=pandas.read_csv(r'data.csv')
for i in list(data):
if i!="Time":
data.plot(x='Time',y=i,legend=False)
Good luck everyone!
Related
Resample().mean() in Python/Pandas and adding the results to my dataframe when the starting point is missing
I'm pretty new to coding and have a problem resampling my dataframe with Pandas. I need to resample my data ("value") to means for every 10 minutes (13:30, 13:40, etc.). The problem is: The data start around 13:36 and I can't access them by hand because I need to do this for 143 dataframes. Resampling adds the mean at the respective index (e.g. 13:40 for the second value), but because 13:30 is not part of my indices, that value gets lost. I'm trying two different approaches here: First, I tried every option of resample() (offset, origin, convention, ...). Then I tried adding the missing values manually with a loop, which doesn't run properly because I didn't know how to access the correct spot on the list. The list does include all relevant values though. I also tried adding a row with 13:30 as the index on top of the dataframe but didn't manage to convince Python that my index is legit because it's a timestamp (this is not in the code). Sorry for the very rough code, it just didn't work in several places which is why I'm asking here. If you have a possible solution, please keep in mind that it has to function within an already long loop because of the many dataframes I have to work on simultaneously. Thank you very much! df["tenminavg"] = df["value"].resample("10Min").mean() df["tenminavg"] = df["tenminavg"].ffill() ls1 = df["value"].resample("10Min").mean() #my alternative: list the resampled values in order to eventually access the first relevant timespan for i in df.index: #this loop doesn't work. It should add the value for the first 10 min if df["tenminavg"][i]=="nan": if datetime.time(13,30) <= df.index.time < datetime.time(13,40): df["tenminavg"][i] = ls1.index.loc[i.floor("10Min")]["value"] #tried to access the corresponding data point in the list else: continue
Swapping dataframe column data without changing the index for the table
While compiling a pandas table to plot certain activity on a tool I have encountered a rare error in the data that creates an extra 2 columns for certain entries. This means that one of my computed column data goes into the table 2 cells further on that the other and kills the plot. I was hoping to find a way to pull the contents of a single cell in a row and swap it into the other cell beside it, which contains irrelevant information in the error case, but which is used for the plot of all the other pd data. I've tried a couple of different ways to swap the data around but keep hitting errors. My attempts to fix it include: for rows in df['server']: if '%USERID' in line: df['server'] = df[7] # both versions of this and below df['server'].replace(df['server'],df[7]) else: pass if '%USERID' in df['server']: # Attempt to fix missing server name df['server'] = df[7]; else: pass if '%USERID' in df['server']: return row['7'], row['server'] else: pass I'd like the data from column '7' to be replicated in 'server', only in the case of the error - where the data in the cell contains a string starting with '%USERID'
Turns out I was over-thinking this one. I took a step back, worked the code a bit and solved it. Rather than trying to smash a one-size fits all bit of code for the all data I built separate lists for the general data and 2 exception I found, by writing a nested loop and created 3 data frames. These were easy enough to then manipulate individually, and finally concatenate together. All working fine now.
Plotting Columns without using the column names
Ok, so I have aggregated a bunch of data that looks like this: X-mean y-Mean z- Mean 1 0.3444 2.34987 1.347 2 etc. 3 4 5 6 Except, it is not three columns, but 561 of them :-) So, it seems like such a simple problem to me: I know how to plot the first column vs. the x column using Mean_f_values.plot(y= y_vals, use_index=True).So, the column names are often a bunch of gibberish, so I want to plot individual plots by not referring to their names, but just their location. I want to do some kind of for loop and display several graphs as I try to weed out useless columns. But all I can find (so far) is that we can only refer to column name, not their location when plotting. It seems obvious to me that this cannot be true, at least with some kind of simple plotting method. I am kinda noob, so what am I missing? Thanks!
Pandas dataframe to numpy array [duplicate]
This question already has answers here: Convert pandas dataframe to NumPy array (15 answers) Closed 3 years ago. I am very new to Python and have very little experience. I've managed to get some code working by copying and pasting and substituting the data I have, but I've been looking up how to select data from a dataframe but can't make sense of the examples and substitute my own data in. The overarching goal: (if anyone could actually help me write the entire thing, that would be helpful, but highly unlikely and probably not allowed) I am trying to use scipy to fit the curve of a temperature change when two chemicals react. There are 40 trials. The model I am hoping to use is a generalized logistic function with six parameters. All I need are the 40 functions, and nothing else. I have no idea how to achieve this, but I will ask another question when I get there. The current issue: I had imported 40 .csv files, compiled/shortened the data into 2 sections so that there are 20 trials in 1 file. Now the data has 21 columns and 63 rows. There is a title in the first row for each column, and the first column is a consistent time interval. However, each trial is not necessarily that long. One of them does, though. So I've managed to write the following code for a dataframe: import pandas as pd df = pd.read_csv("~/Truncated raw data hcl.csv") print(df) It prints the table out, but as expected, there are NaNs where there exists no data. So I would like to know how to arrange it into workable array with 2 columns , time and a trial like an (x,y) for a graph for future workings with numpy or scipy such that the rows that there is no data would not be included. Part of the .csv file begins after the horizontal line. I'm too lazy to put it in a code block, sorry. Thank you. time,1mnaoh trial 1,1mnaoh trial 2,1mnaoh trial 3,1mnaoh trial 4,2mnaoh trial 1,2mnaoh trial 2,2mnaoh trial 3,2mnaoh trial 4,3mnaoh trial 1,3mnaoh trial 2,3mnaoh trial 3,3mnaoh trial 4,4mnaoh trial 1,4mnaoh trial 2,4mnaoh trial 3,4mnaoh trial 4,5mnaoh trial 1,5mnaoh trial 2,5mnaoh trial 3,5mnaoh trial 4 0.0,23.2,23.1,23.1,23.8,23.1,23.1,23.3,22.0,22.8,23.4,23.3,24.0,23.0,23.8,23.8,24.0,23.3,24.3,24.1,24.1 0.5,23.2,23.1,23.1,23.8,23.1,23.1,23.3,22.1,22.8,23.4,23.3,24.0,23.0,23.8,23.8,24.0,23.4,24.3,24.1,24.1 1.0,23.2,23.1,23.1,23.7,23.1,23.1,23.3,22.3,22.8,23.4,23.3,24.0,23.0,23.8,23.8,24.0,23.5,24.3,24.1,24.1 1.5,23.2,23.1,23.1,23.7,23.1,23.1,23.3,22.4,22.8,23.4,23.3,24.0,23.0,23.8,23.8,23.9,23.6,24.3,24.1,24.1 2.0,23.3,23.2,23.2,24.2,23.6,23.2,24.3,22.5,23.0,23.7,24.4,24.1,23.1,23.9,24.4,24.2,23.7,24.5,24.7,25.1 2.5,24.0,23.5,23.5,25.4,25.3,23.3,26.4,22.7,23.5,25.8,27.9,25.1,23.1,23.9,27.4,26.8,23.8,27.2,26.7,28.1 3.0,25.4,24.4,24.1,26.5,27.8,23.3,28.5,22.8,24.6,28.6,31.2,27.2,23.2,23.9,30.9,30.5,23.9,31.4,29.8,31.3 3.5,26.9,25.5,25.1,27.4,29.9,23.4,30.1,22.9,26.4,31.4,34.0,30.0,23.3,24.2,33.8,34.0,23.9,35.1,33.2,34.4 4.0,27.8,26.5,26.2,27.9,31.4,23.4,31.3,23.1,28.8,34.0,36.1,32.6,23.3,26.6,36.0,36.7,24.0,37.7,35.9,36.8 4.5,28.5,27.3,27.0,28.2,32.6,23.5,32.3,23.1,31.2,36.0,37.5,34.8,23.4,30.0,37.7,38.7,24.0,39.7,38.0,38.7 5.0,28.9,27.9,27.7,28.5,33.4,23.5,33.1,23.2,33.2,37.6,38.6,36.5,23.4,33.2,39.0,40.2,24.0,40.9,39.6,40.2 5.5,29.2,28.2,28.3,28.9,34.0,23.5,33.7,23.3,35.0,38.7,39.4,37.9,23.5,35.6,39.9,41.2,24.0,41.9,40.7,41.0 6.0,29.4,28.5,28.6,29.1,34.4,24.9,34.2,23.3,36.4,39.6,40.0,38.9,23.5,37.3,40.6,42.0,24.1,42.5,41.6,41.2 6.5,29.5,28.8,28.9,29.3,34.7,27.0,34.6,23.3,37.6,40.4,40.4,39.7,23.5,38.7,41.1,42.5,24.1,43.1,42.3,41.7 7.0,29.6,29.0,29.1,29.5,34.9,28.8,34.8,23.5,38.6,40.9,40.8,40.2,23.5,39.7,41.4,42.9,24.1,43.4,42.8,42.3 7.5,29.7,29.2,29.2,29.6,35.1,30.5,35.0,24.9,39.3,41.4,41.1,40.6,23.6,40.5,41.7,43.2,24.0,43.7,43.1,42.9 8.0,29.8,29.3,29.3,29.7,35.2,31.8,35.2,26.9,40.0,41.6,41.3,40.9,23.6,41.1,42.0,43.4,24.2,43.8,43.3,43.3 8.5,29.8,29.4,29.4,29.8,35.3,32.8,35.4,28.9,40.5,41.8,41.4,41.2,23.6,41.6,42.2,43.5,27.0,43.9,43.5,43.6 9.0,29.9,29.5,29.5,29.9,35.4,33.6,35.5,30.5,40.8,41.8,41.6,41.4,23.6,41.9,42.4,43.7,30.8,44.0,43.6,43.8 9.5,29.9,29.6,29.5,30.0,35.5,34.2,35.6,31.7,41.0,41.8,41.7,41.5,23.6,42.2,42.5,43.7,33.9,44.0,43.7,44.0 10.0,30.0,29.7,29.6,30.0,35.5,34.6,35.7,32.7,41.1,41.9,41.8,41.7,23.6,42.4,42.6,43.8,36.2,44.0,43.7,44.1 10.5,30.0,29.7,29.6,30.1,35.6,35.0,35.7,33.3,41.2,41.9,41.8,41.8,23.6,42.6,42.6,43.8,37.9,44.0,43.8,44.2 11.0,30.0,29.7,29.6,30.1,35.7,35.2,35.8,33.8,41.3,41.9,41.9,41.8,24.0,42.9,42.7,43.8,39.3,,43.8,44.3 11.5,30.0,29.8,29.7,30.1,35.8,35.4,35.8,34.1,41.4,41.9,42.0,41.8,26.6,43.1,42.7,43.9,40.2,,43.8,44.3 12.0,30.0,29.8,29.7,30.1,35.8,35.5,35.9,34.3,41.4,42.0,42.0,41.9,30.3,43.3,42.7,43.9,40.9,,43.9,44.3 12.5,30.1,29.8,29.7,30.2,35.9,35.7,35.9,34.5,41.5,42.0,42.0,,33.4,43.4,42.7,44.0,41.4,,43.9,44.3 13.0,30.1,29.8,29.8,30.2,35.9,35.8,36.0,34.7,41.5,42.0,42.1,,35.8,43.5,42.7,44.0,41.8,,43.9,44.4 13.5,30.1,29.9,29.8,30.2,36.0,36.0,36.0,34.8,41.5,42.0,42.1,,37.7,43.5,42.8,44.1,42.0,,43.9,44.4 14.0,30.1,29.9,29.8,30.2,36.0,36.1,36.0,34.9,41.6,,42.2,,39.0,43.5,42.8,44.1,42.1,,,44.4 14.5,,29.9,29.8,,36.0,36.2,36.0,35.0,41.6,,42.2,,40.0,43.5,42.8,44.1,42.3,,,44.4 15.0,,29.9,,,36.0,36.3,,35.0,41.6,,42.2,,40.7,,42.8,44.1,42.4,,, 15.5,,,,,36.0,36.4,,35.1,41.6,,42.2,,41.3,,,,42.4,,,
To convert a whole DataFrame into a numpy array, use df = df.values() If i understood you correctly, you want seperate arrays for every trial though. This can be done like this: data = [df.iloc[:, [0, i]].values() for i in range(1, 20)] which will make a list of numpy arrays, every one containing the first column with temperature and one of the trial columns.
Switching rows and columns in pyplot
I'm new to Python and after a lot of tinkering, have managed to clean up some .csv data. I now have a bunch of countries as rows and a bunch of dates as columns, and am trying to create a chart showing a line for each country's value over time. The problem is that when I enter df.plot() it results in a chart with each date as a line. I have melted the data such that the first column is country, second is date, and third is value, but all I get is a single blue block growing over time (not multiple lines). How can I fix this?
You can use the transpose function in [pandas][1]: Or instead of df.plot, you can use plot(coloumn, row). As it was mentioned in comments, it is always better to provide an example (look at #importanceofbeingeenest comment).