I try to use ggplot to plot the dataframe
import pandas as pd
from ggplot import *
df = pd.DataFrame()
df['x'] = [1,2,3,4,5,6]
df['y'] = [1,6,7,2,3,6]
df['id'] = ['a','a','b','b','c','c']
I get the output
x y id
0 1 1 a
1 2 6 a
2 3 7 b
3 4 2 b
4 5 3 c
5 6 6 c
I wish to plot 3 segments with different colors distinguished by 'id'.
ggplot(df,aes(x='x',y='y',colour='id')) + geom_line()
The output contains only the first segment 'a'
output
What's the problem of my codes?
Related
Hi all I need to rotate two dimensional array as shown in the given picture. and if we rotate one set of array it should reflect for all the problems if you find out please do help me to solve the issue
input:
output:
Thankyou
I have tried slicing method to rotate the values but it doesn't give the correct values
import pandas as pd
df = pd.read_csv("/content/pipe2.csv")
df1= df.iloc[6:10]+df.iloc[13:20]
df1
You can use numpy.roll and the DataFrame constructor:
N = -2
out = pd.DataFrame(np.roll(df, N, axis=1),
columns=df.columns, index=df.index)
Example output:
0 1 2 3 4 5 6
0 3 4 5 6 7 1 2
Used input:
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7
Use this:
import pandas as pd
df = pd.read_csv("/content/pipe2.csv")
df1=pd.DataFrame(data=df)
df1_transposed = df1.transpose()
df1_transposed
This sounds a bit weird, but I think that's exactly what I needed now:
I got several pandas dataframes that contains columns with float numbers, for example:
a b c
0 0 1 2
1 3 4 5
2 6 7 8
Now I want to add a column, with only one row, and the value is equal to the average of column 'a', in this case, is 3.0. So the new dataframe will looks like this:
a b c average
0 0 1 2 3.0
1 3 4 5
2 6 7 8
And all the rows below are empty.
I've tried things like df['average'] = np.mean(df['a']) but that give me a whole column of 3.0. Any help will be appreciated.
Assign a series, this is cleaner.
df['average'] = pd.Series(df['a'].mean(), index=df.index[[0]])
Or, even better, assign with loc:
df.loc[df.index[0], 'average'] = df['a'].mean().item()
Filling NaNs is straightforward, you can do
df['average'] = df['average'].fillna('')
df
a b c average
0 0 1 2 3
1 3 4 5
2 6 7 8
Can do something like:
df['average'] = [np.mean(df['a'])]+['']*(len(df)-1)
Here is a full example:
import pandas as pd
import numpy as np
df = pd.DataFrame(
[(0,1,2), (3,4,5), (6,7,8)],
columns=['a', 'b', 'c'])
print(df)
a b c
0 0 1 2
1 3 4 5
2 6 7 8
df['average'] = ''
df['average'][0] = df['a'].mean()
print(df)
a b c average
0 0 1 2 3
1 3 4 5
2 6 7 8
I am plotting lines using the combined ID1 and ID2 columns. In the .csv file, the ID1 and ID2 numbers could be repeated at some point. The way to decide if the data needs to be a new line is directly following when ID2 = 0. I want the program to recognize the sample data I provided below as 2 separate lines.
ID1 ID2 x y
1 2 1 1
1 2 2 2
1 2 3 3
1 2 4 4
1 0 5 5
...
1 2 1 3
1 2 2 5
1 2 3 7
Right now, my program would plot this data as a continuous line in the same color. I need a new line in a different color, but I can't figure out how to filter the data to start a new line even when the ID1 and ID2 values are duplicates. The program needs to see the '0' in the ID2 column as a signal to start a new line. Any ideas would be very helpful.
An option is to find out the indizes of the the zeros and loop over them to create individual DataFrames to plot.
u = u"""ID1 ID2 x y
1 2 1 1
1 2 2 2
1 2 3 3
1 2 4 4
1 0 5 5
1 2 1 3
1 2 2 5
1 2 3 7
1 0 1 3
1 2 2 4
1 2 3 2
1 2 4 1"""
import io
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv(io.StringIO(u), delim_whitespace=True)
fig, ax = plt.subplots()
inx = list(np.where(df["ID2"].values==0)[0]+1)
inx = [0] + inx + [len(df)]
for i in range(len(inx)-1):
dff = df.iloc[inx[i]:inx[i+1],:]
dff.plot(x="x", y="y", ax=ax, label="Label {}".format(i))
plt.show()
One way you could do it is to use cumsum and seaborn plotting with hue:
temp_df = df.assign(line_no=df.ID2.eq(0).cumsum()).query('ID2 != 0')
import seaborn as sns
_ = sns.pointplot(x='x',y='y', hue='line_no',data=temp_df)
Or with matplotlib:
fig,ax = plt.subplots()
for i in temp_df.line_no.unique():
x=temp_df.query('line_no == #i')['x']
y=temp_df.query('line_no == #i')['y']
ax.plot(x,y)
After slicing, I have a multi header Dataframe with two levels, indexed by date, obtained like this:
df = df.iloc[:, df.columns.get_level_values(1).isin({'a','b'})]
Date one two
a b a b
2 2 3 3 3
3 2 3 3 3
4 2 3 3 3
5 2 3 3 3
6 2 3 3 3
7 2 3 3 3
What I would like to do is to plot this Dataframe with a line plot with the Date in axis, the same color for the level 0 and solid/dashed lines for the first level.
I have tried unstacking ie.
df.unstack(level=0).plot(kind='line')
but with no success. The plot as it is now, shows Date in x axis but treat each combination of level 0 and 1 headers as a new entry.
Here is a picture of the plot obtained:
What we would like to implement would be a two levels legend (color/shape of line).
Code Example:
import numpy as np
import pandas as pd
A = np.random.rand(4,4)
C = pd.DataFrame(A, index=range(4), columns=[np.array(['A','A','B','B']), np.array(['a','b','a','b'])])
C.plot(kind='line')
I'm trying to pivot data in a way so that the index and columns of the resulting table aren't automatically sorted. An example of the data might be:
X Y Z
1 1 1
3 1 2
2 1 3
4 1 4
1 2 5
3 2 6
2 2 7
4 2 8
The data is interpreted as an X, Y and Z axis. The pivotted result should look like this:
X 1 3 2 4
Y
1 1 2 3 4
2 5 6 7 8
Instead the result looks like this, where the index and columns are sorted, and the data accordingly:
X 1 2 3 4
Y
1 1 3 2 4
2 5 7 6 8
At this point I have lost information about the order in which the measurements were taken. For example say that I would plot the row at Y=1, with X as the X axis and the data value on the Y axis.
This would result in the figures in this picture. On the right is how I would like the data to be plotted. Does anyone have an idea how to prevent pandas from sorting the index and columns when pivotting a table?
I have an alternative to restore the order, as the ordering is based on the X relative to Y values, for instance, you can restore your X columns ordering by something like this:
import pandas as pd
# using your sample data
df = pd.read_clipboard()
df = df.pivot('Y', 'X', 'Z')
df
X 1 2 3 4
Y
1 1 3 2 4
2 5 7 6 8
# re-order your X columns by the values of first Y, for instance
df = df[df.T[1].values]
df
X 1 3 2 4
Y
1 1 2 3 4
2 5 6 7 8
Not the best approach, but sure it will achieve what you want.