plotting a given column name across different data frames in python

plotting a given column name across different data frames in python - python

All, I have multiple dataframes like
df1 = pd.DataFrame(np.array([
['a', 1, 2],
['b', 3, 4],
['c', 5, 6]]),
columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
['a', 2, 3],
['b', 4, 5],
['c', 6, 7]]),
columns=['name', 'attr1', 'attr2'])
df3 = pd.DataFrame(np.array([
['a', 3, 4],
['b', 5, 6],
['c', 7, 8]]),
columns=['name', 'attr1', 'attr2'])
each of these dataframes are generated at specific time steps says T=[t1, t2, t3]
I would like to plot, attr1 or attr2 of the diff data frames as function of time T. I would like to do this for 'a', 'b' and 'c' on all the same graph.
Plot Attr1 VS time for 'a', 'b' and 'c'

If I understand correctly, first assign a column T to each of your dataframes, then concatenate the three. Then, you can groupby the name column, iterate through each, and plot T against attr1 or attr2:
dfs = pd.concat([df1.assign(T=1), df2.assign(T=2), df3.assign(T=3)])
for name, data in dfs.groupby('name'):
plt.plot(data['T'], data['attr2'], label=name)
plt.xlabel('Time')
plt.ylabel('attr2')
plt.legend()
plt.show()

Related

how slice by hybrid stile

having a random df
df = pd.DataFrame([[1,2,3,4],[4,5,6,7],[7,8,9,10],[10,11,12,13],[14,15,16,17]], columns=['A', 'B','C','D'])
cols_in = list(df)[0:2]+list(df)[4:]
now:
x = []
for i in range(df.shape[0]):
x.append(df.iloc[i,cols_in])
obviously in the cycle, x return an error due to col_in assignment in iloc.
How could be possible apply a mixed style slicing of df like in append function ?

It seems like you want to exclude one column? There is no column 4, so depending on which columns you are after, something like this might be what you are after:
df = pd.DataFrame([[1,2,3,4],[4,5,6,7],[7,8,9,10],[10,11,12,13],[14,15,16,17]], columns=['A', 'B','C','D'])
If you want to get the column indeces from column names you can do:
cols = ['A', 'B', 'D']
cols_in = np.nonzero(df.columns.isin(cols))[0]
x = []
for i in range(df.shape[0]):
x.append(df.iloc[i, cols_in].to_list())
x
Output:
[[1, 2, 4], [4, 5, 7], [7, 8, 10], [10, 11, 13], [14, 15, 17]]

Convert DataFrame into multi-dimensional array with the column names of DataFrame

Below is the DataFrame I want to action upon:
df = pd.DataFrame({'A': [1,1,1],
'B': [2,2,3],
'C': [4,5,4]})
Each row of df creates a unique key. Objective is to create the following list of multi-dimensional arrays:
parameter = [[['A', 1],['B', 2], ['C', 4]],
[['A', 1],['B', 2], ['C', 5]],
[['A', 1],['B', 3], ['C', 4]]]
Problem is related to this question where I have to iterate over the parameter but instead of manually providing them to my function, I have to put all parameter from df (rows) in a list.

You could use the following list comprehension, which zips the values on each row with the columns of the dataframe:
from itertools import repeat
[list(map(list,zip(cols, i))) for cols, i in zip(df.values.tolist(), repeat(df.columns))]
[[[1, 'A'], [2, 'B'], [4, 'C']],
[[1, 'A'], [2, 'B'], [5, 'C']],
[[1, 'A'], [3, 'B'], [4, 'C']]]

Selecting different rows from different GroupBy groups

As opposed to GroupBy.nth, which selects the same index for each group, I would like to take specific indices from each group. For example, if my GroupBy object consisted of four groups and I would like the 1st, 5th, 10th, and 15th from each respectively, then I would like to be able to pass x = [0, 4, 9, 14] and get those rows.

This is kind of a strange thing to want; is there a reason?
In any case, to do what you want, try this:
df = pd.DataFrame([['a', 1], ['a', 2],
['b', 3], ['b', 4], ['b', 5],
['c', 6], ['c', 7]],
columns=['group', 'value'])
def index_getter(which):
def get(series):
return series.iloc[which[series.name]]
return get
which = {'a': 0, 'b': 2, 'c': 1}
df.groupby('group')['value'].apply(index_getter(which))
Which results in:
group
a 1
b 5
c 7

Can I do a conditional sort on two different columns, but where the order of two columns is reversed based on the secondary condition?

Edit: Since writing this, I remembered a third necessary condition. That is, if the difference between the values at index 1 (time) is greater than or equal to 2, then the rows should be sorted normally by the index 1 (time) column. So because the time value for B is 6 and within a difference of 2 for the T time of 5, B should come after T. However,for T and K, for example, because the 7 value for K is 2 greater than the 5 value for T, T should come first.
Let's say I have this array
input = [['user_id', 'time', 'address'],
['F', 5, 5],
['T', 5, 8],
['B', 6, 6],
['K', 7, 7],
['J', 7, 9],
['M', 9, 10]]
I'd like to sort the rows -- first in ascending order by index 1 (time). However, secondarily, if index 2 (address) for a given user_id such as 'B' is less than index 2 (address) for another user such as 'T', I'd like user_id 'B' to come before user_id 'T'.
So the final output would look like this:
output = [['user_id', 'time', 'address'],
['F', 5, 5],
['B', 6, 6]
['T', 5, 8],
['K', 7, 7],
['J', 7, 9],
['M', 9, 10]]
If possible, I'd like to do this without Pandas.

>>> import functools
>>>
>>> def compare(item1, item2):
... return item1[1]-item2[1] if item1[1]-item2[1] >=2 else item1[2]-item2[2]
...
>>>
>>> output = [input[0]] + sorted(input[1:], key = functools.cmp_to_key(compare))
>>> pprint (output)
[['user_id', 'time', 'address'],
['F', 5, 5],
['B', 6, 6],
['T', 5, 8],
['K', 7, 7],
['J', 7, 9],
['M', 9, 10]]
>>>

For builtin function sorted you can provide custom key method. Here it's enough if the key method returns a tuple of columns 1 and 2, so first the value of column 1 will be considered, and for rows having the same value in that column, will be ordered by column 2.
data = [['user_id', 'time', 'address'],
['F', 5, 5],
['B', 6, 6],
['T', 5, 8],
['K', 7, 7],
['J', 7, 9],
['M', 9, 10]]
data_sorted = [data[0]] + sorted(data[1:], key = lambda row: (row[1], row[2]))

How to plot different parts of same Pandas Series column with different colors, having a customized index?

This is a follow-up for my previous question here.
Let's say I have a Series like this:
testdf = pd.Series([3, 4, 2, 5, 1, 6, 10], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
When plotting, this is the result:
testdf.plot()
However, I want to plot, say, the line up to the first 4 values in blue (default) and the rest of the line in red. Trying a solution the way was suggested on the mentioned post above, this is the result I get:
fig, ax = plt.subplots(1, 1)
testdf.plot(ax=ax,color='b')
testdf.iloc[3:].plot(ax=ax,color='r')
I only get the expected result if I don't define my Series with a custom index:
testdf = pd.Series([3, 4, 2, 5, 1, 6, 10])
fig, ax = plt.subplots(1, 1)
testdf.plot(ax=ax,color='b')
testdf.iloc[3:].plot(ax=ax,color='r')
How could I achieve the desired result, then?

I wanted to write a comment but it was too long so I write here.
What you want to achieve works well in case you want to plot bars (which are discrete)
import pandas as pd
import numpy as np
df = pd.Series([3, 4, 2, 5, 1, 6, 10], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
df.plot(kind = 'bar',color=np.where(df.index<'e','b','r'))
But not in case of lines (which are continuous) as you already noticed.
In case you don't want to set custom indices you can use:
df = pd.Series([3, 4, 2, 5, 1, 6, 10])
cut = 4
ax = df[:cut].plot(color='b')
df[(cut-1):].plot(ax=ax, color='r')
While using custom indices you should split your series in two parts. One example is doing
df = pd.Series([3, 4, 2, 5, 1, 6, 10], index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
df1 = pd.Series(np.where(df.index<'e',df.values,np.nan), index=df.index)
df2 = pd.Series(np.where(df.index>='d',df.values,np.nan), index=df.index)
ax = df1.plot(color = 'b')
df2.plot(ax=ax,color='r')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

plotting a given column name across different data frames in python - python

Related

how slice by hybrid stile

Convert DataFrame into multi-dimensional array with the column names of DataFrame

Selecting different rows from different GroupBy groups

Can I do a conditional sort on two different columns, but where the order of two columns is reversed based on the secondary condition?

How to plot different parts of same Pandas Series column with different colors, having a customized index?

Categories

Resources