I have a pandas Series instance defined as follows:
import pandas as pd
timestamps = [1,2,3,4,5,6,7,8,9,10]
quantities = [1,9,6,6,6,4,4,4,5,2]
series = pd.Series(quantities, index=timestamps)
Is it possible to supply an array of index values and retrieve the quantities at them? And if it is, what's the fastest way of achieving this, please?
For example, if I supply:
timestamps = [1,1,1,4]
I expect the following back from series:
quantities = [1,1,1,6]
Thanks for any help here.
It is possible:
>>> series[[1,1,1,4]]
1 1
1 1
1 1
4 6
dtype: int64
>>> series[[1,1,1,4]].values
array([1, 1, 1, 6])
Related
I have a dataframe of numbers and would like to multiply each observation row wise or along axis = 1 and output the answer in another column. As an example:
import pandas as pd
import numpy as np
arr = np.array([2, 3, 4])
df = pd.DataFrame(arr).transpose()
df
What I would like is a column that has value 24 from multiplying column 0 by column 1 by column 2.
I tried the df.mul(axis = 1) but that didn't work.
I'm sure this is easy but all I find is multiplying each column by a constant.
This is prod
df.prod(1)
Out[69]:
0 24
dtype: int32
try to do some thing like this:
import numpy
def multiplyFunction(row):
return numpy.prod(row)
df['result'] = df.apply(multiplyFunction, axis=1)
df.head()
Result
0 1 2 result
0 2 3 4 24
Let me know if it's help
I have a dataframe which is of the following structure:
A B
Location1 1
Location2 2
1 3
2 4
In the above example column A is the index. I am attempting to produce a scatter plot using the index and column B. This data frame is made by resampling and averaging another dataframe like so:
df = df.groupby("A").mean()
Now obviously this sets the index equal to column A and I can plot it using the following which is adapted from here. Use index in pandas to plot data
df.reset_index().plot(x = "A",y = "B",kind="scatter", figsize=(10,10))
Now when I run this it returns the follow:
ValueError: scatter requires x column to be numeric
As the index column is intended to be a column of strings for which I can plot a scatter plot how can I go about fixing this?
You may want to select only the integer rows:
import pandas as pd
d = {'A': ["Location1", "Location2", 1, 2], 'B': [1, 2, 3, 4]}
df = pd.DataFrame(data=d)
df_numeric = df[pd.to_numeric(df.A, errors='coerce').notnull()]
print(df_numeric)
A B
2 1 3
3 2 4
Grouped by A:
df_numeric_grouped_by_A = df_numeric.groupby("A").mean()
print(df_numeric_grouped_by_A)
B
A
1 3
2 4
You may have to transponse the DataFrame, so that you have the index(Column A) as columnnames and then calculate the mean of the columns and plot them.
I have a Series, like this:
series = pd.Series({'a': 1, 'b': 2, 'c': 3})
I want to convert it to a dataframe like this:
a b c
0 1 2 3
pd.Series.to_frame() doesn't work, it got result like,
0
a 1
b 2
c 3
How can I construct a DataFrame from Series, with index of Series as columns?
You can also try this :
df = DataFrame(series).transpose()
Using the transpose() function you can interchange the indices and the columns.
The output looks like this :
a b c
0 1 2 3
You don't need the transposition step, just wrap your Series inside a list and pass it to the DataFrame constructor:
pd.DataFrame([series])
a b c
0 1 2 3
Alternatively, call Series.to_frame, then transpose using the shortcut .T:
series.to_frame().T
a b c
0 1 2 3
you can also try this:
a = pd.Series.to_frame(series)
a['id'] = list(a.index)
Explanation:
The 1st line convert the series into a single-column DataFrame.
The 2nd line add an column to this DataFrame with the value same as the index.
Try reset_index. It will convert your index into a column in your dataframe.
df = series.to_frame().reset_index()
This
pd.DataFrame([series]) #method 1
produces a slightly different result than
series.to_frame().T #method 2
With method 1, the elements in the resulted dataframe retain the same type. e.g. an int64 in series will be kept as an int64.
With method 2, the elements in the resulted dataframe become objects IF there is an object type element anywhere in the series. e.g. an int64 in series will be become an object type.
This difference may cause different behaviors in your subsequent operations depending on the version of pandas.
A very simple example just for understanding.
The goal is to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column.
I have the following DataFrame:
import numpy as np
import pandas as pd
s = pd.Series([1,2,3,2,1,2,3,2,1])
df = pd.DataFrame({'DATA':s, 'POINTS':0})
df
Note: I don't even know how to format the Jupyter Notebook results in the Stackoverflow edit window, so I copy and paste the image, I beg your pardon.
The DATA column shows the observed data; the POINTS column, initialized to 0, is used to collect the output of a "rolling" function applied to DATA column, as explained in the following.
Set a window = 4
nwin = 4
Just for the example, the "rolling" function calculate the max.
Now let me use a drawing to explain what I need.
For every iteration, the rolling function calculate the maximum of the data in the window; then the POINT at the same index of the max DATA is incremented by 1.
The final result is:
Can you help me with the python code?
I really appreciate your help.
Thank you in advance for your time,
Gilberto
P.S. Can you also suggest how to copy and paste Jupyter Notebook formatted cell to Stackoverflow edit window? Thank you.
IIUC the explanation by #IanS (thanks again!), you can do
In [75]: np.array([df.DATA.rolling(4).max().shift(-i) == df.DATA for i in range(4)]).T.sum(axis=1)
Out[75]: array([0, 0, 3, 0, 0, 0, 3, 0, 0])
To update the column:
In [78]: df = pd.DataFrame({'DATA':s, 'POINTS':0})
In [79]: df.POINTS += np.array([df.DATA.rolling(4).max().shift(-i) == df.DATA for i in range(4)]).T.sum(axis=1)
In [80]: df
Out[80]:
DATA POINTS
0 1 0
1 2 0
2 3 3
3 2 0
4 1 0
5 2 0
6 3 3
7 2 0
8 1 0
import pandas as pd
s = pd.Series([1,2,3,2,1,2,3,2,1])
df = pd.DataFrame({'DATA':s, 'POINTS':0})
df.POINTS=df.DATA.rolling(4).max().shift(-1)
df.POINTS=(df.POINTS*(df.POINTS==df.DATA)).fillna(0)
So I got a pandas DataFrame with a single column and a lot of data.
I need to access each of the element, not to change it (with apply()) but to parse it into another function.
When looping through the DataFrame it always stops after the first one.
If I convert it to a list before, then my numbers are all in braces (eg. [12] instead of 12) thus breaking my code.
Does anyone see what I am doing wrong?
import pandas as pd
def go_trough_list(df):
for number in df:
print(number)
df = pd.read_csv("my_ids.csv")
go_trough_list(df)
df looks like:
1
0 2
1 3
2 4
dtype: object
[Finished in 1.1s]
Edit: I found one mistake. My first value is recognized as a header.
So I changed my code to:
df = pd.read_csv("my_ids.csv",header=None)
But with
for ix in df.index:
print(df.loc[ix])
I get:
0 1
Name: 0, dtype: int64
0 2
Name: 1, dtype: int64
0 3
Name: 2, dtype: int64
0 4
Name: 3, dtype: int64
edit: Here is my Solution thanks to jezrael and Nick!
First I added headings=None because my data has no header.
Then I changed my function to:
def go_through_list(df)
new_list = df[0].apply(my_function,parameter=par1)
return new_list
And it works perfectly! Thank you again guys, problem solved.
You can use the index as in other answers, and also iterate through the df and access the row like this:
for index, row in df.iterrows():
print(row['column'])
however, I suggest solving the problem differently if performance is of any concern. Also, if there is only one column, it is more correct to use a Pandas Series.
What do you mean by parse it into another function? Perhaps take the value, and do something to it and create it into another column?
I need to access each of the element, not to change it (with apply()) but to parse it into another function.
Perhaps this example will help:
import pandas as pd
df = pd.DataFrame([20, 21, 12])
def square(x):
return x**2
df['new_col'] = df[0].apply(square) # can use a lambda here nicely
You can convert column as Series tolist:
for x in df['Colname'].tolist():
print x
Sample:
import pandas as pd
df = pd.DataFrame({'a': pd.Series( [1, 2, 3]),
'b': pd.Series( [4, 5, 6])})
print df
a b
0 1 4
1 2 5
2 3 6
for x in df['a'].tolist():
print x
1
2
3
If you have only one column, use iloc for selecting first column:
for x in df.iloc[:,0].tolist():
print x
Sample:
import pandas as pd
df = pd.DataFrame({1: pd.Series( [2, 3, 4])})
print df
1
0 2
1 3
2 4
for x in df.iloc[:,0].tolist():
print x
2
3
4
This can work too, but it is not recommended approach, because 1 can be number or string and it can raise Key error:
for x in df[1].tolist():
print x
2
3
4
Say you have one column named 'myColumn', and you have an index on the dataframe (which is automatically created with read_csv). Try using the .loc function:
for ix in df.index:
print(df.loc[ix]['myColumn'])