Based on a condition, I want to change the value of the first row on a certain column, so far this is what I have
despesas['recibos'] =''
for a in recibos['recibos']:
if len(despesas.loc[(despesas['despesas']==a) & (despesas['recibos']==''), 'recibos'])>0:
despesas.loc[(despesas['despesas']==a) & (despesas['recibos']==''),
'recibos'].iloc[0] =a
So I want to change only the first value of the column recibos by the value on a where (despesas['despesas']==a) & (despesas['recibos']=='')
Edit 1
Example:
despesas['despesas'] = [11.95, 2.5, 1.2 , 0.6 , 2.66, 2.66, 3. , 47.5 , 16.95,17.56]
recibos['recibos'] = [11.95, 1.2 , 1.2 , 0.2 , 2.66, 2.66, 3. , 47.5 , 16.95, 17.56]
And the result should be:
[[11.95, 11.95], [2.5, null] , [1.2, 1.2] , [0.6, null] , [2.66, 2.66], [2.66, 2.66], [3., 3] , [47.5, 45.5 ], [16.95, 16.95], [17.56, 17.56]]
It could be works:
mapper = recibos['recibos'].map(despesas['despesas'].value_counts()).fillna(0)
despesas['recibos'] = recibos['recibos'].where(recibos.groupby('recibos')
.cumcount()
.lt(mapper),'null')
print(despesas)
despesas recibos
0 11.95 11.95
1 2.50 1.2
2 1.20 null
3 0.60 null
4 2.66 2.66
5 2.66 2.66
6 3.00 3
7 47.50 47.5
8 16.95 16.95
9 17.56 17.56
I found the solution that I was looking for
from itertools import count, filterfalse
despesas['recibos'] =''
for index, a in despesas.iterrows():
if len(recibos.loc[recibos['recibos']==a['despesas']])>0:
despesas.iloc[index,1]=True
recibos.drop(recibos.loc[recibos['recibos']==a['despesas']][:1].index, inplace=True)
Related
I am looping through a bunch of files and importing their contents as numpy arrays:
# get the dates for our gaps
import os.path
import glob
from pathlib import Path
from numpy import recfromcsv
folder = "daily_bars_filtered/*.csv"
df_gapper_list = []
df_intraday_analysis = []
# loop through the daily gappers
for fname in glob.glob(folder)[0:2]:
ticker = Path(fname).stem
daily_bars_arr = recfromcsv(fname, delimiter=',')
print(ticker)
print(daily_bars_arr)
Output:
AACG
[(b'2021-07-15', 43796169., 2.98, 3.83, 4.75, 2.9401, 2.98, 59.39597315)
(b'2022-01-04', 14934689., 1.25, 2.55, 2.59, 1.25 , 1.19, 117.64705882)
(b'2022-01-05', 8067429., 1.8 , 2.3 , 2.64, 1.72 , 2.55, 3.52941176)
(b'2022-01-07', 9718034., 1.93, 2.64, 2.94, 1.85 , 1.98, 48.48484848)]
AAL
[(b'2022-03-04', 76218689., 15.27 , 14.59, 15.4799, 14.42 , 15.71, 1.46467218)
(b'2022-03-07', 89360330., 14.32 , 12.84, 14.62 , 12.77 , 14.59, 0.20562029)
(b'2022-03-08', 88067102., 13.035, 13.51, 14.27 , 12.4401, 12.84, 11.13707165)
(b'2022-03-09', 88884229., 14.44 , 14.3 , 14.75 , 14.05 , 13.51, 9.17838638)
(b'2022-03-10', 56463182., 13.82 , 14.2 , 14.44 , 13.46 , 14.3 , 0.97902098)
(b'2022-03-11', 48342029., 14.4 , 14.02, 14.56 , 13.9 , 14.2 , 2.53521127)
(b'2022-03-14', 53284254., 14.04 , 14.25, 14.83 , 13.7 , 14.02, 5.77746077)]
What I then try to do is target the first column where my dates are, by doing:
print(daily_bars_arr[:,[0]])
But then I get the following error:
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
What am I doing wrong?
I'm trying to create a data visualization that's essentially a time series chart. But I have to use Panda, Python, and Plotly, and I'm stuck on how to actually label the dates. Right now, the x labels are just integers from 1 to 60, and when you hover over the chart, you get that integer instead of the date.
I'm pulling values from a Google spreadsheet, and for now, I'd like to avoid parsing csv things.
I'd really like some help on how to label x as dates! Here's what I have so far:
import pandas as pd
from matplotlib import pyplot as plt
import bpr
%matplotlib inline
import chart_studio.plotly as pl
import plotly.express as px
import plotly.graph_objects as go
f = open("../credentials.txt")
u = f.readline()
plotly_user = str(u[:-1])
k = f.readline()
plotly_api_key = str(k)
pl.sign_in(username = plotly_user, api_key = plotly_api_key)
rand_x = np.arange(61)
rand_x = np.flip(rand_x)
rand_y = np.array([0.91 , 1 , 1.24 , 1.25 , 1.4 , 1.36 , 1.72 , 1.3 , 1.29 , 1.17 , 1.57 , 1.95 , 2.2 , 2.07 , 2.03 , 2.14 , 1.96 , 1.87 , 1.25 , 1.34 , 1.13 , 1.31 , 1.35 , 1.54 , 1.38 , 1.53 , 1.5 , 1.32 , 1.26 , 1.4 , 1.89 , 1.55 , 1.98 , 1.75 , 1.14 , 0.57 , 0.51 , 0.41 , 0.24 , 0.16 , 0.08 , -0.1 , -0.24 , -0.05 , -0.15 , 0.34 , 0.23 , 0.15 , 0.12 , -0.09 , 0.13 , 0.24 , 0.22 , 0.34 , 0.01 , -0.08 , -0.27 , -0.6 , -0.17 , 0.28 , 0.38])
test_data = pd.DataFrame(columns=['X', 'Y'])
test_data['X'] = rand_x
test_data['Y'] = rand_y
test_data.head()
def create_line_plot(data, x, y, chart_title="Rate by Date", labels_dict={}, c=["indianred"]):
fig = px.line(
data,
x = x,
y = y,
title = chart_title,
labels = labels_dict,
color_discrete_sequence = c
)
fig.show()
return fig
fig = create_line_plot(test_data, 'X', 'Y', labels_dict={'X': 'Date', 'Y': 'Rate (%)'}) ```
Right now, the x labels are just integers from 1 to 60, and when you hover over the chart, you get that integer instead of the date.
This happens because you are setting rand_x as x labels, and rand_x is an array of integer. Setting labels_dict={'X': 'Date', 'Y': 'Rate (%)'} only adding text Date before x value. What you need to do is parsing an array of datetime values into x. For example:
rand_x = np.array(['2020-01-01','2020-01-02','2020-01-03'], dtype='datetime64')
Let's say I have a DataFrame like this:
import pandas as pd
df = pd.DataFrame(
[
["Norway" , 7.537, 1.5, 3.0],
["Denmark" , 7.522, 1.2, 3.1],
["Switzerland", 7.494, 1.5, 2.8],
["Finland" , 7.469, 1.6, 2.9],
["Netherlands", 7.377, 1.5, 3.0],
],
columns = [
"country",
"variable_1",
"variable_2",
"variable_3",
]
)
How could I neatly update, say, the row for Norway with the values {"variable_2": 1.6, "variable_3": 2.9} while ensuring that the existing variable_1 value doesn't get changed?
I was toying with the following terminology:
country_to_update = "Norway"
values_to_update = {"variable_2": 1.6, "variable_3": 2.9}
df.query("country == #country_to_update").iloc[0] = pd.Series(values_to_update)
This results in the following error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
This is a general solution
df.loc[df.country == country_to_update, values_to_update.keys()] = values_to_update.values()
Out[]:
country variable_1 variable_2 variable_3
0 Norway 7.537 1.6 2.9
1 Denmark 7.522 1.2 3.1
2 Switzerland 7.494 1.5 2.8
3 Finland 7.469 1.6 2.9
4 Netherlands 7.377 1.5 3
You can convert to a series, then extract index and values:
country_to_update = 'Norway'
values_to_update = {'variable_2': 1.6, 'variable_3': 2.9}
s = pd.Series(values_to_update)
df.loc[df['country'] == country_to_update, s.index] = s.values
print(df)
country variable_1 variable_2 variable_3
0 Norway 7.537 1.5 3.0
1 Denmark 7.522 1.2 3.1
2 Switzerland 7.494 1.5 2.8
3 Finland 7.469 1.6 2.9
4 Netherlands 7.377 1.5 3.0
i am implementing the Jacobi iterative method
The problem is i can not store the calculated matrix after each iteration, i tried to append into an empty list but it keeps overwriting the previous elements in that list and i end up with a single matrix repeated K times.
I need to subtract and operate on those matrices for convergence criteria
# Iterate Jacobi until convergence
U = np.array([[8.9,8.9,8.9,8.9,8.9],[8.4,0,0,0,9.2],[7.2,0,0,0,9.4],[6.1,6.8,7.7,8.7,6.1]])
UI=U
UF=U
UFK=[]
k=0
while k<3:
k=k+1 # update the iteration counter
for i in range (1,Nx-1):
for j in range (1,Ny-1):
UF[j,i] = (UI[j+1,i]+UI[j,i+1]+UI[j-1,i]+UI[j,i-1])*0.25 #the matrix i want to store after each iteration
UFK.append(UF) #
print (UF) # when i print UF i get the correct matrix at each iteration displayed
[[ 8.9 8.9 8.9 8.9 8.9 ]
[ 8.4 4.325 3.30625 5.3515625 9.2 ]
[ 7.2 4.58125 3.896875 6.83710938 9.4 ]
[ 6.1 6.8 7.7 8.7 6.1 ]]
[[ 8.9 8.9 8.9 8.9 8.9 ]
[ 8.4 6.296875 6.11132812 7.76210937 9.2 ]
[ 7.2 6.0484375 6.67421875 8.13408203 9.4 ]
[ 6.1 6.8 7.7 8.7 6.1 ]]
[[ 8.9 8.9 8.9 8.9 8.9 ]
[ 8.4 7.36494141 7.67531738 8.47734985 9.2 ]
[ 7.2 7.00979004 7.62979736 8.5517868 9.4 ]
[ 6.1 6.8 7.7 8.7 6.1 ]]
print(UFK) # when i display the appended UFK it is just repeating a single matrix 3 times
[array([[ 8.9 , 8.9 , 8.9 , 8.9 , 8.9 ],
[ 8.4 , 7.36494141, 7.67531738, 8.47734985, 9.2 ],
[ 7.2 , 7.00979004, 7.62979736, 8.5517868 , 9.4 ],
[ 6.1 , 6.8 , 7.7 , 8.7 , 6.1 ]]),
array([[ 8.9 , 8.9 , 8.9 , 8.9 , 8.9 ],
[ 8.4 , 7.36494141, 7.67531738, 8.47734985, 9.2 ],
[ 7.2 , 7.00979004, 7.62979736, 8.5517868 , 9.4 ],
[ 6.1 , 6.8 , 7.7 , 8.7 , 6.1 ]]),
array([[ 8.9 , 8.9 , 8.9 , 8.9 , 8.9 ],
[ 8.4 , 7.36494141, 7.67531738, 8.47734985, 9.2 ],
[ 7.2 , 7.00979004, 7.62979736, 8.5517868 , 9.4 ],
[ 6.1 , 6.8 , 7.7 , 8.7 , 6.1 ]])]
UI=U # why? UI is not a copy of U, it IS U
# UF=U # another why? Changes of UF will change UI and U as well
UFK=[] # appending to a list is great
k=0
while k<3:
k=k+1 # update the iteration counter
UF = np.zeros_like(U) # a fresh copy for iteration
for i in range (1,Nx-1):
for j in range (1,Ny-1):
UF[j,i] = (UI[j+1,i]+UI[j,i+1]+UI[j-1,i]+UI[j,i-1])*0.25
UFK.append(UF) #
print (UF)
print(UFK)
UFK should now be a list of the k UF arrays.
Since you are overwriting all elements of UF it doesn't matter how it it is initialed, just so long as it does not step on other arrays, including UF from previous iterations.
But on further thought, maybe changing UI is part of the plan. If so, why obscure the fact with the UF and UI variables? In this case you can collect the intermediate iterations with a U.copy() - that is, save a copy of U to the list, rather than the U itself.
for i... :
for j....:
U[j,i] = (U[j+1,i]+U[j,i+1]+U[j-1,i]+U[j,i-1])*0.25
UFK.append(U.copy())
print (U)
A list contains pointers to objects. If I write
alist = [U, U, U]
U[0,0] = 10000
that 10000 will appear in all 3 elements of the list - because they are the same thing.
In your code you case UF to the list, and then modify it at each iteration. The result is that your list just contains k pointers to the same array.
You have to set the dimension of UFK before you append it or you always replicate the same matrix several times. The following code can generate the output correctly:
UFK = np.array([]).reshape(0,5)
k = 0
while k < 3:
k += 1
for i in range(1, Nx-1):
for j in range(1, Ny-1):
UF[j, i] = (UI[j+1, i] + UI[j, i+1] + UI[j-1, i] + UI[j, i-1]) * 0.25
UFK = np.append(UFK, UF, axis=0)
Another way to append the array is UFK = np.vstack((UFK, UF)) which will give you the same result.
For example if I have one list having data , and whose item should be selected one by one
a = [0.11 , 0.22 , 0.13, 6.7, 2.5, 2.8]
and the other one for which all items should be selected
b = [1.2 1.4, 2.6, 2.3, 5.7 9.9]
if I select 0.11 from a and do opertation like addition with all the items of b and then save the result in new array or list , how is that br possible with python? ...
I am sorry for the question as I am trying to learn python on my own, kindly tell me how is this thing possible.
Thank you in advance.
You need a nested loop. You can do it in a list comprehension to produce a list of lists:
[[item_a + item_b for item_b in b] for item_a in a]
If you want the end result to be a list of lists it could go like this:
c = [[x + y for x in b] for y in a]
If you want the end result to be a single list with next sublists appended to each other you could write as such:
c=[]
for (y in a):
c += ([y + x for x in b])
Another option is to convert your list into numpy array and then exploit the broadcasting property of numpy arrays:
import numpy as np
npA = np.array(a)
npB = np.array(b)
npA[:, None] + npB
array([[ 1.31, 1.51, 2.71, 2.41, 5.81, 10.01],
[ 1.42, 1.62, 2.82, 2.52, 5.92, 10.12],
[ 1.33, 1.53, 2.73, 2.43, 5.83, 10.03],
[ 7.9 , 8.1 , 9.3 , 9. , 12.4 , 16.6 ],
[ 3.7 , 3.9 , 5.1 , 4.8 , 8.2 , 12.4 ],
[ 4. , 4.2 , 5.4 , 5.1 , 8.5 , 12.7 ]])
You can also do element wise multiplication simply with:
npA[:, None] * npB
which returns:
array([[ 0.132, 0.154, 0.286, 0.253, 0.627, 1.089],
[ 0.264, 0.308, 0.572, 0.506, 1.254, 2.178],
[ 0.156, 0.182, 0.338, 0.299, 0.741, 1.287],
[ 8.04 , 9.38 , 17.42 , 15.41 , 38.19 , 66.33 ],
[ 3. , 3.5 , 6.5 , 5.75 , 14.25 , 24.75 ],
[ 3.36 , 3.92 , 7.28 , 6.44 , 15.96 , 27.72 ]])