Python Pandas dataset filter - python

I need to access my_data.iloc[j:i] == my_data.at['sales]. ie, [0,1] [1,1] [2,1] [3,1] This kind of element access needs filters out dataset.
I have written code as:
x = 1
j = 0
result = []
for i in range(len(my_data)) :
if (my_data.iloc[j:i] == my_data.at['sales'])
# if equal then append the row
result.append(my_data.iloc[i, :])
j = j + 1
I get ValueError the truth value of a DataFrame is ambiguous. use an empty, a.bool()... please help me out.. thanks
new info:
when I printed that list orientation, this is the result .
{'lndex' : [1,1,1,1,1],'table' ,' table1' ,'table2','table3','table4'],'Table Value Index' :[0,0,0,0,0], 'Value_Index' : [1,2,3,4,5]} ...
still the code didnt work for me. i will post the code and error ...
code:
for index,row in my_data.iterrows():
if (my_data.at[index,1] == my_data.at['Index']: #error line
result.append(my_data.iloc[index,:])
it gives key Error(key) from err on error line. i am stepping thru the code to understand what is going one... please help

I believe this is what you are trying to do:
import numpy as np
import pandas as pd
df = pd.DataFrame.from_dict({'v1': [0,1,2,3,5], 'v2': [1,1,1,1,1]})
x = np.array([[0,1], [1,1], [2,1], [3,1]])
# so basically first 4 rows of the DF should match the X
if df.iloc[0:4].values.all() == x.all():
print('Match')
else:
print('Mismatch')

Related

Python - Find first and last index of consecutive NaN groups in series

I am trying to get a list of tuples with the first and last index of grouped NaNs.
An example input could look like
import pandas as pd
import numpy as np
series = pd.Series([1,2,3,np.nan,np.nan,4,5,6,7,np.nan,8,9,np.nan,np.nan,np.nan])
get_nan_inds(series)
and the output should be
[(3, 5), (9, 10), (12, 15)]
The only similar question I could find doesn't solve my problem.
Alternative solution:
import pandas as pd
import numpy as np
series = pd.Series([1,2,3,np.nan,np.nan,4,5,6,7,np.nan,8,9,np.nan,np.nan,np.nan])
def get_nan_inds(series):
is_null_diff = pd.isnull(pd.Series(list(series) + [False])).diff() #Need to add False at the end for the case when the last elemetn is null
res = [i for i, x in enumerate(list(is_null_diff)) if x is True]
res = [(a, b) for i, (a,b) in enumerate(zip(res, res[1:])) if i % 2 == 0]
return res
get_nan_inds(series)
While wrting this question I came up with the following function in case someone else has a similar problem.
def get_nan_inds(series):
''' Obtain the first and last index of each consecutive NaN group.
'''
series = series.reset_index(drop=True)
index = series[series.isna()].index.to_numpy()
if len(index) == 0:
return []
indices = np.split(index, np.where(np.diff(index) > 1)[0] + 1)
return [(ind[0], ind[-1] + 1) for ind in indices]

Python ( iteration problem ) with an exercice

The code :
import pandas as pd
import numpy as np
import csv
data = pd.read_csv("/content/NYC_temperature.csv", header=None,names = ['temperatures'])
np.cumsum(data['temperatures'])
printcounter = 0
list_30 = [15.22]#first temperature , i could have also added it by doing : list_30.append(i)[0] since it's every 30 values but doesn't append the first one :)
list_2 = [] #this is for the values of the subtraction (for the second iteration)
for i in data['temperatures']:
if (printcounter == 30):
list_30.append(i)
printcounter = 0
printcounter += 1
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
list_2.append(substraction)
print(max(list_2))
Hey guys ! i'm really having trouble with the black part.
**for x in list_30:
substract = list_30[x] - list_30[x+1]**
I'm trying to iterate over the elements and sub stracting element x with the next element (x+1) but the following error pops out TypeError: 'float' object is not iterable. I have also tried to iterate using x instead of list_30[x] but then when I use next(x) I have another error.
for x in list_30: will iterate on list_30, and affect to x, the value of the item in the list, not the index in the list.
for your case you would prefer to loop on your list with indexes:
index = 0
while index < len(list_30):
substract = list_30[index] - list_30[index + 1]
edit: you will still have a problem when you will reach the last element of list_30 as there will be no element of list_30[laste_index + 1],
so you should probably stop before the end with while index < len(list_30) -1:
in case you want the index and the value, you can do:
for i, v in enumerate(list_30):
substract = v - list_30[i + 1]
but the first one look cleaner i my opinion
if you`re trying to find ifference btw two adjacent elements of an array (like differentiate it), you shoul probably use zip function
inp = [1, 2, 3, 4, 5]
delta = []
for x0,x1 in zip(inp, inp[1:]):
delta.append(x1-x0)
print(delta)
note that list of deltas will be one shorter than the input

GroupBy + Condition + Mean()

suppose we have 3 columns, A-B-C, I need to group by "A", but then B needs to be a range where B>0 and B<20, and then with that set calculate the mean from C.
Can u help me?
tyvm!
Try this:
import pandas as pd
data = pd.read_csv('rows.csv')
temp = []
for val in data['PPV']:
if val<20:
temp.append(1)
elif 20<val and val<40:
temp.append(2)
else:
temp.append(3)
data['temp'] = temp
output = data.groupby(['Responsable', 'temp'])['Yield'].mean()
print(output)
You should customize it. You can also do more elegant with numpy.digitize.

How to apply my function to the first row of a dataframe?

def calcScore(p):
if p[0] > p[1]:
x = 3
y = 0
elif p[0] == p[1]:
x = 1
y = 1
else:
x = 0
y = 3
return x,y
How would I apply this function to the first row of my dataframe?
I know how to apply it to the whole dataframe but can't seem to apply it to the first row only? Below is what I did with the whole dataframe. I am new to python so please forgive silly or stupid mistakes. Thank you. :)
result =(prem[['FTHG','FTAG']].apply(calcScore, axis = 1))
print(result)
apply is for applying a function to all rows or columns. If you just want one you can just do:
result = calcScore(perm.iloc[0, ['FHG', 'FtAG']])

Looping over dataset of strings

I'm trying to pick out specific occurrences of a value in a dataset of mine, but keep running into a problem dealing with turning the values into strings and looping over them. My code is below:
data = np.genfromtxt('DurhamAirMass.txt')
spot = data[:,1]
mass = str(data[:,2])
DP = np.array([])
DT = np.array([])
MP = np.array([])
MT = np.array([])
TR = np.array([])
for i in range(1461):
if mass[i] == '2':
DP = np.append(DP, str(spot[i]))
if mass[i] == '3':
DT = np.append(DT, str(spot[i]))
if mass[i] == '5':
MP = np.append(MP, str(spot[i]))
if mass[i] == '6' or '66' or '67':
MT = np.append(MT, str(spot[i]))
if mass[i] == '7':
TR = np.append(TR, str(spot[i]))
print DP
When I attempt to print out the DP array, I get an error pointing at the first if statement and saying "IndexError: string index out of range". Any ideas?
What is the purpose of converting data[:,2] into a string?
Btw. or does not work as you think, you have to repeat `mass[i]==``
Why not:
data = np.genfromtxt('DurhamAirMass.txt')
mass = data[:, 1]
spot = data[:, 2]
DP = mass[spot == 2]
DT = mass[spot == 3]
MP = mass[spot == 5]
MT = mass[(spot == 6)||(spot == 66)||(spot == 67)]
TR = mass[spot == 7]
As a general rule, you should never hard-code for-loop iterations unless you want it to be considered an error for the input iterator to ever be greater/smaller than your hard-coded value (and even then, there are better ways to accomplish that).
Your code should probably look like this:
for i in range(len(data)):
...
This will ensure you always loop over only the data you actually have.
You are indeed causing an IndexError
try checking spot to see how large it is, my guess is that 1461 is larger then its bounds perhaps you could try setting your for loop as:
for i in range(len(spot)):
...
instead, this will garauntee that you will only access valid Indexes for spot, if this still causes a problem try the same for mass
for i in range(len(mass)):
...
You could also add a check to make sure the data is the length you think it is.
print len(mass), len(spot), len(spot) == len(mass)
It's always good practice to double check your assumptions in the case of an error. In this case you are clearly being told there is an IndexError so the next step is to find out what index is causing it.
Maybe more information would help you?
try:
for i in range(len(spot)):
# code as usual
except:
print i
raise e
This will tell you what index is causing the error.
I just changed all the strings to ints and that solved things. I didn't think that would work at first. Thanks for all of your answers, everyone!

Categories

Resources