Saving just the value from a Pandas DataFrame row - python

I'm looking up a certain row in my Pandas DataFrame by using the index - the row information is stored in variable p. As you can see p gives me a normal Pandas DataFrame. Now I want to save just the integer in in_reply_to_status_id as variable y but, in my code below, it gives me an object. Does anyone know if and how it would be possible to just store the integer (1243885949697888263 in this case) as y?

y is a series, you can try as follow to pick second (1243885949697888263) value
print(y.array[0])

Did you try this?
y = df.at[i, 'in_reply_to_status_id']
That way you don't have to create p.
If you want to do it with p:
y = p.iloc[0].at['in_reply_to_status_id']
Or:
y = p.iat[0,1]
Or:
y = p.at[i, 'in_reply_to_status_id']

Related

Input values to a List

When I try to do the following, the subsequent error occurs.
ranges = []
a_values= []
b_values= []
for x in params:
a= min(fifa[params][x])
a= a - (a*.25)
b = max(fifa[params][x])
b = b + (b*.25)
ranges.append((a,b))
for x in range(len(fifa['short_name'])):
if fifa['short_name'][x]=='Nunez':
a_values = df.iloc[x].values.tolist()
Error Description
What does it mean? How do I solve this?
Thank you in advance
The problem is on this line:
if fifa['short_name'][x]=='Nunez':
fifa['short_name'] is a Series;
fifa['short_name'][x] tries to index that series with x;
your code doesn't show it, but the stack trace suggests x is some scalar type;
pandas tries to look up x in the index of fifa['short_name'], and it's not there, resulting in the error.
Since the Series will share the index of the dataframe fifa, this means that the index x isn't in the dataframe. And it probably isn't, because you let x range from 0 upto (but not including) len(fifa).
What is the index of your dataframe? You didn't include the definition of params, nor that of fifa, but your problem is most likely in the latter, or you should loop over the dataframe differently, by looping over its index instead of just integers.
However, there's more efficient ways to do what you're trying to do generally in pandas - you should just include some definition of the dataframe to allow people to show you the correct one.

Python- Panda dataframe: Find value in Column X based on nearest value in Column Y

I need the code that worked with this issue:
Let assume I have DataFrame named as Data.
Data has three columns, X,Y and Z
index=[0,1,2] X=[1,3,5] Y=[1,4,8] Z=[3,4,7]
I want a code that able to find the value of X when the nearest value of Y is 2.
So that the answer return to be X=1, as nearest value of Y=2 is 1.
TRY:
import numpy as np
near_val = 5
result = df.loc[df['Y'] == df.Y.values.flat[np.abs(df.Y.values - near_val).argmin()]]['X’].values

Copy values from column X+2 (two to the right of X) into column X

I have a dataframe and one every three columns has a name (the others are unnamed 1,2,3...).
I want values in the columns that have names to be equal to the value of two columns to the right of that.
I was using df.columns.get_loc("X") and I can use this to correctly select my desired column using df.iloc[:,X],
but I can't do Y = X +2 on pandas to do df.iloc[:,X] = df.iloc[:,Y] because X is not just an integer.
Any ideas on how to solve this? It can be a different way to get column X to have the same values as two columns to the right of X.
Thanks!
this would work, change 8 to fit your columns, or len(columns)//3*3
for n in range(0,8,3):
df.iloc[:,n]= df.iloc[:,n+2]
it doesn't seem we can assign a multi column to a multi column, not sure if that is possible

new dataframe within if statement. Python

Here is the the part of the code I am having issues with:
for x in range(len(df['Days'])):
if df['Days'][x]>0 and df['Days'][x]<=30:
b = df['Days'][x]
b
The output I get is b = 14 which is the last value where the if statement holds in the column of the dataframe. I am trying to get ALL the values of the column in which the if statement holds to be held in "b" rather than just the last value alone.
What you want to do is make a list instead and append b to it.
my_vals = []
for x in range(len(df['Days'])):
if df['Days'][x]>0 and df['Days'][x]<=30:
b = df['Days'][x]
my_vals.append(b)
my_vals
In your code, you are changing b in every iterration and so it only stores the most recent value. In the future when you are trying to store multiple values, do so in a different Data Type.
You can also use the filtering functionality of pandas and use
values = df.loc[(df['Days'] >= 0) & (df['Days'] <= 30)]
If you want the values as a Series instead of a DataFrame use
values_series = values['Days']
If you want the values as a list instead of a Series use
values_list = list(values_series)

Comparing dates and filling a new column based on this condition

I want to check if one date is between two other dates (everything in the same row). If this is true I want that a new colum is filled with a sales value of the same table. If not the row shall be dropped.
The code shall iterate over the entire dataframe.
This is my code:
for row in final:
x = 0
if pd.to_datetime(final['start_date'].iloc[x]) < pd.to_datetime(final['purchase_date'].iloc[x]) < pd.to_datetime(final['end_date'].iloc[x]):
final['new_col'].iloc[x] = final['sales'].iloc[x]
else:
final.drop(final.iloc[x])
x = x + 1
print(final['new_col'])
Instead of the values of final[sales] I just get 0 back.
Does anyone know where the mistake is or any other efficient way to tackle this?
The DataFrame looks like this:
I will do something like this:
First, creating the new column:
import numpy as np
final['new_col'] = np.where(pd.to_datetime(final['start_date'])<(pd.to_datetime(final['purchase_date']), final['sales'], np.NaN)
Then, you just drop the Na's:
final.dropna(inplace=True)

Categories

Resources