stripping index values in pandas - python

I have a data set with several dozen columns and am sorting two columns in question by Max value and storing the result in a variable to print it later to a report. How do I only return the Two columns so that they are on the same as my string "Max". Below is the method I am using which returns the ID # in my variable also.
#Create DF
prim1 = mru[['Time', 'Motion:MRU']]
# Sort
prim1 = prim1.sort(['Motion:MRU'], ascending=True)
primmin = prim1['Motion:MRU'].min()
print 'Max: ', prim1[:1]

Basically what you see printed will be a pandas Series in the form of :
<index> <value>
If you want just the value then you access the numpy array data attribute by doing this:
print 'Max: ', prim1[:1].values[0]
This will return a numpy array with a single element and then to access the scalar value you subscript the single value using [0]

Related

how to check value existing in pandas dataframe column value of type list

I have pandas dataframe which contains value in below format. How to filter dataframe which matches the 'd6d4e77e-b8ec-467a-ba06-1c6079aa2d82' in any of the value of type list part of PathDSC column
i tried
def get_projects_belongs_to_root_project(project_df, root_project_id):
filter_project_df = project_df.query("root_project_id in PathDSC")
it didn't work i got empty dataframe
Assuming the values of PathDSC column are lists of strings, you can check row-wise if each list contains the wanted value and mask those rows using Series.apply. Then select only those rows using boolean indexing.
def get_projects_belongs_to_root_project(project_df, root_project_id):
mask = project_df['PathDSC'].apply(lambda lst: root_project_id in lst)
filter_project_df = project_df[mask]
# ...
root_project_id = 'd6d4e77e-b8ec-467a-ba06-1c6079aa2d82'
df = df[df['PathDSC'].str.contains(root_project_id)]

How to add a value inside an numpy array to a python dictionary

i'm trying to loop through a pandas dataframe's columns (which consists of 1's and 0's) to groupby and sum another column then add the groupby column name to an empty dictionary as a key and the summed value as the value. But my current code adds an array as the value instead of the actual value. Here is some sample code below.
import pandas
sample_dict = {'flag1':[0,1,1,1,1,0],
'flag2':[1,1,1,0,0,1],
'flag3':[0,0,0,0,0,1],
'flag4':[1,1,1,1,0,0],
'flag5':[1,0,1,0,1,0],
'dollars':[100,200,300,400,500,600]}
sample_df = pd.DataFrame(sample_dict)
ecols = sample_df.columns[:5]
rate = .46
empty_dict = {}
for i in ecols:
df= sample_df[sample_df[i] == 1]
yield1 = df.groupby(i)['dollars'].sum().values*rate
empty_dict[i] = yield1
empty_dict
That code yields the following output:
Out[223]:
{'flag1': array([644.]),
'flag2': array([552.]),
'flag3': array([276.]),
'flag4': array([460.]),
'flag5': array([414.])}
I would just like to have the actual integer as the value and not the array.
You consistently get an array of one single element: just take its first element (if it has one):
...
empty_dict[i] = yield1[0] if len(yield) >=1 else np.nan
...

dataframe using list vs dictionary

import pandas as pd
pincodes = [800678,800456]
numbers = [2567890, 256757]
labels = ['R','M']
first = pd.DataFrame({'Number':numbers, 'Pincode':pincodes},
index=labels)
print(first)
The above code gives me the following (correct) dataframe.
Number Pincode
R 2567890 800678
M 256757 800456
But, when I use this statement,
second = pd.DataFrame([numbers,pincodes],
index=labels, columns=['Number','Pincode'])
print(second)
then I get the following (incorrect) output.
Number Pincode
R 2567890 256757
M 800678 800456
As you can see, the two Data Frames are different. Why does this happen? What's so different in this dictionary vs list approach?
The constructor of pd.DataFrame() includes this documentation.
Init signature: pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Docstring:
...
Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, or list-like objects
.. versionchanged :: 0.23.0
If data is a dict, column order follows insertion-order for
Python 3.6 and later.
.. versionchanged :: 0.25.0
If data is a list of dicts, column order follows insertion-order
for Python 3.6 and later.
The key word is column. In the first approach, you correctly tell pandas that numbers is the column with label 'Numbers'. But in the second approach, you tell pandas that the columns are 'Numbers' and 'Pincode' and to get the data from the list of lists [numbers, pincodes]. The first column of this list of lists is assigned to the 'Numbers' column, and the second to the 'Pincode' column.
If you want to enter your data this way (not as a dictionary), you need to transpose the list of lists.
>>> import numpy as np
# old way
>>> pd.DataFrame([numbers,pincodes],
index=labels,columns=['Number','Pincode'])
Number Pincode
R 2567890 256757
M 800678 800456
# Transpose the data instead so the rows become the columns.
>>> pd.DataFrame(np.transpose([numbers,pincodes]),
index=labels,columns=['Number','Pincode'])
Number Pincode
R 2567890 800678
M 256757 800456

Store Value From df to Variable

I am trying to extract a value out of a dataframe and put it into a variable. Then later I will record that value into an Excel workbook.
First I run a SQL query and store into a df:
df = pd.read_sql(strSQL, conn)
I am looping through another list of items and looking them up in the df. They are connected by MMString in the df and MMConcat from the list of items I'm looping through.
dftemp = df.loc[df['MMString'] == MMConcat]
Category = dftemp['CategoryName'].item()
I get the following error at the last line of code above. ValueError: can only convert an array of size 1 to a Python scalar
In the debug console, when I run that last line of code but not store it to a variable, I get what looks like a string value. For example, 'Pickup Truck'.
How can I simply store the value that I'm looking up in the df to a variable?
Index by row and column with loc to return a series, then extract the first value via iat:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].iat[0]
Alternatively, get the first value from the NumPy array representation:
Category = df.loc[df['MMString'] == MMConcat, 'CategoryName'].values[0]
The docs aren't helpful, but pd.Series.item just calls np.ndarray.item and only works for a series with one value:
pd.Series([1]).item() # 1
pd.Series([1, 2]).item() # ValueError: can only convert an array of size 1

How to check if Pandas value is null or zero using Python

I have a data frame created with Pandas that contains numbers. I need to check if the values that I extract from this data frame are nulls or zeros. So I am trying the following:
a = df.ix[[0], ['Column Title']].values
if a != 0 or not math.isnan(float(a)):
print "It is neither a zero nor null"
While it does appear to work, sometimes I get the following error:
TypeError: don't know how to convert scalar number to float
What am I doing wrong?
your code to extract a single value from a series will return list of list format with a single value:
For Example: [[1]]
so try changing your code
a = df.ix[[0], ['Column Title']].values
to
a = df.ix[0, 'Column Title']
then try
math.isnan(float(a))
this will work!!

Categories

Resources