Python pandas if statement based off of boolean qualifier - python

I am try to do an IF statement where it keeps my currency pairs in alphabetic ordering (i.e. USD/EUR would flip to EUR/USD because E alphabetically comes before U, however CHF/JPY would stay the same because C comes alphabetically before J.) Initially I was going to write code specific to that, but realized there were other fields I'd need to flip (mainly changing a sign for positive to negative or vice versa.)
So what I did was write a function to create a new column and make a boolean identifier as to whether or not the field needs action (True) or not (False).
def flipFx(ccypair):
first = ccypair[:3]
last = ccypair[-3:]
if(first > last):
return True
else:
return False
brsPosFwd['Flip?'] = brsPosFwd['Currency Pair'].apply(flipFx)
This works great and does what I want it to.
Then I try and write an IF statement to use that field to create two new columns:
if brsPosFwd['Flip?'] is True:
brsPosFwd['CurrencyFlip'] = brsPosFwd['Sec Desc'].apply(lambda x:
x.str[-3:]+"/"+x.str[:3])
brsPosFwd['NotionalFlip'] = -brsPosFwd['Current Face']
else:
brsPosFwd['CurrencyFlip'] = brsPosFwd['Sec Desc']
brsPosFwd['NotionalFlip'] = brsPosFwd['Current Face']
However, this is not working properly. It's creating the two new fields, CurrencyFlip and NotionalFlip but treating every record like it is False and just pasting what came before it.
Does anyone have any ideas?

Pandas uses vectorised functions. You are performing operations on entire series objects as if they were single elements.
You can use numpy.where to vectorise your calculations:
import numpy as np
brsPosFwd['CurrencyFlip'] = np.where(brsPosFwd['Flip?'],
brsPosFwd['Sec Desc'].str[-3:]+'/'+brsPosFwd['Sec Desc'].str[:3]),
brsPosFwd['Sec Desc'])
brsPosFwd['NotionalFlip'] = np.where(brsPosFwd['Flip?'],
-brsPosFwd['Current Face'],
brsPosFwd['Current Face'])
Note also that pd.Series.apply should be used as a last resort; since it is a thinly veiled inefficient loop. Here you can simply use the .str accessor.

Related

For loops and conditionals in Python

I am new to Python and I was wondering if there was a way I could shorten/optimise the below loops:
for breakdown in data_breakdown:
for data_source in data_source_ids:
for camera in camera_ids:
if (camera.get("id") == data_source.get("parent_id")) and (data_source.get("id") == breakdown.get('parent_id')):
for res in result:
if res.get("camera_id") == camera.get("id"):
res.get('data').update({breakdown.get('name'): breakdown.get('total')})
I tried this oneliner, but it doesn't seem to work:
res.get('data').update({breakdown.get('name'): breakdown.get('total')}) for camera in camera_ids if (camera.get("id") == data_source.get("parent_id")) and (data_source.get("id") == breakdown.get('parent_id'))
You can use itertools.product to handle the nested loops for you, and I think (although I'm not sure because I can't see your data) you can skip all the .get and .update and just use the [] operator:
from itertools import product
for b, d, c in product(data_breakdown, data_source_ids, camera_ids):
if c["id"] != d["parent_id"] or d["id"] != b["parent_id"]:
continue
for res in result:
if res["camera_id"] == c["id"]:
res['data'][b['name']] = b['total']
If anything, to optimize the performance of those loops, you should make them longer and more nested, with the data_source.get("id") == breakdown.get('parent_id') happening outside of the camera loop.
But there is perhaps an alternative, where you could change the structure of your data so that you don't need to loop nearly as much to find matching ID values. Convert each of your current lists (of dicts) into a single dict with its keys equal to the 'id' value you'll be trying to match in that loop, and the value being whole dict.
sources_dict = {source.get("id"): source for source in data_source_ids}
cameras_dict = {camera.get("id"): camera for camera in camera_ids}
results_dict = {res.get("camera_id"): res for res in result}
Now the whole loop only needs one level:
for breakdown in data_breakdown:
source = sources_dict[breakdown["parent_id"]]
camera = cameras_dict[source["parent_id"]]
res = results_dict[camera["id"]]
res.data[breakdown["name"]] = breakdown["total"]
This code assumes that all the lookups with get in your current code were going to succeed in getting a value. You weren't actually checking if any of the values you were getting from a get call was None, so there probably wasn't much benefit to it.
I'd further note that it's not clear if the camera loop in your original code was at all necessary. You might have been able to skip it and just directly compare data_source['parent_id'] against res['camera_id'] without comparing them both to a camera['id'] in between. In my updated version, that would translate to leaving out the creation of the cameras_dict and just directly indexing results_dict with source["parent_id"] rather than indexing to find camera first.

How to Query a String in Pandas

I am currently practicing pandas
I am using some pokemon data as a practice https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6
i want to make a program that allows the user to input their queries and I will return the result that they need.
since i do not know how many parameters the user will input, i just made some code that will break that up and then put it in the format that pandas can understand, but when i am trying to execute my code, it just returns None.
whats wrong with my code?
thank you
import pandas as pd
df = pd.read_csv(r'PATH HERE')
column_heads = df.columns
print(f'''
This is a basic searcher
Input your search query as follows:
<Head1>:<Value1>, <Head2>:<Value2> etc..
Example:
Type 1:Bug,Type2:Steel,Legendary:False
Heads:
{column_heads}
''')
usr_inp = input('Enter Query: ')
queries = usr_inp.split(',')
parameters = {}
for query in queries:
head, value = query.split(':')
parameters[head] = value
print('Your search parameters:', parameters)
df_query = 'df.loc['
for key,value in parameters.items():
df_query += f'''(df['{key}'] == '{value}')&'''
df_query = df_query[:-1] + ']'
exec('''print(exec(df_query))''')
There's no need to use exec or eval—though, if you must, you should use eval instead of exec, as in print(eval(df_query)); eval will return the value of the expression (i.e. the result of the query), while exec just executes a statement, returning None.
You could do something like
import numpy as np
from functools import reduce
df[reduce(np.logical_and, (df[col] == val for col, val in parameters.items()))]
Step by step:
Collect a list of "conditions" (boolean Series) of the form df[column] == value, given the search query parameters:
conditions = [df[column] == value for column, value in parameters.items()]
Combine all conditions together using the and operator. With pandas Series/numpy arrays, this is done with the bitwise & operator, which is represented by the binary function operator.and_ (operator is a module in the Python standard library). reduce just means applying a binary operator to the first pair of elements, then to the result of that and the third element, and so on, until you only have one element; so, in this particular case: conditions[0] & conditions[1], (conditions[0] & conditions[1]) & conditions[2], etc.
mask = reduce(operator.and_, conditions)
Alternatively, it might be clearer (and less error-prone) to use np.logical_and, which represents the "proper" boolean and operation:
mask = reduce(np.logical_and, conditions)
Index the dataframe with the combined mask:
df[mask]

In Pandas- how to check of dataframe is empty after every manipulation?

In doing a lot of manipulations on dataframe and in one of them it can return empty (which an except able result).
the thing is if turns empty it crashes on the other line, like in the following code:
NumOfActiveDays = local_input_list.groupby(['DeviceSidID'])['timestamp'].nunique().reset_index().rename(columns={'timestamp': 'NumOfDays'})
NumOfActiveDays = NumOfActiveDays[NumOfActiveDays.NumOfDays >= float(extdf.dict[entity]['days_thresh'])]
local_input_list = local_input_list[local_input_list.DeviceSidID.isin(NumOfActiveDays.loc[:, 'DeviceSidID'])].reset_index(drop=True)
if NumOfActiveDays will become empty it will crash the third line...
is there a better why to check if data manipulation ends up empty instead of after every line to do if df.empty()?
Thanks

ArcMap Field Calculator Program to create Unique ID's

I'm using the Field Calculator in ArcMap and
I need to create a unique ID for every storm drain in my county.
An ID Should look something like this: 16-I-003
The first number is the municipal number which is in the column/field titled "Munic"
The letter is using the letter in the column/field titled "Point"
The last number is simply just 1 to however many drains there are in a municipality.
So far I have:
rec=0
def autoIncrement()
pStart=1
pInterval=1
if(rec==0):
rec=pStart
else:
rec=rec+pInterval
return "16-I-" '{0:03}'.format(rec)
So you can see that I have manually been typing in the municipal number, the letter, and the hyphens. But I would like to use the fields: Munic and Point so I don't have to manually type them in each time it changes.
I'm a beginner when it comes to python and ArcMap, so please dumb things down a little.
I'm not familiar with the ArcMap, so can't directly help you, but you might just change your function to a generator as such:
def StormDrainIDGenerator():
rec = 0
while (rec < 99):
rec += 1
yield "16-I-" '{0:03}'.format(rec)
If you are ok with that, then parameterize the generator to accept the Munic and Point values and use them in your formatting string. You probably should also parameterize the ending value as well.
Use of a generator will allow you to drop it into any later expression that accepts an iterable, so you could create a list of such simply by saying list(StormDrainIDGenerator()).
Is your question on how to get Munic and Point values into the string ID? using .format()?
I think you can use following code to do that.
def autoIncrement(a,b):
global rec
pStart=1
pInterval=1
if(rec==0):
rec=pStart
else:
rec=rec+pInterval
r = "{1}-{2}-{0:03}".format(a,b,rec)
return r
and call
autoIncrement( !Munic! , !Point! )
The r = "{1}-{2}-{0:03}".format(a,b,rec) just replaces the {}s with values of variables a,b which are actually the values of Munic and Point passed to the function.

using subroutines in python instead of if statements

I was wondering if i could use a subroutine here instead if so how do i or is there a another way to shorten this piece of code.
if currency1=='GBP':
if currency2=='USD':
number=float(1.64)
elif currency2=='EUR':
number=float(1.20552)
elif currency2=='JPY':
number=float(171.181)
You could certainly make a dictionary:
currencies = {}
currencies['USD'] = 1.64
currencies['EUR'] = 1.20552
currencies['JPY'] = 171.181
currencies['GBP'] = 1.
number = currencies[currency2]
what's great about this is you can also do:
other_number = currencies[currency1]
exchange_rate = number / other_number # exchange rate BETWEEN the two currencies
How about:
Brit_converter: {'USD':1.64, 'EUR':1.20552}
if currency1=='GBP':
multiplier = converter[currency2]
or, assuming this does what I expect:
converted_currency = currency1 * converter[currency2]
Subroutines - in Python the accepted term is function - cannot replace if operator for a very simple reason - they serve different purpose:
Functions are used to break down code into small, manageable units and consolidate functionality required in more than one place
if operator changes the control flow
As it was pointed out above, dictionary is one of excellent solutions to multiple fixed choices.
I would use a dictionary like so*:
if currency1 == 'GBP':
number = {'USD':1.64, 'EUR':1.20552, 'JPY':171.181}.get(currency2, 1)
Also, notice that I used dict.get here. If currency2 is not found in the dictionary, number will be assigned to 1 and a KeyError will not be raised. However, you can choose anything you want for the default value (or omit it entirely and have None be the default).
Finally, you should note that putting float literals inside the float built-in is unnecessary.
*Note: If you plan to use the dictionary in multiple places, don't keep recreating it. Instead, save it in a variable like so:
my_dict = {'USD':1.64, 'EUR':1.20552, 'JPY':171.181}
if currency1 == 'GBP':
number = my_dict.get(currency2, 1)

Categories

Resources