Accessing property of object stored in Dataframe - python

class Eq(object):
price = 2
# The class "constructor" - It's actually an initializer
def __init__(self, price):
self.price = price
def get_price():
return price
d = {'name': ['cac40', 'ftse100'], 'col2': [Eq, Eq]}
df = pd.DataFrame(data=d)
The above builds a Dataframe containing objects in col2,
I would like to access the price property of my objects, and put that in a new column of my dataframe.
I can't seem to be able to access the object though.
Closest I got is df['price'] = df['col2'].values[0].price, but of course this only gets the price property of the first row.
How can I get the price for all the rows?
Thanks

Generally, if you're trying to create a new column in a DataFrame and methods like this aren't working, it's not a bad idea to look for a way to use the "apply" function. You should be able to tackle your problem this way:
df['price'] = df['col2'].apply(lambda x: x.price)
Though this gets you what you want, why are you storing your object directly within the DataFrame? There might be a more optimal way to get data from your objects into it depending on your reasoning.

You can get a list of attribute values from a list of objects like this:
df['price'] = [obj.price for obj in df['col2']]

Related

Assigning unique IDs to strings

I am trying to build an elegant solution to assigning IDs starting from 0 for the following data:
My Attempt at first creating IDs for the 'Person' category is like this:
df = pd.DataFrame(
{'Person': ['Tom Jones','Bill Smeegle','Silvia Geerea'],
'PersonFriends': [['Bill Smeegle','Silvia Geerea'],['Tom Jones'],['Han Solo']]})
df['PersonID'] = (df['Person']).astype('category').cat.codes
which produces
Now I want to follow the same process but do this for the 'PersonFriends' column to get this result below. How can I apply the same functions to achieve this when I have a list of friends?
I have been able to do this via the hash() function on each name, but the ID generated is long and not very readable. Any help appreciated. Thanks.
Create a dict and apply values from key
id_map = dict(zip(df["Person"], df["PersonID"]))
df["FriendsID"] = df["PersonFriends"].apply(lambda x: [id_map.get(y) for y in x])

Type changes after processing through inheritance function

A type=DataFrame object processing in a masking function defined return in a type=list.
I am trying to use inheritance to shortcut and make life easier as there are a lot of alike code. But it return out in a list rather than a DataFrame.
I have lots of alike code like:
df4=df3.drop_duplicates(['TITLE'])
#from df3 find title duplicated items
index2=df3.duplicated(['TITLE'])
#duplicated titles items are dropped into df5
df5=df3[index2].reset_index(drop=True)
#items with same title but different database are dropped
df6=df5.drop_duplicates(['TITLE'])
#from df5 find title duplicated items
index3=df5.duplicated(['TITLE'])
#duplicated titles items from df5 are dropped into df6
df7=df5[index3].reset_index(drop=True)
The inheritance class where the function is defined:
class Mask_TITLE:
def __init__(self,masked):
self.masked=masked
def mask(masked):
return [masked.drop_duplicates(['TITLE'])]
By doing :
>>df1=Mask_TITLE.mask(df)
df1 is returned as a list.
How to make sure that df1 is still a dataframe by modifying the function, or is it just inheritance and function is not possible to use in DataFrame conditions?
You're turning it into an list here:
def mask(masked):
return [masked.drop_duplicates(['TITLE'])]
The [ & ] around the return value make it a single-element list
Try this:
def mask(masked):
return masked.drop_duplicates(['TITLE'])

Update cell values in dataframe

I am parsing data row-wise, how can I update a data frame cell value in a loop (read a value, parse it, write it to another columnn)
I have tried the below code
data = pd.read_csv("MyNames.csv")
data["title"] = ""
i = 0
for row in data.iterrows():
name = (HumanName(data.iat[i,1]))
print(name)
data.ix['title',i] = name["title"]
i = i + 1
data.to_csv('out.csv')
I would expect the following
name = "Mr John Smith"
| Title
Mr John Smith | Mr
All help appreciated!
Edit: I realise that I might not need to iterate. If I could call the function for all rows in a column and dump the results into another column that would be easier - like a SQL update statement. Thanks
Assuming that HumanName is a function or whatever that takes in a string and returns a dict you want. not able to test this code from here, but you get the gist
data['title'] = data['name'].apply(lambda name: HumanName(name)['title'])
EDIT I used row[1] because of your data.iat[i,1] that index might actually need to be 0 instead of 1 not sure
You can try .apply
def name_parsing(name):
"This function parses the name anyway you want"""
return HumanName(name)['title']
# with .apply, the function will be applied to every item in the column
# the return will be a series. In this case, the series will be attributed to 'title' column
data['title'] = data['name'].apply(name_parsing)
Also, another option, as we're discussing bellow, is to persist an instance of HumanName in the dataframe, so if you need other information from it later you don't need to instantiate and parse the name again (string manipulation can be very slow on big dataframes).
If so, part of the solution would be to create a new column. After that you would get the ['title'] attribute from it:
# this line creates a HumanName instance column
data['HumanName'] = data['name'].apply(lambda x: HumanName(x))
# this lines gets the 'title' from the HumanName object and applies to a 'title' column
data['title'] = data['HumanName'].apply(lambda x: x['title'])

Storing a dataframe in dictionary, weird output in dictionary

I have a function that returns a dataframe to my main. I am trying to store these dataframes in a dictionary, in order to retrieve them again later.
When I run this:
sa_wp5 = get_SA_WP5_value('testfile.txt')
template_dict["SAWP5Country Name"] = sa_wp5
my output looks like the following:
{'SAWP5Country Name': 1 2
0 Australia 047}
where I would rather the output just be the variable itself containing the dataframe.
What am I doing wrong here?
Nothing wrong here. Just a matter of formatting due to the default __str__() output of a DataFrame object. If you feel messy, try this way to print out your dict:
for key, df in template_dict.items():
print("%s:" % key)
print(df.to_string())
print("-------")
You can use Bunch to store all sorts of objects for easy retrieval.
from sklearn.datasets.base import Bunch
Then create a variable using the Bunch() method:
a = Bunch(df1 = df_template.copy(), df2 = df_other_df.copy())
Then you can simply call them as such:
a.df1
a.df1['col1']
df = a.df2
etc.
It's really effective for storage of objects.

How to add on to parameter names in functions?

def priceusd(df):
return df['closeprice'][-1]*btcusdtclose[-1]
This function gives the price of a certain asset in USD by multiplying its price in Bitcoin by Bitcoins price in USD using a dataframe as a parameter.
What I want to do is just allow the name of the asset to be the parameter instead of the dataframe where the price data is coming from. All my dataframes have been named assetbtc. for example ethbtc or neobtc. I want to just be able to pass eth into the function and return ethbtc['closeprice'][-1]*btcusdtclose[-1].
For example,
def priceusd(eth):
return ethbtc['close'][-1]*btcusdtclose[-1]
I tried this and it didnt work, but you can see what I am trying to do
def priceusd(assetname): '{}btc'.format(assetname)['close'][-1]*btcusdtclose[-1].
Thank you very much.
It's not necessary to use eval in a situation like this. As #wwii says, store the DataFrames in a dictionary so that you can easily retrieve them by name.
E.g.
coins_to_btc = {
'eth': ethbtc,
'neo': neobtc,
}
Then,
def priceusd(name):
df = coins_to_btc[name]
return df['close'][-1]*btcusdtclose[-1]
You should be getting the dataframe you want from whatever contains it instead of trying to use a str as the dataframe. I mean you should use the str you formed to fetch the dataframe from where it is.
For example assuming you have placed the priceusd function inside the same module that contains all your created data frames like:
abtc = df1()
bbtc = df2()
cbtc = df3()
# and so on...
def priceusd(asset):
asset_container = priceusd.__module__
asset_name = f'{asset}btc'
df = getattr(asset_container, asset_name)
# now do whatever you want with your df (dataframe)
You can replace the code for getting the asset_container if the structure of your code is different from the one I assumed. But you should generally get my point...

Categories

Resources