def priceusd(df):
return df['closeprice'][-1]*btcusdtclose[-1]
This function gives the price of a certain asset in USD by multiplying its price in Bitcoin by Bitcoins price in USD using a dataframe as a parameter.
What I want to do is just allow the name of the asset to be the parameter instead of the dataframe where the price data is coming from. All my dataframes have been named assetbtc. for example ethbtc or neobtc. I want to just be able to pass eth into the function and return ethbtc['closeprice'][-1]*btcusdtclose[-1].
For example,
def priceusd(eth):
return ethbtc['close'][-1]*btcusdtclose[-1]
I tried this and it didnt work, but you can see what I am trying to do
def priceusd(assetname): '{}btc'.format(assetname)['close'][-1]*btcusdtclose[-1].
Thank you very much.
It's not necessary to use eval in a situation like this. As #wwii says, store the DataFrames in a dictionary so that you can easily retrieve them by name.
E.g.
coins_to_btc = {
'eth': ethbtc,
'neo': neobtc,
}
Then,
def priceusd(name):
df = coins_to_btc[name]
return df['close'][-1]*btcusdtclose[-1]
You should be getting the dataframe you want from whatever contains it instead of trying to use a str as the dataframe. I mean you should use the str you formed to fetch the dataframe from where it is.
For example assuming you have placed the priceusd function inside the same module that contains all your created data frames like:
abtc = df1()
bbtc = df2()
cbtc = df3()
# and so on...
def priceusd(asset):
asset_container = priceusd.__module__
asset_name = f'{asset}btc'
df = getattr(asset_container, asset_name)
# now do whatever you want with your df (dataframe)
You can replace the code for getting the asset_container if the structure of your code is different from the one I assumed. But you should generally get my point...
Related
So I am trying to transform the data I have into the form I can work with. I have this column called "season/ teams" that looks smth like "1989-90 Bos"
I would like to transform it into a string like "1990" in python using pandas dataframe. I read some tutorials about pd.replace() but can't seem to find a use for my scenario. How can I solve this? thanks for the help.
FYI, I have 16k lines of data.
A snapshot of the data I am working with:
To change that field from "1989-90 BOS" to "1990" you could do the following:
df['Yr/Team'] = df['Yr/Team'].str[:2] + df['Yr/Team'].str[5:7]
If the structure of your data will always be the same, this is an easy way to do it.
If the data in your Yr/Team column has a standard format you can extract the values you need based on their position.
import pandas as pd
df = pd.DataFrame({'Yr/Team': ['1990-91 team'], 'data': [1]})
df['year'] = df['Yr/Team'].str[0:2] + df['Yr/Team'].str[5:7]
print(df)
Yr/Team data year
0 1990-91 team 1 1991
You can use pd.Series.str.extract to extract a pattern from a column of string. For example, if you want to extract the first year, second year and team in three different columns, you can use this:
df["year"].str.extract(r"(?P<start_year>\d+)-(?P<end_year>\d+) (?P<team>\w+)")
Note the use of named parameters to automatically name the columns
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html
I have 2 data frames. One is reference table with columns: code and name. Other one is list of dictionaries. The second data frame has code filled up but some names as empty strings. I am thinking of performing 2 for loops to get to the dictionary. But, I am new to this so unsure how to get the value from reference table.
Started with something like this:
for i in sample:
for j in i:
if j['name']=='':
(j['code'])
I am unsure how to proceed with the code. I think there is a very simple way with .map() function. Can someone help?
Reference table:
enter image description here
Edit needed table:
enter image description here
It seems to me that in this particular case you're using Pandas only to work with Python data structures. If that's the case, it would make sense to ditch Pandas altogether and just use Python data structures - usually, it results in more idiomatic and readable code that often performs better than Pandas with dtype=object.
In any case, here's the code:
import pandas as pd
sample_name = pd.DataFrame(dict(code=[8, 1, 6],
name=['Human development',
'Economic managemen',
'Social protection and risk management']))
# We just need a Series.
sample_name = sample_name.set_index('code')['name']
sample = pd.Series([[dict(code=8, name='')],
[dict(code=1, name='')],
[dict(code=6, name='')]])
def fix_dict(d):
if not d['name']:
d['name'] = sample_name.at[d['code']]
return d
def fix_dicts(dicts):
return [fix_dict(d) for d in dicts]
result = sample.map(fix_dicts)
class Eq(object):
price = 2
# The class "constructor" - It's actually an initializer
def __init__(self, price):
self.price = price
def get_price():
return price
d = {'name': ['cac40', 'ftse100'], 'col2': [Eq, Eq]}
df = pd.DataFrame(data=d)
The above builds a Dataframe containing objects in col2,
I would like to access the price property of my objects, and put that in a new column of my dataframe.
I can't seem to be able to access the object though.
Closest I got is df['price'] = df['col2'].values[0].price, but of course this only gets the price property of the first row.
How can I get the price for all the rows?
Thanks
Generally, if you're trying to create a new column in a DataFrame and methods like this aren't working, it's not a bad idea to look for a way to use the "apply" function. You should be able to tackle your problem this way:
df['price'] = df['col2'].apply(lambda x: x.price)
Though this gets you what you want, why are you storing your object directly within the DataFrame? There might be a more optimal way to get data from your objects into it depending on your reasoning.
You can get a list of attribute values from a list of objects like this:
df['price'] = [obj.price for obj in df['col2']]
I have a function that returns a dataframe to my main. I am trying to store these dataframes in a dictionary, in order to retrieve them again later.
When I run this:
sa_wp5 = get_SA_WP5_value('testfile.txt')
template_dict["SAWP5Country Name"] = sa_wp5
my output looks like the following:
{'SAWP5Country Name': 1 2
0 Australia 047}
where I would rather the output just be the variable itself containing the dataframe.
What am I doing wrong here?
Nothing wrong here. Just a matter of formatting due to the default __str__() output of a DataFrame object. If you feel messy, try this way to print out your dict:
for key, df in template_dict.items():
print("%s:" % key)
print(df.to_string())
print("-------")
You can use Bunch to store all sorts of objects for easy retrieval.
from sklearn.datasets.base import Bunch
Then create a variable using the Bunch() method:
a = Bunch(df1 = df_template.copy(), df2 = df_other_df.copy())
Then you can simply call them as such:
a.df1
a.df1['col1']
df = a.df2
etc.
It's really effective for storage of objects.
I have several dataframes on which I an performing the same functions - extracting mean, geomean, median etc etc for a particular column (PurchasePrice), organised by groups within another column (GORegion). At the moment I am just performing this for each dataframe separately as I cannot work out how to do this in a for loop and save separate data series for each function performed on each dataframe.
i.e. I perform median like this:
regmedian15 = pd.Series(nw15.groupby(["GORegion"])['PurchasePrice'].median(), name = "regmedian_nw15")
I want to do this for a list of dataframes [nw15, nw16, nw17], extracting the same variable outputs for each of them.
I have tried things like :
listofnwdfs = [nw15, nw16, nw17]
for df in listofcmldfs:
df+'regmedian' = pd.Series(df.groupby(["GORegion"])
['PurchasePrice'].median(), name = df+'regmedian')
but it says "can't assign to operator"
I think the main point is I can't work out how to create separate output variable names using the names of the dataframes I am inputting into the for loop. I just want a for loop function that produces my median output as a series for each dataframe in the list separately, and I can then do this for means and so on.
Many thanks for your help!
First, df+'regmedian' = ... is not valid Python syntax. You are trying to assign a value to an expression of the form A + B, which is why Python complains that you are trying to re-define the meaning of +.
Also, df+'regmedian' itself seems strange. You are trying to add a DataFrame and a string.
One way to keep track of different statistics for different datafarmes is by using dicts. For example, you can replace
listofnwdfs = [nw15, nw16, nw17]
with
dict_of_nwd_frames = {15: nw15, 16: nw16, 17: nw17}
Say you want to store 'regmedian' data for each frame. You can do this with dicts as well.
data = dict()
for key, df in dict_of_nwd_frames.items():
data[(i, 'regmedian')] = pd.Series(df.groupby(["GORegion"])['PurchasePrice'].median(), name = str(key) + 'regmedian')