Running multiple columns through a function - python

is there anyway I can run a function like this
crypto['Price'] = crypto['Ticker'].transform(lambda item: cg.get_price(ids=item, vs_currencies='usd'))
using the function
cg.get_coin_market_chart_range_by_id(id='bitcoin',vs_currency='usd',from_timestamp='1635505200',to_timestamp='1635548400')
With three columns for the values id , from_timestamp , to_timestamp
with the columns being
crypto['Ticker'] , crypto['Dateroundts'] , crypto['Dateround+1ts']
I basically want to make a new column with a the function above using the three columns as variables and dont know how.

you can use apply with axis=1 to apply across a row:
crypto['Price'] = crypto.apply(lambda x: cg.get_coin_market_chart_range_by_id(id=x.Ticker,vs_currency='usd',from_timestamp=x.Dateroundts,to_timestamp=x['Dateround+1ts']), axis=1)
You can either use the dot notation or use x like a dict (with square bracket). Though if the name is not a single alphnumeric word, you can only use x like dict.

Related

How can I change the number of int value to another number in a column?

The dataframe dataset has two columns 'Review' and 'Label' and dtypes of 'Label' is int.
I would like to change the number in the 'Label' column. So I tried to use replace() but it doesn't change well as you can see in the below picture.
A simple and quick solution(besides replace) would be to use a Series.map() method. You could define a dictionary with keys corresponding to the values you want to replace and values set to the new values you wish to have. Then, use an anonymous function(or normal one) to replace your values
d={1:0,2:0,4:1,5:1}
dataset['label']=dataset['label'].map(lambda x: d[x])
This will replace 1 and 2 with 0, and 4 and 5 with 1.
I am not sure what your criteria for "well" is, as the replace method will work for you and essentially achieve the same result(and is more optimized than map for replacement purposes).
What might be causing the issues is that replace has a default arg inplace=False. Thus, your results will not affect each other and you will have to combine them into dataset['label']=dataset['label'].replace([1,2,4,5],[0,0,1,1]) or dataset['label'].replace([1,2,4,5],[0,0,1,1],inplace=True)

Using apply to add multiple columns in pandas

I'm trying to run a function (row_extract) over a column in my dataframe, that returns three values that I then want to add to three new columns.
I've tried running it like this
all_data["substance", "extracted name", "name confidence"] = all_data["name"].apply(row_extract)
but I get one column with all three values. I'm going to iterate over the rows, but that doesn't seem like a very efficient system - any thoughts?
This is my current solution, but it takes an age.
for index, row in all_data.iterrows():
all_data.at[index, "substance"], all_data.at[index, "extracted name"], all_data.at[index, "name confidence"] = row_extract(row["name"])
Check what the type of your function output is or what the datatypes are. It seems like that's a string.
You can use the "split" method on a string to separate them.
https://docs.python.org/2/library/string.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html
Alternatively, adjust your function to return more than one value.
E.g.
def myfunc():
...
...
return x, y, z

Iterate over multiple dataframes and perform maths functions save output

I have several dataframes on which I an performing the same functions - extracting mean, geomean, median etc etc for a particular column (PurchasePrice), organised by groups within another column (GORegion). At the moment I am just performing this for each dataframe separately as I cannot work out how to do this in a for loop and save separate data series for each function performed on each dataframe.
i.e. I perform median like this:
regmedian15 = pd.Series(nw15.groupby(["GORegion"])['PurchasePrice'].median(), name = "regmedian_nw15")
I want to do this for a list of dataframes [nw15, nw16, nw17], extracting the same variable outputs for each of them.
I have tried things like :
listofnwdfs = [nw15, nw16, nw17]
for df in listofcmldfs:
df+'regmedian' = pd.Series(df.groupby(["GORegion"])
['PurchasePrice'].median(), name = df+'regmedian')
but it says "can't assign to operator"
I think the main point is I can't work out how to create separate output variable names using the names of the dataframes I am inputting into the for loop. I just want a for loop function that produces my median output as a series for each dataframe in the list separately, and I can then do this for means and so on.
Many thanks for your help!
First, df+'regmedian' = ... is not valid Python syntax. You are trying to assign a value to an expression of the form A + B, which is why Python complains that you are trying to re-define the meaning of +.
Also, df+'regmedian' itself seems strange. You are trying to add a DataFrame and a string.
One way to keep track of different statistics for different datafarmes is by using dicts. For example, you can replace
listofnwdfs = [nw15, nw16, nw17]
with
dict_of_nwd_frames = {15: nw15, 16: nw16, 17: nw17}
Say you want to store 'regmedian' data for each frame. You can do this with dicts as well.
data = dict()
for key, df in dict_of_nwd_frames.items():
data[(i, 'regmedian')] = pd.Series(df.groupby(["GORegion"])['PurchasePrice'].median(), name = str(key) + 'regmedian')

Apply transform of your own function in Pandas dataframe

I have pandas dataframe on which I need to some data manipulation, the following code provide me the average of column "Variable" group by "Key":
df.groupby('key').Variable.transform("mean")
The advantage of using "transform" is that it return back the result with the same index which is pretty useful.
Now, I want to have my customize function and use it within "transform" instead of "mean" more over my function need two or more column something like:
lambda (Variable, Variable1, Variable2): (Variable + Variable1)/Variable2
(actual function of mine is more complicated than this example) and each row of my dataframe has Variable,Variable1 and Variable2.
I am wondering if I can define and use such a customized function within "transform" to be able to rerun the result back with same index?
Thanks,
Amir
Don't call transform against Variable, call it on the grouper and then call your variables against the dataframe the function receives as argument:
df.groupby('key').transform(lambda x: (x.Variable + x.Variable1)/x.Variable2)
Why didn't you use simple
df.Variable + df.Variable1 / df.Variable2
There is no need to groupby. In case for example you want to divide by df.groupby('key').Variable2.transform("mean") you can still do it with transform as following:
df.Variable + df.Variable1 / df.groupby('key').Variable2.transform("mean")

If value contains string, then set another column value

I have a dataframe in Pandas with a column called 'Campaign' it has values like this:
"UK-Sample-Car Rental-Car-Broad-MatchPost"
I need to be able to pull out that the string contains the word 'Car Rental' and set another Product column to be 'CAR'. The hyphen is not always separating out the word Car, so finding the string this way isn't an possible.
How can I achieve this in Pandas/Python?
pandas as some sweet string functions you can use
for example, like this:
df['vehicle'] = df.Campaign.str.extract('(Car).Rental').str.upper()
This sets the column vehicle to what is contained inside the parenthesis of the regular expression given to the extract function.
Also the str.upper makes it uppercase
Extra Bonus:
If you want to assign vehicle something that is not in the original string, you have to take a few more steps, but we still use the string functions This time str.contains .
is_motorcycle = df.Campaign.str.contains('Motorcycle')
df['vehicle'] = pd.Series(["MC"] * len(df)) * is_motorcycle
The second line here creates a series of "MC" strings, then masks it on the entries which we found to be motorcycles.
If you want to combine multiple, I suggest you use the map function:
vehicle_list = df.Campaign.str.extract('(Car).Rental|(Motorcycle)|(Hotel)|(.*)')
vehicle = vehicle_list.apply(lambda x: x[x.last_valid_index()], axis=1)
df['vehicle'] = vehicle.map({'Car':'Car campaign', 'Hotel':'Hotel campaign'})
This first extracts the data into a list of options per line. The cases are split by | and the last one is just a catch-all which is needed for the Series.apply function below.
The Series.map function is pretty straight forward, if the captured data is 'Car', we set 'Car campaign', and 'Hotel' we set 'Hotel campaign' etc.

Categories

Resources