How to convert pandas object to either currency or int? - python

I am working on an EDA side project with the Unicorn dataset from maven analytics. There is a column VALUATION with entries like $1B,$150M, etc. I want to convert it to either a currency or int format.

Related

Filter Pandas dataframe into excel spreadsheet

I have an excel spreadsheet with following columns:
I want to group this data by vendor and show all transaction and amount data for that vendor by Type (i.e. Wireless, Bonus etc). For ex: it should show all data for vendor 'A' classified by 'Type'. Once done, it should export this to separate excel files (i.e. for vendor 'A', 3 excel file are created showing all transactions for different revenue types i.e. Wireless, Bonus and Gift). I tried using pandas Groupby function, but it requires aggregation, which doesn't help solve the problem.
Can anyone provide any guidance/ inputs on how to solve this ?
I propose the following steps: Use Distinct to get the unique combinations of Vendor and Type. Once you have these unique combinations, loop through them, filter your dataframe and export the filtered dataframe to an Excel sheet.

From Snowflake query to python dataframe problem in data type mapping

I have a problem casting data type from a snowflake query into pandas dataframe. I want to use the smaller data types as possible in order to reduce my memory usage. Thus, I would like to have my pandas columns into int8 or int16 for integer, instead of the int64 that i have for the moment.
My problem is the following:
From the snowflake documentation, we can read that:
from a snowflake number we can cast in any integer type in python
I tried to cast my int columns in the following manner:
select myvariable::smallint
from mytable
and
select myvariable::number(5,0) -- totaly arbitrary precision, depends on the column
from mytable
which are supposed to be smaller data types.
However, any of thoses solutions do not work and i still have int64 types in my pandas dataframe.
I did not see any parameter to add in my snowflake connector, or in the query. I Know I can cast the data type directly in python, but I would like that the transformation is directly made from snowflake.
If anyone know a solution for this, I would be very interested
Snowflake's INTEGER data types are all INT64 actually.
The various names (SMALLINT like in your example) are to simplify porting from other systems and to suggest the expected range of values for a column of the specified type, but they are still INT64.
For more information have a look here.
You will have to cast the data type directly in Python.

How to convert different currencies to single (dollars) in python, pandas

I have such a task. In the Excel file, the column contains the amount in different currencies. I need to convert all values ​​to dollars (at the rate) without saving the name of the currency, only numbers. Please tell me how to do it?
SGD85,800,000
CA$70,000,000
₹960,000,000

In which datatype should i store following string for machine learning models

I have a column in pandas. Which has dtype->object.
For the machine learning model. In which datatype should I convert the column values into ?. So that my machine learning model can recognize it.
One of the STRING value in column
'000127127124188187186188184XXX194163164XXX14'
I cannot convert it in int64 because it has
'XXX'
The string should be converted into the desired datatype
These ID's are just ordinal variables, just like your employee ID in any organisation.
They are not useful in model prediction.
Example:
Lets say you have employee data , which has employee ID and you want to predict salary.
Each employee ID has different salary, so that variable is not showing any trend in your salary, hence will be useless.
An 'object' data type refers to a string, list, dict, etc which is not an int, float, double, etc numerical data type.
Machines can only recognise numerical data (int, float, etc) and not object data types. For a machine learning model these data types have to be 'encoded' or in simple terms converted to a numerical data type because they use mathematical equations, using several available approaches like label encoding, one hot encoding, etc.
So for your dataset, based on the columns, you have to convert these values to numerical data types using one of the above approaches.

Need help Manipulating a dataset for panelOLS regression in Python

My dataset is in an odd format and I have no clue how to fix it (I have tried and read a lot of similar questions but to no avail). Each column is a firm name (e.g. AAPL, AMZN, FB) and the first row is a list of each category of data. Basically each column has a firm name, then the entry below is a code (e.g. trading volume, market value, price), and then the respective data with an index of dates (monthly). How can I appropriately manipulate this so I can filter the data for a panel regression? Example: using each column of trading volume regressed on each column of earnings per share?
It sounds like you may need to learn how to select columns from Pandas MultiIndex, and perhaps how to create a MultiIndex. You may also benefit from learning how to reshape your data in order to run your panel regression.
If you provide a small sample of your data with the correct format, it will be easier to provide more specifics.

Categories

Resources