How to convert different currencies to single (dollars) in python, pandas

How to convert different currencies to single (dollars) in python, pandas - python

I have such a task. In the Excel file, the column contains the amount in different currencies. I need to convert all values to dollars (at the rate) without saving the name of the currency, only numbers. Please tell me how to do it?
SGD85,800,000
CA$70,000,000
₹960,000,000

Related

How to read, groupby and calculate over a large CSV file in Python

I have a large CSV file(300mb) with data about accidents based on pincodes/zipcodes. The file has basically header and comma separated values. Key fields are Month, Date, Year, Pincode, Count.
Count represents the accident count for that pincode, however each pincode can get several entries through the day say every few hours. So I want to be able to calculate the max accidents per pincode on a given date i.e I need to group by Month, Date, Year, Pincode and then sum over count after grouping?
I have an idea of how to do this if I loaded the large-ish file into a database or a cloud service such as GCP BigQuery but I want to be able to do this with Python/Pandas dataframes and then store the metrics I am calculating in a table. Is this approach possible with Pandas, if not then possibly PySpark is my last option but that involves the overhead of having to setup a Hadoop etc.
I am open to any other ideas as I am a PyNovice :)
Thank you

You can signup for Databricks Community Edition (for free), in which you can easily have a Spark-ready environment, also easy enough to upload your CSV file.

How do I convert the value of a pyspark dataframe column?

I have a column in a pyspark dataframe for the age of an electronic device, and these values are given in milliseconds. Is there an easy way to convert that column's values to years? I am not well versed in Spark.
EDIT: I understand that you can convert milliseconds to years pretty easily with basic math, I'm trying to take a column of a pyspark dataframe and iterate through it and convert all column values to a different value. Is there a specific pyspark function that makes this easier or no? I have a column where all values are very large integers with time in milliseconds, and I am trying to filter out values which are too small or large to make sense based on the lifespan of the device.
table.filter(F.col("age")>0).filter(F.col("age")<yearsToSeconds(20))
where yearsToSeconds is a very basic function converting years to seconds. I'd prefer being able to convert the column values to years, but I haven't worked with spark before and I don't know an optimal way to do that.

well, one way is to use withColumn.
here I'm demonstrating adding a new column called "ageinMin" to dataframe and calculate it based on "age" column from dataframe and dividing it by 600 to get equivalent minutes:
df.withColumn("ageinMin",col("age") /600)

Working with Hours & Minutes in Python & openpyxl - Excel Formatting vs Strings & Ints

Hoping someone with some openpyxl or general Excel experience might be able to help.
I'm working on a project to record flying hours, and produce an Excel spreadsheet of flights completed in a month.
So far, I've used PySimpleGUI to create a nice front end, and got it working so it stores each flight's details as a dictionary, where the keys are terms like the names of the crew, the aircraft registration and so on. Each flight is separately stored in a dictionary for the current month.
To make sure the hours flown make sense, I've used number spinners so they can't get nonsense inputs. Each type of flying hour is recorded as 2 keys, one for hours and one for minutes. So the dictionary has a section with parts like:
'-firstPilotHours-': 1,
'-firstPilotMins-': 30,
'-captainHours-': 1,
'-captainMins-': 30,
.. and so on.
I've managed to get these put into Excel by converting them to strings and the concatenating them with a colon in the middle:
ws1.cell(row=sortieIdent, column=9).value = str(currentMonth[sortie]["-captainHours-"]) + ":" + str(currentMonth[sortie]["-captainMins"])
... so it appears as "1:30" in Excel, which is the way I used to input the data when I ran a manual Excel file for this purpose.
The cell's number format is set as "[h]:mm" to allow me to perform calculations on the values as hours and minutes, so there can be a monthly total shown and so on.
However, this is the point where I'm stuck. I think because I'm converting them to strings, even though they look like "1:30" in Excel, they're being handled in Excel like a string and not an integer. It's not possible to perform any calculations with them. If I overtype them in Excel with "1:30," then they move to the right hand side of the column and start behaving like numbers.
I can't think of any way to get these into Excel in a manner where I can carry out calculations on them. Can anyone help?
I've thought about having separate columns for the hours and minutes, but I can't figure out how to work calculations in that manner either. I also thought about just displaying them as strings as it works now, but doing the calculations in Python; but I can't figure out how to do proper "hours & minutes" calculations within Python.

Hopefully this gives you some insight into Excel's date/time processing.
For a given decimal number that represents date/time...
The left side of the decimal represents the days since 1/0/1900.
The right side of a decimal number represents time.
Practical Example:
The date time I am writing this post is: 24 May 2020 # 8:02pm
Excel's underlying value is: 43975.8347222222
43975 days since 1/0/1900 = 24 May 2020
.8347222222 = Decimal portion of 24 hours
Having that foundation down (let me know if it isn't clear).
Now we can tackle your example.
To express 1:30 into Excel's world.
We would need to turn 1 hour and 30 minutes into a decimal day
Your 1:30 = .0625 in Excel
To show .0625, you can Format the cell as "hh:mm" or whatever you want using the custom format.
Custom Format = Display value to the user
.0625 = Underlying value Excel uses to calculate
Hope this nudges you further down the road.

Keep formatting of Excel file when loading into DataFrame in Python

I'm trying to load a Excel file into Python and keep the formatting of the columns/ data. I have numbers which are stored as text, but Python changes the formatting to numbers which causes an issues as it only shows 15 significant digits in a number, and changes digits after the 15th place to 0 (as it would do in Excel). I would like to keep the numbers as Text to have all digits.
I'm using:
myContractData = pd.read_excel(Path)
Thanks a lot a lot for your help.
Alex

If you case is based on loading data to a excel file, you need to ensure the number in the excel files has 15+ digits.
what I know in old office versions, excel only handle 15 significant digits of precision https://support.microsoft.com/en-us/kb/78113
if you use Excel load the number, the number is rounded to 15 digits automatically. That’s why in Excel files for numbers, we can only see the 15 digits of the value.
if possible, I would suggest you connecting to the data source directly in Python, or convert it to string in data source.

How to read office 2010 excelfile using openpyxl without changing style

i am new in python and i want to read office 2010 excel file without changing its style. Currently its working fine but changing date format. i want it as they are in excel file.

i want it as they are in excel file.
A date is recorded in an Excel file (both 2007+ XLSX files and earlier XLS files) as a floating point number of days (and fraction thereof) since some date in 1899/1900 or 1904. Only the "number format" that is recorded against the cell can be used to distinguish whether a date or a number was intended.
You will need to be able to retrieve the actual float value and the "number format" and apply the format to the float value. If the "number format" being used is one of the standard ones, this should be easy enough to do. Customised number formats are another matter. Locale-dependant formats likewise.
To get detailed help, you will need to give examples of what raw data you have got and what you want to "see" and how it is now being presented ("changing date format").

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.