How to strip a comma in the middle of a large number? - python

I want to convert a str number into a float or int numerical type. However, it is throwing an error that it can't, so I am removing the comma. The comma will not be removed, so I need to find a way of finding a way of designating the location in the number space like say fourth.
power4 = power[power.get('Number of Customers Affected') != 'Unknown']
power5 = power4[pd.notnull(power4['Number of Customers Affected'])]
power6 = power5[power5.get('NERC Region') == 'RFC']
power7 = power6.get('Number of Customers Affected').loc[1]
power8 = power7.strip(",")
power9 = float(power8)
ValueError Traceback (most recent call last) <ipython-input-70- 32ca4deb9734> in <module>
6 power7 = power6.get('Number of Customers Affected').loc[1]
7 power8 = power7.strip(",")
----> 8 power9 = float(power8)
9
10
ValueError: could not convert string to float: '127,000'

Use replace()
float('127,000'.replace(',',''))

Have you tried pandas.to_numeric?
import pandas as pd
a = '1234'
type(a)
a = pd.to_numeric(a)
type(a)

In the
power8 = power7.strip(",")
line, do
power8 = power7.replace(',', '')
strip() will not work here. What is required is replace() method of string. You may also try
''.join(e for e in s if e.isdigit())
Or,
s = ''.join(s.split(','))
RegeEx can also be a way to solve this, or you can have a look at this answer : https://stackoverflow.com/a/266162/9851541

Related

How to convert str to a float?

I imported a list full of floats as strings, and i tried to convert them to floats, but this error kept popping up
Traceback (most recent call last):
File "c:\Users\peter\Documents\coding\projects\LineFitting.py", line 12, in <module>
StockPriceFile = float(value.strip(''))
ValueError: could not convert string to float:
this is what i did to try and convert the list:
#1
for value in range(0, len(StockPriceFile)):
StockPriceFile[value] = float(StockPriceFile[value])
#2
for value in StockPriceFile:
value = float(value)
#3
StockPriceFile[0] = StockPriceFile[0].strip('[]')
for value in StockPriceFile:
StockPriceFile = float(value.strip(''))
(Sample Of Data)
['[36800.]', '36816.666666666664', '36816.666666666664', '36833.333333333336', '36866.666666666664']
where its being written:
Data_AvgFile.write(str(Average) + ',')
What does this mean? and how can i fix it? it works fine when i do it one by one.
(also tell me if you need more data, i dont know if this is sufficient)
for value in StockPriceFile:
stock_price = float(value.strip('[]'))
print(stock_price)
strip() will remove the [] characters around the value.
DEMO
As long you have the brackets "[ ]" in you'r string you cant convert it to a a number as that would make it invalid so do letters and most symbols the dot (.) is an exception for float.
>>> print(float('[36800.]'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: '[36800.]'
>>> print(float('36800.'))
36800.0
l = ['[36800.]', '36816.666666666664', '36816.666666666664', '36833.333333333336', '36866.666666666664']
[float(f.strip('[]')) for f in l]
Output:
[36800.0,
36816.666666666664,
36816.666666666664,
36833.333333333336,
36866.666666666664]

splitting multiple lines using str.split

Trying to split a multiline paragraph using str.split single line split works correctly. Is str.split the correct way to split multiple lines what am I missing here?
single line split working correctly example:
dmap_lines = """Nople Normal Altar1-truck-Altar2,Altar2-train-Cansomme,Cansomme-flight-Karoh,Karoh-truck-Nople"""
destinations = []
remainders1 = []
stages = []
for line in dmap_lines:
destination, remainder1, remainder = dmap_lines.split(' ')
destinations.append(destination)
remainders1.append(remainder1)
remainder = remainder.split(',')
stages.append(remainder)
print(destination)
print(remainder1)
print(type(remainder))
print(remainder)
Expected Output:
Nople
Normal
<class 'list'>
['Altar1-truck-Altar2', 'Altar2-train-Cansomme', 'Cansomme-flight-Karoh', 'Karoh-truck-Nople']
with multiline code:
dmap_lines = """Nople Normal Altar1-truck-Altar2,Altar2-train-Cansomme,Cansomme-flight-Karoh,Karoh-truck-Nople\nDria Normal Altar1-truck-Altar2,Altar2-train-Mala1,Mala1-truck-Mala2,Mala2-flight-Dria"""
destinations = []
remainders1 = []
stages = []
for line in dmap_lines:
destination, remainder1, remainder = dmap_lines.split(' ')
destinations.append(destination)
remainders1.append(remainder1)
remainder = remainder.split(',')
stages.append(remainder)
print(destination)
print(remainder1)
print(type(remainder))
print(remainder)
Receiving error in output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-78-9eb9f8fa1c64> in <module>
4 stages = []
5 for line in dmap_lines:
----> 6 destination, remainder1, remainder = dmap_lines.split(' ')
7 destinations.append(destination)
8 remainders1.append(remainder1)
ValueError: too many values to unpack (expected 3)
Expected output:
Nople
Normal
<class 'list'>
['Altar1-truck-Altar2', 'Altar2-train-Cansomme', 'Cansomme-flight-Karoh', 'Karoh-truck-Nople']
Dria
Normal
<class 'list'>
['Altar1-truck-Altar2,Altar2-train-Mala1,Mala1-truck-Mala2,Mala2-flight-Dria']
Why is the for loop not iterating over multiple lines and splitting the string into the sections?
Is there any reason why re.findall would not be a better approach here:
dmap_lines = "Nople Normal Altar1-truck-Altar2,Altar2-train-Cansomme,Cansomme-flight-Karoh,Karoh-truck-Nople"
matches = re.findall(r'\b\w+-\w+-\w+\b', dmap_lines)
print(matches)
This prints:
['Altar1-truck-Altar2', 'Altar2-train-Cansomme', 'Cansomme-flight-Karoh',
'Karoh-truck-Nople']
To get a single CSV string, use join:
csv = ','.join(matches)
print(csv)
This prints:
Altar1-truck-Altar2,Altar2-train-Cansomme,Cansomme-flight-Karoh,Karoh-truck-Nople

How to format python string with multiple characters as pading part

This is all good to pad a single character:
>>> '{:{pad}>{num}}'.format('12345',num='10', pad='a')
aaaaa12345
However, how to print out abcab12345, by using 'abc' as padding characters?
this is bad:
>>> '{:{pad}>{num}}'.format('12345',num='10', pad='abc')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-38-85d5680ad88a> in <module>()
----> 1 '{:{pad}>{num}}'.format('hello',num='10', pad='abc')
ValueError: Invalid format specifier
I like the mini format language in python3 BTW ;-)
You should concatenate the format operations (TIP: you need to ward your {} with doubles {{}} for each format nesting level):
baseFmtStr = "'{{{{:{{pad}}>{num}}}}}'"
resultStr = baseFmtStr.format(num=10).format(pad='-').format(12345)
This let us to result '-----12345'
Here you have a live example

Summing a column in csv using Python

I work with large csv files and wanted to test if we can sum a numeric
column using Python. I generated a random data set:
id,first_name,last_name,email,gender,money
1,Clifford,Casterou,ccasterou0#dropbox.com,Male,53
2,Ethyl,Millichap,emillichap1#miitbeian.gov.cn,Female,58
3,Jessy,Stert,jstert2#gnu.org,Female,
4,Doy,Beviss,dbeviss3#dedecms.com,Male,80
5,Josee,Rust,jrust4#epa.gov,Female,13
6,Hedvige,Ahlf,hahlf5#vkontakte.ru,Female,67
On line 3 you will notice that value is missing(i removed that data on
purpose to test.)
I wrote the code :
import csv
with open("mock_7.txt","r+",encoding='utf8') as fin:
headerline = fin.readline()
amount = 0
debit = 0
value = 0
for row in csv.reader(fin):
# var = row.rstrip()
value =row[5].replace('',0)
value= float(value)
debit+=value
print (debit)
I got the error :
Traceback (most recent call last):
File "sum_csv1_v2.py", line 11, in <module>
value+= float(value)
TypeError: must be str, not float
As i am new to Python, my plan was to convert the empty cells with zero but I think i am missing something here. Also my script is based on comma separated files but i'm sure it wont work for other delimited files. Can you help me improve this code?
The original exception, now lost in the edit history,
TypeError: replace() argument 2 must be str, not int
is the result of str.replace() expecting string arguments, but you're passing an integer zero. Instead of replace you could simply check for empty string before conversion:
value = row[5]
value = float(value) if value else 0.0
Another option is to catch the potential ValueError:
try:
value = float(row[5])
except ValueError:
value = 0.0
This might hide the fact that the column contains "invalid" values other than just missing values.
Note that had you passed string arguments the end result would probably not have been what you expected:
In [2]: '123'.replace('', '0')
Out[2]: '0102030'
In [3]: float(_)
Out[3]: 102030.0
As you can see an empty string as the "needle" ends up replacing around each and every character in the string.
The latest exception in the question, after fixing the other errors, is the result of the float(value) conversion working and
value += float(value)
being equal to:
value = value + float(value)
and as the exception states, strings and floats don't mix.
Problem with your code is you're calling replace() without checking if its row[5] is empty or not.
Fixed code:
import csv
with open("mock_7.txt","r+",encoding='utf8') as fin:
headerline = fin.readline()
amount = 0
debit = 0
value = 0
for row in csv.reader(fin):
# var = row.rstrip()
if row[5].strip() == '':
row[5] = 0
value = float(row[5])
value += float(value)
debit += value
print (debit)
output:
542.0

Convert re.match/re.search to string

I've been looking through having re.match/re.search find a certain int from my file. The int will differ, which is why I'm using regex in the first place. Here is the file:
Money:
*1,000 coins
*2 dollars
And my code:
import re
amount = 2
price = 500 * amount
with open("money.txt", "r") as money:
moneyc = money.read()
moneyc = moneyc.strip("Money:")
moneyc = re.search("(\*[^0,][0-9]{0,3})?(,[0-9]{3})?(,[0-9]{3})?", moneyc)
moneyleft = re.sub("(\*[^0,][0-9]{0,3})?(,[0-9]{3})?(,[0-9]{3})? coins", "*"+str(int(moneyc.replace("*", "").replace(",", "")) - price)+" coins")
money.write("Money\n"+moneyleft)
Returns the error:
Traceback (most recent call last):
File "C:/***/money.py", line 8, in <module>
moneyleft = re.sub("(\*[^0,][0-9]{0,3})?(,[0-9]{3})?(,[0-9]{3})? coins", "*"+str(int(moneyc.replace("*", "").replace(",", "")) - price)+" coins")
AttributeError: '_sre.SRE_Match' object has no attribute 'replace'
And it's just because regex match isn't a string, however since I need to turn it into a string somehow, how would I go about it?
What I want the file to be afterwards is:
Money:
*0 coins
*2 dollars
Due to the fact that the price is 500 * amount, and amount is 2. Why I keep "coins" in my re.sub is because there's also dollars.
You have a couple of issues there:
You open the file only for reading with the modifier r, you should use r+.
Use the locale.atoi function to validate and convert comma-separated integers.
Take a look at this code:
import locale
import re
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
recoins = re.compile(r'\*(\S+) coins')
amount = 2
price = 500 * amount
with open('money.txt', 'r+') as money:
text = money.read()
coins = recoins.search(text).group(1)
newcoins = locale.atoi(coins) - price
money.seek(0)
money.truncate()
money.write(recoins.sub('*{:,} coins'.format(newcoins), text))
def Money1_to_Money2(i) :
Amount = i * 5
print (Amount)
Money1_to_Money2 (10)
This is a simple currency to currency code.
Just add the amount of money you want converted into the lower parentheses, then add the conversion factor where the 5 is. If you want it more organised put your currency to currency names instead of money1 and money2; i equals the amount of money you have in the lower parentheses which is multiplied by conversion factor.
The object returned from re.search function is a match object not a string.
That's why you get the error:
AttributeError: '_sre.SRE_Match' object has no attribute 'replace'
To get the matching string after using re.search try:
moneyc = moneyc.group()
Then moneyc.replace will work.

Categories

Resources