Format French number into English number - Python - python

I need to convert French formatted numbers extracted from .csv into English formatted numbers so I can use dataframe functions. The .csv gives:
Beta Alpha
2014-07-31 100 100
2014-08-01 99,55 100,01336806
2014-08-04 99,33 100,05348297
2014-08-05 99,63 100,06685818
2014-08-06 98,91 100,08023518
"99,5" & "100,01336806" are actually objects for python.
I need to turn them into floats with the following format "99.5" and "100.01336806"
I tried:
df = df.str.replace(to_replace =',', value = '.', case = False)
Doesn't give my any error for that code line but doesn't switch the ',' into '.' either.
df = pd.to_numeric(df, error = 'coerce')
TypeError: arg must be a list, tuple, 1-d array, or Series
Also tried the regex module without success, and I would rather use built-in function if possible.
Any help welcome!

What is the type of sources objects are "99,5" & "100,01336806", and what type of target objects do you want ?
The following tested with Python 3.8
Case 1: source object is numeric, target is string. Formatting do not allow "French format", only "English". So have to substitute . and ,
Eg. (float) 99.55 -> (string) '99,55'
v1 = float(99.55)
f"{v1:,.2f}"
'99.55'
f"{v1:,.2f}".replace(".",",")
'99,55'
Case 2: source is string with "English" format, target is float. Means the , must be replaced by a . first before converting the string to float
Eg. (string) '99,55' -> (float) 99.55
v2 = "99,55"
float(v2.replace(",","."))
99.55

try using the replace() function.
x = "100,01336806"
y = x.replace(",",".")
print(y)

Related

How to replace a substring in one column based on the string from another column?

I'm working with a dataset of Magic: The Gathering cards. What I want is if a card references it's name in it's rules text, for the name to be replaced with "This_Card". Here is what I've tried:
card_text['text_unnamed'] = card_text[['name', 'oracle_text']].apply(lambda x: x.oracle_text.replace(x.name, 'This_Card') if x.name in x.oracle_text else x, axis = 1)
This is giving me the error "TypeError: 'in ' requires string as left operand, not int"
I've tried with axis = 1, 0 and no axis. Still getting errors.
In editing my code to output what x.name is, it has revealed that it is just the int 2. I'm not sure why this is happening. Everything in the name column is a string. What is causing this interaction and how can I prevent it?
Here is a sample of my data.
Series.name is a built-in attribute, so it won't access the column when you call x.name. Instead, you need use x['name'] to access name column
What's more efficient is to conditionally replace with a mask rather than apply
m = card_text['oracle_text'].str.contains(card_text['name'])
card_text[m, 'text_unnamed'] = card_text['oracle_text'].replace(card_text['name'].tolist(), 'This_Card', regex=True)
x.name isn't always a string so you cant perform <int> in <string>
I can't say for sure without seeing the data.
but I guess adding this line before your code will do it
card_text[['name', 'oracle_text']] = card_text[['name', 'oracle_text']].astype(str)
which simply convert all data in both columns to strings

Extract value in specific range

I have one dataset with several column:
data-pioggia-name.....
I would like to get values, within of the column pioggia, between 0 and 400.
I tried with:
start='0'
end='400'
data = (data['pioggia']>start)&(data['pioggia']<=end)
but I have error: ">" not supported between instances of 'str' and 'int'
I tried also:
data = data['pioggia'].between(0,400, inclusive=True)
but I have the same error.
There is some solution? also for example with replace?
Try adding this line:
data['pioggia'] = data['pioggia'].astype(int)
Also, make your start and end variables be ints (e.g. 0) instead of strings (e.g. '0').
Like this:
start = 0 # Notice this and `end` are ints, not strings
end = 400
data['pioggia'] = data['pioggia'].astype(int)
data = (data['pioggia']>start)&(data['pioggia']<=end)

How to change html hyperlink based on python output?

Hi I'm new to using html and python. But I need to use html and python interchangeably.
For example,
if python output = 30302,
then I need to put '30302' in the hyperlink.
www.google.com/< output> = www.google.com/30302
html = 'www.google.com/'
python = < output>
how would I combine those two?
The problem
I guess that you wanna create a new string from two parts, i.e. you have a string "www.google.com/" and a variable output with integer 30302 and you wanna get the "www.google.com/30302" (for the future, always provide full examples of your code).
So how can you do it?
Convert int to str and concatenate strs
result = "www.google.com/" + str(output)
str(x) will turn x into a string
Formatting
"www.google.com/{}".format(31415)" is equivalent to the "www.google.com/31415" string,
so result = "www.google.com/{}".format(output)" also will work
in python 3 we also have f-stings:
f"www.google.com/{31415}" == "www.google.com/31415"
result = f"www.google.com/{output}"

How do I present my output as a Pandas dataframe?

CHECK_OUTPUT_HERE
Currently, the output I am getting is in the string format. I am not sure how to convert that string to a pandas dataframe.
I am getting 3 different tables in my output. It is in a string format.
One of the following 2 solutions will work for me:
Convert that string output to 3 different dataframes. OR
Change something in the function so that I get the output as 3 different data frames.
I have tried using RegEx to convert the string output to a dataframe but it won't work in my case since I want my output to be dynamic. It should work if I give another input.
def column_ch(self, sample_count=10):
report = render("header.txt")
match_stats = []
match_sample = []
any_mismatch = False
for column in self.column_stats:
if not column["all_match"]:
any_mismatch = True
match_stats.append(
{
"Column": column["column"],
"{} dtype".format(self.df1_name): column["dtype1"],
"{} dtype".format(self.df2_name): column["dtype2"],
"# Unequal": column["unequal_cnt"],
"Max Diff": column["max_diff"],
"# Null Diff": column["null_diff"],
}
)
if column["unequal_cnt"] > 0:
match_sample.append(
self.sample_mismatch(column["column"], sample_count, for_display=True)
)
if any_mismatch:
for sample in match_sample:
report += sample.to_string()
report += "\n\n"
print("type is", type(report))
return report
Since you have a string, you can pass your string into a file-like buffer and then read it with pandas read_csv into a dataframe.
Assuming that your string with the dataframe is called dfstring, the code would look like this:
import io
bufdf = io.StringIO(dfstring)
df = pd.read_csv(bufdf, sep=???)
If your string contains multiple dataframes, split it with split and use a loop.
import io
dflist = []
for sdf in dfstring.split('\n\n'): ##this seems the separator between two dataframes
bufdf = io.StringIO(sdf)
dflist.append(pd.read_csv(bufdf, sep=???))
Be careful to pass an appropriate sep parameter, my ??? means that I am not able to understand what could be a proper parameter. Your field are separated by spaces, so you could use sep='\s+') but I see that you have also spaces which are not meant to be a separator, so this may cause a parsing error.
sep accept regex, so to have 2 consecutive spaces as a separator, you could do: sep='\s\s+' (this will require an additional parameter engine='python'). But again, be sure that you have at least 2 spaces between two consecutive fields.
See here for reference about the io module and StringIO.
Note that the io module exists in python3 but not in python2 (it has another name) but since the latest pandas versions require python3, I guess you are using python3.

can't convert string to integer-python

I have a string within a tuple like this:
params': {
'rtinseconds': '57.132',
**'charge': '3+'**,
'pepmass': (822.6547241, None),
title': '20130630_006.d, MS/MS of 822.6547241 3+ at 0.9522 mins'
}
I am trying to read and convert the value of charge '3+' to integer value 3.
I tried the following code where I read the first character in the string and stored it in a separate variable, then tried to convert it to int, but does not work. The type of 3 is still str. Does anyone have any suggestions?
temp_z = item['params']['charge']
z = temp_z[0:1]
str(z)
int(z)
In the simple case:
z = int(params['charge'].replace('+',''))
However if it is possible that your item may have a negative charge you may want:
if '+' in params['charge']:
z = int(params['charge'].replace('+',''))
else:
z = -int(params['charge'].replace('-',''))

Categories

Resources