I want to replace the commas in my dataframe, particularly in the Generation column, with nothing. There's no error but it doesn't get rid of the commas. I want to be able to change the dtype of the generation column to numeric without having to do error = 'coerce'.
enter image description here
I'm pretty sure I've used this same code before and it has worked, I'm not sure why today it isn't.
Related
I'm starting with a CSV exported from a system with 3 columns, the first column is displaying a number in scientific notation. I need to transform only that column to a number and save to another CSV. Note there are thousands of lines, converting using Excel is not an option.
I have found many articles close to this, using "float", using "round", but I haven't found anything that can handle a large file.
Example, file1.csv:
ID, Phone, Email
1.23E+15, 123-456-7890, johnsmith#test.com
Need the output to file2.csv:
ID, Phone, Email
1234680000000000, 123-456-7890, johnsmith#test.com
I know I'm way off, but this may give you an idea of what I'm trying to accomplish...
import pandas
import numpy as np
pandas.read_csv('file1.csv', dtype=np.float64)
df = df.apply(pd.to_numeric, errors='coerce')
df.round(0)
df.to_csv(float_format='file2.csv')
Here is the error I receive:
error
The text in your CSV file, "1.23E+15", means "one-point-two-three, raised to the 15th power"... that's all Python, Pandas, anything (but you) can know about that number.
I say "but you", because you seem to know that before "1.23E+15", there was the value 1234680000000000.
But, then some other program/process chopped off the "46800..." part and all it left was "1.23E+15"—something decreased the precision of the original value.
That's why #TimRoberts asked "How was this generated?" To get back 1234680000000000, you need to go to the program/process that last had that higher-precision value and try to change that program/process to not decrease the precision of the number.
I am pulling data from an api, converting it into a Python Object, and attempting to convert it to a dataframe. However, when I try to unpack Python Object, I am getting extra rows than are in the Python Object.
You can see in my dataframe how on 2023-02-03, there are multiple rows. One row seems to be giving me the correct data while the other row is giving me random data. I am not sure where the extra row is coming from. I'm wondering if it has something to do with the null values or whether I am not unpacking the Python Object correctly.
My code
I double checked the raw data from the JSON response and don't see the extra values there. On Oura's UI, I checked the raw data and didn't notice anything there either.
Here's an example of what my desired output would look like:
enter image description here
Can anyone identify what I might be doing wrong?
This is the code I'm using and I have also tried converting my datatype of my columns which is object to float but I got this error
df = pd.read_csv('DDOSping.csv')
pearsoncorr = df.corr(method='pearson')
ValueError: could not convert string to float:
'172.27.224.251-172.27.224.250-56003-502-6'
Somewhere in your CSV this string value exists '172.27.224.251-172.27.224.250-56003-502-6'. Do you know why it's there? What does it represent? It looks to me like it shouldn't be in the data you include in your correlation matrix calculation.
The df.corr method is trying to convert the string value to a float, but it's obviously not possible to do because it's a big complicated string with various characters, not a regular number.
You should clean your CSV of unnecessary data (or make a copy and clean that so you don't lose anything important). Remove anything, like metadata, that isn't the exact data that df.corr needs, including the string in the error message.
If it's just a few values you need to clean then just open in excel or a text editor to do the cleaning. If it's a lot and all the irrelevant data to be removed is in specific rows and/or columns, you could just remove them from your DataFrame before calling 'df.corr' instead of cleaning the file itself.
I have got a dataset that has been revised before its release. However, its accessory codes had not been revised and now faces this error.
The code expects a data frame explaining the features of 255 homes, though the item is just a messy string that has no exact delimiter to convert it!
I showed the error, the types of the items in the new dataset and the content of the string in this [picture][1].
I'm sure there's a better way, but I use this trick to get dataframes to work with from poorly formatted SO questions.
Print the string (to let print take care of things like return characters, '\n'), then select-all and copy it. Then use:
df = pd.read_clipboard("\s\s+")
Sometimes I have to manually adjust the spacing a little bit between a few column names for it to work correctly, but it is unreasonably effective.
I would like to apply conditional formatting to my spreadsheet, but instead of iterating over every single cell (of which there are very many) I want to apply the formatting to the entire row.
I have been trying to figure out how to just select a row. I looked on here (scroll a bit less than halfway down to 'Accessing many cells'), and it tells me
row10 = ws[10]
but when I try to do that I get the error message saying
TypeError: argument of type 'int' is not iterable
which I take to mean that it is not a valid range. I messed around with it a bit, but could not make it work.
So at this point I have no idea what to do. Hopefully someone can help.