Hey guys I need some help, my date is not in the correct format.
I made a function to convert all columns of dates it works but, it gives a return of SettingWithCopyWarning.
(https://i.stack.imgur.com/5xlT4.png)
(https://i.stack.imgur.com/hZe9f.png)
(https://i.stack.imgur.com/iglZB.png)
can you tell me how to solve this I've tried in several ways.
If your code is working and is doing its job you can always ignore the error by adding this at the top. I would not recommend it in a very large scale project.
from pandas.core.common import SettingWithCopyWarning
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)
Related
I have a pandas dataframe containing columns in timedelta64[ns] format.
For this project I cannot use df.to_excel() and I need to import the dataframe via xlwings so that it prints into existing Excel Workbook and keeps its format.
When I try the usual:
workbook_sablona_dochazka.sheets[zamestnanec].range('A1').options(index=False).value = individual_dochazka_zamestnanec
I receive error:
TypeError: must be real number, not Timedelta
Is there a way to format my timedelta64[ns] so that xlwings would be able to import the dataframe in? I need to preserve my time values so that it becomes 12:30:00 in Excel again after xlwings import and maybe some after-formatting inside the Excel itself.
I tried:
individual_dochazka_zamestnanec['Příchod do práce'] = individual_dochazka_zamestnanec['Příchod do práce'].values.astype(float)
This worked around the error but imported columns had totally out of sense numbers.
Any idea how to work around this?
Thank you very much in advance!
If the error says it's a "TimeDelta", then you have to ask delta relative to what? Usually a TimeDelta indicates something like "three hours" or "minus two days". You say you'd like the output to be "12:30:00" is that an actual time or does it mean 12 hours 30 minutes and no seconds?
You could try making the TimeDelta relative to "the beginning of time" so that it's a date which can be imported into xlwings but is formatted like a time as suggested here.
Just managed to figure it out.
The trick was to first format timedelta64[ns] columns as string and then trim it a bit with .map(lambda x:str(x)[7:])) so that I would get that nice time only stamp.
individual_dochazka_zamestnanec['Počet odpracovaných hodin celkem'] = ((individual_dochazka_zamestnanec['Počet odpracovaných hodin celkem']).astype(str)).map(lambda x: str(x)[7:])
To my surprise, Excel accepted this without issue which is exactly what I needed.
Hope this helps someone, sometime.
Cheers!
disbursementData_dropped = disbursementData.drop_duplicates(
[
"form_field_1",
"portfolio_name",
"initial_application_category_name",
"disbursement_amount",
],
keep="last",
ignore_index=True,
)
I'm new to Python and PANDAS and I'm trying to run a larger script that's been maintained with my company for several years that generates a CSV with information. However, whenever I try to run it I get the error in the title, and I have minimal PANDAS experience so I have no idea how to fix this.
Drop_duplicate description file
If you can provide a slice of your data or fabricate data, that will be much helpful.
But I attached the drop_duplicates link here for your reference.
with the new update on pandas I can't use this function that I used on on Datacamp learning course - (DAYOFWEEK doesn't exist anymore)
days_of_week = pd.get_dummies(dataframe.index.dayofweek,
prefix='weekday',
drop_first=True)
How can I change the syntax of my 'formula' to get the same results?
Sorry about the silly question but spent a lot of time here and I'm stuck...
Thanks in advance!
already tried just using the dataframe with index but doesn't get the days of the week on the get dummies\
used datetimeindex but messing up on the formulation as well
`days_of_week = pd.get_dummies(dataframe.index.dayofweek, prefix='weekday', drop_first=True)`
the dataframe is fairly big and need the outputs to get me the weekdays because I'm dealing with stock prices
Try weekday instead of dayofweek.
So
days_of_week = pd.get_dummies(dataframe.index.weekday,
prefix='weekday',
drop_first=True)
See docs below:
pandas.Series.dt.weekday
I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.
When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.
How can I get the values imported and shown exactly as they are in the source csv file?
Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip' to read_csv fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf" to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
I realise this is an old question, but maybe this will help someone else:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
In the pd.read_csv arguments, define dtype='str' and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64') .
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.
Is it clear what I am doing wrong?
I'm experimenting with pandas HDFStore.select start and stop options and it's not making a difference.
The commands I'm using are:
import pandas as pd
hdf = pd.HDFStore(path % 'results')
len(hdf.select('results',start=15,stop=20))
hoping to get a length of 4 or 5 or however it's counted, but it gives me the whole darn dataframe.
Here is a screenshot:
When writing to the h5 file, select pandas.to_hdf(<path>,<key>,format='tables') which enables subsets of the store to be selected. However, this is a bug as you should get an error.
According to Jeff (https://stackoverflow.com/users/644898/jeff),
this is a known bug and has a fix here: github.com/pydata/pandas/issues/8287
Pull requests welcome.