I have a csv file containing the pointing personal. of this form:
3,23/02/2015,08:27,08:27,12:29,13:52,19:48
3,24/02/2015,08:17,12:36,13:59,19:28
5,23/02/2015,10:53,13:44
5,25/02/2015,09:05,12:34,12:35,13:30,19:08
5,26/02/2015,08:51,12:20,13:46,18:47,18:58
and I want it cleaning. in this way:
ID, DATE, IN,BREAK_OUT, BREAK_IN, OUT, WORK_TIME
3,Monday 23/02/2015,08:27,12:29,13:52,19:48,08:00hours
3,Tuesday 24/02/2015,08:17,12:36,13:59,19:28,08:00hours
5,Monday 23/02/2015,10:53,NAN,13:44,NAN,2houres
5,Wednesday 25/02/2015,09:05,12:34,13:30,19:08,08hours
can you help me please
think you
I'd suggest you use pandas to import the data from the file
import pandas as pd
pd.read_csv(filepath, sep = ',')
should do the trick, assuming filepath leads to your csv. I'd then suggest that you use the datetime functions to convert your strings to dates you can calculate with (I think you could also use numpys datetime64 types, I'm just not used to them).
import datetime as dt
day = dt.datetime.strptime('23/02/2015', '%d/%m/%Y')
in = dt.datetime.combine(day, dt.datetime.strptime('08:27', '%H:%M').time())
should do the trick. It is necessary, that your in is also a datetime object, not only a time object, otherwise you cannot substract them (which would be the necessary next step to calculate the Worktime.
Is think this should be a bit to get you started, You'll find the pandas documentation here and the datetime documentation here.
If you have further questions, try to ask your question more specific.
This question might help you out: How to split string into column
First, read the whole file and split the columns. Check if there's data or not and write it back into a new file.
If you need additional help, tell us what you tried, what worked for you and what didn't and so on. We won't write a complete program/script for you.
Related
I'm trying to write a daily csv file but would like the title of the csv file to specify today's date.
It seems to be failing every time but I'm sure there's an easy way to do this..?
Hopefully this isn't a duplicate but can't seem to find another question similar.
At the minute I've just tried this;
from datetime import date
morningupdate.to_csv('morningupdate' + '_' + date.today() '.csv')
My brain is completely broken with this, any help much appreciated!
Does this solve your problem?
from datetime import date
morningupdate.to_csv(f'morningupdate_{date.today()}.csv')
You are trying to concatenate string with a datetime object.
You can use f string to manage the problem:
from datetime import date
path = f'morningupdate_{date.today()}.csv'
morningupdate.to_csv(path)
I have a pandas dataframe containing columns in timedelta64[ns] format.
For this project I cannot use df.to_excel() and I need to import the dataframe via xlwings so that it prints into existing Excel Workbook and keeps its format.
When I try the usual:
workbook_sablona_dochazka.sheets[zamestnanec].range('A1').options(index=False).value = individual_dochazka_zamestnanec
I receive error:
TypeError: must be real number, not Timedelta
Is there a way to format my timedelta64[ns] so that xlwings would be able to import the dataframe in? I need to preserve my time values so that it becomes 12:30:00 in Excel again after xlwings import and maybe some after-formatting inside the Excel itself.
I tried:
individual_dochazka_zamestnanec['Příchod do práce'] = individual_dochazka_zamestnanec['Příchod do práce'].values.astype(float)
This worked around the error but imported columns had totally out of sense numbers.
Any idea how to work around this?
Thank you very much in advance!
If the error says it's a "TimeDelta", then you have to ask delta relative to what? Usually a TimeDelta indicates something like "three hours" or "minus two days". You say you'd like the output to be "12:30:00" is that an actual time or does it mean 12 hours 30 minutes and no seconds?
You could try making the TimeDelta relative to "the beginning of time" so that it's a date which can be imported into xlwings but is formatted like a time as suggested here.
Just managed to figure it out.
The trick was to first format timedelta64[ns] columns as string and then trim it a bit with .map(lambda x:str(x)[7:])) so that I would get that nice time only stamp.
individual_dochazka_zamestnanec['Počet odpracovaných hodin celkem'] = ((individual_dochazka_zamestnanec['Počet odpracovaných hodin celkem']).astype(str)).map(lambda x: str(x)[7:])
To my surprise, Excel accepted this without issue which is exactly what I needed.
Hope this helps someone, sometime.
Cheers!
Taking data from a CSV file, trying to change the dates to the right format (ie change from 2016-12-25 to 2016-12)
This is the code right now:
import csv
csvfile = open('XML_project.csv')
linesreader = csv.reader(csvfile, delimiter=',')
from re import sub
from decimal import Decimal
import statistics as s
import datetime
date = []
for l in linesreader:
#date manipulation
temp = l[4]
temp_two = datetime.datetime.strptime(temp, "%Y-%m")
date.append(twmp_two)
csvfile.close
It says the file has unconverted data and I don't know how to fix it
Please edit your post to put triple backticks around your code - it will preserve the formatting and indentation:
```
your code
goes here
```
That said, I think your problem is the call to strptime(). If you also include the full stack trace that came with the error this should be evident (with the strptime() call at the top of the stack). It might also help to see a few lines of example CSV data.
Anyway, you have this:
temp_two = datetime.datetime.strptime(temp, "%Y-%m")
Suppose you have the date 2021-12-25: strptime will only match the 2021-12 part. That is no doubt your aim, but strptime likes to parse the entire string - that way you have more confidence that you have a correct "%Y-%m" format string.
So you want:
temp_two = datetime.datetime.strptime(temp, "%Y-%m-%d")
That should match the whole date field (therefore no error). Then you want to produce a new YYYY-mm style string for your dates, from that datetime object, like this:
yyyy_mm = temp_two.strftime("%Y-%m")
which you can then store:
date.append(yyyy_mm)
Minor other remarks:
it is normal to put all the imports at the top of your file (makes them easy to see)
it is typical to import specific names from the datetime module because of the unfortunately same-named datetime class
Eg:
from datetime import datetime
which lets you use datetime.strptime() instead of the more cumbersome datetime.datetime.strptime().
I am rather new to Python and I am working with .pos files. They are not that common, but I can explain their structure.
There is a header with general information and then 15 different columns containing data.The first two columns contain the GPS time (the date the first column and the time in the second - standard format YYYY/MM/DD hh:mm:ss.ms), then there are 3 columns containing coordinates or distances in meters and then other columns that are other measurements, always numbers. Here can be found an example, mind only that my GPST (gps time) is as explained above.
As a matter of fact, there are three data types in this file, that are datetime, integer, and floating numbers.
I need to import this file in Python as an array. Apparently, Python can consider .pos file as a text file, so I have tried to to use the loadtext() command, specifying the different data types (datetime64, int, float). However, it gave me an error, saying that the date format could not be recognized. Then, I tried with the command genfromtext(), both specifying the data types and with dtype=None. In the first case I got empty columns for date and time and in the latter case I got the date and time as a string.
I would like the date and the time to be recognized as such and not as a string, as I will need it later on for further analyses. Does someone have an idea on how I could import this file correctly?
Please, just try to be clear because I am a neophyte programmer!
Thank you for your help.
I answer my own question, maybe it is useful to someone.
.pos file can be open using the Pandas package as follows:
import pandas as pd
df = pd.read_table(filepath, sep='\s+', parse_dates={'Timestamp': [0, 1]})
In my data, the first two columns are date and time, which is considered as such by the argument "parse_dates={'Timestamp': [0, 1]}"
I have a csv file containing numerical values such as 1524.449677. There are always exactly 6 decimal places.
When I import the csv file (and other columns) via pandas read_csv, the column automatically gets the datatype object. My issue is that the values are shown as 2470.6911370000003 which actually should be 2470.691137. Or the value 2484.30691 is shown as 2484.3069100000002.
This seems to be a datatype issue in some way. I tried to explicitly provide the data type when importing via read_csv by giving the dtype argument as {'columnname': np.float64}. Still the issue did not go away.
How can I get the values imported and shown exactly as they are in the source csv file?
Pandas uses a dedicated dec 2 bin converter that compromises accuracy in preference to speed.
Passing float_precision='round_trip' to read_csv fixes this.
Check out this page for more detail on this.
After processing your data, if you want to save it back in a csv file, you can passfloat_format = "%.nf" to the corresponding method.
A full example:
import pandas as pd
df_in = pd.read_csv(source_file, float_precision='round_trip')
df_out = ... # some processing of df_in
df_out.to_csv(target_file, float_format="%.3f") # for 3 decimal places
I realise this is an old question, but maybe this will help someone else:
I had a similar problem, but couldn't quite use the same solution. Unfortunately the float_precision option only exists when using the C engine and not with the python engine. So if you have to use the python engine for some other reason (for example because the C engine can't deal with regex literals as deliminators), this little "trick" worked for me:
In the pd.read_csv arguments, define dtype='str' and then convert your dataframe to whatever dtype you want, e.g. df = df.astype('float64') .
Bit of a hack, but it seems to work. If anyone has any suggestions on how to solve this in a better way, let me know.