How to extract a month from date in csv file?

How to extract a month from date in csv file? - python

I'm trying to get an output of all the employees who worked this month by extracting the month from the date but I get this error:
month = int(row[1].split('-')[1])
IndexError: list index out of range
A row in the attendance log csv looks like this:
"404555403","2020-10-14 23:58:15.668520","Chandler Bing"
I don't understand why it's out of range?
Thanks for any help!
import csv
import datetime
def monthly_attendance_report():
"""
The function prints the attendance data of all employees from this month.
"""
this_month = datetime.datetime.now().month
with open('attendance_log.csv', 'r') as csvfile:
content = csv.reader(csvfile, delimiter=',')
for row in content:
month = int(row[1].split('-')[1])
if month == this_month:
return row
monthly_attendance_report()

It is working for me. The problem will be probably in processing the csv file, because csv files have in most cases headers, which means that you can't split header text. So add slicer [1:] to your for loop and ignore first line with header:
for row in content[1:]:
And processing date by slicing is not good at all, too. Use datetime module or something like that.

Related

Extract rows from CSV based on column data

I have a report that is generated at the beginning of each month, in .csv format. Currently, the report contains a series of columns with assorted data; one of the columns is an 'add_date' field containing data in "YYYY-mm-dd HH:MM:SS" format.
My end goal is to parse this source CSV so that only rows containing 'add_date' cells with dates from the previous month remain. So for example, if the script were run on February 1st 2021, only the rows containing dates from January 2021 would remain in the output CSV file.
This is an example of the source CSV contents:
Name,Data1,add_date
jasmine,stuff ,2021-01-26 17:29:46
ariel,things,2021-01-26 17:48:04
ursula,foo,2016-11-02 19:32:09
belle,bar,2016-01-21 18:47:33
and this is the python script I have so far:
#!/usr/bin/env python3
import csv
filtered_rows = []
with open('test123.csv', newline='') as csvfile:
rowreader = csv.reader(csvfile, delimiter=',')
for row in rowreader:
if row["2021-01"] in csvfile.add_date:
filtered_rows.append(row)
print(filtered_rows)
which I call with the following command:
./testscript.py > testfile.csv
Currently, when I run the above command I am greeted with the following error message:
Traceback (most recent call last):
File "./testscript.py", line 9, in <module>
if row["2021-01"] in csvfile.add_date:
TypeError: list indices must be integers or slices, not str
My current Python version is Python 3.6.4, running in CentOS Linux release 7.6.1810 (Core).

If I undestood well, you can do something like this:
import pandas as pd
from datetime import datetime
df= pd.read_csv('test.csv',sep=',',header=0)
df['add_date']= pd.to_datetime(df['add_date'])
filtered=df[(df.add_date >= datetime.strptime('2021-01-01','%Y-%m-%d')) & (df.add_date <= datetime.strptime('2021-01-31','%Y-%m-%d')) ]

To do this properly you need to determine the previous month and year, then compare that to add_date field of each row. The year is important to handle December →
January (as well as the possibility of multi-year) transitions.
Here's what I mean.
import csv
import datetime
filename = 'test123.csv'
ADD_DATE_COL = 2
# Determine previous month and year.
first = datetime.date.today().replace(day=1)
last = first - datetime.timedelta(days=1)
previous_month, previous_year = last.month, last.year
# Extract rows for previous month.
filtered_rows = []
with open(filename, newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
next(reader) # Ignore header row.
for row in reader:
add_date = datetime.datetime.strptime(row[ADD_DATE_COL], '%Y-%m-%d %H:%M:%S')
if add_date.month == previous_month and add_date.year == previous_year:
filtered_rows.append(row)
print(filtered_rows)
I got the basic idea of how to determine the date of the previous month from #bgporter's answer to the question How to determine date of the previous month?.

How to arrange data week wise csv file in python

I have generated csv file which has formate as shown in below image:
In this image, I have data week wise but somewhere I couldn't arrange data week wise. If you look into the below image, you will see the red mark and blue mark. I want to separate this both marks. How I will do it?
Note: If Holiday on Friday then it should set a week from Monday to Thursday.
currently, I'm using below logic :
Image: Please click here to see image
current logic:
import csv
blank_fields=[' ']
fields=[' ','Weekly Avg']
# Read csv file
file1 = open('test.csv', 'rb')
reader = csv.reader(file1)
new_rows_list = []
# Read data row by row and store into new list
for row in reader:
new_rows_list.append(row)
if 'Friday' in row:
new_rows_list.append(fields)
file1.close()

overall you are going towards the right direction, your condition is just a little too error prone, things can get worse (e.g., just one day in a week appears in your list). So testing for the weekday string isn't the best choice here.
I would suggest "understanding" the date/time in your table to solve this using weekdays, like this:
from datetime import datetime as dt, timedelta as td
# remember the last weekday
last = None
# each item in list represents one week, while active_week is a temp-var
weeks = []
_cur_week = []
for row in reader:
# assuming the date is in row[1] (split, convert to int, pass as args)
_cur_date = dt(*map(int, row[1].split("-")))
# weekday will be in [0 .. 6]
# now if _cur_date.weekday <= last.weekday, a week is complete
# (also catching corner-case with more than 7 days, between two entries)
if last and (_cur_date.weekday <= last.weekday or (_cur_date - last) >= td(days=7)):
# append collected rows to 'weeks', empty _cur_week for next rows
weeks.append(_cur_week)
_cur_week = []
# regular invariant, append row and set 'last' to '_cur_date'
_cur_week.append(row)
last = _cur_date
Pretty verbose and extensive, but I hope I can transport the pattern used here:
parse existing date and use weekday to distinguish one week from another (i.e., weekday will increase monotonously, means any decrease (or equality) will tell you the current date represents the next week).
store rows in a temporary list during one week
append _cur_week into weeks once the condition for next week gets triggered
empty _cur_week for next rows i.e., week
Finally the last thing to do is to "concat" the data e.g. like this:
new_rows_list = [[fields] + week for week in weeks]

I have another logic for this same thing and it is successfully worked and easy solution.
import csv
import datetime
fields={'':' ', 'Date':'Weekly Avg'}
#Read csv file
file1 = open('temp.csv', 'rb')
reader = csv.DictReader(file1)
new_rows_list = []
last_date = None
# Read data row by row and store into new list
for row in reader:
cur_date = datetime.datetime.strptime(row['Date'], '%Y-%m-%d').date()
if last_date and ((last_date.weekday() > cur_date.weekday()) or (cur_date.weekday() == 0)):
new_rows_list.append(fields)
last_date = cur_date
new_rows_list.append(row)
file1.close()

python CSV writer - formatting

Quick question on how to properly write data back into a CSV file using the python csv module. Currently i'm importing a file, pulling a column of dates and making a column of days_of_the_week using the datetime module. I want to then write out a new csv file (or overright the individual one) containing one original element and the new element.
with open('new_dates.csv') as csvfile2:
readCSV2 = csv.reader(csvfile2, delimiter=',')
incoming = []
for row in readCSV2:
readin = row[0]
time = row[1]
year, month, day = (int(x) for x in readin.split('-'))
ans = datetime.date(year, month, day)
wkday = ans.strftime("%A")
incoming.append(wkday)
incoming.append(time)
with open('new_dates2.csv', 'w') as out_file:
out_file.write('\n'.join(incoming))
Input files looks like this:
2017-03-02,09:25
2017-03-01,06:45
2017-02-28,23:49
2017-02-28,19:34
When using this code I end up with an output file that looks like this:
Friday
15:23
Friday
14:41
Friday
13:54
Friday
7:13
What I need is an output file that looks like this:
Friday,15:23
Friday,14:41
Friday,13:54
Friday,7:13
If I change the delimiter in out_file.write to a comma I just get one element of data per column, like this:
Friday 15:23 Friday 14:41 Friday 13:54 ....
Any thoughts would be appreciated. Thanks!

Being somewhat unclear on what format you want, I've assumed you just want a single space between wkday and time. For a quick fix, instead of appending both wkday and time separately, as in your example, append them together:
...
incoming.append('{} {}'.format(wkday,time))
...
OR, build your incoming as a list of lists:
...
incoming.append([wkday,time])
...
and change your write to:
with open('new_dates2.csv', 'w') as out_file:
out_file.write('\n'.join([' '.join(t) for t in incoming]))

It seems you want Friday in column 0 and the time in column 1, so you need to change your incoming to a list of lists. That means the append statement should look like this:
...
incoming.append([wkday, time])
...
Then, it is better to use the csv.writer to write back to the file. You can write the whole incoming in one go without worrying about formatting.
with open('new_dates2.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(incoming)

Basically your incoming array is a linear list. So, you should have been doing is something like following:
#your incoming array
incoming = ['Friday', '15:23', 'Friday', '14:41', 'Friday', '13:54', 'Friday', '7:13']
#actual parsing of the array for correct output
for i,j in zip(incoming[::2], incoming[1::2]):
out_file.write(','.join((i,j)))
out_file.write('\n')

You don't really need the csv module for this. I'm guessing at the input, but from the description it looks like:
2017-03-02,09:25
2017-03-01,06:45
2017-02-28,23:49
2017-02-28,19:34
This will parse it and write it in a new format:
import datetime
with open('new_dates.csv') as f1, open('new_dates2.csv','w') as f2:
for line in f1:
dt = datetime.datetime.strptime(line.strip(),'%Y-%m-%d,%H:%M')
f2.write(dt.strftime('%A,%H:%M\n'))
Output file:
Thursday,09:25
Wednesday,06:45
Tuesday,23:49
Tuesday,19:34

Using python to print strings between csv values

My overarching goal is to write a Python script that transforms each row of a spreadsheet into a standalone markdown file, using each column as a value in the file's YAML header. Right now, the final for loop I've written not only keeps going and going and going… it also doesn't seem to place the values correctly.
import csv
f = open('data.tsv')
csv_f = csv.reader(f, dialect=csv.excel_tab)
date = []
title = []
for column in csv_f:
date.append(column[0])
title.append(column[1])
for year in date:
for citation in title:
print "---\ndate: %s\ntitle: %s\n---\n\n" % (year, citation)
I'm using tab-separated values because some of the fields in my spreadsheet are chunks of text with commas. So ideally, the script should output something like the following (I figured I'd tackle splitting this output into individual markdown files later. One thing at a time):
---
date: 2015
title: foo
---
---
date: 2016
title: bar
---
But instead I getting misplaced values and output that never ends. I'm obviously learning as I go along here, so any advice is appreciated.

import csv
with open('data.tsv', newline='') as f:
csv_f = csv.reader(f, dialect=csv.excel_tab)
for column in csv_f:
year, citation = column # column is a list, unpack them directly
print "---\ndate: %s\ntitle: %s\n---\n\n" % (year, citation)
This is all I can do without the sample CSV file.

Find datetime instances in multiple datetime groups python

I have two CSV files with timestamp data in str format.
the first CSV_1 has resampled data from a pandas timeseries, into 15 minute blocks and looks like:
time ave_speed
1/13/15 4:30 34.12318398
1/13/15 4:45 0.83396195
1/13/15 5:00 1.466816057
CSV_2 has regular times from gps points e.g.
id time lat lng
513620 1/13/15 4:31 -8.15949 118.26005
513667 1/13/15 4:36 -8.15215 118.25847
513668 1/13/15 5:01 -8.15211 118.25847
I'm trying to iterate through both files to find instances where time in CSV_2 is found within the 15 min time group in CSV_1 and then do something. In this case append ave_speed to every entry which this condition is true.
Desired result using the above examples:
id time lat lng ave_speed
513620 1/13/15 4:31 -8.15949 118.26005 0.83396195
513667 1/13/15 4:36 -8.15215 118.25847 0.83396195
513668 1/13/15 5:01 -8.15211 118.25847 something else
I tried doing it solely in pandas dataframes but ran into some troubles I thought this might be a workaround to achieve what i'm after.
This is the code i've written so far and I feel like it's close but I can't seem to nail the logic to get my for loop returning entries within the 15 min time group.
with open('path/CSV_2.csv', mode="rU") as infile:
with open('path/CSV_1.csv', mode="rU") as newinfile:
reader = csv.reader(infile)
nreader = csv.reader(newinfile)
next(nreader, None) # skip the headers
next(reader, None) # skip the headers
for row in nreader:
for dfrow in reader:
if (datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') and
datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') > datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') - datetime.timedelta(minutes=15)):
print dfrow[2]
Link to pandas question I posted with same problem Pandas, check if timestamp value exists in resampled 30 min time bin of datetimeindex
EDIT:
Creating two lists of time, i.e. listOne with all the times from CSV_1 and listTwo with all the times in CSV_2 I'm able to find instances in the time groups. So something is weird with using CSV values. Any help would be appreciated.

I feel like this is pretty close to what I want if anyone is curious on how to do the same thing. It's not massively efficient and the current script takes roughly 1 day to iterate over all the rows multiple times because of the double loop.
If anyone has any thoughts on how to make this easier or quicker i'd be very interested.
#OPEN THE CSV FILES
with open('/GPS_Timepoints.csv', mode="rU") as infile:
with open('/Resampled.csv', mode="rU") as newinfile:
reader = csv.reader(infile)
nreader = csv.reader(newinfile)
next(nreader, None) # skip the headers
next(reader, None) # skip the headers
#DICT COMPREHENSION TO GET ONLY THE DESIRED DATA FROM CSV
checkDates = {row[0] : row[7] for row in nreader }
x = checkDates.items()
# READ CSV INTO LIST (SEEMED TO BE EASIER THAN READING DIRECT FROM CSV FILE, I DON'T KNOW IF IT'S FASTER)
csvDates = []
for row in reader:
csvDates.append(row)
#LOOP 1 TO ITERATE OVER FULL RANGE OF DATES IN RESAMPLED DATA AND A PRINT STATEMENT TO GIVE ME HOPE THE PROGRAM IS RUNNING
for i in range(0,len(x)):
print 'checking', i
#TEST TO SEE IF THE TIME IS IN THE TIME RANGE, THEN IF TRUE INSERT THE DESIRED ATTRIBUTE, IN THIS CASE SPEED TO THE ROW
for row in csvDates:
if row[2] > x[i-1][0] and row[2] < x[i][0]:
row.insert(9,x[i][1])
# GET THE RESULT TO CSV TO UPLOAD INTO GIS
with open('/result.csv', mode="w") as outfile:
wr = csv.writer(outfile)
wr.writerow(['id','boat_id','time','state','lat','lng','activity','speed', 'state_reason'])
for row in csvDates:
wr.writerow(row)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract a month from date in csv file? - python

Related

Extract rows from CSV based on column data

How to arrange data week wise csv file in python

python CSV writer - formatting

Using python to print strings between csv values

Find datetime instances in multiple datetime groups python

Categories

Resources