I am new to python and my teacher has told me to create a code that can edit a csv file by each field depending on the value in it.
Here is a nested list to show the csv file which is split into lists by lines then by elements:
[["A","B","C","D"],["Yes",1,"05/11/2016","0"],["No","12","05/06/2017","1"],["Yes","6","08/09/2017","2"]]
What I am supposed to do is to make a loop which can be used to detect the postions of the elements within the inner list and then change the first element of each list to a "No" if it is a yes ,the 3rd element to today's date if the date stated is at least 6 months back and the last element to a 1 if it is more than 1 ,how am I supposed to do this?
Below is my code:
filename="Assignment_Data1.csv"
file=open(filepath+filename,"r")
reader=csv.reader(file,delimiter=",")
from datetime import datetime
six_months = str(datetime.date.today() - datetime.timedelta(6*365/12-1))
fm_six_months=str(datetime.datetime.strptime(six_months, '%Y-%m-%d').strftime('%d/%m/%Y'))
td=datetime.now()
deDate = str(td)[8:10] + "/"+ str(td)[5:7] + "/"+ str(td)[0:4]
import csv
for row in reader:
for field in row:
if row[2]<=fm_six_months or row[4]>50 or row[2]<10:
row[3]=deDate
row[4]=0
row[2]=100
Basically what I am trying to do is to replace the fields that have the above stated conditions with want I want through a loop ,is it possible?
You're on the right track, but your code has a couple issues:
1) Import statements.
Your import statements should be all at the top of your program. Currently, you use csv.reader in line 3, but haven't imported csv yet.
The way you're importing of the datetime module is inconsistent with most of the code. This is somewhat confusing, since the datetime module also has a datetime class. Given what you want to do, it would be easiest to change the import statement to import datetime and change line 8 to td=datetime.datetime.now() (now is a function of the datetime class).
2) Iterating over field and row is redundant. The construct you have, for row in reader: for field in row, will run your if statements additional times that are unnecessary.
3) Python is zero-indexed. This means that the first element is a list is accessed using row[0], not row[1]. In your case, the fourth column of your CSV would be accessed with row[3].
4) You're combining conditions. From the phrasing of the assignment, it sounds like each of the conditions (like "change the first element to a No if it is a yes") is supposed to be independent of the other. However, if row[2]<=fm_six_months or row[4]>50 or row[2]<10 means that you'll change the data if any condition is true. It sounds like you need three separate if blocks.
5) Your code has no writer. This is really the big one. Simply saying row[2] = 100 doesn't do anything lasting, as row is just an object in memory; changing row doesn't actually change the CSV file on your computer. To actually modify the csv, you'll need to write it back out to file, using a csv.writer.
Related
I am trying to check a column of an Excel file for values in a given format and, if there is a match, append it to a list. Here is my code:
from openpyxl import load_workbook
import re
#Open file and read column with PBSID.
PBSID = []
wb = load_workbook(filename="FILE_PATH", data_only=True)
sheet = wb.active
for col in sheet["E"]:
if re.search("\d{3}[-\.\s]??\d{5}", str(col)):
PBSID.append(col.value)
print(PBSID)
Column E of the Excel file contains IDs like 431-00456 that I would like to append to the list named PBSID.
Expected result: PBSID list to be populated with ID in regex mask XXX-XXXXX.
Actual result: Output is an empty list ("[]").
Am I missing something? (I know there are more elegant ways of doing this but I am relatively new to Python and very open to critism).
Thanks!
Semantically, I think the for loop should be written as:
for row in sheet["E"]:
As I'm guessing that sheet["E"] is simply referring to the column 'E' already.
Without seeing exact data that's in a cell, I think what's happening here is that python is interpreting your call to str() as follows:
It's performing a maths operation (in my example) '256 - 23690' before giving you the string of the answer, which is '-23434', and then looking for your regular expression in '-23434' for which it won't find any match (hence no results). Make sure the string is interpreted as a raw string.
You also appear to be referring to the whole row object in 'str(col)', and then referring separately to the row value in 'PBSID.append(col.value)'. It's best to refer to the same object, whichever is more suitable in your case.
I have a table (Tab delimited .txt file) in the following form:
each row is an entry;
first row are headers
the first 5 columns are simple numeric parameters
all column after the 7th column are supposed to be a list of values
My problem is how can I import and create a data frame where the last column contain a list of values?
-----Problem 1 ----
The header (first row) is "shorter", containing simply the name of some columns. All the values after the 7th do not have a header (because it is suppose to be a list). If I import the file as is, this appear to confuse the import functions
If, for example, I import as follow
df = pd.read_table( path , sep="\t")
the DataFrame created has only as many columns as the elements in the first row. Moreover, the data value assigned are mismatched.
---- Problem 2 -----
What is really confusing to me is that if I open the .txt in Excel and save it as Tab-delimited (without changing anything), I can then import it without problems, with headers too: columns with no header are simply given an “Unnamed XYZ” tag.
Why would saving in Excel change it? Using Note++ I can see only one difference: the original .txt is in "Unix (LF)" form, while the one saved in Excel is "Windows (CR LF)". Both are UTF-8, so I do not understand how this would be an issue?!?
Nevertheless, from here I could manipulate the data and try to gather all columns I wish and make them into a list. However, I hope that there is a more elegant and faster way to do it.
Here is a screen-shot of the .txt file
Thank you,
I don't know whether this is a very simple qustion, but I would like to do a condition statement based on two other columns.
I have two columns like: the age and the SES and the another empty column which should be based on these two columns. For example when one person is 65 years old and its corresponding socio-economic status is high, then in the third column(empty column=vitality class) a value of 1 is for example given. I have got an idea about what I want to achieve, however I have no idea how to implement that in python itself. I know I should use a for loop and I know how to write conditons, however due to the fact that I want to take two columns into consideration for determining what will be written in the empty column, I have no idea how to write that in a function
and furthermore how to write back into the same csv (in the respective empty column)
[]
Use the pandas module to import the csv as a DataFrame object. Then you can do logical statements to fill empty columns:
import pandas as pd
df = pd.read_csv('path_to_file.csv')
df.loc[(df['age']==65) & (df['SES']=='high'), 'vitality_class'] = 1
df.to_csv('path_to_new_file.csv', index=False)
I get an excel from someone and I need to read the data every month. The format is not stable each time, and by saying "not stable" I mean:
Where the data starts changes: e.g. Section A may start on row 4, column D this time, but next time it may start at row 2, column E.
Under each section there are tags. The number of tags may change as well. But every time I only need the data in tag_2 and tag_3 (these two will always show up)
The only data that I need is from tag_2, tag_3, for each month (month1 - month8). And I want to find a way using Python first locate the section name, then find tag_2, tag_3 under that section, then get the data for month1 to month8 (number of months may change as well).
Please note that I do NOT want to locate the data that I need by specifying locations in excel since the locations change every time. How to I do this?
The end product should be a pandas dataframe that has monthly data for tag_2, tag_3, with a column that says which section the data come from.
Thanks.
I think you can directly read it as a comma separated text file. Based on what you need you can look at the tag2 ant tag3 for each line.
with open(filename, "r") as fs:
for line in fs:
cell_list = line.split(",")
# This point you will have all elements on the line as a list
# you can check for the size and implement your logic
Assuming that the (presumably manually pasted) block of information is unlikely to end up in the very bottom-right corner of the excel sheet, you could simply iterate over rows and columns (set maximum values for each to prevent long searching times) until you find a familiar value (such as "Section A") and go from there.
Unless I misunderstood you, the rest of the format should consistent between the months so you can simply assume that "month_1" is always one cell up and two to the right of that initial spot.
I have not personally worked with excel sheets in python, so I cannot state whether the following is possible in python, but it definitely works in ExcelVBA:
You could just as well use the Range.find() method to find the value "Section A" and continue with the same process as above, perhaps writing any results to a txt file and calling your python script from there if neccessary.
I hope this helps a little.
G'day,
I posted this question, and had some excellent responses from #abarnert. I'm trying to remove particular rows from a CSV file. I've learned that CSV files won't allow particular rows to be deleted, so I'm going to rewrite the CSV whilst omitting the particular rows, then rename the new file as the old.
As per the above question in the link, I have tools being taken and returned from a toolbox. The CSV file I'm trying to rewrite is an ongoing 'log' of the tools currently checked out from the toolbox. Therefore, when a tool is returned, I need that tool to be removed from the log CSV file.
Here's what I have so far:
absent_past = frozenset(tuple(row) for row in csv.reader(open('Absent_Past.csv', 'r')))
absent_current = frozenset(tuple(row) for row in csv.reader(open('Absent_Current.csv', 'r')))
tools_returned = [",".join(row) for row in absent_past - absent_current]
with open('Log.csv') as f:
check = csv.reader(f)
for row in check:
if row[1] not in tools_returned:
csv.writer(open('Log_Current.csv', 'a+')).writerow(row)
os.remove('Log.csv')
os.rename('Log_Current.csv', 'Log.csv')
As you can (hopefully) see from above, it will open the Log.csv file, and if a tool has been returned (ie. the tool is listed in a row in tools_returned), it will not rewrite that entry into the new file. When all the non-returned tools have been written to the new file, the old file is deleted, with the new file being renamed as Log.csv from Log_Current.csv.
It's worth mentioning that the tools which have been taken are appended to Log_Current.csv before it is renamed. This part of the code works nicely :)
I've been instructed to avoid using CSV for this system, which I agree with. I would like to explore CSV operation under Python as much as I can at this point however, as I know it will come in handy in the future. I will be looking to use the contextlib and shelve functions in the future.
Thanks!
EDIT: In the code above, I have if row[1]...which I'm hoping means that it will only check the value of the first column in the row? Basically, the row will consist of something like Hammer, James, Taken, 09:15:25, but I only want to search the Log.csv file for Hammer, as the tools_returned consists of only rows of tool names, ie. Hammer, Drill, Saw etc. Is the row[1] approach correct for this?
At the moment, the Log_Current.csv file is writing the Log.csv files regardless of whether the tool has been replaced or not. As such, I'm thinking that the if row[1] etc part of the code isn't working.
I figured I'd answer my own question, as I've now figured it out. The code posted above is correct, except for one minor error. When referring to the column number in a row, the first column is column 0, not column 1. As I was searching column '1' for the tool name, it was never going to work, as column '1 is actually the second column, which is the name of the user.
Changing that line to if row[0] etc rewrites a new file with the current list of tools that are checked out, and omits any tools that have been replaced, as desired!