csv row and column fetch - python

So working on a program in Python 3.3.2. New to it all, but I've been getting through it. I have an app that I made that will take 5 inputs. 3 of those inputs are comboboxs, two are entry widgets. I have then created a button event that will save those 5 inputs into a text file, and a csv file. Opening each file everything looks proper. For example saved info would look like this:
Brad M.,Mike K.,Danny,Iconnoshper,Strong Wolf Lodge
I then followed a csv demo and copied this...
import csv
ifile = open('myTestfile.csv', "r")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print('%-15s: %s' % (header[colnum], col))
colnum += 1
rownum += 1
ifile.close()
and that ends up printing beautifully as:
rTech: Brad M.
pTech: Mike K.
cTech: Danny
proNam: ohhh
jobNam: Yeah
rTech: Damien
pTech: Aaron
so on and so on. What I'm trying to figure out is if I've named my headers via
if rownum == 0:
header = row
is there a way to pull a specific row / col combo and print what is held there??
I have figured out that I could after the program ran do
print(col)
or
print(col[0:10]
and I am able to print the last col printed, or the letters from the last printed col. But I can't go any farther back than that last printed col.
My ultimate goal is to be able to assign variables so I could in turn have a label in another program get it's information from the csv file.
rTech for job is???
look in Jobs csv at row 1, column 1, and return value for rTech
do I need to create a dictionary that is loaded with the information then call the dictionary?? Thanks for any guidance
Thanks for the direction. So been trying a few different things one of which Im really liking is the following...
import csv
labels = ['rTech', 'pTech', 'cTech', 'productionName', 'jobName']
fn = 'my file.csv'
cameraTech = 'Danny'
f = open(fn, 'r')
reader = csv.DictReader(f, labels)
jobInformation = [(item["productionName"],
item["jobName"],
item["pTech"],
item["rTech"]) for item in reader if \
item['cTech'] == cameraTech]
f.close()
print ("Camera Tech: %s\n" % (cameraTech))
print ("\n".join(["Production Name: %s \nJob Name: %s \nPrep Tech: %s \nRental Agent: %s\n" % (item) for item in jobInformation]))
That shows me that I could create a variable through cameraTech and as long as that matched what was loaded into the reader that holds the csv file and that if cTech column had a match for cameraTech then it would fill in the proper information. 95% there WOOOOOO..
So now what I'm curious about is calling each item. The plan is in a window I have a listbox that is populated with items from a .txt file with "productionName" and "jobName". When I click on one of those items in the listbox a new window opens up and the matching information from the .csv file is then filled into the appropriate labels.
Thoughts??? Thanks again :)

I think that reading the CSV file into a dictionary might be a working solution for your problem.
The Python CSV package has built-in support for reading CSV files into a Python dictionary using DictReader, have a look at the documentation here: http://docs.python.org/2/library/csv.html#csv.DictReader
Here is an (untested) example using DictReader that reads the CSV file into a Python dictionary and prints the contents of the first row:
import csv
csv_data = csv.DictReader(open("myTestfile.csv"))
print(csv_data[0])

Okay so I was able to put this together after seeing the following (https://gist.github.com/zstumgoren/911615)
That showed me how to give each header a variable I could call. From there I could then create a function that would allow for certain variables to be called and compared and if that matched I would be able to see certain data needed. So the example I made to show myself it could be done is as follows:
import csv
source_file = open('jobList.csv', 'r')
for line in csv.DictReader(source_file, delimiter=','):
pTech= line['pTech']
cTech= line['cTech']
rAgent= line['rTech']
prodName= line['productionName']
jobName= line['jobName']
if prodName == 'another':
print(pTech, cTech, rAgent, jobName)
However I just noticed something, while my .csv file has one line this works great!!!! But, creating my proper .csv file, I am only able to print information from the last line read. Grrrrr.... Getting closer though.... I'm still searching but if someone understands my issue, would love some light.

Related

Code won't print several lists but is showing no errors

I'm trying to make a program which allows a user to enter in a 5 digit product number and the program will search the included csv file by that number until it finds it, at which point it will print the corresponding name and price but not the number. In order to get to this point I decided to create a list with each row from the file in it and then print them for troubleshooting, none of them had issues individually printing their lists but when I tried to print all 5 at once it printed the first list then showed 4 empty brackets for the others. The assistant is showing no errors at all and I'm not sure how to fix it.
import csv
f = open('products.csv')
csv_f = csv.reader(f)
next(f)
pNumber = []
pName = []
pDescription = []
pCategory = []
pPrice = []
for row in csv_f:
pNumber.append(row[0])
for row in csv_f:
pName.append(row[1])
for row in csv_f:
pDescription.append(row[2])
for row in csv_f:
pCategory.append(row[3])
for row in csv_f:
pPrice.append(row[4])
print(pNumber)
print(pName)
print(pDescription)
print(pCategory)
print(pPrice)
The products csv file looks like this
Product #,Name,Description,Category,Price
38500,Backpacking Tent,"2-Person Backpacking Tent - 20D Ripstop Nylon",Outdoor,205.99
27840,Sit-Stand Desk,"Sit-Stand Compact Workstation Desk Converter, 37in",Household,139.99
37992,Mouse,"Dark Matter by Monoprice Rover Optical Gaming Mouse - 6200DPI",Office,19.99
24458,Subwoofer,"15in THX Ultra Certified 1000 Watt Powered Subwoofer",Audio,1280.07
38323,USB Cable,"USB 2.0 Type-C to Type-A Charge & Sync Kevlar-Reinforced Nylon-Braid Cable, 6ft, purple",Office,7.55
Your 2nd-5th lists are empty because the first loop read all of the data in the file; there's nothing left to read. If you want to iterate through the entire file again, you need to reset the cursor position in the file object.
f.seek(0)
is often the simplest way to do it.
Better yet, store all of your data fields within one loop, rather than reading the entire file for one column at a time.
Even better than that, simply read the file straight to a data frame.

Python getting exact cell from csv file

import csv
filename = str(input("Give the file name: "))
file = open(filename, "r")
with file as f:
size = sum(1 for _ in f)
print("File", filename, "has been read, and it has", size, "lines.", size - 1, "rows has been analyzed.")
I pretty much type the csv file path to analyze and do different things with it.
First question is: How can I print the exact cell from the CSV file? I have tried different methods, but I can't seem to get it working.
For example I want to print the info of those two cells
The other question is: Can I automate it to print the very first cell(1 A) and the very last row first cell (1099 A), without me needing to type the cell locations?
Thank you
Small portion of data
Example of the data:
Time Solar Carport Solar Fixed SolarFlatroof Solar Single
1.1.2016 317 1715 6548 2131
2.1.2016 6443 1223 1213 23121
3.1.2016 0 12213 0 122
You import csv at the very top but then decided not to use it. I wonder why – it seems just what you need here. So after a brief peek at the official documentation, I got this:
import csv
data = []
with open('../Downloads/htviope2016.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=';')
for row in spamreader:
data.append (row)
print("File has been read, and it has ", len(data), " lines.")
That is all you need to read in the entire file. You don't need to – for some operations, it is sufficient to process one line at a time – but with the full data loaded and ready in memory, you can play around with it.
print (f'First row length: {len(data[0])}')
The number of cells per row. Note that this first row contains the header, and you probably don't have any use for it. Let's ditch it.
print ('Discarding 1st row NOW. Please wait.')
data.pop(0)
Done. A plain pop() removes the last item but you can also use an index. Alternatively, you could use the more pythonic (because "slicing") data = data[1:] but I assume this could involve copying and moving around large amounts of data.
print ('First 10 rows are ...')
for i in range(10):
print ('\t'.join(data[i])+'(end)')
Look, there is data in memory! I pasted on the (end) because of the following:
print (f'First row, first cell contains "{data[0][0]}"')
print (f'First row, last cell contains "{data[0][-1]}"')
which shows
First row, first cell contains "2016-01-01 00:00:00"
First row, last cell contains ""
because each line ends with a ;. This empty 'cell' can trivially be removed during reading (ideally), or afterwards (as we still have it in memory):
data = [row[:-1] for row in data]
and then you get
First row, last cell contains "0"
and now you can use data[row][column] to address any cell that you want (in valid ranges only, of course).
Disclaimer: this is my very first look at the csv module. Some operations could possibly be done more efficiently. Practically all examples verbatim from the official documentation, which proves it's always worth taking a look there first.

Parsing a text file with line breaks in python

I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.

Speeding up Python file handling for a huge dataset

I have a large dataset stored as a 17GB csv file (fileData), which contains a variable number of records (up to approx 30,000) for each customer_id. I am trying to search for specific customers (listed in fileSelection - around 1500 out of a total of 90000) and copy the records for each of these customers into a seperate csv file (fileOutput).
I am very new to Python, but using it because vba and matlab (which i am more familiar with) can't handle the file size. (I am using Aptana studio to write the code, but running the python directly from the cmd line for speed. Running 64bit Windows 7.)
The code I have written is extracting some of the customers, but has two problems:
1) It is failing to find most of the customers in the large dataset. (I believe they are all in the dataset, but cannot be completely sure.)
2) It is VERY slow. Any way to speed the code would be appreciated, including code that can better utilise a 16 core PC.
here is the code:
`def main():
# Initialisation :
# - identify columns in slection file
#
fS = open (fileSelection,"r")
if fS.mode == "r":
header = fS.readline()
selheaderlist = header.split(",")
custkey = selheaderlist.index('CUSTOMER_KEY')
#
# Identify columns in dataset file
fileData = path2+file_data
fD = open (fileData,"r")
if fD.mode == "r":
header = fD.readline()
dataheaderlist = header.split(",")
custID = dataheaderlist.index('CUSTOMER_ID')
fD.close()
# For each customer in the selection file
customercount=1
for sr in fS:
# Find customer key and locate it in customer ID field in dataset
selrecord = sr.split(",")
requiredcustomer = selrecord[custkey]
#Look for required customer in dataset
found = 0
fD = open (fileData,"r")
if fD.mode == "r":
while found == 0:
dr = fD.readline()
if not dr: break
datrecord = dr.split(",")
if datrecord[custID] == requiredcustomer:
found = 1
# Open outputfile
fileOutput= path3+file_out_root + str(requiredcustomer)+ ".csv"
fO=open(fileOutput,"w+")
fO.write(str(header))
#copy all records for required customer number
while datrecord[custID] == requiredcustomer:
fO.write(str(dr))
dr = fD.readline()
datrecord = dr.split(",")
#Close Output file
fO.close()
if found == 1:
print ("Customer Count "+str(customercount)+ " Customer ID"+str(requiredcustomer)+" copied. ")
customercount = customercount+1
else:
print("Customer ID"+str(requiredcustomer)+" not found in dataset")
fL.write (str(requiredcustomer)+","+"NOT FOUND")
fD.close()
fS.close()
`
It has taken a few days to extract a couple of hundred customers, but has failed to find many more.
Sample Output
Thanks # Paul Cornelius. This is much more efficient. I have adopted your approach, also using the csv handling suggested by #Bernardo :
# Import Modules
import csv
def main():
# Initialisation :
fileSelection = path1+file_selection
fileData = path2+file_data
# Step through selection file and create dictionary with required ID's as keys, and empty objects
with open(fileSelection,'rb') as csvfile:
selected_IDs = csv.reader(csvfile)
ID_dict = {}
for row in selected_IDs:
ID_dict.update({row[1]:[]})
# step through data file: for selected customer ID's, append records to dictionary objects
with open(fileData,'rb') as csvfile:
dataset = csv.reader(csvfile)
for row in dataset:
if row[0] in ID_dict:
ID_dict[row[0]].extend([row[1]+','+row[4]])
# write all dictionary objects to csv files
for row in ID_dict.keys():
fileOutput = path3+file_out_root+row+'.csv'
with open(fileOutput,'wb') as csvfile:
output = csv.writer(csvfile, delimiter='\n')
output.writerows([ID_dict[row]])
Use the csv reader instead. Python has a good library to handle CSV files so that it is not necessary for you to do splits.
Check out the documentation: https://docs.python.org/2/library/csv.html
>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
... for row in spamreader:
... print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam
It should perform much better.
The task is way too involved for a simple answer. But your approach is very inefficient because you have too many nested loops. Try making ONE pass through the list of customers, and for each build a "customer" object with any information that you need to use later. You put these in a dictionary; the keys are the different requiredcustomer variables and the values are the customer objects. If I were you, I would get this part to work first, before ever fooling around with the big file.
Now you step ONCE through the massive file of customer data, and each time you encounter a record whose datarecord[custID] field is in the dictionary, you append a line to the output file. You can use the relatively efficient in operator to test for membership in the dictionary.
No nested loops are necessary.
The code as you present it can't run since you write to some object named fL without ever opening it. Also, as Tim Pietzcker pointed out, you aren't closing your files since you don't actually call the close function.
Try using pandas if your machine can handle the size of the csv in memory.
If you are looking for out of core computation - take a look at dask (they provide similar APIs)
In pandas, you can read only specific columns from a csv file, if you run into memory problems.
Anyways - both pandas and dask use C bindings which are significantly faster than pure python.
In pandas, your code would look something like:
import pandas as pd
input_csv = pd.read_csv('path_to_csv')
records_for_interesting customers = input_csv[input_csv.fileSelection.isin([list_of_ids])]
records_for_interesting customers.to_csv('output_path')

Edit a spreadsheet with Python(passing by CSV file)

I've a spreadsheet with the follow fields:
id age smoker do sport
1 35 yes rare
2 40 no frequently
3 20 no never
4 .. .. ..
I'd like to create a Python script that edit this spreadsheet passing by csv file conversion.
"yes" become 1 , "no" become 0,"rare" become 0, "frequently" become 1 and "never" become 2.
I've saved a spreadsheet as a csv file, using delimiter as ';' and quotechar ' " '.
Now I've write this code:
import csv
filecsv=open("file.csv","r")
reader=csv.reader(filecsv, delimiter= ';' , quotechar=' " ')
out=open("outfile.csv","w")
output=csv.writer(out, delimiter= ';' , quotechar=' " ')
for row in reader:
for field in row:
if row[field]=='yes':
.
.
.
.
But I don't know how to continue....
Could someone tell me how use python to make these changes?
Is it better using a Python list or dictionary?
Thank's to everybody!
Even though CSV files look like spread sheets, at their core they are simply text files. This means you don't actually need to use the csv library but instead read it as a simple string.
Once you have the file as a string you can use regular expressions to convert the relevant values. Here's an example:
import re
o = open("output","w")
data = open("file").read()
o.write( re.sub("someword","newword",data) )
o.close()
Remember, you will need one re.sub() call for each value you wish to convert.
Seeing how you already know about Python's csv library, it should be trivial to, for each row of the input csv, create a new row with the changes you require, and write it out to a new csv file.
Notice how the csv reader treats each row as a list. Next, look csv writer's writerow() method; it takes a python list and writes it as a csv row. All you need to do is read one row at a time, make the changes you want and spit it back out to the writer. Using your code:
for row in reader: #for each row in the input
outrow = list(row) # make a copy of the row. I'm not sure if you NEED to do this, but it doesn't hurt.
if outrow[2] == "yes": #if the value in the 3rd column, "smoker", is "yes"
outrow[2] = 1 #change it to 1
elif outrow[2] == "no": #if it's "no"
outrow[2] = 0 #change it to 0.
#repeat this process for outrow[3] (meaning column #4, "do sport")
output.writerow(outrow)
You probably noticed that python calls the 3rd column 2 and the 4th column 3 This is because python counts starting at 0 (so the 1st column is column 0). You should be able to follow this example to make all the changes that you need.
Don't forget to close your files when you're finished!
if you will always have that format and you want to replace line by line:
replacements_dict = {
'yes': 1,
'no' : 0,
'rare': 0,
'frequently': 1,
'never': 2
}
for row_list in reader:
output.writerow([
row_list[0],
row_list[1]
replacements_dict[row_list[2]],
replacements_dict[row_list[3]]
])
you could also read your csv into memory as a string and just replace the words like georgesl suggest

Categories

Resources