I've a spreadsheet with the follow fields:
id age smoker do sport
1 35 yes rare
2 40 no frequently
3 20 no never
4 .. .. ..
I'd like to create a Python script that edit this spreadsheet passing by csv file conversion.
"yes" become 1 , "no" become 0,"rare" become 0, "frequently" become 1 and "never" become 2.
I've saved a spreadsheet as a csv file, using delimiter as ';' and quotechar ' " '.
Now I've write this code:
import csv
filecsv=open("file.csv","r")
reader=csv.reader(filecsv, delimiter= ';' , quotechar=' " ')
out=open("outfile.csv","w")
output=csv.writer(out, delimiter= ';' , quotechar=' " ')
for row in reader:
for field in row:
if row[field]=='yes':
.
.
.
.
But I don't know how to continue....
Could someone tell me how use python to make these changes?
Is it better using a Python list or dictionary?
Thank's to everybody!
Even though CSV files look like spread sheets, at their core they are simply text files. This means you don't actually need to use the csv library but instead read it as a simple string.
Once you have the file as a string you can use regular expressions to convert the relevant values. Here's an example:
import re
o = open("output","w")
data = open("file").read()
o.write( re.sub("someword","newword",data) )
o.close()
Remember, you will need one re.sub() call for each value you wish to convert.
Seeing how you already know about Python's csv library, it should be trivial to, for each row of the input csv, create a new row with the changes you require, and write it out to a new csv file.
Notice how the csv reader treats each row as a list. Next, look csv writer's writerow() method; it takes a python list and writes it as a csv row. All you need to do is read one row at a time, make the changes you want and spit it back out to the writer. Using your code:
for row in reader: #for each row in the input
outrow = list(row) # make a copy of the row. I'm not sure if you NEED to do this, but it doesn't hurt.
if outrow[2] == "yes": #if the value in the 3rd column, "smoker", is "yes"
outrow[2] = 1 #change it to 1
elif outrow[2] == "no": #if it's "no"
outrow[2] = 0 #change it to 0.
#repeat this process for outrow[3] (meaning column #4, "do sport")
output.writerow(outrow)
You probably noticed that python calls the 3rd column 2 and the 4th column 3 This is because python counts starting at 0 (so the 1st column is column 0). You should be able to follow this example to make all the changes that you need.
Don't forget to close your files when you're finished!
if you will always have that format and you want to replace line by line:
replacements_dict = {
'yes': 1,
'no' : 0,
'rare': 0,
'frequently': 1,
'never': 2
}
for row_list in reader:
output.writerow([
row_list[0],
row_list[1]
replacements_dict[row_list[2]],
replacements_dict[row_list[3]]
])
you could also read your csv into memory as a string and just replace the words like georgesl suggest
Related
import csv
filename = str(input("Give the file name: "))
file = open(filename, "r")
with file as f:
size = sum(1 for _ in f)
print("File", filename, "has been read, and it has", size, "lines.", size - 1, "rows has been analyzed.")
I pretty much type the csv file path to analyze and do different things with it.
First question is: How can I print the exact cell from the CSV file? I have tried different methods, but I can't seem to get it working.
For example I want to print the info of those two cells
The other question is: Can I automate it to print the very first cell(1 A) and the very last row first cell (1099 A), without me needing to type the cell locations?
Thank you
Small portion of data
Example of the data:
Time Solar Carport Solar Fixed SolarFlatroof Solar Single
1.1.2016 317 1715 6548 2131
2.1.2016 6443 1223 1213 23121
3.1.2016 0 12213 0 122
You import csv at the very top but then decided not to use it. I wonder why – it seems just what you need here. So after a brief peek at the official documentation, I got this:
import csv
data = []
with open('../Downloads/htviope2016.csv') as csvfile:
spamreader = csv.reader(csvfile, delimiter=';')
for row in spamreader:
data.append (row)
print("File has been read, and it has ", len(data), " lines.")
That is all you need to read in the entire file. You don't need to – for some operations, it is sufficient to process one line at a time – but with the full data loaded and ready in memory, you can play around with it.
print (f'First row length: {len(data[0])}')
The number of cells per row. Note that this first row contains the header, and you probably don't have any use for it. Let's ditch it.
print ('Discarding 1st row NOW. Please wait.')
data.pop(0)
Done. A plain pop() removes the last item but you can also use an index. Alternatively, you could use the more pythonic (because "slicing") data = data[1:] but I assume this could involve copying and moving around large amounts of data.
print ('First 10 rows are ...')
for i in range(10):
print ('\t'.join(data[i])+'(end)')
Look, there is data in memory! I pasted on the (end) because of the following:
print (f'First row, first cell contains "{data[0][0]}"')
print (f'First row, last cell contains "{data[0][-1]}"')
which shows
First row, first cell contains "2016-01-01 00:00:00"
First row, last cell contains ""
because each line ends with a ;. This empty 'cell' can trivially be removed during reading (ideally), or afterwards (as we still have it in memory):
data = [row[:-1] for row in data]
and then you get
First row, last cell contains "0"
and now you can use data[row][column] to address any cell that you want (in valid ranges only, of course).
Disclaimer: this is my very first look at the csv module. Some operations could possibly be done more efficiently. Practically all examples verbatim from the official documentation, which proves it's always worth taking a look there first.
I have a text file with about 20 entries. They look like this:
~
England
Link: http://imgur.com/foobar.jpg
Capital: London
~
Iceland
Link: http://imgur.com/foobar2.jpg
Capital: Reykjavik
...
etc.
I would like to take these entries and turn them into a CSV.
There is a '~' separating each entry. I'm scratching my head trying to figure out how to go thru line by line and create the CSV values for each country. Can anyone give me a clue on how to go about this?
Use the libraries luke :)
I'm assuming your data is well formatted. Most real world data isn't that way. So, here goes a solution.
>>> content.split('~')
['\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n', '\nEngland\nLink: http://imgur.com/foobar.jpg\nCapital: London\n', '\nIceland\nLink: http://imgur.com/foobar2.jpg\nCapital: Reykjavik\n']
For writing the CSV, Python has standard library functions.
>>> import csv
>>> csvfile = open('foo.csv', 'wb')
>>> fieldnames = ['Country', 'Link', 'Capital']
>>> writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
>>> for entry in entries:
... cols = entry.strip().splitlines()
... writer.writerow({'Country': cols[0], 'Link':cols[1].split(': ')[1], 'Capital':cols[2].split(':')[1]})
...
If your data is more semi structured or badly formatted, consider using a library like PyParsing.
Edit:
Second column contains URLs, so we need to handle the splits well.
>>> cols[1]
'Link: http://imgur.com/foobar2.jpg'
>>> cols[1].split(':')[1]
' http'
>>> cols[1].split(': ')[1]
'http://imgur.com/foobar2.jpg'
The way that I would do that would be to use the open() function using the syntax of:
f = open('NameOfFile.extensionType', 'a+')
Where "a+" is append mode. The file will not be overwritten and new data can be appended. You could also use "r+" to open the file in read mode, but would lose the ability to edit. The "+" after a letter signifies that if the document does not exist, it will be created. The "a+" I've never found to work without the "+".
After that I would use a for loop like this:
data = []
tmp = []
for line in f:
line.strip() #Removes formatting marks made by python
if line == '~':
data.append(tmp)
tmp = []
continue
else:
tmp.append(line)
Now you have all of the data stored in a list, but you could also reformat it as a class object using a slightly different algorithm.
I have never edited CSV files using python, but I believe you can use a loop like this to add the data:
f2 = open('CSVfileName.csv', 'w') #Can change "w" for other needs i.e "a+"
for entry in data:
for subentry in entry:
f2.write(str(subentry) + '\n') #Use '\n' to create a new line
From my knowledge of CSV that loop would create a single column of all of the data. At the end remember to close the files in order to save the changes:
f.close()
f2.close()
You could combine the two loops into one in order to save space, but for the sake of explanation I have not.
I am banging my head agaisnt a wall trying to figure out something that is simple.
Basically I have a .CSV with names in and test scores e.g.
Brad 4, 5, 7, 7
Dan 3, 6, 2, 7
What I want to do is write code that first all of prints out the tests scores. This bit works fine.
The aspect that I can not get to work is the part were the program reads the names, in the CSV. If the name is present it will append the CSV with the new score at the start. So insert the new value at array position 1.
If the name is not present in the CSV it will add name and then again insert the value at array position 1.
Here is the code that does not work currently, I don't believe it to be complicated however I must be thinking about it wrong.
import csv
def names():
global fn
fn = input("please enter first name \n").title()
namecheck = False
while namecheck == False:
nc = input("you have entered " + fn + " are you sure \n 1) Yes \n 2) No")
if nc == "1":
quiz()
namecheck = True
if nc =="2":
names()
def quiz():
option = input("do you want to print or append? \n 1) Print 2) Append")
if option =="1":
f = open('namelist.csv', 'r')
a = f.read()
print(a)
if option =="2":
score = input("please enter score")
score = int(score)
with open('namelist.csv', 'rt') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
for field in row:
if field == fn:
XXXXXXXX <--- this is were I think I am going wrong.
names()
when you declare a def, you have to indent the entire section.
def quiz():
[-->INDENT HERE!] option = input("do you want to print or append? \n 1) Print 2) Append")
[-->INDENT HERE!] rest of def..
#back to main statements..
(Of course I don't mean for you to type "[-->INDENT HERE!]" literally)
Do you have to literally write to the file every time someone adds a new score or a new name? If writing to file at the end of program is an option, I'd suggest collecting all info ahead of time and then writing the data in bulk toward the end..that way you could maintain everything in memory and do the I/O with file in one go, sparing you a lot of file operations and giving you the convenience of working with python data structures such as dicts/sets etc....
If that's not an option...
Looking up names by reading the file is really inefficient as you are reading the entire file just to see if the name is present.
I'd recommend using a dict to store the list of names you've already entered. This way checking the dict to see if the name is present or not is much more efficient...You could try storing the dict with key=name and value=line number on which you entered the name while you wrote to csv.
That way if you found the name in dict, you can go to the particular line and then append your data like this:
Start reading and writing on specific line on CSV with Python
To insert an element in the first position (i am assuming this is after the name), you could do something like:
l = line.split(",")
#assumes first element is the name of the person
l.insert(1, new_score) # 1 is the position you want to insert the data
#then insert this data into a_NEW_CSV file, as writing to same CSV at same line is difficult...
See this for more details: Replace data in csv file using python
I have a very large tsv file (1.2GB, 5 columns, 38m lines). I want to delete a column, add a column of ID's (1 to 38m), and rearrange the column order. How can I do this without using a ridiculous amount of memory?
Language of choice is Python, though open to other solutions.
You can read, manipulate, and write one row at a time. Not loading the entire file to memory, this will have a very low memory signature.
import csv
with open(fileinpath, 'rb') as fin, open(fileoutpath, 'wb') as fout:
freader = csv.reader(fin, delimiter = '\t')
fwriter = csv.writer(fout, delimiter = '\t')
idx = 1
for line in freader:
line[4], line[0] = line[0], line[4] #switches position between first and last column
del line[3] #delete fourth column
line.insert(0, idx)
fwriter.writerow(line)
idx += 1
(This is written in python2.7, and deletes the fourth column for an example)
Regarding rearranging the order - I assume it's the order of columns - this could be done in the manipulation part. There's an example of switching the order of the first and last column.
you can use awk to do this, i will not say 1.2GB will take huge amount of memory.
if you want to delete c3
awk -F"\t" 'BEGIN{OFS="\t"}{print $1,$2,$4,$5,NR}' input.txt > output.txt
the raw output is
c1 c2 c4 c5 columnId(1 to 38m)
$1 is coloumn1, $2 is column2, and so on. NR is the number of line.
if you want to rearrange, just change the order of $1,$2,$4,$5 and NR,
The answer depends enormously on how much context is needed need to rewrite the lines and to determine the new ordering.
If it's possible to rewrite the individual lines without regard to context (depends on how the ID number is derived), then you can use the csv module to read the file line-by-line as #Tal Kremerman illustrates, and write it out line-by-line in the same order. If you can determine the correct ordering of the lines at this time, then you can add an extra field indicating the new order they should appear in.
Then you can do a second pass to sort/rearrange the lines into the correct order. There are many recent threads on "how to sort huge files with Python", e.g. How to sort huge files with Python? I think Tal Kremerman is right that the OP only wants to rearrange columns, and not rows
So working on a program in Python 3.3.2. New to it all, but I've been getting through it. I have an app that I made that will take 5 inputs. 3 of those inputs are comboboxs, two are entry widgets. I have then created a button event that will save those 5 inputs into a text file, and a csv file. Opening each file everything looks proper. For example saved info would look like this:
Brad M.,Mike K.,Danny,Iconnoshper,Strong Wolf Lodge
I then followed a csv demo and copied this...
import csv
ifile = open('myTestfile.csv', "r")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print('%-15s: %s' % (header[colnum], col))
colnum += 1
rownum += 1
ifile.close()
and that ends up printing beautifully as:
rTech: Brad M.
pTech: Mike K.
cTech: Danny
proNam: ohhh
jobNam: Yeah
rTech: Damien
pTech: Aaron
so on and so on. What I'm trying to figure out is if I've named my headers via
if rownum == 0:
header = row
is there a way to pull a specific row / col combo and print what is held there??
I have figured out that I could after the program ran do
print(col)
or
print(col[0:10]
and I am able to print the last col printed, or the letters from the last printed col. But I can't go any farther back than that last printed col.
My ultimate goal is to be able to assign variables so I could in turn have a label in another program get it's information from the csv file.
rTech for job is???
look in Jobs csv at row 1, column 1, and return value for rTech
do I need to create a dictionary that is loaded with the information then call the dictionary?? Thanks for any guidance
Thanks for the direction. So been trying a few different things one of which Im really liking is the following...
import csv
labels = ['rTech', 'pTech', 'cTech', 'productionName', 'jobName']
fn = 'my file.csv'
cameraTech = 'Danny'
f = open(fn, 'r')
reader = csv.DictReader(f, labels)
jobInformation = [(item["productionName"],
item["jobName"],
item["pTech"],
item["rTech"]) for item in reader if \
item['cTech'] == cameraTech]
f.close()
print ("Camera Tech: %s\n" % (cameraTech))
print ("\n".join(["Production Name: %s \nJob Name: %s \nPrep Tech: %s \nRental Agent: %s\n" % (item) for item in jobInformation]))
That shows me that I could create a variable through cameraTech and as long as that matched what was loaded into the reader that holds the csv file and that if cTech column had a match for cameraTech then it would fill in the proper information. 95% there WOOOOOO..
So now what I'm curious about is calling each item. The plan is in a window I have a listbox that is populated with items from a .txt file with "productionName" and "jobName". When I click on one of those items in the listbox a new window opens up and the matching information from the .csv file is then filled into the appropriate labels.
Thoughts??? Thanks again :)
I think that reading the CSV file into a dictionary might be a working solution for your problem.
The Python CSV package has built-in support for reading CSV files into a Python dictionary using DictReader, have a look at the documentation here: http://docs.python.org/2/library/csv.html#csv.DictReader
Here is an (untested) example using DictReader that reads the CSV file into a Python dictionary and prints the contents of the first row:
import csv
csv_data = csv.DictReader(open("myTestfile.csv"))
print(csv_data[0])
Okay so I was able to put this together after seeing the following (https://gist.github.com/zstumgoren/911615)
That showed me how to give each header a variable I could call. From there I could then create a function that would allow for certain variables to be called and compared and if that matched I would be able to see certain data needed. So the example I made to show myself it could be done is as follows:
import csv
source_file = open('jobList.csv', 'r')
for line in csv.DictReader(source_file, delimiter=','):
pTech= line['pTech']
cTech= line['cTech']
rAgent= line['rTech']
prodName= line['productionName']
jobName= line['jobName']
if prodName == 'another':
print(pTech, cTech, rAgent, jobName)
However I just noticed something, while my .csv file has one line this works great!!!! But, creating my proper .csv file, I am only able to print information from the last line read. Grrrrr.... Getting closer though.... I'm still searching but if someone understands my issue, would love some light.