How do I access a CSV file on my computer in Jupyter notebook using os module?
I've tried the below code:
import os
file =
(r"C:\Users\...", "r")
text = file.read()
file.close
You can use it directly without os module:
with open(r"C:\Users\xyz\Desktop\test.csv","r") as f1:
lines = f1.readlines()
OR
f = open(r"C:\Users\xyz\Desktop\test.csv", "r")
You don't need to manually open and close the file, the keyword 'with' does that for you:
import csv
file = r'path\to\your.csv'
with open(file, 'r') as csvfile:
# do something
For more info please check the official documentation of the CSV module.
Related
Trying to convert multiple (5) CSVs to TSVs using python, but when I run this it only creates 1 TSV. Can anyone help?
import csv
import sys
import os
import pathlib
print ("Exercise1.csv"), sys.argv[0]
dirname = pathlib.Path('/Users/Amber/Documents')
for file in pathlib.Path().rglob('*.csv'):
with open(file,'r') as csvin, open('Exercise1.tsv', 'w') as tsvout:
csvin = csv.reader(csvin)
tsvout = csv.writer(tsvout, delimiter='\t')
for row in csvin:
print(row)
tsvout.writerow(row)
exit ()
Thanks!
You're opening each file in the .csv folder with your for loop, but only opening a single file to write to (Exercise1.tsv). So you're overwriting the same file each time. You need to make new files to write to in each iteration of the loop. You could try something like this:
for i,file in enumerate(pathlib.Path().rglob('*.csv')):
with open(file,'r') as csvin, open('Exercise_{}.tsv'.format(i), 'w') as tsvout:
csvin = csv.reader(csvin)
tsvout = csv.writer(tsvout, delimiter='\t')
enumerate() adds a counter to the for loop. This will append a number to your Exercise.tsv files from 0 to the length of the files in your directory.
I have developed a script which deletes all whitespaces at the end of the file.
import sys
with open("/Users/XXXXX/Desktop/XXXXX.txt") as infile:
lines = infile.read()
while lines.endswith("\n"):
lines = lines[:-2]
with open("/Users/XXXXX/Desktop/XXXXX.txt", 'w') as outfile:
for line in lines:
outfile.write(line)
The script works fine but I have two thousand small files in a folder where I need to delete all whitespaces.
Can someone guide me on how to change my script, so I can open each file in a folder and run the script above ?
thanks,
Try the following code :
import os
import sys
def removeNewLines(file):
with open(file , 'r') as infile:
lines = infile.read()
while lines.endswith("\n"):
lines = lines[:-2]
with open(file, 'w') as outfile:
for line in lines:
outfile.write(line)
all_files = os.listdir('FOLDER PATH')
for file in all_files:
removeNewLines(file)
This problem may be tricky.
I want to create a csv file from a list in Python. This csv file does not exist before. And then export it to some local directory. There is no such file in the local directory either. We just create a new csv file, and export (put) the csv file in some local directory.
I found that StringIO.StringIO can generate the csv file from a list in Python, then what are the next steps.
Thank you.
And I found the following code can do it:
import os
import os.path
import StringIO
import csv
dir = r"C:\Python27"
if not os.path.exists(dir):
os.mkdir(dir)
my_list=[[1,2,3],[4,5,6]]
with open(os.path.join(dir, "filename"+'.csv'), "w") as f:
csvfile=StringIO.StringIO()
csvwriter=csv.writer(csvfile)
for l in my_list:
csvwriter.writerow(l)
for a in csvfile.getvalue():
f.writelines(a)
Did you read the docs?
https://docs.python.org/2/library/csv.html
Lots of examples on that page of how to read / write CSV files.
One of them:
import csv
with open('some.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
import csv
with open('/path/to/location', 'wb') as f:
writer = csv.writer(f)
writer.writerows(youriterable)
https://docs.python.org/2/library/csv.html#examples
I modified the code based on the comments from experts in this thread. Now the script reads and writes all the individual files. The script reiterates, highlight and write the output. The current issue is, after highlighting the last instance of the search item, the script removes all the remaining contents after the last search instance in the output of each file.
Here is the modified code:
import os
import sys
import re
source = raw_input("Enter the source files path:")
listfiles = os.listdir(source)
for f in listfiles:
filepath = source+'\\'+f
infile = open(filepath, 'r+')
source_content = infile.read()
color = ('red')
regex = re.compile(r"(\b be \b)|(\b by \b)|(\b user \b)|(\bmay\b)|(\bmight\b)|(\bwill\b)|(\b's\b)|(\bdon't\b)|(\bdoesn't\b)|(\bwon't\b)|(\bsupport\b)|(\bcan't\b)|(\bkill\b)|(\betc\b)|(\b NA \b)|(\bfollow\b)|(\bhang\b)|(\bbelow\b)", re.I)
i = 0; output = ""
for m in regex.finditer(source_content):
output += "".join([source_content[i:m.start()],
"<strong><span style='color:%s'>" % color[0:],
source_content[m.start():m.end()],
"</span></strong>"])
i = m.end()
outfile = open(filepath, 'w+')
outfile.seek(0)
outfile.write(output)
print "\nProcess Completed!\n"
infile.close()
outfile.close()
raw_input()
The error message tells you what the error is:
No such file or directory: 'sample1.html'
Make sure the file exists. Or do a try statement to give it a default behavior.
The reason why you get that error is because the python script doesn't have any knowledge about where the files are located that you want to open.
You have to provide the file path to open it as I have done below. I have simply concatenated the source file path+'\\'+filename and saved the result in a variable named as filepath. Now simply use this variable to open a file in open().
import os
import sys
source = raw_input("Enter the source files path:")
listfiles = os.listdir(source)
for f in listfiles:
filepath = source+'\\'+f # This is the file path
infile = open(filepath, 'r')
Also there are couple of other problems with your code, if you want to open the file for both reading and writing then you have to use r+ mode. More over in case of Windows if you open a file using r+ mode then you may have to use file.seek() before file.write() to avoid an other issue. You can read the reason for using the file.seek() here.
I have the following code:
import re
#open the xml file for reading:
file = open('path/test.xml','r+')
#convert to string:
data = file.read()
file.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
file.close()
where I'd like to replace the old content that's in the file with the new content. However, when I execute my code, the file "test.xml" is appended, i.e. I have the old content follwed by the new "replaced" content. What can I do in order to delete the old stuff and only keep the new?
You need seek to the beginning of the file before writing and then use file.truncate() if you want to do inplace replace:
import re
myfile = "path/test.xml"
with open(myfile, "r+") as f:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
f.truncate()
The other way is to read the file then open it again with open(myfile, 'w'):
with open(myfile, "r") as f:
data = f.read()
with open(myfile, "w") as f:
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>", r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", data))
Neither truncate nor open(..., 'w') will change the inode number of the file (I tested twice, once with Ubuntu 12.04 NFS and once with ext4).
By the way, this is not really related to Python. The interpreter calls the corresponding low level API. The method truncate() works the same in the C programming language: See http://man7.org/linux/man-pages/man2/truncate.2.html
file='path/test.xml'
with open(file, 'w') as filetowrite:
filetowrite.write('new content')
Open the file in 'w' mode, you will be able to replace its current text save the file with new contents.
Using truncate(), the solution could be
import re
#open the xml file for reading:
with open('path/test.xml','r+') as f:
#convert to string:
data = f.read()
f.seek(0)
f.write(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>",data))
f.truncate()
import os#must import this library
if os.path.exists('TwitterDB.csv'):
os.remove('TwitterDB.csv') #this deletes the file
else:
print("The file does not exist")#add this to prevent errors
I had a similar problem, and instead of overwriting my existing file using the different 'modes', I just deleted the file before using it again, so that it would be as if I was appending to a new file on each run of my code.
See from How to Replace String in File works in a simple way and is an answer that works with replace
fin = open("data.txt", "rt")
fout = open("out.txt", "wt")
for line in fin:
fout.write(line.replace('pyton', 'python'))
fin.close()
fout.close()
in my case the following code did the trick
with open("output.json", "w+") as outfile: #using w+ mode to create file if it not exists. and overwrite the existing content
json.dump(result_plot, outfile)
Using python3 pathlib library:
import re
from pathlib import Path
import shutil
shutil.copy2("/tmp/test.xml", "/tmp/test.xml.bak") # create backup
filepath = Path("/tmp/test.xml")
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))
Similar method using different approach to backups:
from pathlib import Path
filepath = Path("/tmp/test.xml")
filepath.rename(filepath.with_suffix('.bak')) # different approach to backups
content = filepath.read_text()
filepath.write_text(re.sub(r"<string>ABC</string>(\s+)<string>(.*)</string>",r"<xyz>ABC</xyz>\1<xyz>\2</xyz>", content))