Reading and writing a csv file Python - python

I just started learning Python, and I am trying to do the following:
- Read a .csv file
- Write the filtered data in a new file where the column 7 is not blank/empty
When I am printing my results, it shows the right output in the python shelf, but when I am checking my data in the .csv is no correct (differs from what is showing with the print function)
Any suggestion with my code?
Thank you in advance.
file = open("station.csv", "r")
writeFile = open("stations-filtered.csv", "w")
for line in file:
line2 = line.split(",")
if line2[7] != "":
print(line)
writeFile.write(line)

I agree with #user513093 that you can use csv, like:
file = open("station.csv", "r")
writeFile = open("stations-filtered.csv", "w")
writer = csv.writer(writeFile, delimiter=',')
for line in file:
line2 = line.split(",")
if line2[7] != "":
print(line)
writer.writerow(line)
But still, pandas is good:
import pandas as pd
file = pd.read_csv("station.csv", sep=",", header=None)
file = file[file[7] != ""]
file.to_csv("stations-filtered.csv")

Related

ValueError: I/O operation on closed file after opening file

I have this code:
import os
csv_out = 'femaleconsolidated.csv'
csv_list = [r'C:\Users\PycharmProjects\filemerger\Female\outputA.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputB.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputC.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputD.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputE.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputother.csv']
print(csv_list)
csv_merge = open(csv_out, 'w')
for file in csv_list:
csv_in = open(file)
for line in csv_in:
csv_merge.write(line)
csv_in.close()
csv_merge.close()
print('Verify consolidated CSV file : ' + csv_out)
The code is to merge CSVs.
Surely open(file) should open the file but I get this:
csv_merge.write(line)
ValueError: I/O operation on closed file.
What could be causing this?
csv_merge.close() this should sit outside of the for loop - since you are still writing to csv_merge in next iteration :
for file in csv_list:
csv_in = open(file)
for line in csv_in:
csv_merge.write(line)
csv_in.close()
csv_merge.close()
Use pandas:
import pandas as pd
import glob
dfs = glob.glob('path/*.csv') #path to all of your csv
result = pd.concat([pd.read_csv(df) for df in dfs], ignore_index=True)
result.to_csv('path/femaleconsolidated.csv', ignore_index=True)
If you want to writer line by line you should try this. You dont need to worry for closing the file when using this package - csv
import csv
destination = open(csv_out , 'w')
csvwriter = csv.writer(destination)
...
for line in csv_in:
csvwriter.writerow(line)
If you just want to merge all files in a single one, there are more efficient ways to do it, not line by line. You can check this one
https://www.freecodecamp.org/news/how-to-combine-multiple-csv-files-with-8-lines-of-code-265183e0854/
Your for statement should be inside the with block:
with open(csv_out, 'w') as csv_merge:
for file in csv_list:
csv_in = open(file)
for line in csv_in:
csv_merge.write(line)
csv_in.close()
csv_merge.close()
print('Verify consolidated CSV file : ' + csv_out)

read from line to line yelp dataset by python

I want to change this code to specifically read from line 1400001 to 1450000. What is modification?
file is composed of a single object type, one JSON-object per-line.
I want also to save the output to .csv file. what should I do?
revu=[]
with open("review.json", 'r',encoding="utf8") as f:
for line in f:
revu = json.loads(line[1400001:1450000)
If it is JSON per line:
revu=[]
with open("review.json", 'r',encoding="utf8") as f:
# expensive statement, depending on your filesize this might
# let you run out of memory
revu = [json.loads(s) for s in f.readlines()[1400001:1450000]]
if you do it on the /etc/passwd file it is easy to test (no json of course, so that is left out)
revu = []
with open("/etc/passwd", 'r') as f:
# expensive statement
revu = [s for s in f.readlines()[5:10]]
print(revu) # gives entry 5 to 10
Or you iterate over all lines, saving you from memory issues:
revu = []
with open("...", 'r') as f:
for i, line in enumerate(f):
if i >= 1400001 and i <= 1450000:
revu.append(json.loads(line))
# process revu
To CSV ...
import pandas as pd
import json
def mylines(filename, _from, _to):
with open(filename, encoding="utf8") as f:
for i, line in enumerate(f):
if i >= _from and i <= _to:
yield json.loads(line)
df = pd.DataFrame([r for r in mylines("review.json", 1400001, 1450000)])
df.to_csv("/tmp/whatever.csv")

CSV file to list of lines?

I have a txt file and I want to save each line as a list in a new file with fname as the new file name. But the output is not being saved. What am I missing?
import csv
with open('file.txt', 'rU') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
i = 1
for line in reader:
fname = line[0] + line[1]
#print fname
with open(fname, 'w') as out:
out.write(line)
i +=1
To do what you want, you need to fix two things, one is to open the output files in "append" mode so their previous contents aren't wiped-out everytime something additional is written to them.
Secondly you need some way to know the raw data from the file for each csv row it reads. This can be difficult when you use an extension like the csv module and don't know the internals (which you shouldn't use anyway).
To work around that in this case, you can pass a custom csvfile argument to the csv.reader that will give you the information needed. Basically a small preprocessor of the data being read. Here's what I mean:
import csv
def pre_reader(file):
"""Generator that remembers last line read."""
for line in file:
pre_reader.lastline = line
yield line
with open('test_gen.csv', 'rU') as csvfile:
reader = csv.reader(pre_reader(csvfile), delimiter='\t')
i = 1
for line in reader:
fname = line[0] + line[1]
#print fname
with open(fname, 'a') as out:
out.write(pre_reader.lastline)
i +=1
Change:
with open(fname, 'w') as out:
out.write(line)
To:
with open(fname, 'a') as out:
out.write(line)
w Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
a Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
Better way:
import csv
with open('file.txt', 'rU') as csvfile, open(fname, 'w') as out:
reader = csv.reader(csvfile, delimiter='\t')
i = 1
for line in reader:
fname = line[0] + line[1]
out.write(line)
You cannot write a list so change penultimate line to **out.write(str(line))**
import csv
with open('file.txt', 'rU') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
i = 1
for line in reader:
fname = line[0] + line[1]
#print fname
with open(fname, 'w') as out:
------> out.write(str(line))
i +=1

convert the following json to csv using python

{"a":"1","b":"1","c":"1"}
{"a":"2","b":"2","c":"2"}
{"a":"3","b":"3","c":"3"}
{"a":"4","b":"4","c":"4"}
I have tried the following code but it gives error:-
from nltk.twitter import Twitter
from nltk.twitter.util import json2csv
with open('C:/Users/Archit/Desktop/raw_tweets.json', 'r') as infile:
# Variable for building our JSON block
json_block = []
for line in infile:
# Add the line to our JSON block
json_block.append(line)
# Check whether we closed our JSON block
if line.startswith('{'):
# Do something with the JSON dictionary
json2csv(json_block, 'tweets.csv', ['id','text','created_at','in_reply_to_user_id','in_reply_to_screen_name','in_reply_to_status_id','user.id','user.screen_name','user.name','user.location','user.friends_count','user.followers_count','source'])
# Start a new block
json_block = []
Error:
File "C:\Python34\lib\json\decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 1 (char 0)
import csv, json
data = []
with open('C:\Users\Shahriar\Desktop\T.txt') as data_file:
for line in data_file:
data.append(json.loads(line))
keys = data[0].keys()
with open('data.csv', 'wb') as csvF:
csvWriter = csv.DictWriter(csvF, fieldnames=keys)
csvWriter.writeheader()
for d in data:
csvWriter.writerow(d)
Output:
a,c,b
1,1,1
2,2,2
3,3,3
4,4,4
This is way too late but I also stumbled upon some errors today. I figured that you actually have to import from nltk.twitter.common instead of util. Hope this helps others who stumbled upon this thread
# Read json
filename = 'C:/Users/Archit/Desktop/raw_tweets.json'
lines = [line.replace("{", "").replace("}", "").replace(":", ",") for line in open(filename)]
# Write csv
with open('out.csv', 'w') as csv_file:
for line in lines:
csv_file.write("%s\n" % line)

python text reading

datafile = open("temp.txt", "r")
record = datafile.readline()
while record != '':
d1 = datafile.strip("\n").split(",")
print d1[0],float (d1[1])
record = datafile.readline()
datafile.close()
The temp file contains
a,12.7
b,13.7
c,18.12
I can't get output. Please help.
The correct code should be:
with open('temp.txt') as f:
for line in f:
after_split = line.strip("\n").split(",")
print after_split[0], float(after_split[1])
The main reason you're not getting output in your code is that datafile doesn't have a strip() method, and I'm surprised you're not getting exceptions.
I highly suggest you read the Python tutorial - it looks like you're trying to write Python in another language and that is not A Good Thing
You want to call strip and split on the line, not the file.
Replace
d1 = datafile.strip("\n").split(",")
With
d1 = record.strip("\n").split(",")
you operating with file handler, but should work on line
like this d1 = record.strip("\n").split(",")
datafile = open("temp.txt", "r")
record = datafile.readline()
while record != '':
d1 = record.strip("\n").split(",")
print d1[0],float (d1[1])
record = datafile.readline()
datafile.close()
Perhaps the following will work better for you (comments as explanation):
# open file this way so that it automatically closes upon any errors
with open("temp.txt", "r") as f:
data = f.readlines()
for line in data:
# only process non-empty lines
if line.strip():
d1 = line.strip("\n").split(",")
print d1[0], float(d1[1])

Categories

Resources