I need to merge vertically the data from several CSV spreadsheets in Python. Their structure is identical, I just need to put one table's data on top of the following because they are the months on an annual survey. I tried several methods I found googling but I can't find a way to do something as simple as:
import csv
spreadsheets1 = open('0113_RE_fscom.csv','r')
spreadsheets2 = open('0213_RE_fscom.csv','r')
spreadsheets = spreadsheets1 + spreadsheets2
with spreadsheet as csvfile:
sales = csv.reader(csvfile)
for row in sales:
print row
Looks like you simply forgot to iterate over files. Try this code:
import csv
spreadsheet_filenames = [
'0113_RE_fscom.csv',
'0213_RE_fscom.csv',
]
for filename in spreadsheet_filenames:
with open(filename, 'r') as csvfile:
sales = csv.reader(csvfile)
for row in sales:
print row
how about this:
import csv
from itertools import izip
with open('0113_RE_fscom.csv', 'r') as f1, open('0113_RE_fscom.csv', 'r') as f2:
csv1 = csv.reader(f1, delimiter=',')
csv2 = csv.reader(f2, delimiter=',')
for line1, line2 in izip(csv1, csv2):
print line1 + line2
This is quite simple with pandas.
import pandas as pd
f1 = pd.read_csv('0113_RE_fscom.csv', header=None)
f2 = pd.read_csv('0213_RE_fscom.csv', header=None)
merged = pd.concat(f1, f2)
merged.to_csv('merged.csv', index=None, header=None)
Remove header=None if your files actually do have a header.
Related
First, I need to import two csv files.
Then I need to remove header in both files.
After that, I would like to take one column from both files and to concatenate them.
I have tried to open files, but I'm not sure how to concatenate.
Can anyone give advice how to proceed?
import csv
x = []
chamber_temperature = []
with open(r"C:\Users\mm02058\Documents\test.txt", 'r') as file:
reader = csv.reader(file, delimiter='\t')
with open(r"C:\Users\mm02058\Documents\test.txt", 'r') as file1:
reader_1 = csv.reader(file1, delimiter='\t')
for row in (reader):
x.append(row[0])
chamber_temperature.append(row[1])
for row in (reader_1):
x.append(row[0])
chamber_temperature.append(row[1])
The immediate bug is that you are trying to read from reader1 outside the with block, which means Python has already closed the file.
But the nesting of the with calls is just confusing and misleading anyway. Here is a generalization which should allow you to extend with more new files easily.
import csv
x = []
chamber_temperature = []
for filename in (r"C:\Users\mm02058\Documents\test.txt",
r"C:\Users\mm02058\Documents\test.txt"):
with open(filename, 'r') as file:
for idx, row in enumerate(csv.reader(file, delimiter='\t')):
if idx == 0:
continue # skip header line
x.append(row[0])
chamber_temperature.append(row[1])
Because of how you have structured your code, the context manager for file1 will close the file before it has been used by the for loop.
Use a single context manager to open both files e.g
with open('file1', 'r') as file1, open('file2', 'r') as file2:
# Your code in here
for row in (reader_1):
x.append(row[0])
chamber_temperature.append(row[1])
You are getting this error because you have placed this codeblock outside the 2nd loop and now the file has been closed.
You can either open both the files at once with this
with open('file1', 'r') as file1, open('file2', 'r') as file2:
# Your code in here
or you can use pandas for opening and concatenating csv files
import pandas as pd
data = pd.read_csv(r'file.csv', header=None)
and then refer here Concatenate dataframes
I have just started learning python, trying to use it for one of my manual activity which i perform using excel filter operator.
Every month i receive a file, i put that csv into an excel then applying filter create a new file for value in carrier field and share that with respective carrier.
here is some sample data from my csv. I have shown only 2 carriers here but i have more than 13 values,
carrier,type,count
DTH,a,123
DTH,b,3123
DTH,c,41341
DTH,d,13411
BLUEDART,a,12123
BLUEDART,b,31231
BLUEDART,c,411
BLUEDART,d,11
Expected output
DTH.csv
carrier,type,count
DTH,a,123
DTH,b,3123
DTH,c,41341
DTH,d,13411
BLUEDART.csv
carrier,type,count
BLUEDART,a,12123
BLUEDART,b,31231
BLUEDART,c,411
BLUEDART,d,11
Any help or just guidance is highly appreciated.
Very easy using pandas:
import pandas as pd
carriers_csv_path = r"C:\Users\Bluetab\PycharmProjects\utils\csvGeneratorStack\csvCarriers.csv"
carrier_df = pd.read_csv(carriers_csv_path)
grouped_by_carrier = carrier_df.groupby(["carrier"])
unique_keys = carrier_df['carrier'].unique()
for unique_key in unique_keys:
grouped_by_carrier.get_group(unique_key).to_csv("./" + unique_key + ".csv", sep=",", index=False)
Hope it helps.
Tomas
Using the standard library of Python only:
import csv
def write_output(header_row, carrier_name, c_rows):
print("writing output for "+carrier_name)
with open("c:\\tmp\\"+carrier_name+".csv", "w", newline="") as outfile:
outwriter = csv.writer(outfile, delimiter=",")
outwriter.writerow(header_row)
for outrow in c_rows:
outwriter.writerow(outrow)
with open("c:\\tmp\\carrier.csv", newline="") as csvfile:
creader = csv.reader(csvfile, delimiter=",")
first_row = True
header_row = None
groups = {}
for row in creader:
if first_row:
header_row = row
first_row = False
else:
if not row[0] in groups:
groups[row[0]] = [row]
else:
groups[row[0]].append(row)
for gr in groups:
write_output(header_row, gr, groups[gr])
I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.
So lets say I have a csv file;
1,2,3
4,5,6
7,8,9
All I want to do is to have a csv file as this;
1,2,3,4,5,6,7,8,9
The code I have tried is this;
fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
new = line.replace(',', ' ', 1)
fout.write (new)
fin.close()
fout.close()
Could you please help?
You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.
import csv
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')
print(data1)
print(data2)
combined = []
for row in data1:
combined.extend(row)
for row in data2:
combined.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined)
That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:
import csv
import os
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
all_files = os.listdir('my_csvs')
combined_output = []
for file in all_files:
data = return_contents('my_csvs/{}'.format(file))
for row in data:
combined_output.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined_output)
If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files
Here is what you can do:
PATH = "your folder path"
def order_list():
data_list = []
for filename in os.listdir(PATH):
if filename.endswith(".csv"):
with open("data.csv") as csvfile:
read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
for row in read_csv:
data_list.extend(row)
print(data_list)
if __name__ == '__main__':
order_list()
Store your data in pandas df
import pandas as pd
df = pd.read_csv('file.csv')
Store the modified dataframe into new one
df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column
Write the df to new csv
df2.to_csv("file_modified.csv")
You could do it also like this:
fIn = open("test.csv", "r")
fOut = open("output.csv", "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
I've you want now to run it on multiple file you can run it as script with arguments:
import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:
for i in *.csv; do python csvOnliner.py $i changed_$i; done
With windows you could do it in a way like this:
FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i
I have two CSV files; one containing X(longitude) and the other Y(latitude) values (they are 'float' data type)
I am trying to create a single CSV with all possible combinations (e.g. X1,Y1; X1, Y2; X1,Y3; X2,Y1; X2,Y2; X2,Y3... etc)
I have written the following which partly works. However the CSV file created has lines in between values and i also get the values stored like this with there list parentheses ['20.7599'] ['135.9028']. What I need is 20.7599, 135.9028
import csv
inLatCSV = r"C:\data\Lat.csv"
inLongCSV = r"C:\data\Long.csv"
outCSV = r"C:\data\LatLong.csv"
with open(inLatCSV, 'r') as f:
reader = csv.reader(f)
list_Lat = list(reader)
with open(inLongCSV, 'r') as f:
reader = csv.reader(f)
list_Long = list(reader)
with open(outCSV, 'w') as myfile:
for y in list_Lat:
for x in list_Long:
combVal = (y,x)
#print (combVal)
wr = csv.writer(myfile)
wr.writerow(combVal)
Adding a argument to the open function made the difference:
with open(my_csv, 'w', newline="") as myfile:
combinations = [[y,x] for y in list_Lat for x in list_Long]
wr = csv.writer(myfile)
wr.writerows(combinations)
Any time you're doing something with csv files, Pandas is a great tool
import pandas as pd
lats = pd.read_csv("C:\data\Lat.csv",header=None)
lons = pd.read_csv("C:\data\Long.csv",header=None)
lats['_tmp'] = 1
lons['_tmp'] = 1
df = pd.merge(lats,lons,on='_tmp').drop('_tmp',axis=1)
df.to_csv('C:\data\LatLong.csv',header=False,index=False)
We create a dataframe for each file, and merge them on a temporary column, which produces the cartesian product. https://pandas.pydata.org/pandas-docs/version/0.20/merging.html
I have data in a csv file that looks like that is imported as this.
import csv
with open('Half-life.csv', 'r') as f:
data = list(csv.reader(f))
the data will come out as this to where it prints out the rows like data[0] = ['10', '2', '2'] and so on.
What i'm wanting though is to retrieve the data as columns in instead of rows, to where in this case, there are 3 columns.
You can create three separate lists, and then append to each using csv.reader.
import csv
c1 = []
c2 = []
c3 = []
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
c1.append(row[0])
c2.append(row[1])
c3.append(row[2])
A little more automatic and flexible version of Alexander's answer:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
for i in range(len(row)):
columns[i].append(row[i])
# Following line is only necessary if you want a key error for invalid column numbers
columns = dict(columns)
You could also modify this to use column headers instead of column numbers.
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
headers = next(reader)
column_nums = range(len(headers)) # Do NOT change to xrange
for row in reader:
for i in column_nums:
columns[headers[i]].append(row[i])
# Following line is only necessary if you want a key error for invalid column names
columns = dict(columns)
Another option, if you have numpy installed, you can use loadtxt to read a csv file into a numpy array. You can then transpose the array if you want more columns than rows (I wasn't quite clear on how you wanted the data to look). For example:
import numpy as np
# Load data
data = np.loadtxt('csv_file.csv', delimiter=',')
# Transpose data if needs be
data = np.transpose(data)