I've got this dataset in a .csv file:
https://www.dropbox.com/s/2kzpzkhoiolhnlc/output.csv?dl=0
19,3,12
3
12
16,4
26,15,8,3
2
8
15
20
12,25,20,2,16
12,16
12,25
2,16
1,12
16,4
11,19,25,20
11,20,16,21
25,20,21
.....
for each row, if numbers are less then 51 than I need to add ? until having 51 chars in that row. For example, in the first row I have 19,3,12, so I have to add 48 ? to have a row like this: 19,3,12,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?
In the second row I have just one number, so I have to add 50 ?, the same for the other rows. Could you help me please?
EDIT: I've tried with this but didn't work, it just added "" to some rows:
import pandas as pd
df = pd.read_csv('output.csv', sep=';')
df = df.fillna('?')
df.to_csv('sorted2.csv', index=False)
you can do it with just text file manipulation if you want, no need to use pandas or csv module for this simple case.
import csv
with open('source.csv') as f:
with open('result.csv', 'w') as fw:
for line in f:
line = line.strip() + (',?' * (50 - line.count(',')))
fw.write(line + '\n')
Use pandas to read the file in and set the number of columns you want. The following code reads in a file and assigns n columns. The extra elements will by default have value np.nan
df.read_csv('file', names=range(n))
If you want them to have a different value you can assign it with
df.fillna(value, inplace = True)
Then you can just write the dataframe back into the file and it will have the shape you want
df.to_csv('file')
Related
I am a beginner and looking for a solution. I am trying to compare columns from two CSV files with no header. The first one has one column and the second one has two.
File_1.csv: #contains 2k rows with random numbers.
1
4
1005
.
.
.
9563
File_2.csv: #Contains 28k rows
0 [81,213,574,697,766,1074,...21622]
1 [0,1,4,10,12,13,1005, ...31042]
2 [35,103,85,1023,...]
3 [4,24,108,76,...]
4 []
.
.
.
28280 [0,1,9,10,32,49,56,...]
I want first to compare the column of File_1 with the first column of File_2 and find out if they match and extract the matching values plus the second column of file2 into a new CSV file (output.csv) deleting the not matching values. For example,
output.csv:
1 [0,1,4,10,12,13,1005, ...31042]
4 []
.
.
.
Second, I want to compare the File_1.csv column (iterate 2k rows) with the second column (each array) of the output.csv and find the matching values and delete the ones that do not, and I want to save those matching values into the output.csv file and also keeping the first column of that file. For example, 4 was deleted as it didn't have any values in the second column (array) as there were no numbers to compare to File_1, but there are others like 1 that did have some that match"
output.csv:
1 [1,4,1005]
.
.
.
I found a code that works for the first step, but it does not save the second column. I have been looking at how to compare arrays, but I haven't been able to.
This is what I have so far,
import csv
nodelist = []
node_matches = []
with open('File_1.csv', 'r') as f_rand_node:
csv_f = csv.reader(f_rand_node)
for row in csv_f:
nodelist.append(row[0])
set_node = set(nodelist)
with open('File_2.csv', 'r') as f_tbl:
with open('output.csv', 'w') as f_out:
csv_f = csv.reader(f_tbl)
for row in csv_f:
set_row = set(' '.join(row).split(' '))
if set_row.intersection(set_node):
node_match = list(set_row.intersection(set_node))[0]
f_out.write(node_match + '\n')
Thank you for the help.
I'd recommend to use pandas for this case.
File_1.csv:
1
4
1005
9563
File_2.csv:
0 [81,213,574,697,766,1074]
1 [0,1,4,10,12,13,1005,31042]
2 [35,103,85,1023]
3 [4,24,108,76]
4 []
5 [0,1,9,10,32,49,56]
Code:
import pandas as pd
import csv
file1 = pd.read_csv('File_1.csv', header=None)
file1.columns=['number']
file2 = pd.read_csv('File_2.csv', header=None, delim_whitespace=True, index_col=0)
file2.columns = ['data']
df = file2[file2.index.isin(file1['number'].tolist())] # first step
df = df[df['data'] != '[]'] # second step
df.to_csv('output.csv', header=None, sep='\t', quoting=csv.QUOTE_NONE)
Output.csv:
1 [0,1,4,10,12,13,1005,31042]
The entire thing is a lot easier with pandas DataFrames:
import pandas as pd
#Read the files into two dataFrames
df1= pd.read_csv("File_1.csv")
df2= pd.read_csv("File_2.csv")
df2.set_index("Column 0")
df2= df2.filter(items = df1)
index= df1.values()
df2 = df2.applymap(lambda x: set(x).intersection(index))
df.to_csv("output.csv")
This should do the trick, quite simply.
I have multiple txt files and each of these txt files has 6 columns. What I want to do : add just one column as a last column, so at the end the txt file has maximum 7 columns and if i run the script again it shouldn't add a new one:
At the beginning each file has six columns:
637.39 718.53 155.23 -0.51369 -0.18539 0.057838 3.209840789730089
636.56 720 155.57 -0.51566 -0.18487 0.056735 3.3520643559939938
635.72 721.52 155.95 -0.51933 -0.18496 0.056504 3.4997850701290125
What I want is to add a new column of zeros only if the current number of columns is 6, after that it should prevent adding a new column when I run the script again (7 columns is the total number where the last one is zeros):
637.39 718.53 155.23 -0.51369 -0.18539 0.057838 3.209840789730089 0
636.56 720 155.57 -0.51566 -0.18487 0.056735 3.3520643559939938 0
635.72 721.52 155.95 -0.51933 -0.18496 0.056504 3.4997850701290125 0
My code works and add one additional column each time i run the script but i want to add just once when the number of columns 6. Where (a) give me the number of column and if the condition is fulfilled then add a new one:
from glob import glob
import numpy as np
new_column = [0] * 20
def get_new_line(t):
l, c = t
return '{} {}\n'.format(l.rstrip(), c)
def writecolumn(filepath):
# Load data from file
with open(filepath) as datafile:
lines = datafile.readlines()
a=np.loadtxt(lines, dtype='str').shape[1]
print(a)
**#if a==6: (here is the problem)**
n, r = divmod(len(lines), len(new_column))
column = new_column * n + new_column[:r]
new_lines = list(map(get_new_line, zip(lines, column)))
with open(filepath, "w") as f:
f.writelines(new_lines)
if __name__ == "__main__":
filepaths = glob("/home/experiment/*.txt")
for path in filepaths:
writecolumn(path)
When i check the number of columns #if a==6 and shift the content inside the if statement I get error. without shifting the content inside the if every thing works fine and still adding one column each time i run it.
Any help is appreciated.
To test the code create two/one txt files with random number of six columns.
Could be an indentation problem, i.e. block below 'if'. writing new-lines should be indented properly --
This works --
def writecolumn(filepath):
# Load data from file
with open(filepath) as datafile:
lines = datafile.readlines()
a=np.loadtxt(lines, dtype='str').shape[1]
print(a)
if int(a)==6:
n, r = divmod(len(lines), len(new_column))
column = new_column * n + new_column[:r]
new_lines = list(map(get_new_line, zip(lines, column)))
with open(filepath, "w") as f:
f.writelines(new_lines)
Use pandas to read your text file:
import pandas as of
df = pd.read_csv("whitespace.csv", header=None, delimiter=" ")
Add a column or more as needed
df['somecolname'] = 0
Save DataFrame with no header.
Basically I have data from a mechanical test in the output format .raw and I want to access it in Python.
The file needs to be splitted using delimiter ";" so it contains 13 columns.
By doing this the idea is to index and pullout the desired information, which in my case is the "Extension mm" and "Load N" values as arrays in row 41 in order to create plot.
I have never worked with .raw files and I dont know what to do.
The file can be downloaded here:
https://drive.google.com/file/d/0B0GJeyFBNd4FNEp0elhIWGpWWWM/view?usp=sharing
Hope somebody can help me out there!
you can convert the raw file into csv file then use the csv module remember to set the delimeter=' ' otherwise by default it take comma as delimeter
import csv
with open('TST0002.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader: //this will read each row line by line
print (row[0]) //you can use row[0] to get first element of that row.
Your file looks basically like a .tsv with 40 lines to skip. Could you try this ?
import csv
#export your file.raw to tsv
with open('TST0002.raw') as infile, open('new.tsv', 'w') as outfile:
lines = infile.readlines()[40:]
for line in lines:
outfile.write(line)
Or if you want to make directly some data analysis on your two columns :
import pandas as pd
df = pd.read_csv("TST0002.raw", sep="\t", skiprows=40, usecols=['Extension mm', 'Load N'])
print(df)
output:
Extension mm Load N
0 -118.284 0.1365034
1 -117.779 -0.08668576
2 -117.274 -0.1142517
3 -116.773 -0.1092401
4 -116.271 -0.1144083
5 -11.577 -0.1314806
6 -115.269 -0.03609632
7 -114.768 -0.06334914
....
I have a CSV file.
There are a fixed number of columns and an unknown number of rows.
The information I need is always in the same 2 columns but not in the same row.
When column 6 has a 17 character value I also need to get the data from column 0.
This is an example row from the CSV file:
E4:DD:EF:1C:00:4F, 2012-10-08 11:29:04, 2012-10-08 11:29:56, -75, 9, 18:35:2C:18:16:ED,
You could open the file and go through it line by line. Split the line and if element 6 has 17 characters append element 0 to your result array.
f = open(file_name, 'r')
res = []
for line in f:
L = line.split(',')
If len(L[6])==17:
res.append(L[0])
Now you have a list with all the elements in column 6 of you cvs.
You can use csv module to read the csv files and you can provide delimiter/dialect as you need (, or | or tab etc..) while reading the file using csv reader.
csv reader takes care of providing the row/record with columns as list of values. If you want access the csv record/row as dict then you can use DictReader and its methods.
import csv
res = []
with open('simple.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
# Index start with 0 so the 6th column will be 5th index
# Using strip method would trim the empty spaces of column value
# Check the length of columns is more than 5 to handle uneven columns
if len(row) > 5 and len(row[5].strip()) == 17:
res.append(row[0])
# Result with column 0 where column 6 has 17 char length
print res
I have a csv which contains 38 colums of data, all I want to find our how to do is, divide column 11 by column by column 38 and append this data tot he end of each row. Missing out the title row of the csv (row 1.)
If I am able to get a snippet of code that can do this, I will be able to manipulate the same code to perform lots of similar functions.
My attempt involved editing some code that was designed for something else.
See below:
from collections import defaultdict
class_col = 11
data_col = 38
# Read in the data
with open('test.csv', 'r') as f:
# if you have a header on the file
# header = f.readline().strip().split(',')
data = [line.strip().split(',') for line in f]
# Append the relevant sum to the end of each row
for row in xrange(len(data)):
data[row].append(int(class_col)/int(data_col))
# Write the results to a new csv file
with open('testMODIFIED2.csv', 'w') as nf:
nf.write('\n'.join(','.join(row) for row in data))
Any help will be greatly appreciated. Thanks SMNALLY
import csv
with open('test.csv', 'rb') as old_csv:
csv_reader = csv.reader(old_csv)
with open('testMODIFIED2.csv', 'wb') as new_csv:
csv_writer = csv.writer(new_csv)
for i, row in enumerate(csv_reader):
if i != 0:
row.append(float(row[10]) / float(row[37]))
csv_writer.writerow(row)
Use pandas:
import pandas
df = pandas.read_csv('test.csv') #assumes header row exists
df['FRACTION'] = 1.0*df['CLASS']/df['DATA'] #by default new columns are appended to the end
df.to_csv('out.csv')