My csv data likes this:
I use this code to print:
import pandas as pd
import csv
rs = pd.read_csv(r'D:/Clustering_TOP.csv',encoding='utf-8')
with open('D:/Clustering_TOP.csv','r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
csv_title = rows[0]
csv_title = csv_title[1:]
len_csv_title = len(csv_title)
for i in range(len_csv_title):
for j in range(i,len_csv_title):
print(str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()
The result of printing is this,the format is not right:
But in fact,I want the printing likes pyramid,for example:
How to modify my code?
Hello I am not 100% sure but I think your second for is the problem. Try this:
for i in range(len_csv_title):
for j in range(i+1):
print(str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()
I think the issue lies in your second 'for' loop. The range should be for j in for j in range(len_csv_title), rather than in range(i,len_csv_title).
Change the below line:
for j in range(i,len_csv_title):
to:
for j in range(0,i+1):
This is a sample code to print starts in the pattern you requested
def pyramidpattern(n):
for i in range(0, n):
for j in range(0, i+1):
print("* ",end="")
print("\r")
n=5
pyramidpattern(n)
Output:
*
* *
* * *
* * * *
* * * * *
You need inverse back loop in your code. Just change the for loop and it works
import pandas as pd
import csv
rs = pd.read_csv(r'file',encoding='utf-8')
with open('file','r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
csv_title = rows[0]
csv_title = csv_title[1:]
len_csv_title = len(csv_title)
for i in range(len_csv_title):
for j in range(0,i+1):
print(str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()
Try to change inverse for loop
for j in range(0,i+1):
Related
I have files with hundreds and thousands rows of data but they are without any column.
I am trying to go to every file and make them row by row and store them in list after that I want to assign values by columns. But here I am confused what to do because values are around 60 in every row and some extra columns with value assigned and they should be added in every row.
Code so for:
import re
import glob
filenames = glob.glob("/home/ashfaque/Desktop/filetocsvsample/inputfiles/*.txt")
columns = []
with open("/home/ashfaque/Downloads/coulmn names.txt",encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
columns.append(l.rstrip())
total = {}
for name in filenames:
modified_data = []
with open(name,encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
if len(l) >= 1:
modified_data.append(re.split(': |,',l))
rows = []
i = len(modified_data)
x = 0
while i > 60:
r = lines[x:x+59]
x = x + 60
i = i - 60
rows.append(r)
z = len(modified_data)
while z >= 60:
z = z - 60
if z > 1:
last_columns = modified_data[-z:]
x = []
for l in last_columns:
if len(l) > 1:
del l[0]
x.append(l)
elif len(l) == 1:
x.append(l)
for row in rows:
for vl in x:
row.append(vl)
for r in rows:
for i in range(0,len(r)):
if len(r) >= 60:
total.setdefault(columns[i],[]).append(r[i])
In other script I have separated both row with 60 values and last 5 to 15 columns which should be added with row are separate but again I am confused how to bind all the data.
Data Should look like this after binding.
outputdata.xlsx
Data Input file:
inputdata.txt
What Am I missing here? any tool ?
I believe that your issue can be resolved by taking the input file and turning it into a CSV file which you can then import into whatever program you like.
I wrote a small generator that would read a file a line at a time and return a row after a certain number of lines, in this case 60. In that generator, you can make whatever modifications to the data as you need.
Then with each generated row, I write it directly to the csv. This should keep the memory requirements for this process pretty low.
I didn't understand what you were doing with the regex split, but it would be simple enough to add it to the generator.
import csv
OUTPUT_FILE = "/home/ashfaque/Desktop/File handling/outputfile.csv"
INPUT_FILE = "/home/ashfaque/Desktop/File handling/inputfile.txt"
# This is a generator that will pull only num number of items into
# memory at a time, before it yields the row.
def get_rows(path, num):
row = []
with open(path, "r", encoding="ISO-8859-1") as f:
for n, l in enumerate(f):
# apply whatever transformations that you need to here.
row.append(l.rstrip())
if (n + 1) % num == 0:
# if rows need padding then do it here.
yield row
row = []
with open(OUTPUT_FILE, "w") as output:
csv_writer = csv.writer(output)
for r in get_rows(INPUT_FILE, 60):
csv_writer.writerow(r)
I have a code that is basically doing this:
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerows(row1)
row1[:] = []
I'm creating some lists and I want to map each value to a column, like this
This error showed it up iterable expected. How can I do that?
#roganjosh is right, what you need to write one row at a time is writerow:
import csv
myFile = open("aaa.csv", "w", newline="")
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerow(row1)
row1[:] = []
myFile.close() # Don't forget to close your file
You probably need to call the method .writerow() instead of the plural .writerows(), because you write a single line to the file on each call.
The other method is to write multiple lines at once to the file.
Or you could also restructure your code like this to write all the lines at the end:
import csv
row_list = []
for j in range(2):
row = [j+i for i in range(4)]
row_list.append(row)
# row_list = [
# [j+i for i in range(4)]
# for j in range(2)]
with open('filename.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(row_list)
It's much simpler and easier to manipulate tabular data in pandas -- is there a reason you don't want to use pandas?
import pandas as pd
df = pd.DataFrame()
for i in range(4):
df[i] = range(i, i+4)
# Any other data wrangling
df.to_csv("file.csv")
I have a problem I can't seem to get right.
I have 2 numbers, A & B. I need to make a list of A rows with B columns, and have them print out 'R0CO', 'R0C1', etc.
Code:
import sys
A= int(sys.argv[1])
B= int(sys.argv[2])
newlist = []
row = A
col = B
for x in range (0, row):
newlist.append(['R0C' + str(x)])
for y in range(0, col):
newlist[x].append('R1C' + str(y))
print(newlist)
This is not working. The following is the output I get and the expected output:
Program Output
Program Failed for Input: 2 3
Expected Output:
[['R0C0', 'R0C1', 'R0C2'], ['R1C0', 'R1C1', 'R1C2']]
Your Program Output:
[['R0C0', 'R1C0', 'R1C1', 'R1C2'], ['R0C1', 'R1C0', 'R1C1', 'R1C2']]
Your output was incorrect. Try again
You are first adding R0Cx and then R1Cxy. You need to add RxCy. So try:
newlist = []
row = A
col = B
for x in range (0, row):
newlist.append([])
for y in range(0, col):
newlist[x].append('R' + str(x) + 'C' + str(y))
print(newlist)
You have to fill columns in a row while still in that row:
rows = []
row = 2
col = 3
for x in range(0, row):
columns = []
for y in range(0, col):
columns.append('R' + str(x) + 'C' + str(y))
rows.append(columns)
print(rows)
will print:
[['R0C0', 'R0C1', 'R0C2'], ['R1C0', 'R1C1', 'R1C2']]
Try changing your range commands as shown:
for x in range (row):
for y in range(col):
In fact, you have a second issue that you are not modifying the text properly:
newlist = []
row = A
col = B
for x in range (row):
sublist = []
for y in range(col):
sublist.append('R{}C{}'.format(x, y))
newlist.append(sublist)
I am quite new to working with python, so i hope you can help me out here. I have to write a programm that opens a csv file, reads it and let you select columns you want by entering the number. those have to be put in a new file. the problem is: after doing the input of which columns i want and putting "X" to start the main-part it generates exactly what i want but by using a loop, not printing a variable that contains it. But for the csv-writer i need to have a variable containg it. any ideas? here you have my code, for questions feel free to ask. the csvfile is just like:
john, smith, 37, blue, michigan
tom, miller, 25, orange, new york
jack, o'neill, 40, green, Colorado Springs
...etc
Code is:
import csv
with open("test.csv","r") as t:
t_read = csv.reader(t, delimiter=",")
t_list = []
max_row = 0
for row in t_read:
if len(row) != 0:
if max_row < len(row):
max_row = len(row)
t_list = t_list + [row]
print([row], sep = "\n")
twrite = csv.writer(t, delimiter = ",")
tout = []
counter = 0
matrix = []
for i in range(len(t_list)):
matrix.append([])
print(len(t_list), max_row, len(matrix), "Rows / Columns / Matrix Dimension")
eingabe = input("Enter column number you need or X to start generating output: ")
nr = int(eingabe)
while type(nr) == int:
colNr = nr-1
if max_row > colNr and colNr >= 0:
nr = int(nr)
# print (type(nr))
for i in range(len(t_list)):
row_A=t_list[i]
matrix[i].append(row_A[int(colNr)])
print(row_A[int(colNr)])
counter = counter +1
matrix.append([])
else:
print("ERROR")
nr = input("Enter column number you need or X to start generating output: ")
if nr == "x":
print("\n"+"Generating Output... " + "\n")
for row in matrix:
# Loop over columns.
for column in row:
print(column + " ", end="")
print(end="\n")
else:
nr = int(nr)
print("\n")
t.close()
Well you have everything you need with matrix, apart from an erroneous line that adds an unneeded row:
counter = counter +1
matrix.append([]) # <= remove this line
else:
print("ERROR")
You can then simply do:
if nr == "x":
print("\n"+"Generating Output... " + "\n")
with open("testout.csv", "w") as out:
wr = csv.writer(out, delimiter=",")
wr.writerows(matrix)
I have calculate distance between two atoms and save in out.txt file. The generated out file is like this.
N_TYR_A0002 O_CYS_A0037 6.12
O_CYS_A0037 N_TYR_A0002 6.12
N_ALA_A0001 O_TYR_A0002 5.34
O_TYR_A0002 N_ALA_A0001 5.34
My outfile has repeats, means same atoms and same distance.
How i can remove redundant line.
i used this program for distance calculation (all to all atoms)
from __future__ import division
from string import *
from numpy import *
def eudistance(c1,c2):
x_dist = (c1[0] - c2[0])**2
y_dist = (c1[1] - c2[1])**2
z_dist = (c1[2] - c2[2])**2
return math.sqrt (x_dist + y_dist + z_dist)
infile = open('file.pdb', 'r')
text = infile.read().split('\n')
infile.close()
text.remove('')
pdbid = []
#define the pdbid
spfcord = []
for g in pdbid:
ratom = g[0]
ratm1 = ratom.split('_')
ratm2 = ratm1[0]
if ratm2 in allatoms:
spfcord.append(g)
#print spfcord[:10]
outfile1 = open('pairdistance.txt', 'w')
for m in spfcord:
name1 = m[0]
cord1 = m[1]
for n in spfcord:
if n != '':
name2 = n[0]
cord2 = n[1]
dist = euDist(cord1, cord2)
if 7 > dist > 2:
#print name1, '\t', name2, '\t', dist
distances = name1 + '\t ' + name2 + '\t ' + str(dist)
#print distances
outfile1.write(distances)
outfile1.write('\n')
outfile1.close()
If you don't care about order:
def remove_duplicates(input_file):
with open(input_file) as fr:
unique = {'\t'.join(sorted([a1, a2] + [d]))
for a1, a2, d in [line.strip().split() for line in fr]
}
for item in unique:
yield item
if __name__ == '__main__':
for line in remove_duplicates('out.txt'):
print line
But simple check if name1 < name2 in your script before computing distance and writing data would be probably better.
Okay, I have an idea. Not pretending it is the best or cleanest way but it was fun so..
import numpy as np
from StringIO import StringIO
data_in_file = """
N_TYR_A0002, O_CYS_A0037, 6.12
N_ALA_A0001, O_TYR_A0002, 5.34
P_CUC_A0001, N_TYR_A0002, 9.56
O_TYR_A0002, N_ALA_A0001, 5.34
O_CYS_A0037, N_TYR_A0002, 6.12
N_TYR_A0002, P_CUC_A0001, 9.56
"""
# Import data using numpy, any method is okay really as we don't really on data being array's
data_in_array = np.genfromtxt(StringIO(data_in_file), delimiter=",", autostrip=True,
dtype=[('atom_1', 'S12'), ('atom_2', 'S12'), ('distance', '<f8')])
N = len(data_in_array['distance'])
pairs = []
# For each item find the repeated index
for index, a1, a2 in zip(range(N), data_in_array['atom_1'], data_in_array['atom_2']):
repeat_index = list((data_in_array['atom_2'] == a1) * (data_in_array['atom_1'] == a2)).index(True)
pairs.append(sorted([index, repeat_index]))
# Each item is repeated, so sort and remove every other one
unique_indexs = [item[0] for item in sorted(pairs)[0:N:2]]
atom_1 = data_in_array['atom_1'][unique_indexs]
atom_2 = data_in_array['atom_2'][unique_indexs]
distance = data_in_array['distance'][unique_indexs]
for i in range(N/2):
print atom_1[i], atom_2[i], distance[i]
#Prints
N_TYR_A0002 O_CYS_A0037 6.12
N_ALA_A0001 O_TYR_A0002 5.34
P_CUC_A0001 N_TYR_A0002 9.56
I should add that this assumes that every pair is repeated exactly once, and no item exists without a pair, this will break the code but could be handled by an error exception.
Note I also made changes to your input data file, using "," deliminators and added another pair to be sure the ordering wouldn't break the code.
Lets try to avoid generating the duplicates in the first place. Change this part of the code -
outfile1 = open('pairdistance.txt', 'w')
length = len(spfcord)
for i,m in enumerate(spfcord):
name1 = m[0]
cord1 = m[1]
for n in islice(spfcord,i+1,length):
Add the import :
from itertools import islice