For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
f = open('index.csv','wb')
write = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
for ch_row in data1:
if ( data2[row,3] == ch_row ):
write.writerow(data1[data2[row,3],:])
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
Can anyone help me solve this problem?
You need to indent your last 2 lines. Also, it looks like you are writing to the file from which you are reading.
Related
I've Read the CVS file using pandas and have managed to print the 1st, 2nd, 3rd and 4th row for every 20 rows using .iloc.
Prem_results = pd.read_csv("../data sets analysis/prem/result.csv")
Prem_results.iloc[:320:20,:]
Prem_results.iloc[1:320:20,:]
Prem_results.iloc[2:320:20,:]
Prem_results.iloc[3:320:20,:]
Is there a way using iloc to print the 1st 4 rows of every 20 lines together rather then seperately like I do now? Apologies if this is worded badly fairly new to both python and using pandas.
Using groupby.head:
Prem_results.groupby(np.arange(len(Prem_results)) // 20).head(4)
You can concat slices together like this:
pd.concat([df[i::20] for i in range(4)]).sort_index()
MCVE:
df = pd.DataFrame({'col1':np.arange(1000)})
pd.concat([df[i::20] for i in range(4)]).sort_index().head(20)
Output:
col1
0 0
1 1
2 2
3 3
20 20
21 21
22 22
23 23
40 40
41 41
42 42
43 43
60 60
61 61
62 62
63 63
80 80
81 81
82 82
83 83
Start at 0 get every 20 rows
Start at 1 get every 20 rows
Start at 2 get every 20 rows
And, start at 3 get every 20 rows.
You can also do this while reading the csv itself.
df = pd.DataFrame()
for chunk in pd.read_csv(file_name, chunksize = 20):
df = pd.concat((df, chunk.head(4)))
More resources:
You can read more about the usage of chunksize in Pandas official documentation here.
I also have a post about its usage here.
I am creating dicts where the dict keys are strings and the values are large-ish pandas DataFrames. I would like to write these dicts to an excel file but the issue I'm having is that when python writes the dataframe to a csv it cuts out parts. Code:
import pandas as pd
import numpy as np
def create_random_df():
return(pd.DataFrame(np.random.randint(0,100,size=(70,26)),columns=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')))
dic={'key1': create_random_df() , 'key2': create_random_df()}
with open('test.csv', 'w') as f:
for key in dic.keys():
f.write("%s,%s\n"%(key,dic[key]))
This sort of outputs the format I'd like except for the following:
All of the dataframe columns are in Cell B1 and they're not complete... it's
A B C D E F G H I ... R S T U V W X Y
Z
and then the indexes and dataframe elements are all in columns A. i.e. Cells A2:A4 is
0 55 96 60 47 11 3 2 69 50 ... 3 23 26 3 15 53
78 95 49
1 72 48 12 25 32 57 11 84 5 ... 11 43 56 0 68 55
95 64 84
2 80 56 78 58 79 72 67 97 58 ... 84 34 18 21 71 20
72 36 37
I'd like the dataframes to be written to the csv in their entirety and obviously the values in discrete cells
You can try:
dic={'key1': create_random_df() , 'key2': create_random_df()}
with open('test.csv', 'w') as f:
for key in dic.keys():
df = dic[key]
df.insert(0,'Key', pd.Series([key]))
df.Key = df.Key.fillna('')
f.write(df.to_csv(index=False))
I am completely new to openpyxl and python and I am having a hard time with this issue and i need your help.
JAN FEB MAR MAR YTD 2019 YTD
25 9 57 23 7
61 41 29 5 57
54 34 58 10 7
13 13 63 26 45
31 71 40 40 40
24 38 63 63 47
31 50 43 2 61
68 33 13 9 63
28 1 30 39 71
I have an excel report with the data above. I'd like to search cells for those that contain a specific string (i.e., YTD) and get the column number for YTD column. I want to use the column number to extract data for that column. I do not want to use row and cell reference as the excel file gets updated regularly, thus d column will always move.
def t_PM(ff_sheet1,start_row):
wb = openpyxl.load_workbook(filename='report') # open report
report_sheet1 = wb.get_sheet_by_name('sheet 1')
col = -1
for j, keyword in enumerate(report_sheet1.values(0)):
if keyword=='YTD':
col = j
break
ff_sheet1.cell(row=insert_col + start_row, column= header['YTD_OT'], value=report_sheet1.cell(row=i + 7, column=col).value)
But then, I get an " 'generator' object is not callable" error. How can i fix this?
Your problem is that report_sheet1.values is a generator so you can't call it with (0). I'm assuming by your code that you don't want to rely that the "YTD" will appear in the first row so you iterate all cells. Do this by:
def find_YTD():
wb = openpyxl.load_workbook(filename='report') # open report
report_sheet1 = wb.get_sheet_by_name('sheet 1')
for col in report_sheet1.iter_cols(values_only=True):
for value in col:
if isinstance(value, str) and 'YTD' in value:
return col
If you are assuming this data will be in the first row, simply do:
for cell in report_sheet1[1]:
if isinstance(value, str) and 'YTD' in cell.value:
return cell.column
openpyxl uses '1-based' line indexing
Read the docs - access many cells
Hi when I try to print a list, it prints out the directory and not the contents of win.txt. I'm trying to enumerate the txt into a list and split then append it to a, then do other things once get a to print. What am I doing wrong?
import os
win_path = os.path.join(home_dir, 'win.txt')
def roundedStr(num):
return str(int(round(num)))
a=[] # i declares outside the loop for recover later
for i,line in enumerate(win_path):
# files are iterable
if i==0:
t=line.split(' ')
else:
t=line.split(' ')
t[1:6]= map(int,t[1:6])
a.append(t) ## a have all the data
a.pop(0)
print a
prints out directory, like example c:\workspace\win.txt
NOT what I want
I want it to print the contents of win.txt
which takes t[1:6] as integers, like
11 21 31 41 59 21
and prints that out like that same way.
win.txt contains this
05/06/2017 11 21 31 41 59 21 3
05/03/2017 17 18 49 59 66 9 2
04/29/2017 22 23 24 45 62 5 2
04/26/2017 01 15 18 26 51 26 4
04/22/2017 21 39 41 48 63 6 3
04/19/2017 01 19 37 40 52 15 3
04/15/2017 05 22 26 45 61 13 3
04/12/2017 08 14 61 63 68 24 2
04/08/2017 23 36 51 53 60 15 2
04/05/2017 08 20 46 53 54 13 2
I just want [1]-[6]
I think what you want is to open the file 'win.txt', and read its content. Using the open function to create a file object, and a with block to scope it. See my example below. This will read the file, and take the first 6 numbers of each line.
import os
win_path = os.path.join(home_dir, 'win.txt')
a=[] # i declares outside the loop for recover later
with open(win_path, 'r') as file:
for i,line in enumerate(file):
line = line.strip()
print(line)
if i==0:
t=line.split(' ')
else:
t=line.split(' ')
t[1:7]= map(int,t[1:7])
t = t[1:7]
a.append(t) ## a have all the data
a.pop(0)
print (a)
I have a file of which the first column has repeated pattern as belows,
1999.2222 50 100
1999.2222 42 15
1999.2222 24 35
1999.2644 10 25
1999.2644 10 26
1999.3564 65 98
1999.3564 45 685
1999.3564 54 78
1999.3564 78 98
and I want this file into three files as
file1:
1999.2222 50 100
1999.2222 42 15
1999.2222 24 35
file2:
1999.2644 10 25
1999.2644 10 26
file3:
1999.3564 65 98
1999.3564 45 685
1999.3564 54 78
1999.3564 78 98
How could I split like this? Thanks:)
itertools.groupby is probably the most suitable choice for what you're after.
import itertools
with open('file.txt', 'r') as fin:
# group each line in input file by first part of split
for i, (k, g) in enumerate(itertools.groupby(fin, lambda l: l.split()[0]), 1):
# create file to write to suffixed with group number - start = 1
with open('file{0}.txt'.format(i), 'w') as fout:
# for each line in group write it to file
for line in g:
fout.write(line.strip() + '\n')