How insert to file one column from table in python - python

I want to copy and write from one file to another a new one column.
I have a file:
1 12 13 14
2 22 23 24
3 32 33 34
4 42 43 44
5 52 53 54
6 62 63 64
I need to copy 4 column to new file.
In the code, you can see that I take the file and delete the first two lines in it. After that, my attempts to create a file with one column.
f=open("1234.txt").readlines()
for i in [0,0,-1]:
f.pop(i)
with open("1234.txt",'w') as F:
F.writelines(f)
ff = open("1234.txt", 'r')
df1 = ff.iloc[:,3:3]
print(df1)
with open('12345.txt', 'w') as F:
df.writelines('12345.txt')
I’m not sure whether to import something for iloc, may be it pandas? Should I close files in code and when?

Related

How to obtain the first 4 rows for every 20 rows from a CSV file

I've Read the CVS file using pandas and have managed to print the 1st, 2nd, 3rd and 4th row for every 20 rows using .iloc.
Prem_results = pd.read_csv("../data sets analysis/prem/result.csv")
Prem_results.iloc[:320:20,:]
Prem_results.iloc[1:320:20,:]
Prem_results.iloc[2:320:20,:]
Prem_results.iloc[3:320:20,:]
Is there a way using iloc to print the 1st 4 rows of every 20 lines together rather then seperately like I do now? Apologies if this is worded badly fairly new to both python and using pandas.
Using groupby.head:
Prem_results.groupby(np.arange(len(Prem_results)) // 20).head(4)
You can concat slices together like this:
pd.concat([df[i::20] for i in range(4)]).sort_index()
MCVE:
df = pd.DataFrame({'col1':np.arange(1000)})
pd.concat([df[i::20] for i in range(4)]).sort_index().head(20)
Output:
col1
0 0
1 1
2 2
3 3
20 20
21 21
22 22
23 23
40 40
41 41
42 42
43 43
60 60
61 61
62 62
63 63
80 80
81 81
82 82
83 83
Start at 0 get every 20 rows
Start at 1 get every 20 rows
Start at 2 get every 20 rows
And, start at 3 get every 20 rows.
You can also do this while reading the csv itself.
df = pd.DataFrame()
for chunk in pd.read_csv(file_name, chunksize = 20):
df = pd.concat((df, chunk.head(4)))
More resources:
You can read more about the usage of chunksize in Pandas official documentation here.
I also have a post about its usage here.

Pandas read google sheet data types wrong

I read some Google Sheet data through GSpread and Pandas; however, Pandas gives my dtype as object and I cannot change it.
I'm sure that my Google Sheet values are numbers, apart from the headers, which are strings. Matplotlib will not allow me to plot a graph, as it throws a type error.
The issue is solved if I download the file as CSV but I would like to read the file directly from the google sheet.
Here is the code:
main_worksheet=sh.worksheet('Sheet3')
data = main_worksheet.get_all_values()
headers = data.pop(0)
df = pd.DataFrame(data, columns=headers, dtype='int64')
df['week'].astype(str).astype(int)
print(df['week'])
And the result:
0 28
1 29
2 30
3 31
4 32
5 33
6 34
7 35
8 36
9 37
10 38
11 39
12 40
13 41
14 42
15 43
16 44
17 45
18 46
19 47
20 48
Name: week, dtype: object
Having the same problem.
Apparently, when reading a google sheet with pandas, it doesn't allow having different data types in the same column.
If it finds different data types in the same column it converts everything into a string/object, but it's strange because when you check specific values that are strings, now they are defaulted to NaN.
To solve that problem you need to convert the entire column into the same format "Plain text" from google sheets and then if you need to change the data type as it was you can do it using the apply method once you have read the dataframe.
Problem:
enter image description here
Reading with pandas
enter image description here
SOLUTION:
enter image description here
Reading with pandas
enter image description here

Column index number by cell value in python

I am new to python and I am having a hard time with this issue and i need your help.
Q1 Q2 Q3 Q4 Q5
25 9 57 23 7
61 41 29 5 57
54 34 58 10 7
13 13 63 26 45
31 71 40 40 40
24 38 63 63 47
31 50 43 2 61
68 33 13 9 63
28 1 30 39 71
I have an excel report with the data above. I'd like to write a code that looks through all columns in the 1st row and output the index number of the column with S in the column value (i.e., 3). I want to use the index number to extract data for that column. I do not want to use row and cell reference as the excel file gets updated regularly, thus d column will always move.
def find_idx():
wb = xlrd.open_workbook(filename='data.xlsx') # open report
report_sheet1 = wb.sheet_by_name('Sheet 1')
for j in range(report_sheet1.ncols):
j=report_sheet1.cell_value(0, j)
if 'YTD' in j:
break
return j.index('Q4')
find_idx()
the i get "substring not found" erro
What i want is to return the column index number (i.e, 3), so that i can call it easily in another code. How can i fix this?
Hass!
As far as I understood, you want to get the index of a column of an excel file whose name contains a given substring such as Y. Is that right?
If so, here's a working snippet that does not requires pandas:
import xlrd
def find_idx(excel_filename, sheet_name, col_name_lookup):
"""
Returns the column index of the first column that
its name contains the string col_name_lookup. If
the col_name_lookup is not found, it returns -1.
"""
wb = xlrd.open_workbook(filename=excel_filename)
report_sheet1 = wb.sheet_by_name(sheet_name)
for col_ix in range(report_sheet1.ncols):
col_name = report_sheet1.cell_value(0, col_ix)
if col_name_lookup in col_name:
return col_ix
return -1
if __name__ == "__main__":
excel_filename = "./data.xlsx"
sheet_name = "Sheet 1"
col_name_lookup = "S"
print(find_idx(excel_filename, sheet_name, col_name_lookup))
I tried to give more semantic names to your variables (I transformed your variable j into two other variables: col_ix (actual column index of the loop) and also the variable col_name which really stands for the column name.
This code assumes that the first line of your excel file contains the column names, and if your desired substring to be looked in each of these names is not found, it returns -1.

how to get column number by cell value in python using openpyxl

I am completely new to openpyxl and python and I am having a hard time with this issue and i need your help.
JAN FEB MAR MAR YTD 2019 YTD
25 9 57 23 7
61 41 29 5 57
54 34 58 10 7
13 13 63 26 45
31 71 40 40 40
24 38 63 63 47
31 50 43 2 61
68 33 13 9 63
28 1 30 39 71
I have an excel report with the data above. I'd like to search cells for those that contain a specific string (i.e., YTD) and get the column number for YTD column. I want to use the column number to extract data for that column. I do not want to use row and cell reference as the excel file gets updated regularly, thus d column will always move.
def t_PM(ff_sheet1,start_row):
wb = openpyxl.load_workbook(filename='report') # open report
report_sheet1 = wb.get_sheet_by_name('sheet 1')
col = -1
for j, keyword in enumerate(report_sheet1.values(0)):
if keyword=='YTD':
col = j
break
ff_sheet1.cell(row=insert_col + start_row, column= header['YTD_OT'], value=report_sheet1.cell(row=i + 7, column=col).value)
But then, I get an " 'generator' object is not callable" error. How can i fix this?
Your problem is that report_sheet1.values is a generator so you can't call it with (0). I'm assuming by your code that you don't want to rely that the "YTD" will appear in the first row so you iterate all cells. Do this by:
def find_YTD():
wb = openpyxl.load_workbook(filename='report') # open report
report_sheet1 = wb.get_sheet_by_name('sheet 1')
for col in report_sheet1.iter_cols(values_only=True):
for value in col:
if isinstance(value, str) and 'YTD' in value:
return col
If you are assuming this data will be in the first row, simply do:
for cell in report_sheet1[1]:
if isinstance(value, str) and 'YTD' in cell.value:
return cell.column
openpyxl uses '1-based' line indexing
Read the docs - access many cells

How to write values to a csv file from another csv file

For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
f = open('index.csv','wb')
write = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
for ch_row in data1:
if ( data2[row,3] == ch_row ):
write.writerow(data1[data2[row,3],:])
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
Can anyone help me solve this problem?
You need to indent your last 2 lines. Also, it looks like you are writing to the file from which you are reading.

Categories

Resources