I have three excel files, Book1, Book2, Book3, with me. Each one of them consists of 11000 rows and 10000 columns. And each cell contains a numeric value of an observation. Now I have a 3 tuple, (100, 150, 150) and I want to compare the numeric values of each cell of Book1 with 1st tuple (100) and of Book2 with 2nd tuple (150) and similarly Book3 with 3rd tuple (150). Now whenever the corresponding cells of these excel files match with this tuple, I want to print 1 otherwise 0. That is, say my (10,200) cell in Book1 contains 100, in Book2 the cell (10,200) contains 150 and in (10,200) cell of Book3 we have 150, then I want to print 1 else 0.
So this is the program I wrote for this:
import xlrd
file_loc1 = "D:\Python\Book1.xlsx"
file_loc2 = "D:\Python\Book2.xlsx"
file_loc3 = "D:\Python\Book3.xlsx"
workbook1 = xlrd.open_workbook(file_loc1)
workbook2 = xlrd.open_workbook(file_loc2)
workbook3 = xlrd.open_workbook(file_loc3)
sheet1 = workbook1.sheet_by_index(0)
sheet2 = workbook2.sheet_by_index(0)
sheet3 = workbook3.sheet_by_index(0)
for i in range(1,11000):
for j in range(0,10000):
if sheet1.cell_value(i,j) == 100 and sheet2.cell_value(i,j) == 150 and sheet3.cell_value(i,j) == 150:
print 1
else:
print 0
Firstly, as I am new to Python, so I want to make sure if this program is correct or there is some issue with this? The range of loop is the one I required.
Secondly, I ran this program on my system and it has been around 10 hours and the program is still running. I am using 64-bit Python 2.7.13 on my 64-bit Windows 8.1 system. For executing, I am using Windows Powershell. I gave the following command for execution python script1.py > output1.txt as I also want an output in text. I got a text file generated in my Python directory named output1 but its size has been 0 bytes since the beginning of program. So, I am not even sure if I am getting any proper file or not. What should I do here? Is there any more efficient way to get such an output? Also, how long am I suppose to wait for this program/loop to finish up?
Related
I'm working on a data science project to make some prediction, and I need to calculate a new column based on other column values.
All is working fine, except that my Jupyter-lab is printing me blank lines in my output and I don't know why.
Here's the code :
# Calcul A :
pas = 1500
TailleTotal = len(df)
limite=TailleTotal-pas
df['A'] = np.empty(TailleTotal, dtype = float)
index=0
while index < limite :
A_temp = 0
A_temp = np.sqrt((df['X'][index]**2)+(df['Y'][index]**2)+(df['Z'][index]**2))
df['A'][index]=A_temp
index = index+1
And when I run it, I have a blank line for every iteration.. My files is making more than 1M lines, I have to scroll all over in my code it's very annoying.
But it's more that I really don't understand why it does this, I have no print function or anything that is supposed to show me something.. So why Python have this need to show me empty lines ? It's because I have no "return" in my loop ?
Edit : It appears to be a "output memory" problem from Jupyter-lab. Right clicking and "clear all output" is resolving my issue
You have an infinite loop, first change the while and try to do maybe 1 or 2 iterations and check if the problem is the same. I'm prety sure that dissapears. The kernel should be consuming resources for your loop.
I am trying to create a loop involving Pandas/ Python and an Excel file. The column in question is named "ITERATION" and it has numbers ranging from 1 to 6. I'm trying to query the number of hits in the Excel file in the following iteration ranges:
1 to 2
3
4 to 6
I've already made a preset data frame named "df".
iteration_list = ["1,2", "3", "4,5,6"]
i = 1
for k in iteration_list:
table = df.query('STATUS == ["Sold", "Refunded"]')
table["ITERATION"] = table["ITERATION"].apply(str)
table = table.query('ITERATION == ["%s"]' % k)
table = pd.pivot_table(table, columns=["Month"], values=["ID"], aggfunc=len)
table.to_excel(writer, startrow = i)
i = i + 3
The snippet above works only for the number "3". The other 2 scenarios don't seem to work as it literally searches for the string "1,2". I've tried other ways such as:
iteration_list = [1:2, 3, 4:6]
iteration_list = [{1:2}, 3, {4:6}]
to no avail.
Does anyone have any suggestions?
EDIT
After looking over Stidgeon's answer, I seemed to come up with the following alternatives. Stidgeon's answer DOES provide an output but not the one I'm looking for (it gives 6 outputs - from iteration 1 to 6 in each loop).
Above, my list was the following:
iteration_list = ["1,2", "3", "4,5,6"]
If you play around with the quotation marks, you could input exactly what you want. Since your strings is literally going to be inputted into this line where %s is:
table = table.query('ITERATION == ["%s"]' % k)
You can essentially play around with the list to fit your precise needs with quotations. Here is a solution that could work:
list = ['1", "2', 3, '4", "5", "6']
Just focusing on getting the values out of the list of strings, this works for me (though - as always - there may be more Pythonic approaches):
lst = ['1,2','3','4,5,6']
for item in lst:
items = item.split(',')
for _ in items:
print int(_)
Though instead of printing at the end, you can pass the value to your script.
This will work if all your strings are either single numbers or numbers separated by commas. If the data are consistently formatted like that, you may have to tweak this code.
Solved: A friend of mine helped me add in code that takes the csv files which get outputted and combines them into a single new file. I will add the code in after the weekend in case anyone else with a similar issue wants to see it in the future!
Let me start by sharing my existing, working code. This code takes some raw data from a csv file and generates new csv files from it. The data consists of two columns, one representing voltage and one representing current. If the voltage value is not changing, the current values are sent to a new csv file whose name reflects the constant voltage. Once a new stable voltage is reached, another csv is made for that voltage and so on. Here it is:
for x in range(1,6):
input=open('signalnoise(%d).csv' %x,"r") # Opens raw data as readable
v1 = 0
first = True
for row in csv.reader(input,delimiter='\t'):
v2 = row[0]
if v1==v2:
voltage=float(v1)*1000
if first:
print("Writing spectra file for " +str(voltage) + "mV")
first = False
output=open('noisespectra(%d)' %x +str(voltage)+'mV.csv',"a")
current = [row[1]]
writer=csv.writer(output,delimiter='\t')
writer.writerow(current)
else:
v1 = row[0]
first = True
One note, for some reason the print command doesn't seem to go off until the entire script is done running but it prints the correct thing. This could just be my computer hanging while the script runs.
I would like to change this so that instead of having a bunch of files, I just have one output file with multiple columns. Each column would have its first entry be the voltage value followed by all the currents recorded for that voltage. Here is my idea so far but I'm stuck:
for x in range(1,6):
input=open('signalnoise(%d).csv' %x,"r") # Opens raw data as readable
v1 = 0
first = True
for row in csv.reader(input,delimiter='\t'):
v2 = row[0]
if v1==v2:
voltage=float(v1)*1000
if first:
column = ['voltage']
print("Writing spectra file for " +str(voltage) + "mV")
first = False
column=column+[row[1]] # Adds the current onto the column
saved = True # Means that a column is being saved
elif saved: # This is executed if there is a column waiting to be appended and the voltage has changed
#I get stuck here...
At this point I think I need to somehow use item.append() like the example here but I'm not entirely sure how to implement it. Then I would set saved = False and v1 = row[0] and have the same else statement as the original working code so that on the next iteration things would proceed as desired.
Here is some simple sample data to work with (although mine is actually tab delimited):
.1, 1
.2, 2
.2, 2
.2, 2.1
.2, 2
.3, 3
.4, 4
.5, 5.1
.5, 5.2
.5, 5
.5, 5.1
My working code would take this and give me two files named 'noisespectra(#)200.0mV.csv' and 'noisespectra(#)500.0mV.csv' which are single columns '2,2,2.1,2' and '5.1,5.2,5,5.1' respectively. I would like code which makes a single file named 'noisespectra(#).csv' which is two columns, '200.0mV,2,2,2.1,2' and '500.0mV,5.1,5.2,5,5.1'. In general, a particular voltage will not have the same number of currents and I think this could be a potential problem in using the item.append() technique, particularly if the first voltage has fewer corresponding currents than future voltages.
Feel free to disregard the 'for x in range()'; I am looping through files with similar names but that's not important for my problem.
I greatly appreciate any help anyone can give me! If there are any questions, I will try to address them as quickly as I can.
Keep track of the two sets of values in two lists, then do ...
combined = map(None, list_1, list_2)
And then output the combined list to csv.
I have a xls spreadsheet that looks like below
Number Code Unit
1 Widget 1 20.0000
2 Widget 2 4.6000
3 Widget 3 2.6000
4 Widget 4 1.4500
I have created the following code:
import xlrd
wb=xlrd.open_workbook('pytest.xls')
xlsname = 'pytest.xls'
book = xlrd.open_workbook(xlsname)
sd={}
for s in book.sheets():
sd[s.name] = s
sheet=sd["Prod"]
Number = sh.col_values(0)
Code = sh.col_values(1)
Unit = sh.col_values(2)
Now this is where I am getting stuck, what i need to do is ask a question on what Number they choose, for this example lets say they choose 3, it needs to do print the answer for the unit. So if they choose 4 it prints 1.450. This document is 10k's long so manually entering the data into python is not viable.
In this case you'd just need to do this:
Unit[Number.index(value)]
Which will return the value from the Unit column that corresponds to the value specified for the Number column.
The index() function on a Python sequence returns the index of the first occurence of the provided value in the sequence. This value gets used to as the index to find the corresponding entry from Unit.
I am trying to create multiple tables in a new Microsoft Word document using Python. I can create the first table okay. But I think I have the COM Range object configured wrong. It is not pointing to the end. The first table is put before "Hello I am a text!", the second table is put inside the first table's first cell. I thought that returning a Range from wordapp will return the full range, then collapse it using wdCollapseStart Enum which I think is 1. (I can't find the constants in Python win32com.). So adding a table to the end of the Range will add it to the end of the document but that is not happening.
Any ideas?
Thanks Tim
import win32com.client
wordapp = win32com.client.Dispatch("Word.Application")
wordapp.Visible = 1
worddoc = wordapp.Documents.Add()
worddoc.PageSetup.Orientation = 1
worddoc.PageSetup.BookFoldPrinting = 1
worddoc.Content.Font.Size = 11
worddoc.Content.Paragraphs.TabStops.Add (100)
worddoc.Content.Text = "Hello, I am a text!"
location = worddoc.Range()
location.Collapse(1)
location.Paragraphs.Add()
location.Collapse(1)
table = location.Tables.Add (location, 3, 4)
table.ApplyStyleHeadingRows = 1
table.AutoFormat(16)
table.Cell(1,1).Range.InsertAfter("Teacher")
location1 = worddoc.Range()
location1.Paragraphs.Add()
location1.Collapse(1)
table = location1.Tables.Add (location1, 3, 4)
table.ApplyStyleHeadingRows = 1
table.AutoFormat(16)
table.Cell(1,1).Range.InsertAfter("Teacher1")
worddoc.Content.MoveEnd
worddoc.Close() # Close the Word Document (a save-Dialog pops up)
wordapp.Quit() # Close the Word Application
The problem seems to be in the Range object that represents a part of the document. In my original code the Range object contains the first cell and starts at the first cell, where it will insert. Instead I want to insert at the end of the range. So I got the following code replacement to work. I moved the Collapse after the Add() call and gave it an argument of 0. Now there is only one Collapse call per Range object.
location = worddoc.Range()
location.Paragraphs.Add()
location.Collapse(0)
Now the code works, I can read from a database and populate new tables from each entry.
Tim