Vlookup XLRD Python

Vlookup XLRD Python - python

I have a xls spreadsheet that looks like below
Number Code Unit
1 Widget 1 20.0000
2 Widget 2 4.6000
3 Widget 3 2.6000
4 Widget 4 1.4500
I have created the following code:
import xlrd
wb=xlrd.open_workbook('pytest.xls')
xlsname = 'pytest.xls'
book = xlrd.open_workbook(xlsname)
sd={}
for s in book.sheets():
sd[s.name] = s
sheet=sd["Prod"]
Number = sh.col_values(0)
Code = sh.col_values(1)
Unit = sh.col_values(2)
Now this is where I am getting stuck, what i need to do is ask a question on what Number they choose, for this example lets say they choose 3, it needs to do print the answer for the unit. So if they choose 4 it prints 1.450. This document is 10k's long so manually entering the data into python is not viable.

In this case you'd just need to do this:
Unit[Number.index(value)]
Which will return the value from the Unit column that corresponds to the value specified for the Number column.
The index() function on a Python sequence returns the index of the first occurence of the provided value in the sequence. This value gets used to as the index to find the corresponding entry from Unit.

Related

Masking the Zip Codes

I'm taking a course and I need to solve the following assignment:
"In this part, you should write a for loop, updating the df_users dataframe.
Go through each user, and update their zip code, to Safe Harbor specifications:
If the user is from a zip code for the which the “Geographic Subdivision” is less than equal to 20,000, change the zip code in df_users to ‘0’ (as a string)
Otherwise, zip should be only the first 3 numbers of the full zip code
Do all this by directly updating the zip column of the df_users DataFrame
Hints:
This will be several lines of code, looping through the DataFrame, getting each zip code, checking the geographic subdivision with the population in zip_dict, and setting the zip_code accordingly.
Be very aware of your variable types when working with zip codes here."
Here you can find all the data necessary to understand the context:
https://raw.githubusercontent.com/DataScienceInPractice/Data/master/
assignment: 'A4'
data_files: user_dat.csv, zip_pop.csv
After cleaning the data from user_dat.csv leaving only the columns: 'age', 'zip' and 'gender', and creating a dictionary from zip_pop.csv that contains the population of the first 3 digits from all the zipcodes; I wrote this code:
# Loop through the dataframe's to get each zipcode
for zipcode in df_users['zip']:
# check if the zipcode's 3 first numbers from the dataframe, correspond to a population of more or less than 20.000 people
if zip_dict[zipcode[:len(zipcode) - 2]] <= 20000:
# if less, change zipcode value to string zero.
df_users.loc[df_users['zip'] == zipcode, 'zip'] = '0'
else:
# If more, preserve only the first 3 digits of the zipcode.
df_users.loc[df_users['zip'] == zipcode, 'zip'] = zipcode[:len(zipcode) - 2]
This code works halfways and I don't understand why.
It changes the zipcode to 0 if the population is less than 20.000 people, and also changes the first zipcodes (up until the ones that start with '078') but then it returns this error message:
KeyError Traceback (most recent call last)
/var/folders/95/4vh4zhc1273fgmfs4wyntxn00000gn/T/ipykernel_44758/1429192050.py in < module >
1 for zipcode in df_users['zip']:
----> 2 if zip_dict[zipcode[:len(zipcode) - 2]] <= 20000:
3 df_users.loc[df_users['zip'] == zipcode, 'zip'] = '0'
4 else:
5 df_users.loc[df_users['zip'] == zipcode, 'zip'] = str(zipcode[:len(zipcode) - 2])
KeyError: '0'
I get that the problem is in the last line of code, because I've been doing every line at a time and each of them worked, until I put that last one. And if I just print the zipcodes instead of that last line, it also works!
Can anyone can help me understand why my code is wrong?

You're modifying a collection of values (i.e. df_users['zip']) whilst you're iterating over it. This is a common anti pattern. If a loop is absolutely required, then you could consider iterating over df_users['zip'].unique() instead. That creates a copy of all the unique zip codes, solving your current error, and it means that you aren't redoing work when you encounter a duplicate zipcode.
If a loop is not required, then there are better (more pandas style) ways to go about your problem. I would suggest something like (untested):
zip_start = df_users['zip'].str[:-2]
df_users['zip'] = zip_start.where(zip_start.map(zip_dict) > 20000, other="0")

Google Kickstart 2014 Round D Sort a scrambled itinerary - Do I need to bring the input in a ready-to-use array format?

Problem:
Once upon a day, Mary bought a one-way ticket from somewhere to somewhere with some flight transfers.
For example: SFO->DFW DFW->JFK JFK->MIA MIA->ORD.
Obviously, transfer flights at a city twice or more doesn't make any sense. So Mary will not do that.
Unfortunately, after she received the tickets, she messed up the tickets and she forgot the order of the ticket.
Help Mary rearrange the tickets to make the tickets in correct order.
Input:
The first line contains the number of test cases T, after which T cases follow.
For each case, it starts with an integer N. There are N flight tickets follow.
Each of the next 2 lines contains the source and destination of a flight ticket.
Output:
For each test case, output one line containing "Case #x: itinerary", where x is the test case number (starting from 1) and the itinerary is a sorted list of flight tickets that represent the actual itinerary.
Each flight segment in the itinerary should be outputted as pair of source-destination airport codes.
Sample Input: Sample Output:
2 Case #1: SFO-DFW
1 Case #2: SFO-DFW DFW-JFK JFK-MIA MIA-ORD
SFO
DFW
4
MIA
ORD
DFW
JFK
SFO
DFW
JFK
MIA
My question:
I am a beginner in the field of competitive programming. My question is how to interpret the given input in this case. How did Googlers program this input? When I write a function with a Python array as its argument, will this argument be in a ready-to-use array format or will I need to deal with the above mentioned T and N numbers in the input and then arrange airport strings in an array format to make it ready to be passed in the function's argument?
I have looked up at the following Google Kickstart's official Python solution to this problem and was confused how they simply pass the ticket_list argument in the function. Don't they need to clear the input from the numbers T and N and then arrange the airport strings into an array, as I have explained above?
Also, I could not understand how could the methods first and second simply appear if no Class has been initialized? But I think this should be another question...
def print_itinerary(ticket_list):
arrival_map = {}
destination_map = {}
for ticket in ticket_list:
arrival_map[ticket.second] += 1
destination_map[ticket.first] += 1
current = FindStart(arrival_map)
while current in destination_map:
next = destination_map[current]
print current + "-" + next
current = next

You need to implement it yourself to read data from standard input and write results to standard output.
Sample code for reading from standard input and writing to standard output can be found in the coding section of the FAQ on the KickStart Web site.
If you write the solution to this problem in python, you can get T and N as follows.
T = int(input())
for t in range(1, T + 1):
N = int(input())
...
Then if you want to get the source and destination of the flight ticket as a list, you can use the same input method to get them in the list.
ticket_list = [[input(), input()] for _ in range(N)]
# [['MIA', 'ORD'], ['DFW', 'JFK'], ['SFO', 'DFW'], ['JFK', 'MIA']]
If you want to use first and second, try a namedtuple.
Pair = namedtuple('Pair', ['first', 'second'])
ticket_list = [Pair(input(), input()) for _ in range(N)]

for loop taking too long to produce output

I have three excel files, Book1, Book2, Book3, with me. Each one of them consists of 11000 rows and 10000 columns. And each cell contains a numeric value of an observation. Now I have a 3 tuple, (100, 150, 150) and I want to compare the numeric values of each cell of Book1 with 1st tuple (100) and of Book2 with 2nd tuple (150) and similarly Book3 with 3rd tuple (150). Now whenever the corresponding cells of these excel files match with this tuple, I want to print 1 otherwise 0. That is, say my (10,200) cell in Book1 contains 100, in Book2 the cell (10,200) contains 150 and in (10,200) cell of Book3 we have 150, then I want to print 1 else 0.
So this is the program I wrote for this:
import xlrd
file_loc1 = "D:\Python\Book1.xlsx"
file_loc2 = "D:\Python\Book2.xlsx"
file_loc3 = "D:\Python\Book3.xlsx"
workbook1 = xlrd.open_workbook(file_loc1)
workbook2 = xlrd.open_workbook(file_loc2)
workbook3 = xlrd.open_workbook(file_loc3)
sheet1 = workbook1.sheet_by_index(0)
sheet2 = workbook2.sheet_by_index(0)
sheet3 = workbook3.sheet_by_index(0)
for i in range(1,11000):
for j in range(0,10000):
if sheet1.cell_value(i,j) == 100 and sheet2.cell_value(i,j) == 150 and sheet3.cell_value(i,j) == 150:
print 1
else:
print 0
Firstly, as I am new to Python, so I want to make sure if this program is correct or there is some issue with this? The range of loop is the one I required.
Secondly, I ran this program on my system and it has been around 10 hours and the program is still running. I am using 64-bit Python 2.7.13 on my 64-bit Windows 8.1 system. For executing, I am using Windows Powershell. I gave the following command for execution python script1.py > output1.txt as I also want an output in text. I got a text file generated in my Python directory named output1 but its size has been 0 bytes since the beginning of program. So, I am not even sure if I am getting any proper file or not. What should I do here? Is there any more efficient way to get such an output? Also, how long am I suppose to wait for this program/loop to finish up?

pygtk cellrenderertoggle is not checked

I have the following code:
self.db='checks.db'
self.con = lite.connect(self.db)
self.cur = self.con.cursor()
self.q_oblig_initial='SELECT data_plirotees.rowid as rowid,recdate,bank,amount,dueto,gto,plirstatus FROM data_plirotees WHERE plirstatus=0 ORDER BY dueto ASC'
self.store_oblig = gtk.ListStore(int,str,str,str,str,str,bool)
self.cur.execute(q_oblig)
self.data_oblig=self.cur.fetchall()
for value in self.data_oblig:
if value[6]==0:
plir=False
elif value[6]==1:
plir=True
self.store_oblig.append([value[0],datetime.datetime.fromtimestamp(int(value[1])).strftime('%d/%m/%Y'),value[2],"%.2f" %(value[3]),datetime.datetime.fromtimestamp(int(value[4])).strftime('%d/%m/%Y'),value[5],plir])`
which gets data from a sqlite database and puts it in a liststore and,
rendererToggle.connect("toggled", self.on_cell_toggled)
column_toggle = gtk.TreeViewColumn("Καλύφθηκε", rendererToggle, active=1)
column_toggle.set_fixed_width(10)
treeView_oblig.append_column(column_toggle)
which has to show it in a column where true should show a checked toggle/checkbox and false should show un-checked.
Unfortunately this doesn't happen.
The checkbox needs not to be active (i don't want it to be able to toggle) but by clicking on the treeview row it opens a new window (where a checkbutton is checked or not accordingly). From that I understand that the true/false value is contained there somewhere but it is not presented visually.
Can someone show me where I'm wrong?
I didn't post the whole program 'cause it would be too big and perhaps misguiding...

self.store_oblig = gtk.ListStore(int,str,str,str,str,str,bool)
This line creates a GtkListStore where each column is of a different type. The columns are numbered from left to right, starting at 0:
self.store_oblig = gtk.ListStore(int,str,str,str,str,str,bool)
column number 0 1 2 3 4 5 6
You create your GtkTreeViewColumn with this:
column_toggle = gtk.TreeViewColumn("Καλύφθηκε", rendererToggle, active=1)
This says that the column's cell renderer should get the value of its active property from column 1 of the model (in this case, the list store). And the active property expects a bool.
But if you look back above, column 1 isn't a bool, but rather a string! So what you really wanted was active=6, not active=1. (Your code to add to the list store, on the other hand, seems correct.)
This is what the warning Warning: unable to set property 'active' of type 'gboolean' from value of type 'gchararray' gtk.main() is trying to tell you; gchararray is (one of) GLib's internal name(s) for a string.

Using python win32com can't make two separate tables in MS Word 2007

I am trying to create multiple tables in a new Microsoft Word document using Python. I can create the first table okay. But I think I have the COM Range object configured wrong. It is not pointing to the end. The first table is put before "Hello I am a text!", the second table is put inside the first table's first cell. I thought that returning a Range from wordapp will return the full range, then collapse it using wdCollapseStart Enum which I think is 1. (I can't find the constants in Python win32com.). So adding a table to the end of the Range will add it to the end of the document but that is not happening.
Any ideas?
Thanks Tim
import win32com.client
wordapp = win32com.client.Dispatch("Word.Application")
wordapp.Visible = 1
worddoc = wordapp.Documents.Add()
worddoc.PageSetup.Orientation = 1
worddoc.PageSetup.BookFoldPrinting = 1
worddoc.Content.Font.Size = 11
worddoc.Content.Paragraphs.TabStops.Add (100)
worddoc.Content.Text = "Hello, I am a text!"
location = worddoc.Range()
location.Collapse(1)
location.Paragraphs.Add()
location.Collapse(1)
table = location.Tables.Add (location, 3, 4)
table.ApplyStyleHeadingRows = 1
table.AutoFormat(16)
table.Cell(1,1).Range.InsertAfter("Teacher")
location1 = worddoc.Range()
location1.Paragraphs.Add()
location1.Collapse(1)
table = location1.Tables.Add (location1, 3, 4)
table.ApplyStyleHeadingRows = 1
table.AutoFormat(16)
table.Cell(1,1).Range.InsertAfter("Teacher1")
worddoc.Content.MoveEnd
worddoc.Close() # Close the Word Document (a save-Dialog pops up)
wordapp.Quit() # Close the Word Application

The problem seems to be in the Range object that represents a part of the document. In my original code the Range object contains the first cell and starts at the first cell, where it will insert. Instead I want to insert at the end of the range. So I got the following code replacement to work. I moved the Collapse after the Add() call and gave it an argument of 0. Now there is only one Collapse call per Range object.
location = worddoc.Range()
location.Paragraphs.Add()
location.Collapse(0)
Now the code works, I can read from a database and populate new tables from each entry.
Tim

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.