Randomize search in excel - python

I started learning python and was trying to work on a small project for myself. Its just to open an excel spreadsheet then look in one column and then randomly choose one of the cells to print. I did some research and found multiple ways to do it but kind of liked this one due to it being short and sweet. The problem I am having is just when it prints i want it to randomize the selection in one column. So wanted to know if there is a way for me to do it. Thanks all help will be appreciated!!!!
import xlrd
wb = xlrd.open_workbook("quotes.xlsx")
sh1 = wb.sheet_by_index(0)
print sh1.cell(0,0).value

Use the following:
from random import choice
import xlrd
wb = xlrd.open_workbook("quotes.xlsx")
sh1 = wb.sheet_by_index(0)
column = 2 # or whatever column you want to select from
print choice(sh1.col(column)).value
The Sheet.col() method returns a list, and random.choice returns a random element from a list.
If you want to restrict the rows from which you randomly select an element you can generate a random row number and use that to index the column instead. You can do that like this:
import random
startRow = 3
endRow = 29
row = random.randint(startRow, endRow)
print sh1.cell(column, row).value
See also: How to randomly select an item from a list?

Related

Is there a way to copy merged cells using Python(openpyxl)?

I'm trying to make code using Python to repeatedly copy a certain range of Excel files.
Like the image file below, the same content is copied by placing a few columns on a sheet with existing content in the same sheet.
enter image description here
Despite being an introductory level of ability, at first I believed that it would be really easy to implement. And I have implemented the most of my plans through the code below except merged cells
import openpyxl
from openpyxl import Workbook
wb = openpyxl.load_workbook('DSD.xlsx')
ws = wb.worksheets[3]
from openpyxl.utils import range_boundaries
min_col, min_row, max_col, max_row = range_boundaries('I1')
for row, row_cells in enumerate(wb['2'], min_row):
for column, cell in enumerate(row_cells, min_col):
# Copy Value from Copy.Cell to given Worksheet.Cell
ws.cell(row=row, column=column).value = cell.value
ws.cell(row=row, column=column)._style = cell._style
wb.save('DSD.xlsx')
enter image description here
However, I've searching for a few days and thinking about it, but I don't know how to merge the merged cells. Beyond that, I don't even know if this is technically possible.
The way I thought of it today was to pull out a list of merged cells in the sheet and then add only more columns from the coordinates of each merged cell to add commands to merge to the same extent.
However, since I pulled out the list of merged cells as shown below, I can't think of what to do.
Is there a way?
import openpyxl
from openpyxl import Workbook
wb=openpyxl.load_workbook('merged.xlsx')
ws=wb.active
Merged_Cells = ws.merged_cell_ranges
a = Merged_Cells[0]
print(ws.merged_cell_ranges)
print(a)
[<MergedCellRange B2:C3>, <MergedCellRange B9:C9>, <MergedCellRange B13:B14>]
B2:C3
The value B2:C3, corresponding to a, appears to be a special value in which a.replace(~~) does not work even if the value is seem to be "B2:C3".
Is there really no way?
You'll need to look at how CellRanges work because you will need to calculate the offset for each range and then create a new MergedCellRange for the new cell using the dimensions from the original.
The following should give you an idea of how to do ths.
from openpyxl.worksheet.cell_range import CellRange
area = CellRange("A1:F13") # area being copied
for mcr in ws.merged_cells:
if mcr.coord not in area:
continue
cr = CellRange(mcr.coord)
cr.shift(col_shift=10)
ws.merge_cells(cr.coord)

Iterating through a list to then return a value if meeting conditions

I just started working with Pandas for a personal side project of mine. I've imported data from a CSV, cleaned it up and now want to use data from the CSV in the rest of my code. I want to ask a user for input and if the user input matches an entry inside the list (Which is a column inside the data) I want to get data for that instance (from the same row but other columns in the df)
I can get it to compare using the "in" statement but don't think that will work when I expand the functionality.
How would I go about looping through the list from the df to return a value from that list if it exists and then be able to return other values in the same row of the df?
import pandas as pd
import re
import math
housing_Data = pd.read_csv("/Users/saads/Downloads/DP_LIVE_26072020053911478.csv") #File to get data
housing_Data = housing_Data.drop(['INDICATOR', 'MEASURE', 'FREQUENCY', 'Flag Codes'], axis=1) #Removing unwanted Columns
print(housing_Data.columns) # Just to test if working
user_Country = str(input("What Country are you in")) # User Input
def getCountry(): # Function to compare user input to elements inside the list
if user_Country in housing_Data.LOCATION.unique():
print(user_Country)
else:
print("Sorry we don't have information about that country")
getCountry()
maybe what you want is all rows matching a location?
matching_rows = housing_Data[housing_Data.LOCATION == user_country]
print(matching_rows)
maybe? Im not entirely sure.
In case you are urgent , you can append the iteration list from the source to a newfile.txt and do a comfortable iteration. hopefully it helps.

How to get first 300 rows of Google sheet via gspread

Set-up
I create a Pandas dataframe from all records in a google sheet like this,
df = pd.DataFrame(wsheet.get_all_records())
as explained in the Gspread docs.
Issue
It seems Python stays in limbo when I execute the command since today. I don't get any error; I interrupt Python with KeyboardInterrupt after a while.
I suspect Google finds the records too much; ±3500 rows with 18 columns.
Question
Now, I actually don't really need the entire sheet. The first 300 rows would do just fine.
The docs show values_list = worksheet.row_values(1), which would return the first row values in a list.
I guess I could create a loop, but I was wondering if there's a build-in / better solution?
I believe your goal as follows.
You want to retrieve the values from 1st row to 300 row from a sheet in Google Spreadsheet.
From I suspect Google finds the records too much; ±3500 rows with 18 columns., you want to retrieve the values from the columns "A" to "R"?
You want to convert the retrieved values to the dataFrame.
You want to achieve this using gspread.
In order to achieve this, I would like to propose the following sample script.
In this answer, I used the method of values_get.
Sample script:
spreadsheetId = "###" # Please set the Spreadsheet ID.
rangeA1notation = "Sheet1!A1:R300" # Please set the range using A1Notation.
client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(spreadsheetId)
values = spreadsheet.values_get(rangeA1notation)
v = values['values']
df = pd.DataFrame(v)
print(df)
Note:
Please set the range as the A1Notation. In this case, when "A1:R300" instead of "Sheet1!A1:R300" is used, the values are retrieved from the 1st tab in the Spreadsheet.
When "A1:300" is used, the values are retrieved from the column "A" to the last column of the sheet.
When the 1st row is the header row and the data is after the 2nd row, please modify as follows.
From
df = pd.DataFrame(v)
To
df = pd.DataFrame(v[1:], columns=v[0])
Reference:
values_get
I used openpyxl package.
import openpyxl as xl
wb = xl.load_workbook('your_file_name')>
sheet = wb['name_of_your_sheet']
Specify your range.
for row in range(1, 300):
Now you can perform many opertions e.g this will point at row(1) & col(3) in first iteration
cell = sheet.cell(row, 3)
if you want to change the cell value
cell.value = 'something'
It's has pretty much all of it.
Here is a link to the docs: https://openpyxl.readthedocs.io/en/stable/

I/O efficiency in Python

I am writing a program that:
Read the content from an excel sheets for each row (90,000 rows in total)
Compare the content with another excel sheet for each row (600,000 rows in total)
If a match occurs, write the matching entry into a new excel sheet
I have written the script and everything works fine. however, the computational time is HUGE. For an hour, it has done just 200 rows from the first sheet, resulting in writing 200 different files.
I was wondering if there is a way to save the matching in a different way as I am going to use them later on? Is there any way to save in a matrix or something?
import xlrd
import xlsxwriter
import os, itertools
from datetime import datetime
# choose the incident excel sheet
book_1 = xlrd.open_workbook('D:/Users/d774911/Desktop/Telstra Internship/Working files/Incidents.xlsx')
# choose the trap excel sheet
book_2 = xlrd.open_workbook("D:/Users/d774911/Desktop/Telstra Internship/Working files/Traps.xlsx")
# choose the features sheet
book_3 = xlrd.open_workbook("D:/Users/d774911/Desktop/Telstra Internship/Working files/Features.xlsx")
# select the working sheet, either by name or by index
Traps = book_2.sheet_by_name('Sheet1')
# select the working sheet, either by name or by index
Incidents = book_1.sheet_by_name('Sheet1')
# select the working sheet, either by name or by index
Features_Numbers = book_3.sheet_by_name('Sheet1')
#return the total number of rows for the traps sheet
Total_Number_of_Rows_Traps = Traps.nrows
# return the total number of rows for the incident sheet
Total_Number_of_Rows_Incidents = Incidents.nrows
# open a file two write down the non matching incident's numbers
print(Total_Number_of_Rows_Traps, Total_Number_of_Rows_Incidents)
write_no_matching = open('C:/Users/d774911/PycharmProjects/GlobalData/No_Matching.txt', 'w')
# For loop to iterate for all the row for the incident sheet
for Rows_Incidents in range(Total_Number_of_Rows_Incidents):
# Store content for the comparable cell for incident sheet
Incidents_Content_Affected_resources = Incidents.cell_value(Rows_Incidents, 47)
# Store content for the comparable cell for incident sheet
Incidents_Content_Product_Type = Incidents.cell_value(Rows_Incidents, 29)
# Convert Excel date type into python type
Incidents_Content_Date = xlrd.xldate_as_tuple(Incidents.cell_value(Rows_Incidents, 2), book_1.datemode)
# extract the year, month and day
Incidents_Content_Date = str(Incidents_Content_Date[0]) + ' ' + str(Incidents_Content_Date[1]) + ' ' + str(Incidents_Content_Date[2])
# Store content for the comparable cell for incident sheet
Incidents_Content_Date = datetime.strptime(Incidents_Content_Date, '%Y %m %d')
# extract the incident number
Incident_Name = Incidents.cell_value(Rows_Incidents, 0)
# Create a workbook for the selected incident
Incident_Name_Book = xlsxwriter.Workbook(os.path.join('C:/Users/d774911/PycharmProjects/GlobalData/Test/', Incident_Name + '.xlsx'))
# Create sheet name for the created workbook
Incident_Name_Sheet = Incident_Name_Book.add_worksheet('Sheet1')
# insert the first row that contains the features
Incident_Name_Sheet.write_row(0, 0, Features_Numbers.row_values(0))
Insert_Row_to_Incident_Sheet = 0
# For loop to iterate for all the row for the traps sheet
for Rows_Traps in range(Total_Number_of_Rows_Traps):
# Store content for the comparable cell for traps sheet
Traps_Content_Node_Name = Traps.cell_value(Rows_Traps, 3)
# Store content for the comparable cell for traps sheet
Traps_Content_Event_Type = Traps.cell_value(Rows_Traps, 6)
# extract date temporally
Traps_Content_Date_temp = Traps.cell_value(Rows_Traps, 10)
# Store content for the comparable cell for traps sheet
Traps_Content_Date = datetime.strptime(Traps_Content_Date_temp[0:10], '%Y-%m-%d')
# If the content matches partially or full
if len(str(Traps_Content_Node_Name)) * len(str(Incidents_Content_Affected_resources)) != 0 and \
str(Incidents_Content_Affected_resources).lower().find(str(Traps_Content_Node_Name).lower()) != -1 and \
len(str(Traps_Content_Event_Type)) * len(str(Incidents_Content_Product_Type)) != 0 and \
str(Incidents_Content_Product_Type).lower().find(str(Traps_Content_Event_Type).lower()) != -1 and \
len(str(Traps_Content_Date)) * len(str(Incidents_Content_Date)) != 0 and \
Traps_Content_Date <= Incidents_Content_Date:
# counter for writing inside the new incident sheet
Insert_Row_to_Incident_Sheet = Insert_Row_to_Incident_Sheet + 1
# Write the Incident information
Incident_Name_Sheet.write_row(Insert_Row_to_Incident_Sheet, 0, Incidents.row_values(Rows_Incidents))
# Write the Traps information
Incident_Name_Sheet.write_row(Insert_Row_to_Incident_Sheet, 107, Traps.row_values(Rows_Traps))
Incident_Name_Book.close()
Thanks
What your doing is seeking/reading a litte bit of data for each cell. This is very inefficient.
Try reading all information in one go into an as basic as sensible python data structure (lists, dicts etc.) and make your comparisons/operations on this data set in memory and write all results in one go. If not all data fits into memory, try to partition it into sub-tasks.
Having to read the data set 10 times, to extract a tenth of data each time will likely still be hugely faster than reading each cell independently.
I don't see how your code can work; the second loop works on variables which change for every row in the first loop but the second loop isn't inside of the first one.
That said, comparing files in this way has a complexity of O(N*M) which means that the runtime explodes quickly. In your case you try to execute 54'000'000'000 (54 billion) loops.
If you run into these kind of problems, the solution is always a three step process:
Transform the data to make it easier to process
Put the data into efficient structures (sorted lists, dict)
Search the data with the efficient structures
You have to find a way to get rid of the find(). Try to get rid of all the junk in the cells that you want to compare so that you could use =. When you have this, you can put rows into a dict to find matches. Or you could load it into a SQL database and use SQL queries (don't forget to add indexes!)
One last trick is to use sorted lists. If you can sort the data in the same way, then you can use two lists:
Sort the data from the two sheets into two lists
Use two row counters (one per list)
If the current item from the first list is less than the current one from the second list, then there is no match and you have to advance the first row counter.
If the current item from the first list is bigger than the current one from the second list, then there is no match and you have to advance the second row counter.
If the items are the same, you have a match. Process the match and advance both counters.
This allows you to process all the items in one go.
I would suggest that you use pandas. This module provides a huge amount of functions to compare datasets. It also has a very fast import/export algorithms for excel files.
IMHO you should use the merge function and provide the arguments how = 'inner' and on = [list of your columns to compare]. That will create a new dataset with only such rows that occur in both tables (having the same values in the defined colums). This new dataset you can export to your excel file.

Going down columns using xlrd

Let's say I have a cell (9,3). I want to get the values from (9,3) to (9,99). How do I go down the columns to get the values. I am trying to write the values into another excel file that starts from (13, 3) and ends at (13,99). How do I write a loop for that in xlrd?
def write_into_cols_rows(r, c):
for num in range (0,96):
c += 1
return (r,c)
worksheet.row(int) will return you the row, and to get the value of certain columns, you need to run row[int].value to get the value.
For more information, you can read this pdf file (Page 9 Introspecting a sheet).
import xlrd
workbook = xlrd.open_workbook(filename)
# This will get you the very first sheet in the workbook.
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])
for index in range(worksheet.nrows):
try:
row = worksheet.row(index)
row_value = [col.value for col in row]
# now row_value is a list contains all the column values
print row_value[3:99]
except:
pass
To write data to Excel file, you might want to check out xlwt package.
BTW, seems like you are doing something like reading from excel.. do some work... write to excel...
I would also recommend you take a look at numpy, scipy or R. When I usually do data munging, I use R and it saves me so much time.

Categories

Resources