Search and Find through a list Python - python

I have a main text file that looks like this:
STATUS| CRN| SUBJECT| SECT| COURSE| CREDIT| INSTR.| BLDG/RM| DAY/TIME| FROM / TO|
OPEN| 43565| ACA6202| 10| Acting II| 3.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43566| ACA6206| 10| Topics:Classical Drama/Cult II| 2.00| Jacobson, L| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43567| ACA6210| 10| Text II| 2.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43568| ACA6212| 10| Voice and Speech II| 3.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43569| ACA6216| 10| Movement II| 2.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43570| ACA6220| 10| Alexander Technique II| 2.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43571| ACA6224| 10| Stage Combat II| 2.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 43572| ACA6228| 10| Practicum IV| 3.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
OPEN| 44500| ACA6595| 10| Selected Topics| 1.00| Logan, G| SEE DEPT| | 01/12/15 - 04/27/15|
My code below gathers only the "SUBJECT" column and strips the numbers from the string. So for example, the output from the top of the file would print several "ACA"s.
with open ("/Users/it/Desktop/Classbook/classAbrevs.txt", "r") as myfile:
subsAndAbrevsMap = tuple(open("/Users/it/Desktop/Classbook/classAbrevs.txt", 'r'))
with open ("/Users/it/Desktop/Classbook/masterClassList.txt", "r") as myfile:
masterSchedule = tuple(open("/Users/it/Desktop/Classbook/masterClassList.txt", 'r'))
for masterline in masterSchedule:
masterline.strip()
masterSplitLine = masterline.split("|")
if masterSplitLine[0] != "STATUS":
subjectAbrev = ''.join([i for i in masterSplitLine[2] if not i.isdigit()])
I have another .txt file that looks like this:
Academy for Classical Acting,ACA
Accountancy,ACCY
Africana Studies,AFST
American Studies,AMST
Anatomy & Regenerative Biology,ANAT
Anthropology,ANTH
Applied Science,APSC
Arabic,ARAB
Art/Art History,AH
Art/Fine Arts,FA
Astronomy,ASTR
Biochemistry,BIOC
Biological Sciences,BISC
In my code below, I check to see if the abbreviations(column 2) in my second .txt equal the abbreviations generated from my first .txt document. If it is a match I would like to append the full class name:
#open 2nd .txt, strip and split
for subsline in subsAndAbrevsMap:
subsline.strip()
subLineSplit = subsline.split(",")
print "subLineSplit is: " + subsline[0]
if subLineSplit[1] == subjectAbrev:
realSubjectName = subLineSplit[0]
print "The subject name for abrev " + subjectAbrev + " is " + realSubjectName
I want the output to print:
"The subject name for abrev ACA is Academy for Classical Acting"
What am I doing wrong?

First of all, these are csv files, so use your csv module!
# path to first file is ~/classes.csv
# path to second file is ~/abbr.csv
import csv
with open("~/classes.csv", 'rU') as classes_csv,\
open("~/abbr.csv", 'rU') as abbr_csv:
classes = csv.reader(classes_csv, delimiter='|')
abbr = csv.reader(abbr_csv, delimiter=',')
header = next(classes)
abbr_dict = {line[1].strip():line[0].strip() for line in abbr}
# create a lookup dictionary for your tags -> names
class_tags = (line[2].strip("0123456789 ") for line in classes)
# create a genexp for all the extant tags in ~/classes.csv
result = {tag:abbr_dict[tag] for tag in class_tags if tag in abbr_dict}
Then it should be easy to format your result.
for abbr,cls in result.items():
print("The abbreviation for {} is {}".format(cls,abbr))

Related

pywinauto - TimeComX Basic print_control_identifiers() doesn't show all the options

I want to have automate process for this program: TimeComX Basic.
The script i wrote:
from pywinauto.application import Application as PyWinAutoApplication
from pywinauto.timings import wait_until
from pywinauto.keyboard import send_keys
import pywinauto
import os
import sys
from pywinauto import mouse
import traceback
#Hidernate pc
app2 = PyWinAutoApplication(backend="uia").connect(found_index=0,title="TimeComX Basic")
handle = pywinauto.findwindows.find_windows(title="TimeComX Basic")[0]
window = app2.window(handle=handle)
window.maximize()
window.set_focus()
app2.TimeComxBasic.print_control_identifiers()
#mouse.click(button='left', coords=(150, 960))
Note that to run this script you have to manually install and open TimeComX Basic.
The output:
Control Identifiers:
Dialog - 'TimeComX Basic' (L-11, T-11, R1931, B1019)
['TimeComX BasicDialog', 'Dialog', 'TimeComX Basic']
child_window(title="TimeComX Basic", control_type="Window")
|
| TitleBar - '' (L24, T-8, R1920, B34)
| ['TitleBar']
| |
| | Menu - 'System' (L0, T0, R22, B22)
| | ['Menu', 'System', 'SystemMenu', 'System0', 'System1']
| | child_window(title="System", auto_id="MenuBar", control_type="MenuBar")
| | |
| | | MenuItem - 'System' (L0, T0, R22, B22)
| | | ['MenuItem', 'System2', 'SystemMenuItem']
| | | child_window(title="System", control_type="MenuItem")
| |
| | Button - 'Minimize' (L1707, T0, R1778, B33)
| | ['MinimizeButton', 'Button', 'Minimize', 'Button0', 'Button1']
| | child_window(title="Minimize", control_type="Button")
| |
| | Button - 'Restore' (L1778, T0, R1848, B33)
| | ['Restore', 'Button2', 'RestoreButton']
| | child_window(title="Restore", control_type="Button")
| |
| | Button - 'Close' (L1848, T0, R1920, B33)
| | ['Close', 'Button3', 'CloseButton']
| | child_window(title="Close", control_type="Button")
As you can see it has options only for close, minimize and maximize buttons and for main menu. There is no option to "Start" button for example.
What can I do in this situation?

Pyspark Java.lang.OutOfMemoryError: Java heap space

I am solving a problem using spark running in my local machine.
I am reading a parquet file from the local disk and storing it to the dataframe.
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder\
.config("spark.driver.memory","4g")\
.config("spark.executor.memory","4g")\
.config("spark.driver.maxResultSize","2g")\
.getOrCreate()
content = spark.read.parquet('./files/file')
So, Content Dataframe contents around 500k rows i.e.
+-----------+----------+
|EMPLOYEE_ID|MANAGER_ID|
+-----------+----------+
| 100| 0|
| 101| 100|
| 102| 100|
| 103| 100|
| 104| 100|
| 105| 100|
| 106| 101|
| 101| 101|
| 101| 101|
| 101| 101|
| 101| 102|
| 101| 102|
. .
. .
. .
I write this code to provide each EMPLOYEE_ID an EMPLOYEE_LEVEL according to their hierarchy.
# Assign EMPLOYEE_LEVEL 1 WHEN MANAGER_ID is 0 ELSE NULL
content_df = content.withColumn("EMPLOYEE_LEVEL", when(col("MANAGER_ID") == 0, 1).otherwise(lit('')))
level_df = content_df.select("*").filter("Level = 1")
level = 1
while True:
ldf = level_df
temp_df = content_df.join(
ldf,
((ldf["EMPLOYEE_LEVEL"] == level) &
(ldf["EMPLOYEE_ID"] == content_df["MANAGER_ID"])),
"left") \
.withColumn("EMPLOYEE_LEVEL",ldf["EMPLOYEE_LEVEL"]+1)\
.select("EMPLOYEE_ID","MANAGER_ID","EMPLOYEE_LEVEL")\
.filter("EMPLOYEE_LEVEL IS NOT NULL")\
.distinct()
if temp_df.count() == 0:
break
level_df = level_df.union(temp_df)
level += 1
It's running, but very slow execution and after some period of time it gives this error.
Py4JJavaError: An error occurred while calling o383.count.
: java.lang.OutOfMemoryError: Java heap space
at scala.collection.immutable.List.$colon$colon(List.scala:117)
at scala.collection.immutable.List.$plus$colon(List.scala:220)
at org.apache.spark.sql.catalyst.expressions.String2TrimExpression.children(stringExpressions.scala:816)
at org.apache.spark.sql.catalyst.expressions.String2TrimExpression.children$(stringExpressions.scala:816)
at org.apache.spark.sql.catalyst.expressions.StringTrim.children(stringExpressions.scala:948)
at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:351)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:595)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1822/0x0000000100d21040.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda$61/0x00000001001d2040.apply(Unknown Source)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:595)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1822/0x0000000100d21040.apply(Unknown Source)
at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1148)
at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1147)
at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1822/0x0000000100d21040.apply(Unknown Source)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1122)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1121)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
I tried many solutions including increasing driver and executor memory, using cache() and persist() for dataframe also doesn't worked for me.
I am using Spark 3.2.1
Spark
Any help will be appreciated.
Thank you.
I figure out the problem. This error related to the mechanism of spark DAG, it use DAG lineage to track a series transformations, when the algorithms need to iterate, the lineage can grow fast and hit the limitation of memory. So break the lineage is necessary when implementing iteration algorithms.
There are mainly 2 ways: 1. add checkpoint. 2.recreate dataframe.
I modify my codes below, which just add a checkpoint to break the lineage and works for me.
epoch_cnt = 0
while True:
print('hahaha1')
print('cached df', len(spark.sparkContext._jsc.getPersistentRDDs().items()))
singer_pairs_undirected_ungrouped = singer_pairs_undirected.join(old_song_group_kernel,
on=singer_pairs_undirected['src'] == old_song_group_kernel['id'],
how='left').filter(F.col('id').isNull()) \
.select('src', 'dst')
windowSpec = Window.partitionBy("src").orderBy(F.col("song_group_id_cnt").desc())
singer_pairs_vote = singer_pairs_undirected_ungrouped.join(old_song_group_kernel,
on=singer_pairs_undirected_ungrouped['dst'] ==
old_song_group_kernel['id'], how='inner') \
.groupBy('src', 'song_group_id') \
.agg(F.count('song_group_id').alias('song_group_id_cnt')) \
.withColumn('song_group_id_cnt_rnk', F.row_number().over(windowSpec)) \
.filter(F.col('song_group_id_cnt_rnk') == 1)
singer_pairs_vote_output = singer_pairs_vote.select('src', 'song_group_id') \
.withColumnRenamed('src', 'id')
print('hahaha5')
new_song_group_kernel = old_song_group_kernel.union(singer_pairs_vote_output) \
.select('id', 'song_group_id').dropDuplicates().persist().checkpoint()
print('hahaha9')
current_kernel_cnt = new_song_group_kernel.count()
print('hahaha2')
old_song_group_kernel.unpersist()
print('hahaha3')
old_song_group_kernel = new_song_group_kernel
epoch_cnt += 1
print('epoch rounds: ', epoch_cnt)
print('previous kernel count: ', previous_kernel_cnt)
print('current kernel count: ', current_kernel_cnt)
if current_kernel_cnt <= previous_kernel_cnt:
print('Iteration done !')
break
print('hahaha4')
previous_kernel_cnt = current_kernel_cnt

SQLAlchemy aliased column "type object 'MiscUnit' has no attribute 'codeLabel'"

I am trying to get a row from my database, which contains multiple columns which are each paired with a unit id column, like so:
id|run_id|diesel_engine_installed_power|diesel_engine_installed_power_unit_id|pv_installed_power|pv_installed_power_unit_id|battery_capacity|battery_capacity_unit_id|
--|------|-----------------------------|-------------------------------------|------------------|--------------------------|----------------|------------------------|
1| | 300| 1| 200| 1| 1000| 4|
2| 484| 300| 1| 200| 1| 1000| 4|
To do so, I am trying to alias the various unit columns while querying them in SQLAlchemy:
diesel_engine_installed_power_MiscUnit = aliased(MiscUnit)
pv_installed_power_MiscUnit = aliased(MiscUnit)
battery_capacity_MiscUnit = aliased(MiscUnit)
mg_res = session.query(ProcRun, ProcMemoGridInput, diesel_engine_installed_power_MiscUnit, pv_installed_power_MiscUnit, battery_capacity_MiscUnit). \
with_entities(
ProcRun,
ProcMemoGridInput,
diesel_engine_installed_power_MiscUnit.codeLabel.label("diesel_engine_installed_power_MiscUnit"),
pv_installed_power_MiscUnit.codeLabel.label("pv_installed_power_MiscUnit"),
battery_capacity_MiscUnit.codeLabel.label("battery_capacity_MiscUnit")
). \
filter(ProcRun.id == ProcMemoGridInput.run_id). \
filter(ProcRun.id == 484). \
filter(ProcMemoGridInput.diesel_engine_installed_power_unit_id == diesel_engine_installed_power_MiscUnit.id). \
filter(ProcMemoGridInput.pv_installed_power_unit_id == pv_installed_power_MiscUnit.id). \
filter(ProcMemoGridInput.battery_capacity_unit_id == battery_capacity_MiscUnit.id). \
one()
It is based on this solution:
Usage of "aliased" in SQLAlchemy ORM
But it tells me that AttributeError: type object 'MiscUnit' has no attribute 'codeLabel'. I don't really understand what the difference is, from what I understand this is the same process for aliasing the MiscUnit ORM object.

How to click using pywinauto

I would like to use pywinauto to control an image processing software.
First, I need to click a specific area (which is used for image dragging) to pop up a windows for path input. See the first figure.
Then, I need to input a path and click the button "Select Folder". See the second figure.
I tried:
from pywinauto import Desktop, Application, mouse, findwindows
from pywinauto.keyboard import SendKeys
app = Application(backend='uia').start(r"C:\Program Files\Duplicate Photo Cleaner\DuplicatePhotoCleaner.exe")
app.connect(path="DuplicatePhotoCleaner.exe")
app.DuplicatePhotoCleaner.print_control_identifiers()
Control Identifiers:
Dialog - 'Duplicate Photo Cleaner' (L440, T126, R1480, B915)
['Duplicate Photo Cleaner', 'Duplicate Photo CleanerDialog', 'Dialog']
child_window(title="Duplicate Photo Cleaner", control_type="Window")
|
| TitleBar - '' (L464, T129, R1472, B157)
| ['', 'TitleBar']
| |
| | Menu - 'System' (L448, T134, R470, B156)
| | ['System', 'Menu', 'SystemMenu', 'System0', 'System1']
| | child_window(title="System", auto_id="MenuBar", control_type="MenuBar")
| | |
| | | MenuItem - 'System' (L448, T134, R470, B156)
| | | ['System2', 'SystemMenuItem', 'MenuItem']
| | | child_window(title="System", control_type="MenuItem")
| |
| | Button - 'Minimize' (L1333, T127, R1380, B157)
| | ['Minimize', 'Button', 'MinimizeButton', 'Button0', 'Button1']
| | child_window(title="Minimize", control_type="Button")
| |
| | Button - 'Maximize' (L1380, T127, R1426, B157)
| | ['Button2', 'Maximize', 'MaximizeButton']
| | child_window(title="Maximize", control_type="Button")
| |
| | Button - 'Close' (L1426, T127, R1473, B157)
| | ['CloseButton', 'Button3', 'Close']
| | child_window(title="Close", control_type="Button")
Can anyone help?
Thank you very much.
Looks like the + button where you need to click to get the window (shown in second figure) is ownerdrawn.
So, there is only one way to bring up the "Add folder to search" window: use click_input method by passing coordinates.
Once the window comes up, you can use the below code to set the value:
app.DuplicatePhotoCleaner.child_window(title="Folder:", auto_id="1152", control_type="Edit").set_text('Hello world') #or
app.DuplicatePhotoCleaner['Folder:Edit'].set_text('Hello world')
Application().connect(title='Add folder to search')...
Please go though pywinauto docs for further info.

PyGTK Spacing in an HBox

I'm new to GTK, I'm trying to figure out how to accomplish something like this:
+---+------+---+
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
+---+------+---+
I want this done in an HBox. How would I accomplish this? Thanks.
It is done with "packing".
I always keep the class reference under my pillow : http://www.pygtk.org/docs/pygtk/gtk-class-reference.html
Samples in the good tutorial found here :
http://www.pygtk.org/pygtk2tutorial/sec-DetailsOfBoxes.html
And finally, this shows up something like your drawing :
import gtk as g
win = g.Window ()
win.set_default_size(600, 400)
win.set_position(g.WIN_POS_CENTER)
win.connect ('delete_event', g.main_quit)
hBox = g.HBox()
win.add (hBox)
f1 = g.Frame()
f2 = g.Frame()
f3 = g.Frame()
hBox.pack_start(f1)
hBox.pack_start(f2)
hBox.pack_start(f3)
win.show_all ()
g.main ()
Have fun ! (and I hope my answer is helpful)
The answer is pack_start() and pack_end()
The function has a few parameters you can send to it that give you the desired effect
If you use Louis' example:
hBox.pack_start(f1, expand =False, fill=False)
hBox.pack_start( f2, expand=True, fill=True, padding=50)
hBox.pack_end(f3, expand=False, fill=False)
Hope that helps!

Categories

Resources