Python parse an xml file pass result to array - python

I am attempting to parse an xml file which I have accomplished and pass the results into an array which will be used later on. The xml is opened read and parsed where I am picking out 3 elements (channel, start and title). As shown in code below, the start is date and time. I am able to split date and time and store in date. As the code loops thru each xml entry I would like to pick out the channel, start and title and store to a multidimensional array. I have done this in Brightscript but can't understand the array or list structure of Python. Once I have all entries in the array or list, I will need to parse that array pulling out all titles and dates with the same date. Can somebody guide me thru this?
xmldoc=minidom.parse (xmldoc)
programmes= xmldoc.getElementsByTagName("programme")
def getNodeText(node):
nodelist = node.childNodes
result = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
result.append(node.data)
return ''.join(result)
title = xmldoc.getElementsByTagName("title")[0]
#print("Node Name : %s" % title.nodeName)
#print("Node Value : %s \n" % getNodeText(title))
programmes = xmldoc.getElementsByTagName("programme")
for programme in programmes:
cid = programme.getAttribute("channel")
starts=programme.getAttribute("start")
cutdate=starts[0:15]
year= int(cutdate[0:4])
month= int(cutdate[5:6])
day= int(cutdate[7:8])
hour= int(cutdate[9:10])
minute= int(cutdate[11:12])
sec= int(cutdate[13:14])
date=datetime(year, month, day,hour, minute, sec)
title = programme.getElementsByTagName("title")[0]
print("id:%s, title:%s, starts:%s" %
(cid, getNodeText(title), starts))
print (date)

Python normally refers to arrays as lists and it looks like what you want is a list of lists (there's an array module and the whole numpy extension with its own arrays, but it doesn't look like you want that:-).
So start the desired list as empty:
results = []
and where you now just print things, append them to the list:
results.append([cid, getNodeText(title), date])
(or whatever -- your indentation is so rambling it would cause tons of syntax errors in Python and confuses me about what exactly you want:-).
Now for the part
I will need to parse that array pulling out all titles and dates with
the same date
just sort the results by date:
import operator
results.sort(key=operator.itemgetter(2))
then group by that:
import itertools
for date, items in itertools.groupby(results, operator.itemgetter(2)):
print(date,[it[1] for it in items])
or whatever else you want to do with this grouping.
You could improve this style in many ways but this does appear to give you the key functionality you're asking for.

Related

Editing String Objects in a List in Python

I have read in data from a basic txt file. The data is time and date in this form "DD/HHMM" (meteorological date and time data). I have read this data into a list: time[]. It prints out as you would imagine like so: ['15/1056', '15/0956', '15/0856', .........]. Is there a way to alter the list so that it ends up just having the time, basically removing the date and the forward slash, like so: ['1056', '0956', '0856',.........]? I have already tried list.split but thats not how that works I don't think. Thanks.
I'm still learning myself and I haven't touched python in sometime, BUT, my solution if you really need one:
myList = ['15/1056', '15/0956', '15/0856']
newList = []
for x in mylist:
newList.append(x.split("/")[1])
# splits at '/'
# returns ["15", "1056"]
# then appends w/e is at index 1
print(newList) # for verification

Parsing and arranging text in python

I'm having some trouble figuring out the best implementation
I have data in file in this format:
|serial #|machine_name|machine_owner|
If a machine_owner has multiple machines, I'd like the machines displayed in a comma separated list in the field. so that.
|1234|Fred Flinstone|mach1|
|5678|Barney Rubble|mach2|
|1313|Barney Rubble|mach3|
|3838|Barney Rubble|mach4|
|1212|Betty Rubble|mach5|
Looks like this:
|Fred Flinstone|mach1|
|Barney Rubble|mach2,mach3,mach4|
|Betty Rubble|mach5|
Any hints on how to approach this would be appreciated.
You can use dict as temporary container to group by name and then print it in desired format:
import re
s = """|1234|Fred Flinstone|mach1|
|5678|Barney Rubble|mach2|
|1313|Barney Rubble||mach3|
|3838|Barney Rubble||mach4|
|1212|Betty Rubble|mach5|"""
results = {}
for line in s.splitlines():
_, name, mach = re.split(r"\|+", line.strip("|"))
if name in results:
results[name].append(mach)
else:
results[name] = [mach]
for name, mach in results.items():
print(f"|{name}|{','.join(mach)}|")
You need to store all the machines names in a list. And every time you want to append a machine name, you run a function to make sure that the name is not already in the list, so that it will not put it again in the list.
After storing them in an array called data. Iterate over the names. And use this function:
data[i] .append( [ ] )
To add a list after each machine name stored in the i'th place.
Once your done, iterate over the names and find them in in the file, then append the owner.
All of this can be done in 2 steps.

Python - Getting Items From List Using Day Name

I am trying to make a program that gets the date, works out what lessons i have.
import datetime
def getdate():
now = datetime.datetime.now()
print(now.strftime("%A"))
day=getdate()
##LESSON LIST###
################
Lessons = [
Monday=['English','Geography','German','P.E.','Science-C']
Tuesday=['Art','Science-B','Maths','ICT','French']
Wednesday=['History','English','Drama','Science-B','Maths']
Thursday=['P.E.','D&T','HTT','Geography','R.E.']
Friday=['German','D&T','Maths','English','Music']
]
################
#END LESSON LIST
Today = Lessons[day]
print("1) Book Check")
print("2) Timetable List")
x = input()
if x = 1:
#List lessons for this day one by one with a book input eg.
print("book for lesson1")
l1 = bool(input("True/False"))
print("book for lesson2")
l2 = bool(input("True/False"))
#but it should say the lesson name, and save the state of book boolean
elif x = 2:
#list lessons for this day
print(Today) # just an example.
Currently, I get a syntax error that I cannot fix, I can't find where I have gone wrong. I would like to use a dictionary to complete my code but I am unsure how to.
First you have a major error in how you are trying to form a dict. It should be this:
Lessons = {
'Monday':['English','Geography','German','P.E.','Science-C'],
'Tuesday':['Art','Science-B','Maths','ICT','French'],
'Wednesday':['History','English','Drama','Science-B','Maths'],
'Thursday':['P.E.','D&T','HTT','Geography','R.E.'],
'Friday':['German','D&T','Maths','English','Music']
}
This is a dictionary called Lessons that has Strings as keys (the days of the weeks) and Lists as values (The lists of lessons). To access a list of lessons you would do it like so:
Lessons['Monday']
Note that this returns a list, if you want it formatted differently, then you can do something like this:
", ".join(Lessons['Monday'])
This will give you a comma-separated list of lessons.
I am unsure what exactly you are trying to do with books, but if you want to be more specific I will update my answer. However, I can say that if you will be running this program each day, then you will need to store information about the books in a file to maintain their state, otherwise it will be lost when the program ends.
Also, variables should be lower-cased (lessons instead of Lessons), but I kept it how you had it to be consistent.

Web2py comparing part of a request.vars element

I have a form with a table with rows containing SELECTs with _names with IDs attached, like this:
TD_list.append(TD(SELECT(lesson_reg_list, _name='lesson_reg_' + str(student[4]))))
When the form is submitted I want to extract both the student[4] value and the value held by request.vars.lesson_reg_student[4].
I've tried something like:
for item in request.vars:
if item[0:9] == "lesson_reg":
enrolment_id = int(item[10:])
code = request.vars.item
I also tried treating request.vars like a dictionary by using:
for key, value in request.vars:
if key[0:9] == "lesson_reg":
enrolment_id = int(key[10:])
code = value
but then I got 'too many values to unpack'. How do I retrieve the value of a request.vars item when the last part of its name could be any number, plus a substring of the item name itself?
Thanks in advance for helping me.
In Python, when slicing, the ending index is excluded, so your two slices should be 0:10 and 0:11. To simplify, you can also use .startswith for the first one:
for item in request.vars:
if item.startswith('lesson_reg'):
enrolment_id = int(item[11:])
code = request.vars.item

Python extract substring with location of field and symbols

I have been trying to clean a field in a csv file. The field is populated with numbers and characters, which I read into a panda dataframe and convert to a string.
Goal is to extract following variables: StopId, StopCode (possible to have multiple for each record), rte, routeId from the long string. Here is what I attempted so far.
After extracting the variables listed above, I need to merge the variable/codes with another file with location data per each stop/route/rte.
Sample records for the FIELD:
'Web Log: Page generated Query [cid=SM&rte=50183&dir=S&day=5761&dayid=5761&fst=0%2c&tst=0%2c]'
'Web Log: Page generated Query: [_=1407744540393&agencyId=SM&stopCode=361096&rte=7878%7eBus%7e251&dir=W]'
Web Log: Page generated Query: [_=1407744956001&agencyId=AC&stopCode=55451&stopCode=55452stopCode=55489&&rte=43783%7eBus%7e88&dir=S]
Solutions I tried below, but I am stuck! Advice and recommendations are appreciated
# Idea 1: Splits field above in a loop by '&' into a list. This is useful but I'll
# have to write additional code to pull out relevant variables
i = 0
for t in data['EVENT_DESCRIPTION']:
s = list(t.split('&'))
data['STOPS'][i] = [ x for x in s if "Web Log" not in x ]
i+=1
# Idea 1 next step help - how to pull out necessary variables from the list in data['STOPS']
# Idea2: Loop through field with string to find the start and end of variable names. The output for stopcode_pl (et. al. variables) is tuple or list of tuples (if there are more than one in the string)
for i in data['EVENT_DESCRIPTION']:
stopcode_pl = [(a.start(), a.end() ) for a in list(re.finditer('stopCode=', i))]
stopid_pl = i[(a.start(), a.end() ) for a in list(re.finditer('stopId=', i))]
rte_pl = [(a.start(), a.end() ) for a in list(re.finditer('rte=', i))]
routeid_pl = [(a.start(), a.end() ) for a in list(re.finditer('routeId=', i))]
#Idea2: Next Step Help - how to use the string location for variable names to pull the number of the relevant variable. Is there a trick to grab the characters in between the variable name last place (i.e. after the '=' of the variable name) and the next '&'?
This function
def qdata(rec):
return [tuple(item.split('=')) for item in rec[rec.find('[')+1:rec.find(']')].split('&')]
yields, for instance, on the first record:
[('cid', 'SM'), ('rte', '50183'), ('dir', 'S'), ('day', '5761'), ('dayid', '5761'), ('fst', '0%2c'), ('tst', '0%2c')]
You can then step across that list searching for your specific items.

Categories

Resources