Xpath expression, format method, using variables outside for loop, Python - python

I am using xpath and lxml to parse an XML file. I use format to increment the value in the item element, item[{}], so that I can get data from the first element, 2nd, and so on. I loop through based on number of times the //items/item element appears in the XML file, which works. When I get to the last item in the tuple I increase z by 1.
QUESTION:
I can't figure how to assign the tuple value outside this for loop, and still use format to increase the value of z. If I define the tuple outside the for loop, z always equals 1. It works as shown, but I don't like it.
x = 0
y = 1
z = 1
for field in xrange(1, 3):
dataFields = \
(("itemType" , "//items/item[{}]//#itemType".format(z)),
("itemClass" , "//items/item[{}]//#itemClass".format(z)),)
currentData = dataFields[x][y]
try:
r = tree.xpath(currentData)[0]
except:
r = ""
f = open(dataOutputFile, 'a')
f.write(str(r) + "\t")
f.close()
x += 1
if "itemClass" in currentData:
z += 1

Related

From a tuple list, Insert tuples Index[0] and Index[1] into a function

Good mooring to all,
The objective is to be able to create a series of new columns by inserting x and y into the df[f'sma_{x}Vs_sma{y}'] function.
The problem that I’m having is that I’m only getting the last tuple value into the function and therefore into the data frame as you can see on the last image.
On the second part of the code, 3 examples on how the tuples values must be plug into the function. IN the examples I will be using the first 2 tuples (10,11), (10,12) and the last tuple (48,49)
Code:
a = list(combinations(range(10, 15),2))
print(a)
for index, tuple in enumerate(a):
x = tuple[0]
y = tuple[1]
print(x, y)
df[f'sma_{x}_Vs_sma_{y}'] = np.where(ta.sma(df['close'], lenght = x) > ta.sma(df['close'], lenght = y),1,-1)
Code Examples:
Tuple (10,11)
df[f'sma_{10}_Vs_sma_{11}'] = np.where(ta.sma(df['close'], lenght = 10) > ta.sma(df['close'], lenght = 11),1,-1)
Tuple (10,12)
df[f'sma_{10}_Vs_sma_{12}'] = np.where(ta.sma(df['close'], lenght = 10) > ta.sma(df['close'], lenght = 12),1,-1)
Tuple (13,14)
df[f'sma_{13}_Vs_sma_{14}'] = np.where(ta.sma(df['close'], lenght = 13) > ta.sma(df['close'], lenght = 14),1,-1)
Error code
On the next lines the code that solve the issue. Although looking backwards seams very easy, it took me some time to get to the answer.
Thanks to the people that comment on the issue
a = list(combinations(range(5, 51),2))
print(a)
for x, y in a :
df[f'hma_{x}_Vs_hma_{y}'] = np.where(ta.hma(df['close'], lenght = x) > ta.hma(df['close'], lenght = y),1,-1)

Find max and extract data from a list

I have a text file with twenty car prices and its serial number there are 50 lines in this file. I would like to find the max car price and its serial for every 10 lines.
priceandserial.txt
102030 4000.30
102040 5000.40
102080 5500.40
102130 4000.30
102140 5000.50
102180 6000.50
102230 2000.60
102240 4000.30
102280 6000.30
102330 9000.70
102340 1000.30
102380 3000.30
102430 4000.80
102440 5000.30
102480 7000.30
When I tried Python's builtin max function I get 102480 as the max value.
x = np.loadtxt('carserial.txt', unpack=True)
print('Max:', np.max(x))
Desired result:
102330 9000.70
102480 7000.30
There are 50 lines in file, therefore I should have a 5 line result with serial and max prices of each 10 lines.
Respectfully, I think the first solution is over-engineered. You don't need numpy or math for this task, just a dictionary. As you loop through, you update the dictionary if the latest value is greater than the current value, and do nothing if it isn't. Everything 10th item, you append the values from the dictionary to an output list and reset the buffer.
with open('filename.txt', 'r') as opened_file:
data = opened_file.read()
rowsplitdata = data.split('\n')
colsplitdata = [u.split(' ') for u in rowsplitdata]
x = [[int(j[0]), float(j[1])] for j in colsplitdata]
output = []
buffer = {"max":0, "index":0}
count = 0
#this assumes x is a list of lists, not a numpy array
for u in x:
count += 1
if u[1] > buffer["max"]:
buffer["max"] = u[1]
buffer["index"] = u[0]
if count == 10:
output.append([buffer["index"], buffer["max"]])
buffer = {"max":0, "index":0}
count = 0
#append the remainder of the buffer in case you didn't get to ten in the final pass
output.append([buffer["index"], buffer["max"]])
output
[[102330, 9000.7], [102480, 7000.3]]
You should iterate over it and for each 10 lines extract the maximum:
import math
# New empty list for colecting the results
max_list=[]
#iterate thorught x supposing
for i in range(math.ceil(len(x)/10)):
### append only 10 elments if i+10 is not superior to the lenght of the array
if i+11<len(x):
max_list=max_list.append(np.max(x[i:i+11]))
### if it is superior, then append all the remaining elements
else:
max_list=max_list.append(np.max(x[i:]))
This should do your job.
number_list = [[],[]]
with open('filename.txt', 'r') as opened_file:
for line in opened_file:
if len(line.split()) == 0:
continue
else:
a , b = line.split(" ")
number_list[0].append(a)
number_list[1].append(b)
col1_max, col2_max = max(number_list[0]), max(number_list[1])
col1_max, col2_max
Just change the filename. col1_max, col2_max have the respective column's max value. You can edit the code to accommodate more columns.
You can transpose your input first, then use np.split and for each submatrix you calculate its max.
x = np.genfromtxt('carserial.txt', unpack=True).T
print(x)
for submatrix in np.split(x,len(x)//10):
print(max(submatrix,key=lambda l:l[1]))
working example

How to separate different input formats from the same text file with Python

I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end

Python for loop using range results in the index being printed rather than the value

I'm looping over the below output from a difflib compare of two configuration files:-
[server]
+ # web-host-name = www.myhost.com
+ https-port = 1080
+ network-interface = 0.0.0.0
[process-root-filter]
[validate-headers]
[interfaces]
[header-names]
[oauth]
[tfim-cluster:oauth-cluster]
[session]
+ preserve-inactivity-timeout = 330
[session-http-headers]
what I'm trying to achieve is to parse the diff and only print out headers (items in []) for which the next item in the list starts with +
I have the following code that is running without errors.
for x in range(0, (len(diff) - 1)):
print str(x) #prints index number instead of the content of the line
z = str(diff)[x+1]
if str(x).startswith('+'):
print(x) # prints nothing x contains the index of the line instead of the text
elif str(x).startswith(' [') and z.startswith('+'):
print(x)
The problem is that the index number of the line is being returned in the loop rather than the text in the line.
e.g. print output from
[0]
[1]
[2]
[3]
[4]
I know I must be missing something basic here but can't seem to find the answer after.
for x in range(0, (len(diff) - 1)):
print str(x) #prints index number instead of the content of the line
The reason the index is being printed here is because x is not iterating over the contents of diff, it's iterating over an integer range that is equal to the length of diff. You already have the answer as to why your print is giving the index instead of the string content here, on the very next line: z = str(diff)[x+1]
Calling diff[x] refers to the line of diff at index x, so if you want to print diff's content or refer to it in later lines you need to do the same thing: print str(diff[x])
What you are doing is looping through an x of the range 0 to length of diff - 1.
This will provide you with all integer values between those two integers, e.g.
for x in range(0, 3):
print(str(x) + ' ')
will give you back:
0 1 2 3
So if diff is a list of strings for each new line, to get the line, you can just use :
# iterate through list diff
for x in diff:
print(x)
to print out all of your lines. If you now want to know if its a header before you print it out:
# iterate through list diff
for x in diff:
# test if x is a header
if x.startswith('[') and x.endswith(']'):
print(x)
Please note that none of this code has been tested.
Hope this helps
EDIT:
if diff is not a list of lines but rather one single string, you can use
line_list = diff.split('\n')
to get a list of lines.
EDIT 2:
If you now want to also check the next line within the first iteration, we have to use indexes instead:
# for every index in list diff
for i in range(0, len(diff) - 1):
if diff[i].startswith('[') and diff[i + 1].startswith('+'):
# do something if its a header with following content
elif diff[i].startswith('+'):
# do something if the line is data
thanks to both #pheonix and #rdowell comments above. I've updated the code to the below which now works:-
for i in range(0, len(diff) - 1):
if diff[i].startswith(' [') and diff[i + 1].startswith('+'):
# do something if its a header with following content
print str(diff[i])
elif diff[i].startswith('+'):
# do something if the line is data
print str(diff[i])
This post will be something I'll refer back to as I'm just getting to know what you can and can't do with the different object types in Python

Python: Change range location with loop

I would like my program to print every other letter in the string "welcome".
like:
e
c
m
Here is the code I have so far:
stringVar = "welcome"
countInt = 7
count = 0
oneVar = 1
twoVar = 2
showVar = stringVar[oneVar:twoVar]
for count in range(countInt):
count = count + 1
oneVar = oneVar + count
twoVar = twoVar + count
print(showVar)
Though it only shows the 2nd letter "e".
How can I get the variables oneVar and twoVar to update so that the range changes for the duration of the loop?
There is a built in notation for this, called "slicing":
>>> stringVar = "welcome"
>>> print(stringVar[::2])
wloe
>>> print(stringVar[1::2])
ecm
stringVar is iterable like a list, so the notation means [start : end : step]. Leaving any one of those blank implicitly assumes from [0 : len(stringVar) : 1]. For more detail, read the linked post.
Another more complex way of the doing the same would be
string_var = "welcome"
for index, character in enumerate(string_var, start=1): # 'enumerate' provides us with an index for the string and 'start' allows us to modify the starting index.
if index%2 == 0:
print character
Why its not working in your snipet:
Even though you increase oneVar and twoVar inside the loop, there is no change in the showVar as showVar is string which is immutable type, and its printing stringVar[1:2] which is e the 2nd index of welcome:
Just to fix your snippet:
You can just try like this;
stringVar = "welcome"
countInt = 7
for count in range(1,countInt,2):
print count, stringVar[count]
Output:
e
c
m

Categories

Resources