Read lines from GitHub user content file - python

I'm trying to read a text file (proxies list) from GitHub user content. Code shall return a random line, but it doesn't work as expected.
My code:
res = reqs.get('https://raw.githubusercontent.com/clarketm/proxy-list/master/proxy-list-raw.txt', headers={'User-Agent':'Mozilla/5.0'})
proxies = []
for lines in res.text:
proxies = ''.join(lines)
print proxies
return proxies
Here is what I get:
.
2
1
:
8
0
8
0
1
9
2
.
1
6
2
.
6
2
.
1
9
7
:
5
9
2
4
6
Here is what is expected:
178.217.106.245:8080
186.192.98.250:8080
If random line can be returned this would be even better.

Result is a string, iterating over a string iterates over letters, not lines.
You'll have to split the string by newlines and iterate over that:
for lines in res.text.split('\n'):
...

Related

How can I loop through a textfile but do something different on the first line, Python

I have a text file that looks something like this :
original-- expected output--
0 1 2 3 4 5 SET : {0,1,2,3,4,5}
1 3 RELATION:{(1,3),(3,1),(5,4),(4,5)}
3 1
5 4 REFLEXIVE : NO
4 5 SYMMETRIC : YES
and part of the code is having it print out the first line in curly braces, and the rest within one giant curly braces and each binary set in parentheses. I am still a beginner but I wanted to know if there is some way in python to make one loop that treats the first line differently than the rest?
try this with filename is your file ..
with open("filename.txt", "r") as file:
set_firstline = []
first_string = file.readline()
list_of_first_string = list(first_string)
for i in range(len(list_of_first_string)):
if str(i) in first_string:
set_firstline.append(i)
print(set_firstline)
OUTPUT : [0,1,2,3,4,5]
im new as well. so hope I can help you

In Python, how can I read a text document line-by-line and print the number of same characters in a row at the end of each line?

I have a program which converts a simple image (black lines on white background) into 2 character ASCII art ("x" is black and "-" is white).
I want to read each line and print the number or same character in a row at the end of each line. Do you know how I can do this?
for example:
---x--- 3 1 3
--xxx-- 2 3 2
-xxxxx- 1 5 1
in the top row there are 3 dashes 1 'x' and 3 dashes, and so on.
I would like these numbers to be saved to the ASCII text document.
Thank you!
You can use itertools.groupby:
from itertools import groupby
with open("art.txt", 'r') as f:
for line in map(lambda l: l.strip(), f):
runs = [sum(1 for _ in g) for _, g in groupby(line)]
print(f"{line} {' '.join(map(str, runs))}")
# ---x--- 3 1 3
# --xxx-- 2 3 2
# -xxxxx- 1 5 1

How to split array elements with Python 3

I am writing a script to gather results from an output file of a programme. The file contains headers, captions, and data in scientific format. I only want the data and I need a script that can do this repeatedly for different output files with the same results format.
This is the data:
GROUP 1 2 3 4 5 6 7 8
VELOCITY (m/s) 59.4604E+06 55.5297E+06 52.4463E+06 49.3329E+06 45.4639E+06 41.6928E+06 37.7252E+06 34.9447E+06
GROUP 9 10 11 12 13 14 15 16
VELOCITY (m/s) 33.2405E+06 30.8868E+06 27.9475E+06 25.2880E+06 22.8815E+06 21.1951E+06 20.1614E+06 18.7338E+06
GROUP 17 18 19 20 21 22 23 24
VELOCITY (m/s) 16.9510E+06 15.7017E+06 14.9359E+06 14.2075E+06 13.5146E+06 12.8555E+06 11.6805E+06 10.5252E+06
This is my code at the moment. I want it to open the file, search for the keyword 'INPUT:BETA' which indicates the start of the results I want to extract. It then takes the information between this input keyword and the end identifier that signals the end of the data I want. I don't think this section needs changing but I have included it just in case.
I have then tried to use regex to specify the lines that start with VELOCITY (m/s) as these contain the data I need. This works and extracts each line, whitespace and all, into an array. However, I want each numerical value to be a single element, so the next line is supposed to strip the whitespace out and split the lines into individual array elements.
with open(file_name) as f:
t=f.read()
t=t[t.find('INPUT:BETA'):]
t=t[t.find(start_identifier):t.find(end_identifier)]
regex = r"VELOCITY \(m\/s\)\s(.*)"
res = re.findall(regex, t)
res = [s.split() for s in res]
print(res)
print(len(res))
This isn't working, here is the output:
[['33.2405E+06', '30.8868E+06', '27.9475E+06', '25.2880E+06', '22.8815E+06', '21.1951E+06', '20.1614E+06', '18.7338E+06'], ['16.9510E+06', '15.7017E+06', '14.9359E+06', '14.2075E+06', '13.5146E+06', '12.8555E+06', '11.6805E+06', '10.5252E+06']]
2
It's taking out the whitespace but not putting the values into separate elements, which I need for the next stage of the processing.
My question is therefore:
How can I extract each value into a separate array element, leaving the rest of the data behind, in a way that will work with different output files with different data?
Here is how you can flatten your list, which is your point 1.
import re
text = """
GROUP 1 2 3 4 5 6 7 8
VELOCITY (m/s) 59.4604E+06 55.5297E+06 52.4463E+06 49.3329E+06 45.4639E+06 41.6928E+06 37.7252E+06 34.9447E+06
GROUP 9 10 11 12 13 14 15 16
VELOCITY (m/s) 33.2405E+06 30.8868E+06 27.9475E+06 25.2880E+06 22.8815E+06 21.1951E+06 20.1614E+06 18.7338E+06
GROUP 17 18 19 20 21 22 23 24
VELOCITY (m/s) 16.9510E+06 15.7017E+06 14.9359E+06 14.2075E+06 13.5146E+06 12.8555E+06 11.6805E+06 10.5252E+06
"""
regex = r"VELOCITY \(m\/s\)\s(.*)"
res = re.findall(regex, text)
res = [s.split() for s in res]
res = [value for lst in res for value in lst]
print(res)
print(len(res))
Your regex isn't skipping your first line though. There must be an error in the rest of your code.

Reading specific column from file when last few rows are not equivalent in python

I have a problem during the reading of a text file in python. Basically what I need is to get the 4th column in a list.
With this small function I achieve it without any great issues:
def load_file(filename):
f = open(filename, 'r')
# skip the first useless row
line = list(f.readlines()[1:])
total_sp = []
for i in line:
t = i.strip().split()
total_sp.append(int(t[4]))
return total_sp
but now I have to manage files, that in the last row(s) have any random number that don't respect the text format. An example of the not working text file is:
#generated file
well10_1 3 18 6 1 2 -0.01158 0.01842 142
well5_1 1 14 6 1 2 0.009474 0.01842 141
well4_1 1 13 4 1 2 -0.01842 -0.03737 125
well7_1 3 10 1 1 2 -0.002632 0.009005 101
well3_1 1 10 9 1 2 -0.03579 -0.06368 157
well8_1 3 10 10 1 2 -0.06895 -0.1021 158
well9_1 3 10 18 1 2 0.03053 0.02158 176
well2_1 1 4 4 1 2 -0.03737 -0.03737 128
well6_1 3 4 5 1 2 -0.07053 -0.1421 127
well1_1 -2 3 1 1 2 0.006663 -0.02415 128
1 0.9259
2 0.07407
where 1 0.9259 and 2 0.07407 have to be dumped.
In fact, using the function of above with this text file, I get the following error because of the 2 additional last rows:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/tmp/tmpqi8Ktw.py", line 21, in load_obh
total_sp.append(int(t[4]))
IndexError: list index out of range
How can I get rid of the last lines in the line variable?
Thanks to all
There are many ways to handle this, one such way can be to handle the indexError by surrounding the erroneous code by try and except, something like this :
try :
total_sp.append(int(t[4]))
except IndexError :
pass
This will only append to the total_sp when index exits otherwise not. Also, this will handle whenever you do not have the data present corresponding to that particular index.
Alternatively, if you are interested in removing just the last two rows (elements), you can use the slice operator such as by replacing line = list(f.readlines()[1:]) with line = f.readlines()[1:-2].
f.readlines already returns a list. Just as you provide a start index to slice from, you can specify "2 before the end" using negative indexing as below:
line = f.readlines()[1:-2]
Should do the trick.
EDIT: To handle an arbitrary number of lines at the end:
def load_file(filename):
f = open(filename, 'r')
# skip the first useless row
line = f.readlines()[1:]
total_sp = []
for i in line:
t = i.strip().split()
# check if enough columns were found
if len(t) >= 5:
total_sp.append(int(t[4]))
return total_sp
Also there is a "your case-specific" solution:
for i in line:
if not i.startswith(' '):
t = i.strip().split()
total_sp.append(int(t[4]))

Alternative to bash (awk command) with python

Context : I run calculations on a program that gives me result files.
On these result files (extension .h5), I can apply a python code (I cannot change this python code) such that it gives me a square matrix :
oneptdm.py resultfile.h5
gives me for example :
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
points groups
1
2
3
...
in a file called oneptdm.dat
I want to grep the diagonal of this matrix. Usually I use simply bash:
awk '{ for (i=0; i<=NF; i++) if (NR >= 1 && NR == i) print i,$(i) }' oneptdm.dat > diagonal.dat
But for x reason, I have to do it with python now. How can I do that ?
I can of course use "subprocess" to use awk again but I would like to know if there is an alternative way to do that with a python script, version 2.6.
The result should be :
(line) (diagonal element)
1 1
2 6
3 11
4 16
You can try something like this:
with open('oneptdm.dat') as f:
for i, l in enumerate(f):
print '%d\t%s' % (i + 1, l.split()[i])
This should do the trick. It does assume that the file begins with a square matrix, and that assumption is used to limit the number of lines read from the file.
with open('oneptdm.dat') as f:
line = next(f).split()
for i in range(len(line)):
print('{0}\t{1}'.format(i+1, line[i]))
try:
line = next(f).split()
except StopIteration:
break
Output for your sample file:
1 1
2 6
3 11
4 16

Categories

Resources