i got list of URLs, for example:
urls_list = [
"http://yandex.ru",
"http://google.ru",
"http://rambler.ru",
"http://google.ru",
"http://gmail.ru",
"http://mail.ru"
]
I need to open the csv file, check if each value from list in file - skip to next value, else (if value not in a list) add this value in list.
Result: 1st run - add all lines (if file is empty), 2nd run - doing nothing, because all elements in already in file.
A wrote code, but it's work completely incorrect:
import csv
urls_list = [
"http://yandex.ru",
"http://google.ru",
"http://rambler.ru",
"http://google.ru",
"http://gmail.ru",
"http://mail.ru"
]
with open('urls_list.csv', 'r') as fp:
for row in fp:
for url in urls_list:
if url in row:
print "YEY!"
with open('urls_list.csv', 'a+') as fp:
wr = csv.writer(fp, dialect='excel')
wr.writerow([url])
Read the file into a variable-
with open('urls_list.csv', 'r') as fp:
s = fp.read()
Check to see if each list item is in the file, if not save it
missing = []
for url in urls_list:
if url not in s:
missing.append(url + '\n')
Write the missing url's to the file
if missing:
with open('urls_list.csv', 'a+') as fp:
fp.writelines(missing)
Considering your file has only one column, the csv module might be an overkill.
Here's a version that first reads all the lines from the file and reopens the file to write urls that are not already in the file:
lines = open('urls_list.csv', 'r').read()
with open('urls_list.csv', 'a+') as fp:
for url in urls_list:
if url in lines:
print "YEY!"
else:
fp.write(url+'\n')
Related
I'm trying to download files from a site and due to search result limitations (max 300), I need to search each item individually. I have a csv file that has a complete list which I've written some basic code to return the ID# column.
With some help, I've got another script that iterates through each search result and downloads a file. What I need to do now is to combine the two so that it will search each individual ID# and download the file.
I know my loop is messed up here, I just can't figure out where and if I'm even looping in the right order
import requests, json, csv
faciltiyList = []
with open('Facility List.csv', 'r') as f:
csv_reader = csv.reader(f, delimiter=',')
for searchterm in csv_reader:
faciltiyList.append(searchterm[0])
url = "https://siera.oshpd.ca.gov/FindFacility.aspx"
r = requests.get(url+"?term="+str(searchterm))
searchresults = json.loads(r.content.decode('utf-8'))
for report in searchresults:
rpt_id = report['RPT_ID']
reporturl = f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
r = requests.get(reporturl)
a = r.headers['Content-Disposition']
filename = a[a.find("filename=")+9:len(a)]
file = open(filename, "wb")
file.write(r.content)
r.close()
The original code I have is here:
import requests, json
searchterm="ALAMEDA (COUNTY)"
url="https://siera.oshpd.ca.gov/FindFacility.aspx"
r=requests.get(url+"?term="+searchterm)
searchresults=json.loads(r.content.decode('utf-8'))
for report in searchresults:
rpt_id=report['RPT_ID']
reporturl=f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
r=requests.get(reporturl)
a=r.headers['Content-Disposition']
filename=a[a.find("filename=")+9:len(a)]
file = open(filename, "wb")
file.write(r.content)
r.close()
The searchterm ="ALAMEDA (COUNTY)" results in more than 300 results, so I'm trying to replace "ALAMEDA (COUNTY)" with a list that'll run through each name (ID# in this case) so that I'll get just one result, then run again for the next on the list
CSV - just 1 line
Tested with a CSV file with just 1 line:
406014324,"HOLISTIC PALLIATIVE CARE, INC.",550004188,Parent Facility,5707 REDWOOD RD,OAKLAND,94619,1,ALAMEDA,Not Applicable,,Open,1/1/2018,Home Health Agency/Hospice,Hospice,37.79996,-122.17075
Python code
This script reads the IDs from the CSV file. Then, it fetches the results from URL and finally writes the desired contents to the disk.
import requests, json, csv
# read Ids from csv
facilityIds = []
with open('Facility List.csv', 'r') as f:
csv_reader = csv.reader(f, delimiter=',')
for searchterm in csv_reader:
facilityIds.append(searchterm[0])
# fetch and write file contents
url = "https://siera.oshpd.ca.gov/FindFacility.aspx"
for facilityId in facilityIds:
r = requests.get(url+"?term="+str(facilityId))
reports = json.loads(r.content.decode('utf-8'))
# print(f"reports = {reports}")
for report in reports:
rpt_id = report['RPT_ID']
reporturl = f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
r = requests.get(reporturl)
a = r.headers['Content-Disposition']
filename = a[a.find("filename=")+9:len(a)]
# print(f"filename = {filename}")
with open(filename, "wb") as o:
o.write(r.content)
Repl.it link
I am unable to write the result of the following code to a file
import boto3
ACCESS_KEY= "XXX"
SECRET_KEY= "XXX"
regions = ['us-east-1','us-west-1','us-west-2','eu-west-1','sa-east-1','ap-southeast-1','ap-southeast-2','ap-northeast-1']
for region in regions:
client = boto3.client('ec2',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region,)
addresses_dict = client.describe_addresses()
#f = open('/root/temps','w')
for eip_dict in addresses_dict['Addresses']:
with open('/root/temps', 'w') as f:
if 'PrivateIpAddress' in eip_dict:
print eip_dict['PublicIp']
f.write(eip_dict['PublicIp'])
This results in printing the IP's but nothing gets written in file, the result of print is :
22.1.14.1
22.1.15.1
112.121.41.41
....
I just need to write the content in this format only
for eip_dict in addresses_dict['Addresses']:
with open('/root/temps', 'w') as f:
if 'PrivateIpAddress' in eip_dict:
print eip_dict['PublicIp']
f.write(eip_dict['PublicIp'])
You are re-opening the file for writing at each iteration of the loop. Perhaps the last iteration has no members with 'PrivateIpAddress' in its dict, so the file gets opened, truncated, and left empty. Write it this way instead:
with open('/root/temps', 'w') as f:
for eip_dict in addresses_dict['Addresses']:
if 'PrivateIpAddress' in eip_dict:
print eip_dict['PublicIp']
f.write(eip_dict['PublicIp'])
open file in append mode
with open('/root/temps', 'a') as f:
or
declare the file outside the loop
I've seen really complex answers on this website as how to edit a specific line on a file but I was wondering if there was a simple way to do it?
I want to search for a name in a file, and on the line that I find that name on, I want to add an integer to the end of the line (as it is a score for a quiz). Or could you tell me how I can replace the entirety of the line with new data?
I have tried a lot of coding but either no change is made, or all of the data in the file gets deleted.
I tried this....
with open ('File.py', 'r') as class_file:
for number, line in enumerate(class_file):
if name in line:
s=open('File.py', 'r').readlines()
s[number]=str(data)
class_file=open('File.py', 'w')
class_file.writelines(new_score)
class_file.close()
As well as this function....
def replace (file, line_number, add_score):
s=open(file, 'w')
new_data=line[line_number].replace(line, add_score)
s.write(str(new_data))
s.close()
As well as this...
def replace_score(file_name, line_num, text):
new = open(file_name, 'r').readlines()
new[line_num] = text
adding_score= open(file_name, 'w')
adding_score.writelines(new)
adding_score.close()
But I still can't get it to work.
The last code works if I'm trying to replace the first line, but not the others.
You need to get the content of the file. Close the file. Modify the content and rewrite the file with the modified content. Try the following:
def replace_score(file_name, line_num, text):
f = open(file_name, 'r')
contents = f.readlines()
f.close()
contents[line_num] = text+"\n"
f = open(file_name, "w")
contents = "".join(contents)
f.write(contents)
f.close()
replace_score("file_path", 10, "replacing_text")
This is Tim Osadchiy's code:
def replace_score(file_name, line_num, text):
f = open(file_name, 'r')
contents = f.readlines()
f.close()
contents[line_num] = text+"\n"
f = open(file_name, "w")
contents = "".join(contents)
f.write(contents)
f.close()
replace_score("file_path", 10, "replacing_text")
This code does work but just remember that the line_num will always be one above the actual line number (as it is an index). So if you wanted line 9 then enter 8, not 9. Also, do not forget to put .txt at the end of the file path (I would've commented but do not have a high enough reputation)
I'm new to python and programming. I need some help with a python script. There are two files each containing email addresses (more than 5000 lines). Input file contains email addresses that I want to search in the data file(also contains email addresses). Then I want to print the output to a file or display on the console. I search for scripts and was able to modify but I'm not getting the desired results. Can you please help me?
dfile1 (50K lines)
yyy#aaa.com
xxx#aaa.com
zzz#aaa.com
ifile1 (10K lines)
ccc#aaa.com
vvv#aaa.com
xxx#aaa.com
zzz#aaa.com
Output file
xxx#aaa.com
zzz#aaa.com
datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as fd:
for line in fd:
name = fd.readline()
if name[1:-1] in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
New Code
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as f:
for line in f:
name = f.readlines()
if name in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
Maybe I'm missing something, but why not use a pair of sets?
#!/usr/local/cpython-3.3/bin/python
data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'
with open(input_filename, 'r') as input_file:
input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())
with open(data_filename, 'r') as data_file:
data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())
print(input_addresses.intersection(data_addresses))
mitan8 gives the problem you have, but this is what I would do instead:
with open(inputfile, "r") as f:
names = set(i.strip() for i in f)
output = []
with open(datafile, "r") as f:
for name in f:
if name.strip() in names:
print name
This avoids reading the larger datafile into memory.
If you want to write to an output file, you could do this for the second with statement:
with open(datafile, "r") as i, open(outputfile, "w") as o:
for name in i:
if name.strip() in names:
o.write(name)
Here's what I would do:
names=[]
outputList=[]
with open(inputfile) as f:
for line in f:
names.append(line.rstrip("\n")
myEmails=set(names)
with open(outputfile) as fd, open("emails.txt", "w") as output:
for line in fd:
for name in names:
c=line.rstrip("\n")
if name in myEmails:
print name #for console
output.write(name) #for writing to file
I think your issue stems from the following:
name = fd.readline()
if name[1:-1] in names:
name[1:-1] slices each email address so that you skip the first and last characters. While it might be good in general to skip the last character (a newline '\n'), when you load the name database in the "dfile"
with open(inputfile, 'r') as f:
names = f.readlines()
you are including newlines. So, don't slice the names in the "ifile" at all, i.e.
if name in names:
I think you can remove name = fd.readline() since you've already got the line in the for loop. It'll read another line in addition to the for loop, which reads one line every time. Also, I think name[1:-1] should be name, since you don't want to strip the first and last character when searching. with automatically closes the files opened.
PS: How I'd do it:
with open("dfile1") as dfile, open("ifile") as ifile:
lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
ofile.write(lines)
In the above solution, basically I'm taking the union (elements part of both sets) of the lines of both the files to find the common lines.
here is my code for readinng individual cell of one csv file. but want to read multiple csv file one by one from .txt file where csv file paths are located.
import csv
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
data = list(csv.reader(ifile, delimiter = ';'))
REQ = []
RES = []
n = len(data)
for i in range(n):
x = data[i][1]
y = data[i][2]
REQ.append (x)
RES.append (y)
i += 1
for j in range(2,n):
try:
if REQ[j] != '' and RES[j]!= '': # ignore blank cell
print REQ[j], ' ', RES[j]
except:
pass
j += 1
And csv file paths are stored in a .txt file like
C:\Desktop\Test_Specification\RDBI.csv
C:\Desktop\Test_Specification\ECUreset.csv
C:\Desktop\Test_Specification\RDTC.csv
and so on..
You can read stuff stored in files into variables. And you can use variables with strings in them anywhere you can use a literal string. So...
with open('mytxtfile.txt', 'r') as txt_file:
for line in txt_file:
file_name = line.strip() # or was it trim()? I keep mixing them up
ifile = open(file_name, 'rb')
# ... the rest of your code goes here
Maybe we can fix this up a little...
import csv
with open('mytxtfile.txt', 'r') as txt_file:
for line in txt_file:
file_name = line.strip()
csv_file = csv.reader(open(file_name, 'rb', delimiter=';'))
for record in csv_file[1:]: # skip header row
req = record[1]
res = record[2]
if len(req + res):
print req, ' ', res
you just need to add a while which will read your file containing your list of files & paths upon your first open statement, for example
from __future__ import with_statement
with open("myfile_which_contains_file_path.txt") as f:
for line in f:
ifile = open(line, 'rb')
# here the rest of your code
You need to use a raw string string your path contains \
import csv
file_list = r"C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
with open(file_list) as f:
for line in f:
with open(line.strip(), 'rb') as the_file:
reader = csv.reader(the_file, delimiter=';')
for row in reader:
req,res = row[1:3]
if req and res:
print('{0} {1}'.format(req, res))