I am a newbie at Python & I have a web scraper program that retrieved links and puts them into a .csv file. I need to add a new line after each web link in the output but I do not know how to use the \n properly. Here is my code:
file = open('C:\Python34\census_links.csv', 'a')
file.write(str(census_links))
file.write('\n')
Hard to answer your question without knowing the format of the variable census_links.
But presuming it is a list that contains multiple links composed of strings, you would want to parse through each link in the list and append a newline character to the end of a given link and then write that link + newline to the output file:
file = open('C:/Python34/census_links.csv', 'a')
# Simulating a list of links:
census_links = ['example.com', 'sample.org', 'xmpl.net']
for link in census_links:
file.write(link + '\n') # append a newline to each link
# as you process the links
file.close() # you will need to close the file to be able to
# ensure all the data is written.
E. Ducateme has already answered the question, but you could also use the csv module (most of the code is from here):
import csv
# This is assuming that “census_links” is a list
census_links = ["Example.com", "StackOverflow.com", "Google.com"]
file = open('C:\Python34\census_links.csv', 'a')
writer = csv.writer(file)
for link in census_links:
writer.writerow([link])
Related
and thank you for taking the time to read this post. This is literally my first time trying to use Python so bare with me.
My Target/Goal: Edit the original text file (Original .txt file) so that for every domain listed an "OR" is added in between them (below target formatting image). Any help is greatly appreciated.
I have been able to google the information to open and read the txt file, however, I am not sure how to do the formatting part.
Script
Original .txt file
Target formatting
You can achieve this in a couple lines as:
with open(my_file) as fd:
result = fd.read().replace("\n", " OR ")
You could then write this to another file with:
with open(formatted_file, "w") as fd:
fd.write(result)
something you could do is the following
import re
# This opens the file in read mode
with open('Original.txt', 'r') as file:
# Read the contents of the file
contents = file.read()
# Seems that your original file has line breaks to each domain so
# you could replace it with the word "OR" using a regular expression
contents = re.sub(r'\n+', ' OR ', contents)
# Then you should open the file in write mode
with open('Original.txt', 'w') as file:
# and finally write the modified contents to the file
file.write(contents)
a suggestion is, maybe you want to try first writing in a different file to see if you are happy with the results (or do a copy of Original.txt just in case)
with open('AnotherOriginal.txt', 'w') as file:
file.write(contents)
I'm trying to create a webscraping script in Python where I follow a bunch of links and insert them into a .txt file. However, I want to do this only if the website already doesn't exist in the file.
I have written this code to insert the given website link into the file, so far (not working):
def writeSite(site):
file = open("websites.txt", 'a+')
# print(site)
if site in file.read():
return
file.write(site + "\n")
file.close()
Thanks in advance.
You were pretty close, but because you open the file to append to it, it starts with the file pointer at the end. You need to seek to the start to read its contents again:
def writeSite(site):
file = open("websites.txt", 'a+')
file.seek(0)
# print(site)
if site in file.read():
return
file.write(site + "\n")
file.close()
However, keep in mind that site in file.read() is very crude.
For example, imagine you already have 'http://somesite.com/page/' in the file but now you want to add 'http://somesite.com/' - the URL as a whole is not in the file, but your test will find it.
If you want to check whole lines (and be sure you deal with the file nicely), this would be better:
def writeSite(site):
site += '\n'
with open("websites.txt", 'a+') as f:
f.seek(0)
if site in f.readlines():
return
f.write(site)
It adds a newline to the name of the site to separate the URLs in the file and uses readlines to make use of that fact to check for the whole URL. Using with ensures the file always gets closed.
And since you want to read before writing anyway, you could use 'r+' as a mode, and skip the seek - but only if you can be sure the file already exists. I assume you chose 'a+' because that isn't the case.
(in case you worry that this changes the value of site - that's only true for the parameter inside the function. Whatever value you passed in outside the function will remain unaffected)
I have been searching online, but have not found any solution.
Here is my text file:
I want x cookies
He wants y cookies
I want the python script to export the value in x and y from the user input.
Here is the script:
xcookies = input("How much cookies do you want?")
ycookies = input("How much cookies does he want?")
I found some scripts online but I can never keep the text from the original text file and export variables in this text file.
Could anyone please help me with that?
This will append your variables to the end of MyFile.txt, a txt file in the same directory as the python script.
# Open the text file
txt_file = open("MyFile.txt","a")
# Append new line
txt_file.write('\n')
# Append your variables
txt_file.write(xcookies + '\n')
txt_file.write(ycookies + '\n')
# Close file
txt_file.close()
It is unclear what you want to reach in your script.
Please make your question more concrete.
Anyway, you can use this as a guideline:
In order to read a text file and write to a text file, you can do the following:
with open("path/to/file.txt", "r") as f:
data = f.read()
Using that you are able to read to the content of the file and then parse it
(using data.split() or whatever you need).
After getting the desired input from the user,
you are able to write the content back to the file by doing the following:
with open("path/to/file.txt", "w") as f:
f.write(content)
Please refer to those tutorials:
The Python Tutorial - Input and Output
Python for Beginners - Reading and Writing Files in Python
I have a csv file of a couple dozen web pages that I am trying to loop over.
The goal is to get the text from the web page, take out the html markup (using html2text), and then save the clean text as a .txt file. My idea was to save the clean text of each webpage as an item in the list, then export each item in the list to a txt file.
I can get the program to loop over the urls and take out the html, but saving to individual txt files keeps throwing an error. Can anyone give me some ideas on how to do this?
Code:
from stripogram import html2text
import urllib
import csv
text_list = []
urls = csv.reader(open('web_links2.csv'))
for url in urls:
response = urllib.urlopen(url[0])
html = response.read()
text = html2text(html)
text_list.append(text)
print text_list
for item in text_list:
f = open('c:\users\jacob\documents\txt_files\%s.txt'%(item,), 'w')
f.write(item)
f.close
It looks like you are using the same value (item) for both the names of the files and their contents, so unless these files are single words, you are likely generating illegal file names.
Plus, in order to call close, you need to supply the parentheses.
Your main problem is you are not escaping the t use raw string r:
open(r'c:\users\jacob\documents\txt_files\%s.txt'%(item,), 'w')
\t is tab so use raw string as in the example, double \\ or forward slashes / in your file path.
In [11]: s = "\txt_files"
In [12]: print(s)
xt_files
In [13]: s = r"\txt_files"
In [14]: print(s)
\txt_files
f.close <- missing parens to call the method
Use with to open you file and things like forgetting to call close will not be an issue:
with open(r'c:\users\jacob\documents\txt_files\%s.txt'%(item,), 'w') as f: # closes your files automatically
f.write(item)
I think you might not want to add the full item to the filename since the item is all the html of a webpage. In your case I'd either add some logic to give it a neat website name or just use an index so you can iterate over this.
Also the file path definition should be different, try to use double quotes and \ instead of .
You might want to do something like this:
i = 0
for item in text_list:
i += 1
#also use format instead of the %s
f = open("c:\\users\\jacob\\documents\\txt_files\\{0}.txt".format(i), 'w')
f.write(item)
f.close()
How can I tell Python to open a CSV file, and merge all columns per line, into new lines in a new TXT file?
To explain:
I'm trying to download a bunch of member profiles from a website, for a research project. To do this, I want to write a list of all the URLs in a TXT file.
The URLs are akin to this: website.com-name-country-title-id.html
I have written a script that takes all these bits of information for each member and saves them in columns (name/country/title/id), in a CSV file, like this:
mark japan rookie married
john sweden expert single
suzy germany rookie married
etc...
Now I want to open this CSV and write a TXT file with lines like these:
www.website.com/mark-japan-rookie-married.html
www.website.com/john-sweden-expert-single.html
www.website.com/suzy-germany-rookie-married.html
etc...
Here's the code I have so far. As you can probably tell I barely know what I'm doing so help will be greatly appreciated!!!
import csv
x = "http://website.com/"
y = ".html"
csvFile=csv.DictReader(open("NameCountryTitleId.csv")) #This file is stored on my computer
file = open("urls.txt", "wb")
for row in csvFile:
strArgument=str(row['name'])+"-"+str(row['country'])+"-"+str(row['title'])+"-"+str(row['id'])
try:
file.write(x + strArgument + y)
except:
print(strArgument)
file.close()
I don't get any error messages after running this, but the TXT file is completely empty.
Rather than using a DictReader, use a regular reader to make it easier to join the row:
import csv
url_format = "http://website.com/{}.html"
csv_file = 'NameCountryTitleId.csv'
urls_file = 'urls.txt'
with open(csv_file, 'rb') as infh, open(urls_file, 'w') as outfh:
reader = csv.reader(infh)
for row in reader:
url = url_format.format('-'.join(row))
outfh.write(url + '\n')
The with statement ensures the files are closed properly again when the code completes.
Further changes I made:
In Python 2, open a CSV files in binary mode, the csv module handles line endings itself, because correctly quoted column data can have embedded newlines in them.
Regular text files should be opened in text mode still though.
When writing lines to a file, do remember to add a newline character to delineate lines.
Using a string format (str.format()) is far more flexible than using string concatenations.
str.join() lets you join a sequence of strings together with a separator.
its actually quite simple, you are working with strings yet the file you are opening to write to is being opened in bytes mode, so every single time the write fails and it prints to the screen instead. try changing this line:
file = open("urls.txt", "wb")
to this:
file = open("urls.txt", "w")
EDIT:
i stand corrected, however i would like to point out that with an absence of newlines or some other form of separator, how do you intend to use the URLs later on? if you put newlines between each URL they would be easy to recover