Python: Write output in a csv-file - python

I have the following text (as string, \t = Tab):
Article_1 \t Title of Article \t author of article \n
Article_2 \t Title of Art 2 \t author of article 2 \n
I'd like to save this in a csv-file s.t. I can open it in Excel. In fact, it is possible to open the file I got in Excel, but the program writes everything in the first column, but I'd like to have "art_1, art_2, ..." in the first column, the titles in the second and the authors in the third column. How can I do this?
Thanks for any help! :)

If you have a string, str, one easy way is just:
with open("file.csv","w") as f:
f.write(','.join(str.split()))
If you have multiple strings, and they are stored in a list, str_list, you could do this:
with open("file.csv","w") as f:
for line in str_list:
f.write(','.join(line.split()))
f.write('\n')
If the question is how to split one monolithic string into manageable sub-strings, then that's a different question. In that case you'd want to split() on the \t and then go through the list 3 at a time.
There's also a csv python package that provides a clean way of creating csv files from python data structures.

In case you want to use the csv module
import csv
with open("csv_file.csv", "wb") as csv_file:
csv_writer = csv.writer(csv_file, delimiter=",")
for str in list_of_articles:
csv_writer.writerow(str.split("\t"))

Related

How to replace characters in a csv file

I'm doing some measurements in the lab and want to transform them into some nice Python plots. The problem is the way the software exports CSV files, as I can't find a way to properly read the numbers. It looks like this:
-10;-0,0000026
-8;-0,00000139
-6;-0,000000546
-4;-0,000000112
-2;-5,11E-09
0,0000048;6,21E-09
2;0,000000318
4;0,00000304
6;0,0000129
8;0,0000724
10;0,000268
Separation by ; is fine, but I need every , to be ..
Ideally I would like Python to be able to read numbers such as 6.21E-09 as well, but I should be able to fix that in excel...
My main issue: Change every , to . so Python can read them as a float.
The simplest way would be for you to convert them to string and then use the .replace() method to pretty much do anything. For i.e.
txt = "0,0000048;6,21E-09"
txt = txt.replace(';', '.')
You could also read the CSV file (I don't know how you are reading the file) but depending on the library, you could change the 'delimiter' (to : for example). CSV is Comma-separated values and as the name implies, it separates columns by means of '.
You can do whatever you want in Python, for example:
import csv
with open('path_to_csv_file', 'r') as csv_file:
data = list(csv.reader(csv_file, delimiter=';'))
data = [(int(raw_row[0]), float(raw_row[1].replace(',', '.'))) for row in data]
with open('path_to_csv_file', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerows(data)
Can you consider a regex to match the ',' all in the text, then loop the match results in a process that takes ',' to '.'.

Writing strings to csv with library in Python

I tried to write strings to csv.
import csv
f = open('ttt.csv', 'w', encoding='utf-8', newline='')
wr = csv.writer(f)
for t in [['Love it', 'doenst matter']] :
lin = ''.join(t)
print(type(lin))
wr.writerow([lin])
f.close()
Normally, I expected and hope it will be written :
"Love itdoenst matter"
In this manner, it should be saved like :
Love itdoenst matter |
But actually it is written on csv file without quotes :
Love itdoenst matter
So in CSV file doesn't treat it as one element of string. So it saves Love itdoesnt matter on different columns.
Like
Love | itdoesnt | matter
Don't know why this happen
Your issue is that you do not tell the csv module that you want to delimit your file on spaces - the default is comma, as per the name. You can specify the delimiter as follows:
wr = csv.writer(f, delimiter=" ")
Under default csv quoting, this will then place quote marks around any elements containing the delimiter character.

Why is my csv file separated by " \t " instead of commas (" , ")?

I downloaded data from internet and saved as a csv (comma delimited) file. The image shows what the file looks like in excel.
Using csv.reader in python, I printed each row. I have shown my code below along with the output in Spyder.
import csv
with open('p_dat.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
I am very confused as to why my values are not comma separated. Any help will be greatly appreciated.
As pointed out in the comments, technically this is a TSV (tab-separated values) file, which is actually perfectly valid.
In practice, of course, not all libraries will make a "hard" distinction between a TSV and CSV file. The way you parse a TSV file is basically the same as the way you parse a CSV file, except that the delimiter is different.
There are actually multiple valid delimiters for this kind of file, such as tabs, commas, and semicolons. Which one you choose is honestly a matter of preference, not a "hard" technical limit.
See the specification for csvs. There are many options for the delimiter in the file. In this case you have a tab, \t.
The option is important. Suppose your data had commas in it, then a , as a delimiter would not be a good choice.
Even though they're named comma-separated values, they're sometimes separated by different symbols (like the tab character that you have currently).
If you want to use Python to view this as a comma-separated file, you can try something like:
import csv
...
with open('p_dat.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
commarow = row.replace("\t",",")
print(commarow)

How to write lines in a txt file, with data from a csv file

How can I tell Python to open a CSV file, and merge all columns per line, into new lines in a new TXT file?
To explain:
I'm trying to download a bunch of member profiles from a website, for a research project. To do this, I want to write a list of all the URLs in a TXT file.
The URLs are akin to this: website.com-name-country-title-id.html
I have written a script that takes all these bits of information for each member and saves them in columns (name/country/title/id), in a CSV file, like this:
mark japan rookie married
john sweden expert single
suzy germany rookie married
etc...
Now I want to open this CSV and write a TXT file with lines like these:
www.website.com/mark-japan-rookie-married.html
www.website.com/john-sweden-expert-single.html
www.website.com/suzy-germany-rookie-married.html
etc...
Here's the code I have so far. As you can probably tell I barely know what I'm doing so help will be greatly appreciated!!!
import csv
x = "http://website.com/"
y = ".html"
csvFile=csv.DictReader(open("NameCountryTitleId.csv")) #This file is stored on my computer
file = open("urls.txt", "wb")
for row in csvFile:
strArgument=str(row['name'])+"-"+str(row['country'])+"-"+str(row['title'])+"-"+str(row['id'])
try:
file.write(x + strArgument + y)
except:
print(strArgument)
file.close()
I don't get any error messages after running this, but the TXT file is completely empty.
Rather than using a DictReader, use a regular reader to make it easier to join the row:
import csv
url_format = "http://website.com/{}.html"
csv_file = 'NameCountryTitleId.csv'
urls_file = 'urls.txt'
with open(csv_file, 'rb') as infh, open(urls_file, 'w') as outfh:
reader = csv.reader(infh)
for row in reader:
url = url_format.format('-'.join(row))
outfh.write(url + '\n')
The with statement ensures the files are closed properly again when the code completes.
Further changes I made:
In Python 2, open a CSV files in binary mode, the csv module handles line endings itself, because correctly quoted column data can have embedded newlines in them.
Regular text files should be opened in text mode still though.
When writing lines to a file, do remember to add a newline character to delineate lines.
Using a string format (str.format()) is far more flexible than using string concatenations.
str.join() lets you join a sequence of strings together with a separator.
its actually quite simple, you are working with strings yet the file you are opening to write to is being opened in bytes mode, so every single time the write fails and it prints to the screen instead. try changing this line:
file = open("urls.txt", "wb")
to this:
file = open("urls.txt", "w")
EDIT:
i stand corrected, however i would like to point out that with an absence of newlines or some other form of separator, how do you intend to use the URLs later on? if you put newlines between each URL they would be easy to recover

Remove repeating lines of characters when reading text files in python?

I am reading a text file which was copied from a CSV file. When I read the file in python, I get a ton of unnecessary repeating lines as seen below. How can i strip away those three unwanted lines, including \cf0 and \cell\row at the beginning and end of each text?
Or should I read the text directly from the csv file itself? the text is in just one of the columns of the CSV file.
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clbrdrt\brdrs\brdrw20\brdrcf2 \clbrdrl\brdrs\brdrw20\brdrcf2 \clbrdrb\brdrs\brdrw20\brdrcf2 \clbrdrr\brdrs\brdrw20\brdrcf2 \clpadl100 \clpadr100 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720
\cf0 i have been using your product and it has been helping me a lot to solve business problem,\cell \row
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clbrdrt\brdrs\brdrw20\brdrcf2 \clbrdrl\brdrs\brdrw20\brdrcf2 \clbrdrb\brdrs\brdrw20\brdrcf2 \clbrdrr\brdrs\brdrw20\brdrcf2 \clpadl100 \clpadr100 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720
\cf0 I am very happy with your products. Very easy to use.\cell \row
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clbrdrt\brdrs\brdrw20\brdrcf2 \clbrdrl\brdrs\brdrw20\brdrcf2 \clbrdrb\brdrs\brdrw20\brdrcf2 \clbrdrr\brdrs\brdrw20\brdrcf2 \clpadl100 \clpadr100 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720
\cf0 Many improvements with income tracker, and other time saving elements. Newer look, easier navigation. I believe there definitely is a time savings from past versions.\cell \row
Here is a snippet of the csv file:
page_url Review_title Product_id Rating Publish_date Review_Description
www.blabla.com Great! 777777 5 01/01/14 Excellent upgrade! Was not disappointed!
I only copied text from the Review_Description column and pasted them all in a text file.
Here is my python code to just read the file:
text_file=open("my_text.txt", "r")
lines=text_file.readlines()
print lines
Your real problem here appears to be that you pasted the CSV into an RTF file, not a text file. Pasting into Wordpad on Windows or TextEdit on Mac (especially if you copied from, say, Excel or Numbers) and saving it without explicitly telling it to "save as plain text" or "convert to plain text" will generally "help" you this way automatically.
While you could try to parse the RTF to recover the original text, you're much better off just using the original text if possible. Parsing CSV files in Python—either with Pandas, or with the stdlib's csv module—is very easy.
For example, your file appears to use tabs as delimiters, and no other non-default features. If so:
import csv
with open('my_csv.csv', 'rb') as f:
reader = csv.DictReader(f, delimiter='\t')
reviews = [row['Review_Description'] for row in reader]
Now you have a list of all the reviews, and can do anything you want with them. If you just want to print them out, it's even simpler:
import csv
with open('my_csv.csv', 'rb') as f:
reader = csv.DictReader(f, delimiter='\t')
for row in reader:
print row['Review_Description']

Categories

Resources