Is there any way to while reading into this dictionary to convert the values of each key value to int? Originally they are strings but i would prefer them to be ints. This is what i tried but i am getting errors! Each key looks like {'USA': ('123,123', '312,321,321')} But i want those numbers to be ints
**def _demo_fileopenbox():
msg = "Pick A File!"
msg2 = "Select a country to learn more about!"
title = "Open files"
default="*.py"
f = fileopenbox(msg,title,default=default)
writeln("You chose to open file: %s" % f)
countries = {}
with open(f,'r') as handle:
reader = csv.reader(handle, delimiter = '\t')
for row in reader:
countries[row[0]] = ((int(row[1])),(int(row[2])))
while 1:
reply = choicebox(msg=msg2, choices= list(countries.keys()) )
writeln(reply + ";\tArea: " + (countries[reply])[0] + "\tPopulation: " + (countries[reply])[1] )
**
thanks!
Try removing the commas from the strings before converting them to ints:
countries[row[0]] = (int(row[1].replace(',', '')), int(row[2].replace(',', '')))
Your problem is that your numbers contain commas. Change the code to this:
for row in reader:
countries[row[0]] = tuple(int(a.replace(",","")) for a in row[1:])
Related
You may think of this one as another redundant question asked, but I tried to go through all similar questions asked, no luck so far. In my specific use-case, I can't use pandas or any other similar library for this operation.
This is what my input looks like
AttributeName,Value
Name,John
Gender,M
PlaceofBirth,Texas
Name,Alexa
Gender,F
SurName,Garden
This is my expected output
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
So far, I have tried to store my input into a dictionary and then tried writing it to a csv string. But, it is failing as I am not sure how to incorporate missing column values conditions. Here is my code so far
reader = csv.reader(csvstring.split('\n'), delimiter=',')
csvdata = {}
csvfile = ''
for row in reader:
if row[0] != '' and row[0] in csvdata and row[1] != '':
csvdata[row[0]].append(row[1])
elif row[0] != '' and row[0] in csvdata and row[1] == '':
csvdata[row[0]].append(' ')
elif row[0] != '' and row[1] != '':
csvdata[row[0]] = [row[1]]
elif row[0] != '' and row[1] == '':
csvdata[row[0]] = [' ']
for key, value in csvdata.items():
if value == ' ':
csvdata[key] = []
csvfile += ','.join(csvdata.keys()) + '\n'
for row in zip(*csvdata.values()):
csvfile += ','.join(row) + '\n'
For the above code as well, I took some help here. Thanks in advance for any suggestions/advice.
Edit #1 : Update code to imply that I am doing processing on a csv string instead of a csv file.
What you need is something like that:
import csv
with open("in.csv") as infile:
buffer = []
item = {}
lines = csv.reader(infile)
for line in lines:
if line[0] == 'Name':
buffer.append(item.copy())
item = {'Name':line[1]}
else:
item[line[0]] = line[1]
buffer.append(item.copy())
for item in buffer[1:]:
print item
If none of the attributes is mandatory, I think #framontb solution needs to be rearranged in order to work also when Name field is not given.
This is an import-free solution, and it's not super elegant.
I assume you have lines already in this form, with this columns:
lines = [
"Name,John",
"Gender,M",
"PlaceofBirth,Texas",
"Gender,F",
"Name,Alexa",
"Surname,Garden" # modified typo here: SurName -> Surname
]
cols = ["Name", "Gender", "Surname", "PlaceofBirth"]
We need to distinguish one record from another, and without mandatory fields the best I can do is start considering a new record when an attribute has already been seen.
To do this, I use a temporary list of attributes tempcols from which I remove elements until an error is raised, i.e. new record.
Code:
csvdata = {k:[] for k in cols}
tempcols = list(cols)
for line in lines:
attr, value = line.split(",")
try:
csvdata[attr].append(value)
tempcols.remove(attr)
except ValueError:
for c in tempcols: # now tempcols has only "missing" attributes
csvdata[c].append("")
tempcols = [c for c in cols if c != attr]
for c in tempcols:
csvdata[c].append("")
# write csv string with the code you provided
csvfile = ""
csvfile += ",".join(csvdata.keys()) + "\n"
for row in zip(*csvdata.values()):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,PlaceofBirth,Surname,Gender
John,Texas,,M
Alexa,,Garden,F
While, if you want to sort columns according to your desired output:
csvfile = ""
csvfile += ",".join(cols) + "\n"
for row in zip(*[csvdata[k] for k in cols]):
csvfile += ",".join(row) + "\n"
>>> print(csvfile)
Name,Gender,Surname,PlaceofBirth
John,M,,Texas
Alexa,F,Garden,
This works for me:
with open("in.csv") as infile, open("out.csv", "w") as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
Update: For input and output as strings:
import csv, io
with io.StringIO(indata) as infile, io.StringIO() as outfile:
incsv, outcsv = csv.reader(infile), csv.writer(outfile)
incsv.__next__() # Skip 1st row
outcsv.writerows(zip(*incsv))
print(outfile.getvalue())
I'm looping over a csv of links, visiting those links, and then trying to write information from those links to a new file:
with open("hrefs.csv", "rb") as f:
reader = csv.reader(f)
for row in reader:
newUrl = row[0]
response = requests.get(newUrl)
newData = response.text
newSoup = BeautifulSoup(newData, 'lxml')
newstring = ''
titles = newSoup.findAll('span', {'id': 'titletextonly'})
prices = newSoup.findAll('span', {'class': 'price'})
newstring += titles[0].text + ',' + prices[0].text + ','
for ana in newSoup.findAll('p',{'class':'attrgroup'}):
for myb in ana.findAll('b'):
newstring += myb.text + ','
print newstring
listFile = open("output.csv", 'wb')
writer = csv.writer(listFile)
writer.writerow(newstring.encode('ascii', 'ignore').decode('ascii'))
There are a couple problems I'm running into. First, I thought the csv would realize that there are comma separated values and put each attribute in a new column. Second, it seems that one letter is getting put in each column. When I simple print each newstring it is giving me a coherent string.
You need to give writer.writerow a sequence of strings:
writer.writerow(newstring.split(","))
would be the easiest change from what you currently have.
I have a text file consisting of 100 records like
fname,lname,subj1,marks1,subj2,marks2,subj3,marks3.
I need to extract and print lname and marks1+marks2+marks3 in python. How do I do that?
I am a beginner in python.
Please help
When I used split, i got an error saying
TypeError: Can't convert 'type' object to str implicitly.
The code was
import sys
file_name = sys.argv[1]
file = open(file_name, 'r')
for line in file:
fname = str.split(str=",", num=line.count(str))
print fname
If you want to do it that way, you were close. Is this what you were trying?
file = open(file_name, 'r')
for line in file.readlines():
fname = line.rstrip().split(',') #using rstrip to remove the \n
print fname
Note: its not a tested code. but it tries to solve your problem. Please give it a try
import csv
with open(file_name, 'rb') as csvfile:
marksReader = csv.reader(csvfile)
for row in marksReader:
if len(row) < 8: # 8 is the number of columns in your file.
# row has some missing columns or empty
continue
# Unpack columns of row; you can also do like fname = row[0] and lname = row[1] and so on ...
(fname,lname,subj1,marks1,subj2,marks2,subj3,marks3) = *row
# you can use float in place of int if marks contains decimals
totalMarks = int(marks1) + int(marks2) + int(marks3)
print '%s %s scored: %s'%(fname, lname, totalMarks)
print 'End.'
"""
sample file content
poohpool#signet.com; meixin_kok#hotmail.com; ngai_nicole#hotmail.com; isabelle_gal#hotmail.com; michelle-878#hotmail.com;
valerietan98#gmail.com; remuskan#hotmail.com; genevieve.goh#hotmail.com; poonzheng5798#yahoo.com; burgergirl96#hotmail.com;
insyirah_powergals#hotmail.com; little_princess-angel#hotmail.com; ifah_duff#hotmail.com; tweety_butt#hotmail.com;
choco_ela#hotmail.com; princessdyanah#hotmail.com;
"""
import pandas as pd
file = open('emaildump.txt', 'r')
for line in file.readlines():
fname = line.split(';') #using split to form a list
#print(fname)
df1 = pd.DataFrame(fname,columns=['Email'])
print(df1)
I have read other answers to similar questions on google and on this site, and none of them work in my script.
I need to sort information in a csv_file by the third column from a py script. And then, with the sorted info, find duplicates and remove them but add a count to the csv_file.
for ip in open("lists.txt"):
with open("csv_file.csv", "a") as csv_file:
csv_file.write("\r IP:" + ip.strip() + ", Count, A, B, C \r")
for line in open("data.txt"):
new_line = line.split()
if "word" in new_line:
if "word"+ip.strip() in new_line:
csv_file.write(ip.strip() + ", " + new_line[10].replace("word=", ", ") + new_line[12].replace("word=", ", "))
try:
csv_file.write(new_line[14].replace("word=", ", "))
except IndexError:
pass
csv_file.write("\r")
with open("csv_file.csv", "r") as inputfile:
reader = csv.reader(inputfile)
headers = next(reader)
for row in reader:
key = (row[0], row[1:])
if key not in rows:
rows[key] = row + [0,]
rows[key][-1] += 1
I have no idea why this isn't working, and returning errors like:
TypeError: unhashable type: 'list'
Question: How do I sort by the 3rd column, remove duplicates and add a duplicate count to my csv_file through a py script?
If I'm not mistaken the "a" tag opens for writing in this line:
with open("csv_file.csv", "a") as inputfile:
This means you're opening for writing, not for reading. You should use either "r" or "+".
Having a little trouble converting the two elements that are in a tuple inside of a dictionary into int values. the keys of the dictionary are country name and the tuple of info is (the area, the population). This is what i have so far :
def _demo_fileopenbox():
msg = "Pick A File!"
msg2 = "Select a country to learn more about!"
title = "Open files"
default="*.py"
f = fileopenbox(msg,title,default=default)
writeln("You chose to open file: %s" % f)
countries = {}
with open(f,'r') as handle:
reader = csv.reader(handle, delimiter = '\t')
for row in reader:
countries[row[0]] = (row[1].replace(',', ''), row[2].replace(',', ''))
for i in countries:
int((countries[i])[0])
int((countries[i])[1])
#while 1:
# reply = choicebox(msg=msg2, choices= list(countries.keys()) )
# writeln(reply + "-\tArea: " + (countries[reply])[0] + "\tPopulation: " + (countries[reply])[1] )
but i keep getting this error :
int((countries[i])[0])
ValueError: invalid literal for int() with base 10: ''
any ideas how to fix this or a better way to do this: