Error while working on .csv with load_csv - python

I am trying to work on the below code:
ds = load_csv('C:\\User.csv')
f = open(ds,'r')
lines = f.readlines()[1:]
print(lines)
f.close()
First line of dataset is string. I am getting the below error:
TypeError: expected str, bytes or os.PathLike object, not list
Though when I try to open the file with below code it works:
filename='C:\\User.csv'
f = open(filename,'r')
lines = f.readlines()[1:]
print(lines)
f.close()
I am ignoring the first line because its string and rest of the dataset is float.
Update:
load_csv
def load_csv(ds):
dataset = list()
with open(ds, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
Even if I use this way still get the error:
ds = load_csv('C:\\Users.csv')
minmax = dataset_minmax(ds)
normalize_dataset(ds, minmax)
def dataset_minmax(dataset):
minmax = list()
for i in range(len(dataset[0])):
col_values = [row[i] for row in dataset]
value_min = min(col_values)
value_max = max(col_values)
minmax.append([value_min, value_max])
return minmax
def normalize_dataset(dataset, minmax):
for row in dataset:
for i in range(len(row)):
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
It gives error on:
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
Error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Since you're now getting a different error, I'll give a second answer.
This error means that the two variables in your subtraction are strings, not numbers.
In [1]: 5 - 3
Out[1]: 2
In [2]: '5' - '3'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-4ef7506473f1> in <module>
----> 1 '5' - '3'
TypeError: unsupported operand type(s) for -: 'str' and 'str'
This is because the CSV reader assumes everything is a string. You need to convert it to floats, e.g., by changing load_csv to do something like dataset.append(list(map(float, row))) instead of your existing append statement.
The min-max stuff doesn't fail, because Python's min and max work on strings, too:
In [3]: min('f', 'o', 'o', 'b', 'a', 'r')
Out[3]: 'a'
However, it might be giving you incorrect answers:
In [4]: min('2.0', '10.0')
Out[4]: '10.0'
By the way, if you're doing much along these lines, you'd probably benefit from using the Pandas package instead of rolling your own.

I am guessing the error is in the open command in your code. The reason why this fails is that the open command expects a string or operating system path-like object that is a handle to a file that it can open (like it says in the error). The function load_csv probably returns a list which is an incompatible format for open

Look at your first two lines where it doesn't work:
ds = load_csv('C:\\User.csv')
f = open(ds,'r')
ds is an object returned (from TensorFlow, I assume?) which contains the data. Then you open it as if it were a filename. This is why the interpreter complains. ds is the dataset, not the string representing the file.
It works in the other example, because you use a filename.

Related

Python: How to remove $ character from list after CSV import

I am attempting to import a CSV file into Python. After importing the CSV, I want to take an every of every ['Spent Past 6 Months'] value, however the "$" symbol that the CSV includes in front of that value is causing me problems. I've tried a number of things to get rid of that symbol and I'm honestly lost at this point!
I'm really new to Python, so I apologize if there is something very simple here that I am missing.
What I have coded is listed below. My output is listed first:
File "customer_regex2.py", line 24, in <module>
top20Cust = top20P(data)
File "customer_regex2.py", line 15, in top20P
data1 += data1 + int(a[i]['Spent Past 6 Months'])
ValueError: invalid literal for int() with base 10: '$2099.83'
error screenshot
import csv
import re
data = []
with open('customerData.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
def top20P(a):
outputList=[]
data1=0
for i in range(0,len(a)):
data1 += data1 + int(a[i]['Spent Past 6 Months'])
top20val= int(data1*0.8)
for j in range(0,len(a)):
if data[j]['Spent Past 6 Months'] >= top20val:
outputList.append('a[j]')
return outputList
top20Cust = top20P(data)
print(outputList)
It looks like a datatype issue.
You could strip the $ characters like so:
someString = '$2099.83'
someString = someString.strip('$')
print(someString)
2099.83
Now the last step is to wrap in float() since you have decimal values.
print(type(someString))
<class 'str'>
someFloat = float(someString)
print(type(someFloat))
<class 'float'>
Hope that helps.

AttributeError: 'list' object has no attribute 'SeqRecord' - while trying to slice multiple sequences with Biopython>SeqIO from fasta file

I am trying to generate varying length N and C termini Slices (1,2,3,4,5,6,7). But before I get there I am having problems just reading in my fasta files. I was following the 'Random subsequences' head tutorial from:https://biopython.org/wiki/SeqIO . But in this case there is only one sequence so maybe that is where I went wrong. The code with example sequences and my errors. Any help would be much appreciated. I am clearly out of my depth. It looks like there are a lot of similar problems others have come across so I imagine it is something stupid that I am doing because I do not fully understand the SeqRecord structures. Thanks!
Two example sequences in my file domains.fasta:
>GA98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE
>GB98
TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE
my code that is not working:
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
# Load data:
domains = list(SeqIO.parse("domains.fa",'fasta'))
#set up receiving arrays
home=[]
num=1
#slice data
for i in range(0, 6):
num = num+1
domain = domains
seq_n = domains.seq[0:num]
seq_c = domains.seq[len(domain)-num:len(domain)]
name = domains.id
record_d = SeqRecord(domain,'%s' % (name), '', '')
home.append(record_d)
record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
home.append(record_n)
record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")
error I get is:
Traceback (most recent call last):
File "~/fasta_nc_sequences.py", line 20, in <module>
seq_n = domains.seq[0:num]
AttributeError: 'list' object has no attribute 'SeqRecord'
When I print out 'domains = list(SeqIO.parse("domains.fa",'fasta'))' I get this:
[SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTLKDEIKTFTVTE', SingleLetterAlphabet()), id='GA98', name='GA98', description='GA98', dbxrefs=[]), SeqRecord(seq=Seq('TTYKLILNLKQAKEEAIKELVDAGTAEKYFKLIANAKTVEGVWTYKDEIKTFTVTE', SingleLetterAlphabet()), id='GB98', name='GB98', description='GB98', dbxrefs=[])]
I am not sure why I cannot access what is within the SeqRecord. Maybe it is because I wrapped the SeqIO.parse in a list because before I was being thrown a different error:
AttributeError: 'generator' object has no attribute 'seq'
I was working one level too low in my for loop so I was not iterating through the sequences. There were also problems accessing the C terminus sequence. Now the code works.
#Load data:
domains = list(SeqIO.parse("examples/data/domains.fa",'fasta'))
#set up receiving arrays
home=[]
#num=1
#subset data
for record in (domains):
num = 0
domain = record.seq
name = record.id
record_d = SeqRecord(domain,'%s' % (name), '', '')
home.append(record_d)
for i in range(0, 6):
num= num+1
seq_n = record.seq[0:num]
seq_c = record.seq[len(record.seq)-num:len(record.seq)]
record_n = SeqRecord(seq_n,'%s_n_%i' % (name,num), '', '')
home.append(record_n)
record_c = SeqRecord(seq_c,'%s_c_%i' % (name,num), '', '')
home.append(record_c)
SeqIO.write(home, "domains_variants.fasta", "fasta")

Hangman code having trouble accessing other file

So I made a program for hangman that accesses an input file but it is having trouble accessing it once I type it in.
This is the function that is calling the file
def getWord(filename):
print("Loading from file...")
inputFile = open(filename, 'r')
wordlist = inputFile.read().splitlines()
print(len(wordlist) + " lines loaded.")
return wordlist
filename = input("What file will the words come from? ")
wordlist = getWord(filename)
theWordLine = random.choice(wordlist)
game(theWordLine)
And this is the file itself
person,Roger
place,Home
phrase,A Piece Of Cake
The error it is giving me is this
File "hangman.py' , line 77, in <module>
wordlist = getWord(filename)
File "hangman.py' , line 10, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Can anyone help me out?
The error states: TypeError: unsupported operand type(s) for +: 'int' and 'str'. That means that you cannot use + with something of type int and something of type str. the len function returns an int. Thus you need to cast it into a str before you being able to concatenate it with another str
It should be print(str(len(wordlist)) + " lines loaded.") instead of print(len(wordlist) + " lines loaded.")
You may also want to use string formatting as a comment mentions. If you are using python 3.6 or higher, you can try f-strings: f'{len(wordlist)} lines loaded}'.
It has nothing to do with reading the file. Read the error: an integer and string cannot be added.
Why are you getting this error? Because len() returns an integer, not a string. You can cast len()'s return to a string, or you can just use an f-string:
f'{len(wordlist)} lines loaded}'
Use print(len(wordlist), " lines loaded.") instead of print(len(wordlist) + " lines loaded.")
print(len(wordlist) + " lines loaded.") is causing your issue as it is trying to apply the + operand to variables of different datatypes.
You could use print("{} lines loaded".format(len(wordlist))) to avoid this.

Python data error: ValueError: invalid literal for int() with base 10: '42152129.0'

I am working on a simple data science project with Python. However, I am getting an error which is the following:
ValueError: could not convert string to float:
Here is what my code looks like:
import matplotlib.pyplot as plt
import csv
from datetime import datetime
filename = 'USAID.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
monies = []
for row in reader:
money = int(row[1])
monies.append(money)
print(monies)
if I change the line:
money = int(row[1]) to money = float(row[1])
I get this error: ValueError: could not convert string to float:
Here are my tracebacks: first error:
Traceback (most recent call last):
File "funding.py", line 60, in <module>
money = int(row[1])
ValueError: invalid literal for int() with base 10: '42152129.0'
Second Error:
Traceback (most recent call last):
File "funding.py", line 60, in <module>
money = float(row[1])
ValueError: could not convert string to float:
Any help would be great! Thank you!
The first failure is because you passed a string with . in it to int(); you can't convert that to an integer because there is a decimal portion.
The second failure is due to a different row[1] string value; one that is empty.
You could test for that:
if row[1]:
money = float(row[1])
Since you are working with a Data Science project you may want to consider using the pandas project to load your CSV instead with DataFrame.read_csv().
Some of the entries in row[1] are empty so you probably want to check for those before trying to cast. Pass a default value of, say 0, if the entry is blank.
Then you should consider using decimal for computations that relate to money.
I had the same issue while I was learning data visualization using Seaborn. Thanks EdChum's help, I was able to solve the issue with his approach:
df['col'] = pd.to_numeric(df['col'], errors='coerce')

How to sum values for the same key [duplicate]

This question already has answers here:
I'm getting a TypeError. How do I fix it?
(2 answers)
Why dict.get(key) instead of dict[key]?
(14 answers)
Closed 6 months ago.
I have a file
gu|8
gt|5
gr|5
gp|1
uk|2
gr|20
gp|98
uk|1
me|2
support|6
And I want to have one number per TLD like:
gr|25
gp|99
uk|3
me|2
support|6
gu|8
gt|5
and here is my code:
f = open(file,'r')
d={}
for line in f:
line = line.strip('\n')
TLD,count = line.split('|')
d[TLD] = d.get(TLD)+count
print d
But I get this error:
d[TLD] = d.get(TLD)+count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Can anybody help?
Taking a look at the full traceback:
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD) + count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
The error is telling us that we tried to add something of type NoneType to something of type str, which isn't allowed in Python.
There's only one object of type NoneType, which, unsurprisingly, is None – so we know that we tried to add a string to None.
The two things we tried to add together in that line were d.get(TLD) and count, and looking at the documentation for dict.get(), we see that what it does is
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
Since we didn't supply a default, d.get(TLD) returned None when it didn't find TLD in the dictionary, and we got the error attempting to add count to it. So, let's supply a default of 0 and see what happens:
f = open('data','r')
d={}
for line in f:
line = line.strip('\n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + count
print d
$ python mee.py
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD, 0) + count
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Well, we've still got an error, but now the problem is that we're trying to add a string to an integer, which is also not allowed, because it would be ambiguous.
That's happening because line.split('|') returns a list of strings – so we need to explicitly convert count to an integer:
f = open('data','r')
d={}
for line in f:
line = line.strip('\n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + int(count)
print d
... and now it works:
$ python mee.py
{'me': 2, 'gu': 8, 'gt': 5, 'gr': 25, 'gp': 99, 'support': 6, 'uk': 3}
Turning that dictionary back into the file output you want is a separate issue (and not attempted by your code), so I'll leave you to work on that.
To answer the title of your question: "how to sum values for the same key" - well, there is the builtin class called collections.Counter that is a perfect match for you:
import collections
d = collections.Counter()
with open(file) as f:
tld, cnt = line.strip().split('|')
d[tld] += int(cnt)
then to write back:
with open(file, 'w') as f:
for tld, cnt in sorted(d.items()):
print >> f, "%s|%d" % (tld, cnt)

Categories

Resources