I want to check if a specific word (user-defined via input) occurs in a csv file. Now, I've come up with a code that does that but since I'm a beginner and don't want to adapt any "bad habits", I'm wondering if it is the fastest, easiest and shortest possibility. Any given improvements are appreciated. sa
This works (mostly, see below), but the whole thing with the "yes" variable makes me think that there has to be a better way to solve this.
def add(self, name):
with open(filepath, "r+") as file:
csvreader = csv.reader(file, delimiter=",", quotechar='"')
csvwriter = csv.writer(file, delimiter=",", quotechar='"')
yes = False
for line in csvreader:
if name in line[0]:
yes = True
if yes:
print("This ingredient has already been added")
else:
csvwriter.writerow([name])
It sometimes throws an "IndexError: list index out of range". I don not have any idea as to why because it only does that sometimes. Other times it works fine...
There are 2 improvements you could make:
After the value is found and you set the found flag to True, add a break; there's no point continuing to scan the file.
Your index error likely comes from a blank line. This will be falsey so we can add a check for that before trying to access by index. if line and name in line[0]:. This will not attempt the index if the first condition is not True.
In terms of being falsey, this refers to objects that will be considered False without actually being a Boolean. This includes None and empty sequences such as an empty string (''), an empty list ([]) etc. Empty sequences don't support indexing, even for the zeroth index, so that's why you get an error on a blank line.
With falsey items, we don't need a direct comparison (==) to True or False; indeed they would fail. But you can do boolean-type checks on them e.g. if some_sequence: or if not some_sequence:. Also, and checks conditions left-to-right and will stop as soon as it finds a falsey condition. In the case of if line and... it never gets to the point of trying to index line because it already knows the list is empty. Hence you never try to take the index of the empty list.
There is no reason to use csv at all to find a word in a file:
def word_in_file(filename, name)
with open(filename, 'r') as f:
for line in f:
if name in line:
return True
return False
If I have a list of strings and want to eliminate leading and trailing whitespaces from it, how can I use .strip() effectively to accomplish this?
Here is my code (python 2.7):
for item in myList:
item = item.strip()
print item
for item in myList:
print item
The changes don't preserve from one iteration to the next. I tried using map as suggested here (https://stackoverflow.com/a/7984192) but it did not work for me. Please help.
Note, this question is useful:
An answer does not exist already
It covers a mistake someone new to programming / python might make
Its title covers search cases both general (how to update values in a list) and specific (how to do this with .strip()).
It addresses previous work, in particular the map solution, which would not work for me.
I'm guessing you tried:
map(str.strip, myList)
That creates a new list and returns it, leaving the original list unchanged. If you want to interact with the new list, you need to assign it to something. You could overwrite the old value if you want.
myList = map(str.strip, myList)
You could also use a list comprehension:
myList = [item.strip() for item in myList]
Which many consider a more "pythonic" style, compared to map.
I'm answering my own question here in the hopes that it saves someone from the couple hours of searching and experimentation it took me.
As it turns out the solution is fairly simple:
index = 0
for item in myList:
myList[index] = item.strip()
index += 1
for item in myList:
print "'"+item+"'"
Single quotes are concatenated at the beginning/end of each list item to aid detection of trailing/leading whitespace in the terminal. As you can see, the strings will now be properly stripped.
To update the values in the list we need to actually access the element in the list via its index and commit that change. I suspect the reason is because we are passing by value (passing a copy of the value into item) instead of passing by reference (directly accessing the underlying list[item]) when we declare the variable "item," whose scope is local to the for loop.
I am reading from file x which is contained individual data. These data are separated from each other by new line.I want to calculate tf_idf_vectorizer() for each individual data. So, I need to remove all members of the tweets whenever the code fine new line (\n) . I got error for the bold line in my code.
def load_text():
file=open('x.txt', 'r')
tweets = []
all_matrix = []
for line in file:
if line in ['\n', '\r\n']:
all_matrix.append(tf_idf_vectorizer(tweets))
**for i in tweets: tweets.remove(i)**
else:
tweets.append(line)
file.close()
return all_matrix
You can make tweets an empty list again with a simple assignment.
tweets = []
If you actually need to empty out the list in-place, the way you do it is either:
del tweets[:]
… or …
tweets[:] = []
In general, you can delete or replace any subslice of a list in this way; [:] is just the subslice that means "the whole list".
However, since nobody else has a reference to tweets, there's really no reason to empty out the list; just create a new empty list, and bind tweets to that, and let the old list become garbage to be cleaned up:
tweets = []
Anyway, there are two big problems with this:
for i in tweets: tweets.remove(i)
First, when you want to remove a specific element, you should never use remove. That has to search the list to find a matching element—which is wasteful (since you already know which one you wanted), and also incorrect if you have any duplicates (there could be multiple matches for the same element). Instead, use the index. For example, del tweets[index]. You can use the enumerate function to get the indices. The same thing is true for lots of other list, string, etc. functions—don't use index, find, etc. with a value when you could get the index directly.
Second, if you remove the first element, everything else shifts up by one. So, first you remove element #0. Then, when you remove element #1, that's not the original element #1, but the original #2, which has shifted up one space. And besides skipping every other element, once you're half-way through, you're trying to remove elements past the (new) end of the list. In general, avoid mutating a list while iterating over it; if you must mutate it, it's only safe to do so from the right, not the left (and it's still tricky to get right).
The right way to remove elements one by one from the left is:
while tweets:
del tweets[0]
However, this will be pretty slow, because you keep having to re-adjust the list after each removal. So it's still better to go from the right:
while tweets:
del tweets[-1]
But again, there's no need to go one by one when you can just do the whole thing at once, or not even do it, as explained above.
You should never try to remove items from a list while iterating over that list. If you want a fresh, empty list, just create one.
tweets = []
Otherwise you may not actually remove all the elements of the list, as I suspect you noticed.
You could also re-work the code to be:
from itertools import groupby
def load_tweet(filename):
with open(filename) as fin:
tweet_blocks = (g for k, g in groupby(fin, lambda line: bool(line.strip())) if k)
return [tf_idf_vectorizer(list(tweets)) for tweets in tweet_blocks]
This groups the file into runs of non-blank lines and blank lines. Where the lines aren't blank, we build a list from them to pass to the vectorizer inside a list-comp. This means that we're not having references to lists hanging about, nor are we appending one-at-a-time to lists.
I'm trying to strip subdomains off of a large list of domains in a text file. The script works but only for the last domain in the list. I know the problem is in the loop but can't pinpoint the extact issue. Thanks for any assistance:)
with open ("domainlist.txt", "r") as datafile:
s = datafile.read()
for x in s:
t = '.'.join(s.split('.')[-2:])
print t
this will take "example.test.com" and "return test.com". The only problem is it won't perform this for every domain in the list - only the last one.
What you want is to build up a new list, by modifying the elements of an old one, fortunately, Python has the list comprehension - perfect for this job.
with open("domainlist.txt", "r") as datafile:
modified = ['.'.join(x.split('.')[-2:]) for x in datafile]
This behaves exactly like creating a list and adding items to it in a for loop, except faster and nicer to read. I recommend watching the video linked above for more information on how to use them.
Note that file.read() reads the entire thing in as one big string, what you wanted was probably to loop over the lines of the file, which is done just by looping over the file itself. Your current loop loops of the individual characters of the file, rather than lines.
You are overwriting t in each loop iteration, so naturally only the value from the last iteration stays in t. INstead put the string inside a list with list.append.
Try this out. Better readability.
with open ("domainlist.txt", "r") as datafile:
s = datafile.readlines()
t = []
for x in s:
t.append('.'.join(x.split('.')[-2:]))
print t
I have a problem while I'm doing my assignment with python.
I'm new to python so I am a complete beginner.
Question: How can I merge two files below?
s555555,7
s333333,10
s666666,9
s111111,10
s999999,9
and
s111111,,,,,
s222222,,,,,
s333333,,,,,
s444444,,,,,
s555555,,,,,
s666666,,,,,
s777777,,,,,
After merging, it should look something like:
s111111,10,,,,
s222222,,,,,
s333333,10,,,,
s444444,,,,,
s555555,7,,,,
s666666,9,,,,
s777777,,,,,
s999999,9,,,,
Thanks for reading and any helps would be appreciated!!!
Here are the steps you can follow for one approach to the problem. In this I'll be using FileA, FileB and Result as the various filenames.
One way to approach the problem is to give each position in the file (each ,) a number to reference it by, then you read the lines from FileA, then you know that after the first , you need to put the first line from FileB to build your result that you will write out to Result.
Open FileA. Ideally you should use the with statement because it will automatically close the file when its done. Or you can use the normal open() call, but make sure you close the file after you are done.
Loop through each line of FileA and add it to a list. (Hint: you should use split()). Why a list? It makes it easier to refer to items by index as that's our plan.
Repeat steps 1 and 2 for FileB, but store it in a different list variable.
Now the next part is to loop through the list of lines from FileA, match them with the list from FileB, to create a new line that you will write to the Result file. You can do this many ways, but a simple way is:
First create an empty list that will store your results (final_lines = [])
Loop through the list that has the lines for FileA in a for loop.
You should also keep in mind that not every line from FileA will have a corresponding line in FileB. For every first "bit" in FileA's list, find the corresponding line in FileB's list, and then get the next item by using the index(). If you are keen you would have realized that the first item is always 0 and the next one is always 1, so why not simply hard code the values? If you look at the assignment; there are multiple ,s so it could be that at some point you have a fourth or fifth "column" that needs to be added. Teachers love to check for this stuff.
Use append() to add the items in the right order to final_lines.
Now that you have the list of lines ready, the last part is simple:
Open a new file (use with or open)
Loop through final_lines
Write each line out to the file (make sure you don't forget the end of line character).
Close the file.
If you have any specific questions - please ask.
Not relating to python, but on linux:
sort -k1 c1.csv > sorted1
sort -k1 c2.csv > sorted2
join -t , -11 -21 -a 1 -a 2 sorted1 sorted2
Result:
s111111,10,,,,,
s222222,,,,,
s333333,10,,,,,
s444444,,,,,
s555555,7,,,,,
s666666,9,,,,,
s777777,,,,,
s999999,9
Make a dict using the first element as a primary key, and then merge the rows?
Something like this:
f1 = csv.reader(open('file1.csv', 'rb'))
f2 = csv.reader(open('file2.csv', 'rb'))
mydict = {}
for row in f1:
mydict[row[0]] = row[1:]
for row in f2:
mydict[row[0]] = mydict[row[0]].extend(row[1:])
fout = csv.write(open('out.txt','w'))
for k,v in mydict:
fout.write([k]+v)