How to capture output of function in for-loop - python

I have a for-loop that calls a function. The function iterates over elements in a list and constructs vectors from it. I am able to print out each of these vectors put since I need to perform operations on them I need the loop to actually return each of them.
for text in corpus:
text_vector = get_vector(lexicon, text)
print(text_vector)

You can store them in a list.
You can create a list by doing:
output = []
Then in the loop you can add the values like this:
for text in corpus:
text_vector = get_vector(lexicon, text)
print(text_vector)
output.append(text_vector)
Now you have saved each item from the for loop in the list.

I think only you need is a list in the global scope and append to it.
vector_texts = []
for text in corpus:
text_vector = get_vector(lexicon, text)
vector_texts.append(text_vector)
print(text_vector)

Related

Python: Glob.Glob wildcard matching, loop through list for wildcard

I am trying to use the glob.glob wildcard name to pickup files with particular names.
listing = ['DBMP','CIFP']
for x in range(len(listing)):
print(listing[x])
pklFilenamesList = glob.glob(os.path.join(input_location, '*{x}_.pkl'))
How can I loop through my listing strings and concatenate that with _.pkl files names for my pklFilenamesList variable?
I want to create a list like so :
*DBMP_.pkl
*CIFP_.pkl
Update, I am trying to loop through the list like below :
pklFilenamesList = glob.glob(os.path.join(input_location, '*{x}_.pkl'))
for filecounter, filename in enumerate(pklFilenamesList):
For the list part, you can easily get the result using a list comprehension with f-string.
newlist = [f'*{x}_.pkl' for x in listing]
['*DBMP_.pkl', '*CIFP_.pkl']
Update
After getting the newlist, you can just loop over it and append the result into a list.
pklFilenamesList = []
for i in newlist:
pklFilenamesList.append(glob.glob(os.path.join(input_location, i )))
for j in pklFilenamesList:
for filecounter, filename in enumerate(j):
...
or you can continue with what you need to do without appending it.

Apply function to all inputs of a list dictionary

I am trying to run a function pre_process on a list input k1_tweets_filtered['text'].
however, the function only seems to work on one input at a time i.e. k1_tweets_filtered[1]['text'].
I want the function to run on all inputs of k1_tweets_filtered['text'].
I have tried to use loops, however, the loop only outputs the words of the first input .
I am wondering if this is the right approach as to how I can apply this to the rest of the inputs
This is the question I am trying to solve and what I have coded so far.
Write your code to pre-process and clean up all tweets
stored in the variable k1_tweets_filtered, k2_tweets_filtered and k3_tweets_filtered using the
function pre_process() to result in new variables k1_tweets_processed, k2_tweets_processed
and k3_tweets_processed.
for x in range(len(k1_tweets_filtered)):
tweet_k1 = k1_tweets_filtered[x]['text']
x+=1
k1_tweets_processed = pre_process(tweet_k1)
The function pre_process is below, however, I know that this is correct, as it was given to me.
def remove_non_ascii(s): return "".join(i for i in s if ord(i)<128)
def pre_process(doc):
"""
pre-processes a doc
* Converts the tweet into lower case,
* removes the URLs,
* removes the punctuations
* tokenizes the tweet
* removes words less that 3 characters
"""
doc = doc.lower()
# getting rid of non ascii codes
doc = remove_non_ascii(doc)
# replacing URLs
url_pattern = "http://[^\s]+|https://[^\s]+|www.[^\s]+|[^\s]+\.com|bit.ly/[^\s]+"
doc = re.sub(url_pattern, 'url', doc)
# removing dollars and usernames and other unnecessary stuff
userdoll_pattern = "\$[^\s]+|\#[^\s]+|\&[^\s]+|\*[^\s]+|[0-9][^\s]+|\~[^\s]+"
doc = re.sub(userdoll_pattern, '', doc)
# removing punctuation
punctuation = r"\(|\)|#|\'|\"|-|:|\\|\/|!|\?|_|,|=|;|>|<|\.|\#"
doc = re.sub(punctuation, ' ', doc)
return [w for w in doc.split() if len(w) > 2]
k1_tweets_processed = []
for i in range(len(k1_tweets_filtered)):
tweet_k1 = k1_tweets_filtered[i]['text']
k1_tweets_processed.append(pre_process(tweet_k1))
When you iterate it is better to use i,j for variable name, and if you have "for i n range(10)" you should not increment it inside your loop. And previously you set k1_tweets_processed to single preprocessed text instead of creating list and adding new texts to it.

How to modify each element in a list without creating a new list

I want to modify all elements in a list such that I delete all characters after certain specific characters.
list is ['JACK\NAME1','TOM\NAME2'] and I want to modify it into ['JACK', 'TOM']
Right now I am using a For Loop with Split command:
text = ['JACK\\NAME1','TOM\\NAME2']
text_use = []
for item in text:
item = item.split('\\',1)[0]
text_use.append(item)
text_use
But I also have to create a new empty list (text_use) and append items to it.
Is there a better way to do this? Where I don't have to use a For Loop?
Or where I don't have to create an empty list and then append items to it?
Thank you
R
In my opinion, it's more idiomatic (pythonic) to use enumerate:
for i, item in enumerate(text):
text[i] = item.split('\\',1)[0]
like this
text = ['JACK\\NAME1','TOM\\NAME2']
text = [x.split('\\',1)[0] for x in text]
Iterate over the list positions instead of the list values, and then access the items by their position in the list:
for pos in range(len(text)):
text[pos] = text[pos].split('\\',1)[0]

Removing duplicate results from the loop output in Python

My data from a loop generates a series of strings which are sentences retrieved from a database. However, my data structure in the database needs to have duplicates but I want to omit the duplicates in the output. Assuming my loop and results is as follow:
for text in document:
print(text)
Output:
He goes to school.
He works here.
we are friends.
He goes to school.
they are leaving us alone.
..........
How can I set up a condition so that the program reads all the output generated and if find duplicate results (eg. He goes to school) it will only show one record of to me instead of multiple similar records?
already_printed = set()
for text in document:
if text not in already_printed:
print(text)
already_printed.add(text)
You can use set. Like:
values = set(document)
for text in values:
print(text)
Or can use list:
temp_list = []
for text in document:
if text not in temp_list:
temp_list.append(text)
print(text)
Or you can use dict:
temp_dict={}
for text in document:
if text not in temp_dict.keys():
temp_dict[text]=1
print(text)
Split document by '\n' or read by rows to arr = []. I.e. in for loop store arr += row.lowercase().
arr = list(set(arr)) will remove the duplicates.
If the case does not matter, you can take set of the list.
for text in set(i.lower() for i in document):
print (text)
Use built in option SET of python to remove duplicates
documents = ["He goes to school", "He works here. we are friends", "He goes to school", "they are leaving us alone"]
list(set(document))

Python: Splitting a string in a list

I am having trouble splitting an '&' in a list of URL's. I know it is because I cannot split a list directly but I cannot figure out how to get around this error. I am open for any suggestions.
def nestForLoop():
lines = open("URL_leftof_qm.txt", 'r').readlines()
for l in lines:
toke1 = l.split("?")
toke2 = toke1.split("&")
for t in toke2:
with open("ampersand_right_split.txt".format(), 'a') as f:
f.write
lines.close()
nestForLoop()
NO. STOP.
qs = urlparse.urlparse(url).query
qsl = urlparse.parse_qsl(qs)
As Ignacio points out, you should not be doing this in the first place. But I'll explain where you're going wrong, and how to fix it:
toke2 is a list of two strings: the main URL before the ?, and the query string after the &. You don't want to split that list, or everything in that list; you just want to split the query string. So:
mainurl, query = l.split("?")
queryvars = query.split("&")
What if you did want to split everything in the first list? There are two different things that could mean, which are of course done differently. But both require a loop (explicit, or inside a list comprehension) over the first list. Either this:
tokens = [toke2.split("&") for toke2 in l.split("?")]
or
tokens = [token for toke2 in l.split("?")
for token in toke2.split("&")]
Try them both out to see the different outputs, and hopefully you'll understand what they're doing.

Categories

Resources