How can I join different segments of a list? - python

I'm having trouble in a school project because I don't know how to join elements of a list in segments. Here's an example: Let's say I have the following list:
list = ["T","h","i","s","I","s","A","L","i","s","t",]
How could I join this list so that the program outputs the following?:
Output: ["This","Is","A","List"]

Assuming list is your input, and without giving you the answer outright since it's a school project you should do yourself, here are some hints.
You'll want to check if a character is uppercase to know when the start of a word is. With python, you can use isupper() (ex: 'C'.isupper() would return True).
Python strings are iterable.
You can add a character to the end of a string using += (ex: myWord += 'a')
You can add a string to a list using append (ex: myList.append(myWord))
Remember this is a learning experience and there's no real value to being given the answer outright, if that's what you were hoping for. Best of luck and welcome to StackOverflow.

You can use regex for this
import re
list = ["T","h","i","s","I","s","A","L","i","s","t",]
sep=[s for s in re.split("([A-Z][^A-Z]*)", ''.join(list)) if s]
print(sep)

Related

How can I sort strings in a list based on common characters using Python?

I want to compare a list of strings and if a certain sequence of characters match, I want to put those matching strings into a new_list, like so:
string_list1 = ['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY', 'CE.4.FXZ', 'CE.4.FXX', 'CE.4.FXY']
new_list = ['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY']
As you can see, the common character in each is either 1 or 4.
My question is how can I separate strings based on a common character, if I do not know the common character beforehand? For example, I would like to parse the string_list1 into a function and have the function automatically identify the common characters and then separate based on that. Any help would be great! Thanks.
You can isolate the "common character" in your example with python built-in str.split() method (more info at https://docs.python.org/fr/2.7/library/stdtypes.html#str.split) like so :
for i in string_list1:
common_character = i.split(".")[1]
Next step would be creating a list each time you see a novel "common_character" or adding your element to an existing list using the list.append() method (one by one).
Best of luck !
If the common char is always the second token (when split on the .) you can use a default dict where each key is the common char and each value is the list of common chars.
from collections import defaultdict
string_list1 = ['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY', 'CE.4.FXZ', 'CE.4.FXX', 'CE.4.FXY']
common_chars = defaultdict(list)
for str in string_list1:
common_chars[str.split('.')[1]].append(str)
for common_group in common_chars.values():
print(common_group)
Outputs:
['CE.1.FXZ', 'CE.1.FXX', 'CE.1.FXY']
['CE.4.FXZ', 'CE.4.FXX', 'CE.4.FXY']

Change two characters into one symbol (Python)

Im currently working on a file compression task for school, and I find myself unable to understand what's happening in this code (more specifically what ISN'T happening and why it is not happening).
So in this section of the code what I'm aiming to do is, in non-coding terms, change two adjacent letters which are the same into one symbol, therefore taking up less memory:
for i, word in enumerate(file_contents):
#file_contents = LIST of words in any given text file
word_contents = (file_contents[i]).split()
for ind,letter in enumerate(word_contents[:-1]):
if word_contents[ind] == word_contents[ind+1]:
word_contents[ind] = ''
word_contents[ind+1] = '★'
However, when I run the full code with a sample text file, it seemingly doesn't do what I told it to do. For instance, the word 'Sally' should be 'Sa★y' but instead stays the same.
Could anyone help me get on the right track?
EDIT: I missed out a pretty key detail. I want the compressed string to somehow appear back in the original file_contents list where there are double letters, as the purpose of the full compression algorithm is to return a compressed version of the text in an inputted file.
I would suggest use a regex matching same adjacent characters.
Example:
import re
txt = 'sally and bobby'
print(re.sub(r"(.)\1", '*', txt))
# sa*y and bo*y
Loop and condition checking in your code are not required. Use below line instead:
word_contents = re.sub(r"(.)\1", '*', word_contents)
There are a few things wrong with your code (I think).
1) split produces a list not a str, so when you say this enumerate(word_contents[:-1]) It looks like you're assuming that gets you a string?!? at any rate... I'm not sure it is or not.
but then!
2)with this line:
if word_contents[ind] == word_contents[ind+1]:
word_contents[ind] = ''
word_contents[ind+1] = '★'
You're operating on your list again. Where it looks pretty clear that you want to be operating on the string, or a list of characters in a word you're processing. At best this function will do nothing, and at worst, you're corrupting the word content list.
So when you perform your modifications you are modifying the word_contents list and not the list item [:-1] you are actually looking over. There are more issues, but I think that answers your question (I hope)
If you really want to understand what you're doing wrong I recommend putting in print statements along what you're doing. If you're looking for someone to do your homework for you, there is another which already gave you an answer I guess.
Here is an example of how you should add logging to the function
for i, word in enumerate(file_contents):
#file_contents = LIST of words in any given text file
word_contents = (file_contents[i]).split()
# See what the word content list actually is
print(word_contents)
# See what your slice is actually returning
print(word_contents[:-1])
# Unless you have something modifying your list elsewhere you probably want to iterate over the words list generally and not just the slice of it as well.
for ind,letter in enumerate(word_contents[:-1]):
# See what your other test is testing
print(word_contents[ind], word_contents[ind+1])
# Here you probably actually want
# word_contents[:-1][ind]
# which is the list item you iterate over and then the actual string I suspect you get back
if word_contents[ind] == word_contents[ind+1]:
word_contents[ind] = ''
word_contents[ind+1] = '★'
UPDATE: based on the follow up questions from the OP I've made a sample program annotated with descriptions. Note this isn't an optimal solution, but mainly an exercise in teaching flow control and using basic structures.
# define the initial data...
file = "sally was a quick brown fox and jumped over the lazy dog which we'll call billy"
file_contents = file.split()
# Enumerate isn't needed in your example unless you intend to use the index later (example below)
for list_index, word in enumerate(file_contents):
# changing something you iterate over is dangerous and sometimes confusing like in your case you iterated over
# word contents and then modified it. if you have to take
# two characters you change the index and size of the structure making changes potentially invalid. So we'll create a new data structure to dump the results in
compressed_word = []
# since we have a list of strings we'll just iterate over each string (or word) individually
for character in word:
# Check to see if there is any data in the intermediate structure yet if not there are no duplicate chars yet
if compressed_word:
# if there are chars in new structure, test to see if we hit same character twice
if character == compressed_word[-1]:
# looks like we did, replace it with your star
compressed_word[-1] = "*"
# continue skips the rest of this iteration the loop
continue
# if we haven't seen the character before or it is the first character just add it to the list
compressed_word.append(character)
# I guess this is one reason why you may want enumerate, to update the list with the new item?
# join() is just converting the list back to a string
file_contents[list_index] = "".join(compressed_word)
# prints the new version of the original "file" string
print(" ".join(file_contents))
outputs: "sa*y was a quick brown fox and jumped over the lazy dog which we'* ca* bi*y"

python string replacement, all possible combinations #2

I have sentences like the following:
((wouldyou)) give me something ((please))
and a bunch of keywords, stored in arrays / lists:
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]
I want to replace every occurrence of variables in parentheses with a suitable set of strings stored in an array and get every possible combination back. The amount of variables and keywords is undefined.
James helped me with the following code:
def filler(word, from_char, to_char):
options = [(c,) if c != from_char else (from_char, to_char) for c in word.split(" ")]
return (' '.join(o) for o in product(*options))
list(filler('((?please)) tell me something ((?please))', '((?please))', ''))
It works great but only replaces one specific variable with empty strings. Now I want to go through various variables with different set of keywords. The desired result should look something like this:
can you give me something please
would you give me something please
please give me something please
can you give me something ASAP
would you give me something ASAP
please give me something ASAP
I guess it has something to do with to_ch, but I have no idea how to compare through list items at this place.
The following would work. It uses itertools.product to construct all of the possible pairings (or more) of your keywords.
import re, itertools
text = "((wouldyou)) give me something ((please))"
keywords = {}
keywords["wouldyou"] = ["can you", "would you", "please"]
keywords["please"] = ["please", "ASAP"]
# Get a list of bracketed terms
lsources = re.findall("\(\((.*?)\)\)", text)
# Build a list of the possible substitutions
ldests = []
for source in lsources:
ldests.append(keywords[source])
# Generate the various pairings
for lproduct in itertools.product(*ldests):
output = text
for src, dest in itertools.izip(lsources, lproduct):
# Replace each term (you could optimise this using a single re.sub)
output = output.replace("((%s))" % src, dest)
print output
You could further improve it by avoiding the need to do multiple replace() and assignment calls with one re.sub() call.
This scripts gives the following output:
can you give me something please
can you give me something ASAP
would you give me something please
would you give me something ASAP
please give me something please
please give me something ASAP
It was tested using Python 2.7. You will need to think how to solve it if multiple identical keywords were used. Hopefully you find this useful.
This is a job for Captain Regex!
Partial, pseudo-codey, solution...
One direct, albeit inefficient (like O(n*m) where n is number of words to replace and m is average number of replacements per word), way to do this would be to use the regex functionality in the re module to match the words, then use the re.sub() method to swap them out. Then you could just embed that in nested loops. So (assuming you get your replacements into a dict or something first), it would look something like this:
for key in repldict:
regexpattern = # construct a pattern on the fly for key
for item in repldict[key]:
newstring = re.sub(regexpattern, item)
And so forth. Only, you know, like with correct syntax and stuff. And then just append the newstring to a list, or print it, or whatever.
For creating the regexpatterns on the fly, string concatenation just should do it. Like a regex to match left parens, plus the string to match, plus a regex to match right parens.
If you do it that way, then you can handle the optional features just by looping over a second version of the regex pattern which appends a question mark to the end of the left parens, then does whatever you want to do with that.

in python find index in list if combination of strings exist

I'm writing my first script and trying to learn python.
But I'm stuck and can't get out of this one.
I'm writing a script to change file names.
Lets say I have a string = "this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv"
I want the result to be string = "This Is Test3 E00"
this is what I have so far:
l = list(string)
//Transform the string into list
for i in l:
if "E" in l:
p = l.index("E")
if isinstance((p+1), int () is True:
if isinstance((p+2), int () is True:
delp = p+3
a = p-3
del l[delp:]
new = "".join(l)
new = new.replace("."," ")
print (new)
get in index where "E" and check if after "E" there are 2 integers.
Then delete everything after the second integer.
However this will not work if there is an "E" anyplace else.
at the moment the result I get is:
this is tEst
because it is finding index for the first "E" on the list and deleting everything after index+3
I guess my question is how do I get the index in the list if a combination of strings exists.
but I can't seem to find how.
thanks for everyone answers.
I was going in other direction but it is also not working.
if someone could see why it would be awesome. It is much better to learn by doing then just coping what others write :)
this is what I came up with:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
anyone can tell me why this isn't working. I get an error.
Thank you so much
Have you ever heard of a Regular Expression?
Check out python's re module. Link to the Docs.
Basically, you can define a "regex" that would match "E and then two integers" and give you the index of it.
After that, I'd just use python's "Slice Notation" to choose the piece of the string that you want to keep.
Then, check out the string methods for str.replace to swap the periods for spaces, and str.title to put them in Title Case
An easy way is to use a regex to find up until the E followed by 2 digits criteria, with s as your string:
import re
up_until = re.match('(.*?E\d{2})', s).group(1)
# this.is.tEst3.E00
Then, we replace the . with a space and then title case it:
output = up_until.replace('.', ' ').title()
# This Is Test3 E00
The technique to consider using is Regular Expressions. They allow you to search for a pattern of text in a string, rather than a specific character or substring. Regular Expressions have a bit of a tough learning curve, but are invaluable to learn and you can use them in many languages, not just in Python. Here is the Python resource for how Regular Expressions are implemented:
http://docs.python.org/2/library/re.html
The pattern you are looking to match in your case is an "E" followed by two digits. In Regular Expressions (usually shortened to "regex" or "regexp"), that pattern looks like this:
E\d\d # ('\d' is the specifier for any digit 0-9)
In Python, you create a string of the regex pattern you want to match, and pass that and your file name string into the search() method of the the re module. Regex patterns tend to use a lot of special characters, so it's common in Python to prepend the regex pattern string with 'r', which tells the Python interpreter not to interpret the special characters as escape characters. All of this together looks like this:
import re
filename = 'this.is.tEst3.E00.erfeh.ervwer.vwtrt.rvwrv'
match_object = re.search(r'E\d\d', filename)
if match_object:
# The '0' means we want the first match found
index_of_Exx = match_object.end(0)
truncated_filename = filename[:index_of_Exx]
# Now take care of any more processing
Regular expressions can get very detailed (and complex). In fact, you can probably accomplish your entire task of fully changing the file name using a single regex that's correctly put together. But since I don't know the full details about what sorts of weird file names might come into your program, I can't go any further than this. I will add one more piece of information: if the 'E' could possibly be lower-case, then you want to add a flag as a third argument to your pattern search which indicates case-insensitive matching. That flag is 're.I' and your search() method would look like this:
match_object = re.search(r'E\d\d', filename, re.I)
Read the documentation on Python's 're' module for more information, and you can find many great tutorials online, such as this one:
http://www.zytrax.com/tech/web/regex.htm
And before you know it you'll be a superhero. :-)
The reason why this isn't working:
for i in l:
if i=="E" and isinstance((i+1), int ) is True:
p = l.index(i)
print (p)
...is because 'i' contains a character from the string 'l', not an integer. You compare it with 'E' (which works), but then try to add 1 to it, which errors out.

Python homework - Comparing Lists [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to check if my list has an item from another list(dictionary)?
This is actually homework for a mark.
The user of program must write sentence down. Than program checks the words and prints the wrong ones (if wrong words appear more than once program must print them only once). Wrong words must be printed in the order they appear in the sentence.
Here is how I did it. But there is one problem. The wrong words do not apper in the same order they apper in the sentence beacause of built-in function sorted. Is there any other method to delete duplicates in list?
And dictionary is imported from dictionary.txt!!
sentence=input("Sentence:")
dictionary=open("dictionary.txt", encoding="latin2").read().lower().split()
import re
words=re.findall("\w+",sentence.lower())
words=sorted(set(words))
sez=[]
for i in words:
if i not in dictionary:
sez.append(i)
print(sez)
words = filter(lambda index, item: words.index(item) == index, enumerate(words))
It'll filter out every duplicate and will maintain the order.
As Thomas pointed out, this is a rather heavy approach. if you need to process a larger number of words, you could use this for loop:
dups = set()
filtered_list = []
for word in words:
if not word in dups:
filtered_list.append(word)
dups.add(word)
To delete duplicates in a list, add them to a dictionary. A dictionary only has 1 KEY:VALUE pair.
You can use OrderedSet recipe.
#edit: BTW if the dictionary is big then it's better to convert dictionary list into a set -- checking existence of an element in a set takes constant time instead of O(n) in the case of list.
You should check this answer:
https://stackoverflow.com/a/7961425/1225541
If you use his method and stop sorting the words array (remove the words=sorted(set(words)) line) it should do what you expect.

Categories

Resources