splitting list, extracting an element and adding it in python - python

I am new in python.
I have a list with seperator of "::" and it seems like that;
1::Erin Burkovich (2000)::Drama
2::Assassins (1995)::Thriller
I want to split them by "::" and extract the year from name and add it into the end of the line. Each movie has it own index.
Desired list seems like;
1::Erin Burkovich:Drama::2000
2::Assasins:Thriller:1995
I have below code:
for i in movies:
movie_id,movie_title,movie_genre=i.split("::")
movie_year=((movie_title.split(" "))[-1]).replace("(","").replace(")","")
movies.insert(-1, movie_year)
but it doesn't work at all.
Any help ?
Thanks in advance.

You're having infinite loop, because when you add an item, your loop needs to iterate on more items, and then you're adding another item...
You should create a new list with the result.
Also, you can extract the list in a much easier way:
movie_year = re.findall('\d+', '(2000)')

Instead of splitting, you can use re.findall to grab all alphanumeric characters, including whitespace, and then regroup:
import re
s = ['1::Erin Burkovich (2000)::Drama', '2::Assassins (1995)::Thriller']
new_data = [re.sub('\s(?=\:)', '', "{}::{}:{}:{}".format(id, name, type, year)) for id, name, year, type in [re.findall('[a-zA-Z0-9\s]+', i) for i in s]]
Output:
['1::Erin Burkovich:Drama:2000', '2::Assassins:Thriller:1995']

Another (probably less elegant) way:
for i in movies:
split_list = i.split("::")
movie_id = split_list[0]
movie_title = split_list[1].split('(')
movie_genre = split_list[2]
print movie_id + '::' + movie_title[0].strip() + "::" + movie_genre + "::" + movie_title[1].strip(')')

For python 3.6,check out this
a="""1::Erin Burkovich (2000)::Drama
2::Assassins (1995)::Thriller"""
a=a.split("\n")
c=[]
for b in range(len(a)):
g=[]
d=a[b].split("::")
e=d[1].split(" (")[1].split(")")[0]
f=d[1].split(" (")[0]
g.append(d[0])
g.append(f)
g.append(d[2])
g.append(e)
h="::".join(g)
c.append(h)
print("\n".join(c))
OUTPUT::
1::Erin Burkovich::Drama::2000
2::Assassins::Thriller::1995

Many issues,
split doesn't return a tuple but a list, so it can't be assigned directly
movie year split is fine but you aren't removing the year from the original title
inserting into movies array is not a good idea, you need to replace the array element instead
I've rewritten the code based on what you wanted, hope it helps
movies=["1::Erin Burkovich (2000)::Drama", "2::Assassins (1995)::Thriller"]
for i in range(len(movies)):
movie_details=movies[i].split("::")
print movie_details
movie_id=movie_details[0]
movie_title=movie_details[1]
movie_genre=movie_details[2]
movie_title_parts=movie_title.split(" ")
movie_year=((movie_title_parts[-1]).replace("(","").replace(")",""))
del movie_title_parts[-1]
movie_title=" ".join(movie_title_parts)
print movie_title+", "+movie_year
movies[i]=movie_id+"::"+movie_title+"::"+movie_genre+"::"+movie_year

Related

Excel cell into list in Python

So I have an Excel column which contains Python lists.
The problem is that when I'm trying to loop through it in Python it reads the cells as str. Attempt to split it makes the items in a list generate as e.g.:
list = ["['Gdynia',", "'(2262011)']"]
list[0] = "['Gdynia,'"
list1 = "'(2261011)']"
I want only to get the city name which is e.g. 'Gdynia' or 'Tczew'. Any idea how can I make it possible?
You can split the string at a desired symbol, ' would be good for your example.
Then you get a list of strings and you can chose the part you need.
str = "['Gdynia',", "'(2262011)']"
str_parts = str.split("'") #['[', 'Gdynia', ',', '(2262011)', ']']
city = str_parts[1] #'Gdynia'
Solution with re:
import re
data = ["['Gdynia', '(2262011)'",
"['Tczew', '(2214011)']",
"['Zory', ’(2479011)']"]
r = re.compile("'(.*?)'")
print(*[r.search(s).group(1) for s in data], sep='\n')
Output
Gdynia
Tczew
Zory

How can I Split a list in python to get a new list with elements to the left of the delimiter instead of the right

I want to split this python list (originalList):
['"car_type":"STANDARD","price":725842',
'"car_type":"LUXURY","price":565853',
'"car_type":"PEOPLE_CARRIER","price":239081',
'"car_type":"LUXURY_PEOPLE_CARRIER","price":661624',
'"car_type":"MINIBUS","price":654172']
to give me this list (pricesList):
[725842, 565853, 239081, 661624, 654172]
I tried this line of code below to split the list named originalList:
pricesList = [i.split("price:")[0] for i in originalList]
The outcome is a list with the same number of elements, but each element contains the car_type only, in short the splitting has removed everything to the left of the delimiter. How can I change my code above or even replace to obtain in the new list elements with the values to the left of the delimiter and everything to the right removed?
You forget the double-quotes " that are part of your delimiter, then pick the wrong index (0) which is before the split, and finally, you do not cast to int. You can do the following to get the desired output:
>>> [int(i.split('"price":')[-1]) for i in originalList]
[725842, 565853, 239081, 661624, 654172]
schwobaseggl answer is good, here is a possible alternative using json library (I guess original list comes from json processing)
import json
list(map(lambda x:json.loads('{'+x+'}')['price'],originalList))
You can try:
import json
n = ['"car_type":"STANDARD","price":725842',
'"car_type":"LUXURY","price":565853',
'"car_type":"PEOPLE_CARRIER","price":239081',
'"car_type":"LUXURY_PEOPLE_CARRIER","price":661624',
'"car_type":"MINIBUS","price":654172']
print [json.loads("{"+str(i)+"}")["price"] for i in n]
Another way of doing it:
pricesList = [int(originalList[i].split(",")[1].split(":")[1]) for i in range(0,len(l1))]
Solution
If you change to .split(':') you can just take the [-1] item, that will represent the numbers at the end
lista = [
'"car_type":"STANDARD","price":725842',
'"car_type":"LUXURY","price":565853',
'"car_type":"PEOPLE_CARRIER","price":239081',
'"car_type":"LUXURY_PEOPLE_CARRIER","price":661624',
'"car_type":"MINIBUS","price":654172'
]
new_lista = []
for i in range(len(lista)):
lista[i] = lista[i].split(':')
new_lista.append(lista[i][-1])
print(new_lista)
Output
(xenial)vash#localhost:~/python$ python3.7 split.py
['725842', '565853', '239081', '661624', '654172']

Python: Split between two characters

Let's say I have a ton of HTML with no newlines. I want to get each element into a list.
input = "<head><title>Example Title</title></head>"
a_list = ["<head>", "<title>Example Title</title>", "</head>"]
Something like such. Splitting between each ><.
But in Python, I don't know of a way to do that. I can only split at that string, which removes it from the output. I want to keep it, and split between the two equality operators.
How can this be done?
Edit: Preferably, this would be done without adding the characters back in to the ends of each list item.
# initial input
a = "<head><title>Example Title</title></head>"
# split list
b = a.split('><')
# remove extra character from first and last elements
# because the split only removes >< pairs.
b[0] = b[0][1:]
b[-1] = b[-1][:-1]
# initialize new list
a_list = []
# fill new list with formatted elements
for i in range(len(b)):
a_list.append('<{}>'.format(b[i]))
This will output the given list in python 2.7.2, but it should work in python 3 as well.
You can try this:
import re
a = "<head><title>Example Title</title></head>"
data = re.split("><", a)
new_data = [data[0]+">"]+["<" + i+">" for i in data[1:-1]] + ["<"+data[-1]]
Output:
['<head>', '<title>Example Title</title>', '</head>']
The shortest approach using re.findall() function on extended example:
# extended html string
s = "<head><title>Example Title</title></head><body>hello, <b>Python</b></body>"
result = re.findall(r'(<[^>]+>[^<>]+</[^>]+>|<[^>]+>)', s)
print(result)
The output:
['<head>', '<title>Example Title</title>', '</head>', '<body>', '<b>Python</b>', '</body>']
Based on the answers by other people, I made this.
It isn't as clean as I had wanted, but it seems to work. I had originally wanted to not re-add the characters after split.
Here, I got rid of one extra argument by combining the two characters into a string. Anyways,
def split_between(string, chars):
if len(chars) is not 2: raise IndexError("Argument chars must contain two characters.")
result_list = [chars[1] + line + chars[0] for line in string.split(chars)]
result_list[0] = result_list[0][1:]
result_list[-1] = result_list[-1][:-1]
return result_list
Credit goes to #cforemanand #Ajax1234.
Or even simpler, this:
input = "<head><title>Example Title</title></head>"
print(['<'+elem if elem[0]!='<' else elem for elem in [elem+'>' if elem[-1]!='>' else elem for elem in input.split('><') ]])

preprocessing (rstrip and regular expression and simpler code)

I'm trying to read 200 txt files and do some preprocessing.
1) how could i write simpler code instead of writing same code for each of txt files?
2) can i combine regular expression with rstrip?
-> mainly, i want to get rid of "\n" but sometimes they are sticked with other letters.so what i want is remove every \n as well as words that are combined with \n (i.e. "\n?", "!\n" .. and so on)
3) at the last line, is there a way to add all list in one list with simpler code?
​
data = open("job (0).txt", 'r').read()
rows0 = data.split(" ")
rows0 = [item.rstrip('\n?, \n') for item in rows0]
data = open("job (1).txt", 'r').read()
rows1 = data.split(" ")
rows1 = [item.rstrip('\n?, \n') for item in rows1]
​
.....(up to 200th file)
data = open("job (199).txt", 'r').read()
rows199 = data.split(" ")
rows199 = [item.rstrip('\n?, \n') for item in rows199]
ds_l = rows0 + rows1 + ... rows199
First of all, I'm not a python expert. But since the question has been around for a while already... (At least I'm save from downvotes if no one looks at this^^)
1) Use loops, and read a programming tutorial.
See for example this post How do I read a file line-by-line into a list? on how to get a list of all rows. Then you can loop over the list.
2) No idea whether it's possible to use regexes with strip, this brought me here, so tell me if you find out.
It's not clear what exactly you are asking for, do you want to get rid of all (space seperated) words that contain any "/n", or just cut out the "/n","/n?",... parts of the words?
In the first case, a simple, unelegant solution would be to just have two loops over rows and over all words in a row and do something like
# loop over rows with i as index
row = rows[i].split(" ")
for j in range len(row):
if("/n" in row[j])
del row[j]
rows[i] = " ".join(row)
In the latter case, if there's not so many expressions you want to remove, you can probably use re.sub() somehow. Google helps ;)
3) If you have the rows as a list "rows" of strings, you can use join:
ds_1 = "".join(rows)
(For join: Python join: why is it string.join(list) instead of list.join(string)?)

python - matching string and replacing

I have a file i am trying to replace parts of a line with another word.
it looks like bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212
i need to delete everything but bob123#bobscarshop.com, but i need to match 23rh32o3hro2rh2 with 23rh32o3hro2rh2:poniacvibe , from a different text file and place poniacvibe infront of bob123#bobscarshop.com
so it would look like this bob123#bobscarshop.com:poniacvibe
I've had a hard time trying to go about doing this, but i think i would have to split the bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212 with data.split(":") , but some of the lines have a (:) in a spot that i don't want the line to be split at, if that makes any sense...
if anyone could help i would really appreciate it.
ok, it looks to me like you are using a colon : to separate your strings.
in this case you can use .split(":") to break your strings into their component substrings
eg:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
print(firststring.split(":"))
would give:
['bobkeiser', 'bob123#bobscarshop.com', '0.0.0.0.0', '23rh32o3hro2rh2', '234212']
and assuming your substrings will always be in the same order, and the same number of substrings in the main string you could then do:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
firstdata = firststring.split(":")
secondstring = "23rh32o3hro2rh2:poniacvibe"
seconddata = secondstring.split(":")
if firstdata[3] == seconddata[0]:
outputdata = firstdata
outputdata.insert(1,seconddata[1])
outputstring = ""
for item in outputdata:
if outputstring == "":
outputstring = item
else
outputstring = outputstring + ":" + item
what this does is:
extract the bits of the strings into lists
see if the "23rh32o3hro2rh2" string can be found in the second list
find the corresponding part of the second list
create a list to contain the output data and put the first list into it
insert the "poniacvibe" string before "bob123#bobscarshop.com"
stitch the outputdata list back into a string using the colon as the separator
the reason your strings need to be the same length is because the index is being used to find the relevant strings rather than trying to use some form of string type matching (which gets much more complex)
if you can keep your data in this form it gets much simpler.
to protect against malformed data (lists too short) you can explicitly test for them before you start using len(list) to see how many elements are in it.
or you could let it run and catch the exception, however in this case you could end up with unintended results, as it may try to match the wrong elements from the list.
hope this helps
James
EDIT:
ok so if you are trying to match up a long list of strings from files you would probably want something along the lines of:
firstfile = open("firstfile.txt", mode = "r")
secondfile= open("secondfile.txt",mode = "r")
first_raw_data = firstfile.readlines()
firstfile.close()
second_raw_data = secondfile.readlines()
secondfile.close()
first_data = []
for item in first_raw_data:
first_data.append(item.replace("\n","").split(":"))
second_data = []
for item in second_raw_data:
second_data.append(item.replace("\n","").split(":"))
output_strings = []
for item in first_data:
searchstring = item[3]
for entry in second_data:
if searchstring == entry[0]:
output_data = item
output_string = ""
output_data.insert(1,entry[1])
for data in output_data:
if output_string == "":
output_string = data
else:
output_string = output_string + ":" + data
output_strings.append(output_string)
break
for entry in output_strings:
print(entry)
this should achieve what you're after and as prove of concept will print the resulting list of stings for you.
if you have any questions feel free to ask.
James
Second edit:
to make this output the results into a file change the last two lines to:
outputfile = open("outputfile.txt", mode = "w")
for entry in output_strings:
outputfile.write(entry+"\n")
outputfile.close()

Categories

Resources