Excel cell into list in Python - python

So I have an Excel column which contains Python lists.
The problem is that when I'm trying to loop through it in Python it reads the cells as str. Attempt to split it makes the items in a list generate as e.g.:
list = ["['Gdynia',", "'(2262011)']"]
list[0] = "['Gdynia,'"
list1 = "'(2261011)']"
I want only to get the city name which is e.g. 'Gdynia' or 'Tczew'. Any idea how can I make it possible?

You can split the string at a desired symbol, ' would be good for your example.
Then you get a list of strings and you can chose the part you need.
str = "['Gdynia',", "'(2262011)']"
str_parts = str.split("'") #['[', 'Gdynia', ',', '(2262011)', ']']
city = str_parts[1] #'Gdynia'

Solution with re:
import re
data = ["['Gdynia', '(2262011)'",
"['Tczew', '(2214011)']",
"['Zory', ’(2479011)']"]
r = re.compile("'(.*?)'")
print(*[r.search(s).group(1) for s in data], sep='\n')
Output
Gdynia
Tczew
Zory

Related

How can I Split a list in python to get a new list with elements to the left of the delimiter instead of the right

I want to split this python list (originalList):
['"car_type":"STANDARD","price":725842',
'"car_type":"LUXURY","price":565853',
'"car_type":"PEOPLE_CARRIER","price":239081',
'"car_type":"LUXURY_PEOPLE_CARRIER","price":661624',
'"car_type":"MINIBUS","price":654172']
to give me this list (pricesList):
[725842, 565853, 239081, 661624, 654172]
I tried this line of code below to split the list named originalList:
pricesList = [i.split("price:")[0] for i in originalList]
The outcome is a list with the same number of elements, but each element contains the car_type only, in short the splitting has removed everything to the left of the delimiter. How can I change my code above or even replace to obtain in the new list elements with the values to the left of the delimiter and everything to the right removed?
You forget the double-quotes " that are part of your delimiter, then pick the wrong index (0) which is before the split, and finally, you do not cast to int. You can do the following to get the desired output:
>>> [int(i.split('"price":')[-1]) for i in originalList]
[725842, 565853, 239081, 661624, 654172]
schwobaseggl answer is good, here is a possible alternative using json library (I guess original list comes from json processing)
import json
list(map(lambda x:json.loads('{'+x+'}')['price'],originalList))
You can try:
import json
n = ['"car_type":"STANDARD","price":725842',
'"car_type":"LUXURY","price":565853',
'"car_type":"PEOPLE_CARRIER","price":239081',
'"car_type":"LUXURY_PEOPLE_CARRIER","price":661624',
'"car_type":"MINIBUS","price":654172']
print [json.loads("{"+str(i)+"}")["price"] for i in n]
Another way of doing it:
pricesList = [int(originalList[i].split(",")[1].split(":")[1]) for i in range(0,len(l1))]
Solution
If you change to .split(':') you can just take the [-1] item, that will represent the numbers at the end
lista = [
'"car_type":"STANDARD","price":725842',
'"car_type":"LUXURY","price":565853',
'"car_type":"PEOPLE_CARRIER","price":239081',
'"car_type":"LUXURY_PEOPLE_CARRIER","price":661624',
'"car_type":"MINIBUS","price":654172'
]
new_lista = []
for i in range(len(lista)):
lista[i] = lista[i].split(':')
new_lista.append(lista[i][-1])
print(new_lista)
Output
(xenial)vash#localhost:~/python$ python3.7 split.py
['725842', '565853', '239081', '661624', '654172']

Strip method at Python doesn't clear all

i have some problem to strip '[' at my string (read from file).
code
data = open(Koorpath1,'r')
for x in data:
print(x)
print(x.strip('['))
result
[["0.9986130595207214","26.41608428955078"],["39.44521713256836","250.2412109375"],["112.84327697753906","120.34269714355469"],["260.63800048828125","15.424667358398438"],["273.6199645996094","249.74160766601562"]]
"0.9986130595207214","26.41608428955078"],["39.44521713256836","250.2412109375"],["112.84327697753906","120.34269714355469"],["260.63800048828125","15.424667358398438"],["273.6199645996094","249.74160766601562"]]
Desired output :
"0.9986130595207214","26.41608428955078","39.44521713256836","250.2412109375","112.84327697753906","120.34269714355469","260.63800048828125","15.424667358398438","273.6199645996094","249.74160766601562"
Thanks
It strips the first two '[', it seems you have one long string, you have to split it first.
datalist = data.split[',']
for x in datalist:
# code here
If you don't want to split it and have it all in one string you need replace not strip (strip only works at the end and the beginning.
data = data.replace('[','')
If the data is JSON, then parse it into a Python list and treat it from there:
from itertools import chain
import json
nums = json.loads(x)
print(','.join('"%s"' % num for num in chain.from_iterable(nums)))
chain.from_iterable helps you "flatten" the list of lists and join concatenates everything into one long output.

How do you pick out certain values from a list based on their string values in Python?

I have a list of hyperlinks, with three types of links; htm, csv and pdf. And I would like to just pick out those that are csv.
The list contains strings of the form: csv/damlbmp/20160701damlbmp_zone_csv.zip
I was thinking of running a for loop across the string and just returning values that have first 3 string values are equal to csv, but I am not really sure how to do this.
I would use link.endswith('csv') (or link.endswith('csv.zip')), where link is a string containing that link)
For example:
lst = ['csv/damlbmp/20160701damlbmp_zone_csv.zip',
'pdf/damlbmp/20160701damlbmp_zone_pdf.zip',
'html/damlbmp/20160701damlbmp_zone_html.zip',
'csv/damlbmp/20160801damlbmp_zone_csv.zip']
csv_files = [link for link in lst if link.endswith('csv.zip')]
If your list is called links:
[x for x in links if 'csv/' in x]
You can try this
import re
l=["www.h.com","abc.csv","test.pdf","another.csv"] #list of links
def MatchCSV(list):
matches=[]
for string in list:
m=re.findall('[^\.]*\.csv',string)
if(len(m)>0):
matches.append(m)
return matches
print(MatchCSV(l))
[['abc.csv'], ['another.csv']]
(endswith is a good option too)
This is one way:
lst = ['csv/damlbmp/20160701damlbmp_zone_csv.zip',
'pdf/damlbmp/20160701damlbmp_zone_pdf.zip',
'html/damlbmp/20160701damlbmp_zone_html.zip',
'csv/damlbmp/20160801damlbmp_zone_csv.zip']
[i for i in lst if i[:3]=='csv']
# ['csv/damlbmp/20160701damlbmp_zone_csv.zip',
# 'csv/damlbmp/20160801damlbmp_zone_csv.zip']

Python: Split between two characters

Let's say I have a ton of HTML with no newlines. I want to get each element into a list.
input = "<head><title>Example Title</title></head>"
a_list = ["<head>", "<title>Example Title</title>", "</head>"]
Something like such. Splitting between each ><.
But in Python, I don't know of a way to do that. I can only split at that string, which removes it from the output. I want to keep it, and split between the two equality operators.
How can this be done?
Edit: Preferably, this would be done without adding the characters back in to the ends of each list item.
# initial input
a = "<head><title>Example Title</title></head>"
# split list
b = a.split('><')
# remove extra character from first and last elements
# because the split only removes >< pairs.
b[0] = b[0][1:]
b[-1] = b[-1][:-1]
# initialize new list
a_list = []
# fill new list with formatted elements
for i in range(len(b)):
a_list.append('<{}>'.format(b[i]))
This will output the given list in python 2.7.2, but it should work in python 3 as well.
You can try this:
import re
a = "<head><title>Example Title</title></head>"
data = re.split("><", a)
new_data = [data[0]+">"]+["<" + i+">" for i in data[1:-1]] + ["<"+data[-1]]
Output:
['<head>', '<title>Example Title</title>', '</head>']
The shortest approach using re.findall() function on extended example:
# extended html string
s = "<head><title>Example Title</title></head><body>hello, <b>Python</b></body>"
result = re.findall(r'(<[^>]+>[^<>]+</[^>]+>|<[^>]+>)', s)
print(result)
The output:
['<head>', '<title>Example Title</title>', '</head>', '<body>', '<b>Python</b>', '</body>']
Based on the answers by other people, I made this.
It isn't as clean as I had wanted, but it seems to work. I had originally wanted to not re-add the characters after split.
Here, I got rid of one extra argument by combining the two characters into a string. Anyways,
def split_between(string, chars):
if len(chars) is not 2: raise IndexError("Argument chars must contain two characters.")
result_list = [chars[1] + line + chars[0] for line in string.split(chars)]
result_list[0] = result_list[0][1:]
result_list[-1] = result_list[-1][:-1]
return result_list
Credit goes to #cforemanand #Ajax1234.
Or even simpler, this:
input = "<head><title>Example Title</title></head>"
print(['<'+elem if elem[0]!='<' else elem for elem in [elem+'>' if elem[-1]!='>' else elem for elem in input.split('><') ]])

Slicing a string into a list based on reoccuring patterns

I have a long string variable full of hex values:
hexValues = 'AA08E3020202AA08E302AA1AA08E3020101' etc..
The first 2 bytes (AA08) are a signature for the start of a frame and the rest of the data up to the next AA08 are the contents of the signature.
I want to slice the string into a list based on the reoccurring start of frame sign, e.g:
list = [AA08, E3020202, AA08, F25S1212, AA08, 42ABC82] etc...
I'm not sure how I can split the string up like this. Some of the frames are also corrupted, where the start of the frame won'y have AA08, but maybe AA01.. so I'd need some kind of regex to spot these.
if I do list = hexValues.split('AA08)', the list just removes all the starts of the frame...
So I'm a bit stuck.
Newbie to python.
Thanks
For the case when you don't have "corrupted" data the following should do:
hex_values = 'AA08E3020202AA08E302AA1AA08E3020101'
delimiter = hex_values[:4]
hex_values = hex_values.replace(delimiter, ',' + delimiter + ',')
hex_list = hex_values.split(',')[1:]
print(hex_list)
['AA08', 'E3020202', 'AA08', 'E302AA1', 'AA08', 'E3020101']
Without considering corruptions, you may try this.
l = []
for s in hexValues.split('AA08'):
if s:
l += ['AA08', s]

Categories

Resources