I have the following list of strings:
data = ['1 General Electric (GE) 24581660 $18.19 0.04 0.22 ',
'2 Qudian ADR (QD) 24227349 12.22 -3.93 -24.33 ',
'3 Square Cl A (SQ) 16233308 48.86 0.05 0.10 ',
'4 Teva Pharmaceutical Industries ADR (TEVA) 15830425 13.70 0.22 1.63 ',
'5 Vale ADR (VALE) 14768221 10.98 0.21 1.95 ',
'6 Bank of America (BAC) 13938799 26.59 -0.07 -0.26 ',
'7 Entercom Communications Cl A (ETM) 13087209 12.00 0.10 0.84 ',
'8 Chesapeake Energy (CHK) 12948648 3.92 -0.05 -1.26 ',
"9 Macy's (M) 12684478 21.07 0.44 2.13 "]
Where the format of every string is: count, stock name, volume, some more int values...
I need to split these strings into a list where each element is one of the items in the string format above, and this is how I attempted to do that:
for i in range(1, len(data)-1):
split = data[i].split()
temp = "{} {} {}".format(split[1], split[2], split[3])
del split[2 : 4]
split[1] = temp
print(split)
However, I believe this is inefficient and it doesn't work when the name is more or less than two words. How would I handle this? Would I have to adjust how I generate the list of strings (data) in the first place?
EDIT:
final_data = [
re.split('(?<=\))\s+|(?<=[\d\$-])\s(?=[\d\$-])|(?<=\d)\s(?=[a-zA-Z])', i)
for i in data[1]]
final_data = [i[:-1]+[i[-1][:-1]] for i in final_data]
print(final_data)
Output:
~/workspace $ python extract.py 2017-11-27-04-26-51-ss.xhtml
[[''],
[''],
[''],
...,
[''],
[''],
['']]
You can use re.split:
import re
data = ['1 General Electric (GE) 24581660 $18.19 0.04 0.22 ', '2 Qudian ADR (QD) 24227349 12.22 -3.93 -24.33 ', '3 Square Cl A (SQ) 16233308 48.86 0.05 0.10 ', '4 Teva Pharmaceutical Industries ADR (TEVA) 15830425 13.70 0.22 1.63 ', '5 Vale ADR (VALE) 14768221 10.98 0.21 1.95 ', '6 Bank of America (BAC) 13938799 26.59 -0.07 -0.26 ', '7 Entercom Communications Cl A (ETM) 13087209 12.00 0.10 0.84 ', '8 Chesapeake Energy (CHK) 12948648 3.92 -0.05 -1.26 ', "9 Macy's (M) 12684478 21.07 0.44 2.13 "]
final_data = [re.split('(?<=[a-zA-Z])\s+(?=\()|(?<=\))\s+|(?<=[\d\$-])\s+(?=[\d\$-])|(?<=\d)\s+(?=[a-zA-Z])', i) for i in data]
Output:
[['1', 'General Electric', '(GE)', '24581660', '$18.19', '0.04', '0.22 '], ['2', 'Qudian ADR', '(QD)', '24227349', '12.22', '-3.93', '-24.33 '], ['3', 'Square Cl A', '(SQ)', '16233308', '48.86', '0.05', '0.10 '], ['4', 'Teva Pharmaceutical Industries ADR', '(TEVA)', '15830425', '13.70', '0.22', '1.63 '], ['5', 'Vale ADR', '(VALE)', '14768221', '10.98', '0.21', '1.95 '], ['6', 'Bank of America', '(BAC)', '13938799', '26.59', '-0.07', '-0.26 '], ['7', 'Entercom Communications Cl A', '(ETM)', '13087209', '12.00', '0.10', '0.84 '], ['8', 'Chesapeake Energy', '(CHK)', '12948648', '3.92', '-0.05', '-1.26 '], ['9', "Macy's", '(M)', '12684478', '21.07', '0.44', '2.13 ']]
With the parenthesis removed:
final_data = [[b[1:-1] if b.startswith('(') and b.endswith(')') else b for b in i] for i in final_data]
Output:
[['1', 'General Electric', 'GE', '24581660', '$18.19', '0.04', '0.22 '], ['2', 'Qudian ADR', 'QD', '24227349', '12.22', '-3.93', '-24.33 '], ['3', 'Square Cl A', 'SQ', '16233308', '48.86', '0.05', '0.10 '], ['4', 'Teva Pharmaceutical Industries ADR', 'TEVA', '15830425', '13.70', '0.22', '1.63 '], ['5', 'Vale ADR', 'VALE', '14768221', '10.98', '0.21', '1.95 '], ['6', 'Bank of America', 'BAC', '13938799', '26.59', '-0.07', '-0.26 '], ['7', 'Entercom Communications Cl A', 'ETM', '13087209', '12.00', '0.10', '0.84 '], ['8', 'Chesapeake Energy', 'CHK', '12948648', '3.92', '-0.05', '-1.26 '], ['9', "Macy's", 'M', '12684478', '21.07', '0.44', '2.13 ']]
You can split lists on characters
All of the strings in your original data list have 2 sections, the stock name and then the number values, if you split on the closing paranthesis in the string you can break it into a list holding a string for the stockname and a string containing the numbers, the numbers have consistent spacing between them of one space and then you can split the list of numbers on the space character.
https://docs.python.org/3/library/stdtypes.html#str.split
Related
I have a file called customer.CR that contains multiple rows of lists. It looks like this:
1361886|5303477|CR|WY|WY & NW RAILROAD CO|UNKNOWN||UNKNOWN|WY|00000|C|100.000000000|29|HOLDER|
1280535|5394419|CR|WY|CHAMBERS JERRY|7800 E UNION AVE # 1100||DENVER|CO|802372715|P|100.000000000|15|LESSEE|
1324915|5312567|CR|WY|EXXONMOBIL OIL CORP|PO BOX 650232||DALLAS|TX|752650232|C|100.000000000|15|LESSEE|
...
I want to convert this file into a panda dataframe that looks like this but with many rows and 14 columns (here you can see 2 columns, but I need 14:
Column A
Column B
1361886
5303477
1280535
5394419
I ran the following command:
myfile = dbdir + '/Data/customer.CR.load'
with open(myfile, newline = '') as csvfile:
csvreader = csv.reader(csvfile, delimiter='|')
for row in csvreader:
print(row)
I got the following result.
['1361886', '5303477', 'CR', 'WY', 'WY & NW RAILROAD CO', 'UNKNOWN', '', 'UNKNOWN', 'WY', '00000', 'C', '100.000000000', '29', 'HOLDER', '']
['1280535', '5394419', 'CR', 'WY', 'CHAMBERS JERRY', '7800 E UNION AVE # 1100', '', 'DENVER', 'CO', '802372715', 'P', '100.000000000', '15', 'LESSEE', '']
...
...
...
['1324915', '5312567', 'CR', 'WY', 'EXXONMOBIL OIL CORP', 'PO BOX 650232', '', 'DALLAS', 'TX', '752650232', 'C', '100.000000000', '15', 'LESSEE', '']
['1353999', '5325242', 'CR', 'WY', 'ULTRA RESOURCES INC', '1550 WYNKOOP ST STE 300', '', 'DENVER', 'CO', '802021648', 'C', '20.000000000', '15', 'LESSEE', '']
now I want to convert row to panda dataframe. I wrote the following command:
df = pd.DataFrame([row], index=None)
df
but in the output, I get just one row. My question is, how should I change my code so that I get a dataframe that has all the rows and columns? (Columns are separated by verticle pipe '|' in the file.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
> 0 1306746 5466867 CR WY HUBER EMERICK M 311 S CONWELL ST CASPER WY 826012938 P 7.598157000 15 LESSEE
This is the code in which I tried to get the data from one website using the requests and saved in dictionary called table but when I tried to iterate through those values and saved them in the list , I faced with below error, any help is appreciated.
import requests
from bs4 import BeautifulSoup
list1 = []
table = {}
r = requests.get("https://www.century21.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/?k=1")
content = r.content
soup = BeautifulSoup(content,'html.parser')
all = soup.find_all('div',{"class":"property-card-primary-info"})
for item in all:
print(item.find('a',{"class":"listing-price"}).text.replace('\n','').replace(' ',''))
table['address'] = item.find('div',{"class":"property-address"}).text.replace('\n','').replace(' ','')
table['city'] = item.find('div',{"class":"property-city"}).text.replace('\n','').replace(' ','')
table['beds'] = item.find('div',{"class":"property-beds"}).text.replace('\n','').replace(' ','')
table['baths'] = item.find('div',{"class":"property-baths"}).text.replace('\n','').replace(' ','')
try:
table['half-baths'] = item.find("div",{"class":"property-half-baths"}).text.replace('\n','').replace(' ','')
except:
table['half-baths'] = None
try:
table['property sq.ft.'] = item.find("div",{"class":"property-sqft"}).text.replace(' ','').replace("\n",'')
except:
table['property sq.ft.'] = None
list1.append(table)
list1
OUTPUT
$325,000
$249,000
$390,000
$274,900
$208,000
$169,000
$127,500
$990,999
I'm getting the unique values when I print price values , but when I append to the list all the values are replicated. Any help will means a lot.
Question : how to get rid of this replication of data and get the corresponding values?
[{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '},
{'address': ' 1129 Hilltop Drive',
'city': 'Rock Springs WY 82901 ',
'beds': '4 beds ',
'baths': '5 baths ',
'half-baths': '2 half baths ',
'property sq.ft.': '10,300 sq. ft '}]
for item in all:
table ={} # important
print(item.find('a',{"class":"listing-price"}).text.replace('\n','').replace(' ',''))
table['address'] = item.find('div',{"class":"property-address"}).text.replace('\n','').replace(' ','')
table['city'] = item.find('div',{"class":"property-city"}).text.replace('\n','').replace(' ','')
table['beds'] = item.find('div',{"class":"property-beds"}).text.replace('\n','').replace(' ','')
table['baths'] = item.find('div',{"class":"property-baths"}).text.replace('\n','').replace(' ','')
try:
table['half-baths'] = item.find("div",{"class":"property-half-baths"}).text.replace('\n','').replace(' ','')
except:
table['half-baths'] = None
try:
table['property sq.ft.'] = item.find("div",{"class":"property-sqft"}).text.replace(' ','').replace("\n",'')
except:
table['property sq.ft.'] = None
list1.append(table)
print(set(list1)) # print list outside the loop use set to remove dups
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
So i got this list:
[' Wins 376 #2,258,590 Win % 52.7% #2,043,164 Kills 2,396 #2,202,841 KD 1.01 #2,327,891 ', ' Wins 147 Win % 53.3% Kills 918 K/D 1.05 ', ' Headshot % 43.07% #1,022,726 KD 1.01 #2,327,891 Deaths 2,361 #2,333,651 Headshots 1,032 #1,959,486 Wins 376 #2,258,590 Losses 337 #2,422,718 Win % 52.7% #2,043,164 Time Played 164h #2,450,537 Matches Played 713 #2,341,103 Total XP 6,138,745 #2,351,090 Melee Kills 110 #880,508 Blind Kills 3 #1,820,018 ', ' Time Played 121h 1m 21s #2,077,653 Wins 294 #1,967,656 Losses 264 #2,055,170 Matches 558 #2,014,995 Deaths 1,751 #1,941,493 Kills 1,775 #1,870,100 Win % 52.7% #2,370,676 KD 1.01 #2,469,095 Kills/match 3.18 #1,906,054 Kills/min 0.24 #1,553,218 ', ' Time Played 11h 39m 27s #2,992,790 Wins 18 #2,713,886 Losses 14 #3,233,844 Matches 32 #2,996,579 Deaths 140 #3,027,150 Kills 128 #2,862,140 Win % 56.2% #225,565 KD 0.91 #1,872,850 Kills/match 4.00 #1,851,780 Kills/min 0.20 #1,949,224 ', ' K/D 1.00 Kills/Match 4.40 Kills 66 Deaths 66 Win % 53.33 Wins 8 Losses 7 Abandons 0 Rank GOLD III Max Rank GOLD III MMR 2,679 Max MMR 2,776 ', ' K/D 0.75 Kills/Match 3.00 Kills 3 Deaths 4 Win % 0.00 Wins 0 Losses 1 Abandons 0 Rank - Max Rank - MMR 2,377 Max MMR 0 ']
and i want every string in the list above eg. Wins and the 376 to be in a seperate list as it own object in a list.
How can i achieve this?
s = [' Wins 376 #2,258,590 Win % 52.7% #2,043,164 Kills 2,396 #2,202,841 KD 1.01 #2,327,891 ', ' Wins 147 Win % 53.3% Kills 918 K/D 1.05 ', ' Headshot % 43.07% #1,022,726 KD 1.01 #2,327,891 Deaths 2,361 #2,333,651 Headshots 1,032 #1,959,486 Wins 376 #2,258,590 Losses 337 #2,422,718 Win % 52.7% #2,043,164 Time Played 164h #2,450,537 Matches Played 713 #2,341,103 Total XP 6,138,745 #2,351,090 Melee Kills 110 #880,508 Blind Kills 3 #1,820,018 ', ' Time Played 121h 1m 21s #2,077,653 Wins 294 #1,967,656 Losses 264 #2,055,170 Matches 558 #2,014,995 Deaths 1,751 #1,941,493 Kills 1,775 #1,870,100 Win % 52.7% #2,370,676 KD 1.01 #2,469,095 Kills/match 3.18 #1,906,054 Kills/min 0.24 #1,553,218 ', ' Time Played 11h 39m 27s #2,992,790 Wins 18 #2,713,886 Losses 14 #3,233,844 Matches 32 #2,996,579 Deaths 140 #3,027,150 Kills 128 #2,862,140 Win % 56.2% #225,565 KD 0.91 #1,872,850 Kills/match 4.00 #1,851,780 Kills/min 0.20 #1,949,224 ', ' K/D 1.00 Kills/Match 4.40 Kills 66 Deaths 66 Win % 53.33 Wins 8 Losses 7 Abandons 0 Rank GOLD III Max Rank GOLD III MMR 2,679 Max MMR 2,776 ', ' K/D 0.75 Kills/Match 3.00 Kills 3 Deaths 4 Win % 0.00 Wins 0 Losses 1 Abandons 0 Rank - Max Rank - MMR 2,377 Max MMR 0 ']
# ss = [[j for j in i.split(" ") if j != ""] for i in s ]
ss = [i.split() for i in s] # more efficient
for i in ss:
print(i)
['Wins', '376', '#2,258,590', 'Win', '%', '52.7%', '#2,043,164', 'Kills', '2,396', '#2,202,841', 'KD', '1.01', '#2,327,891']
['Wins', '147', 'Win', '%', '53.3%', 'Kills', '918', 'K/D', '1.05']
['Headshot', '%', '43.07%', '#1,022,726', 'KD', '1.01', '#2,327,891', 'Deaths', '2,361', '#2,333,651', 'Headshots', '1,032', '#1,959,486', 'Wins', '376', '#2,258,590', 'Losses', '337', '#2,422,718', 'Win', '%', '52.7%', '#2,043,164', 'Time', 'Played', '164h', '#2,450,537', 'Matches', 'Played', '713', '#2,341,103', 'Total', 'XP', '6,138,745', '#2,351,090', 'Melee', 'Kills', '110', '#880,508', 'Blind', 'Kills', '3', '#1,820,018']
['Time', 'Played', '121h', '1m', '21s', '#2,077,653', 'Wins', '294', '#1,967,656', 'Losses', '264', '#2,055,170', 'Matches', '558', '#2,014,995', 'Deaths', '1,751', '#1,941,493', 'Kills', '1,775', '#1,870,100', 'Win', '%', '52.7%', '#2,370,676', 'KD', '1.01', '#2,469,095', 'Kills/match', '3.18', '#1,906,054', 'Kills/min', '0.24', '#1,553,218']
['Time', 'Played', '11h', '39m', '27s', '#2,992,790', 'Wins', '18', '#2,713,886', 'Losses', '14', '#3,233,844', 'Matches', '32', '#2,996,579', 'Deaths', '140', '#3,027,150', 'Kills', '128', '#2,862,140', 'Win', '%', '56.2%', '#225,565', 'KD', '0.91', '#1,872,850', 'Kills/match', '4.00', '#1,851,780', 'Kills/min', '0.20', '#1,949,224']
['K/D', '1.00', 'Kills/Match', '4.40', 'Kills', '66', 'Deaths', '66', 'Win', '%', '53.33', 'Wins', '8', 'Losses', '7', 'Abandons', '0', 'Rank', 'GOLD', 'III', 'Max', 'Rank', 'GOLD', 'III', 'MMR', '2,679', 'Max', 'MMR', '2,776']
['K/D', '0.75', 'Kills/Match', '3.00', 'Kills', '3', 'Deaths', '4', 'Win', '%', '0.00', 'Wins', '0', 'Losses', '1', 'Abandons', '0', 'Rank', '-', 'Max', 'Rank', '-', 'MMR', '2,377', 'Max', 'MMR', '0']
I am trying to open this CSV file to then parse the data into columns. The problem is the way the data comes in is causing me problems. Wheni try to run a python script i get all the data in each sentence encclosed with a [' DATA HERE ']. I want to parse the data into columns like 'Account#', 'Service Address', 'City', etc. Just like the column names that are already in place below. The way this data is structured like i said is weird because it has column heads above and below. For example the column header 'Account #' has a second column header below as 'rate code'. Not sure the best way to go about this and would like to get some input from the experts.
Python Script
import csv
with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
print(line)
Result
[' XYZ COMPANY DATE : 09/28/18 ']
[' PAGE : 1 ']
[' ELECTRIC BILL STATEMENT ']
[' ']
[' CUSTOMER NAME: XYZ CUSTOMER SUMMARY BILL NUMBER: 12345-67890 IF YOU HAVE ANY QUESTIONS, ']
[' CUSTOMER NUMBER: 1111111 PLEASE CONTACT: ']
[' MAILING ADDRESS: 4122 RICHARDSON ST ']
[' BILLING DATE: 09/28/18 SUMB#XYZ.COM45 ']
[' SANFORD FL 32771 PAST DUE DATE: 10/09/18 (305)333-3333 ']
[' ']
[' ']
[' READ SVC B MAXIMUM TOTAL DUE METER NO REMARKS ']
[' ACCOUNT # SERVICE ADDRESS CITY DATE DAY C KWH KWD AMOUNT ']
[' RATE CODE CY CUSTOMER NAME MAILING ADDRESS ']
[' ---------------------------------------------------------------------------------------------------------------------------------- ']
[' 11111-22222 485 JOHNSON AVE APT 1405 MIAMI 09/26/18 28 C 140 29.11 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 22222-33333 485 JOHNSON AVE APT 3541 MIAMI 09/26/18 28 C 130 28.08 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 33333-44444 485 JOHNSON AVE APT 4544 MIAMI 09/26/18 28 C 172 32.42 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
[' ']
[' 55555-66666 485 JOHNSON ST AVE APT 1111 MIAMI 09/26/18 28 C 243 39.81 BAT0123 ']
[' RS-1 XYZ COMPANY 485 JOHNSON AVE ']
Question: I want to parse the data into column
Note: The simple regex will split on - and / also. If you expand the regex to your needs, this could be avoided.
import re
rc = re.compile(r'(\w+)')
with open('C:/Users/DEMO/Documents/statement-9-28-18.csv', 'r') as itxt:
for n, line in enumerate(itxt.readline(), 1):
# Row 13 and 14 hold the Header
if n in [13, 14]:
findall = re.findall(rc, line)
print("{}".format(findall))
if n >= 16 and n%3 > 0:
findall = re.findall(rc, line)
print("{}".format(findall))
Output:
['ACCOUNT', 'SERVICE', 'ADDRESS', 'CITY', 'DATE', 'DAY', 'C', 'KWH', 'KWD', 'AMOUNT']
['RATE', 'CODE', 'CY', 'CUSTOMER', 'NAME', 'MAILING', 'ADDRESS']
['11111', '22222', '485', 'JOHNSON', 'AVE', 'APT', '1405', 'MIAMI', '09', '26', '18', '28', 'C', '140', '29', '11', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
['22222', '33333', '485', 'JOHNSON', 'AVE', 'APT', '3541', 'MIAMI', '09', '26', '18', '28', 'C', '130', '28', '08', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
['33333', '44444', '485', 'JOHNSON', 'AVE', 'APT', '4544', 'MIAMI', '09', '26', '18', '28', 'C', '172', '32', '42', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
['55555', '66666', '485', 'JOHNSON', 'ST', 'AVE', 'APT', '1111', 'MIAMI', '09', '26', '18', '28', 'C', '243', '39', '81', 'BAT0123']
['RS', '1', 'XYZ', 'COMPANY', '485', 'JOHNSON', 'AVE']
Tested with Python: 3.4.2
A=[['1'], ['2'], ['3'], ['4'], ['5'], ['6'], ['7'], ['8'], ['9'], ['10'], ['11'], ['12'], ['13'], ['14'], ['15'], ['16'], ['17'], ['18'], ['19'], ['20'], ['21'], ['22'], ['23'], ['24'], ['25'], ['26'], ['27'], ['29'], ['30'], ['31'], ['32'], ['33'], ['34'], ['35'], ['36']]
B=[['Andaman and Nicobar Islands', ' '], ['Andhra Pradesh'], ['Arunachal Pradesh'], ['Assam'], ['Bihar'], ['Chandigarh', ' '], ['Chhattisgarh'], ['Dadra and Nagar Haveli', ' '], ['Daman and Diu', ' '], ['National Capital Territory of Delhi', ' '], ['Goa'], ['Gujarat'], ['Haryana'], ['Himachal Pradesh'], ['Jammu and Kashmir'], ['Jharkhand'], ['Karnataka'], ['Kerala'], ['Lakshadweep', ' '], ['Madhya Pradesh'], ['Maharashtra'], ['Manipur'], ['Meghalaya'], ['Mizoram'], ['Nagaland'], ['Odisha'], ['Puducherry', ' '], ['Rajasthan'], ['Sikkim'], ['Tamil Nadu'], ['Telangana'], ['Tripura'], ['Uttar Pradesh'], ['Uttarakhand'], ['West Bengal']]
C=[['Port Blair'], ['Hyderabad', ' ', '(', 'de jure', ' to 2024)', '\n', 'Amaravati', ' ', '(', 'de facto', ' from 2017)', '[3]', ' ', '[4]', ' ', '[a]'], ['Itanagar'], ['Dispur'], ['Patna'], ['Chandigarh', '[c]'], ['Naya Raipur', '[d]'], ['Silvassa'], ['Daman'], ['New Delhi'], ['Panaji', '[e]'], ['Gandhinagar'], ['Chandigarh'], ['Shimla', '\n', 'Dharamshala', ' (W/2nd)', '[8]', '\n'], ['Srinagar', '\xa0(Summer)', '\n', 'Jammu', '\xa0(Winter)'], ['Ranchi'], ['Bengaluru'], ['Thiruvananthapuram'], ['Kavaratti'], ['Bhopal'], ['Mumbai', '[g]', '\n', 'Nagpur', '\xa0(W/2nd)', '[h]'], ['Imphal'], ['Shillong'], ['Aizawl'], ['Kohima'], ['Bhubaneswar'], ['Puducherry'], ['Jaipur'], ['Gangtok', '[j]'], ['Chennai', '[k]'], ['Hyderabad', '[l]'], ['Agartala'], ['Lucknow'], ['Dehradun', '[m]'], ['Kolkata']]
I have the above three lists and I want it to convert them to a pandas dataframe in the following format:
Numbers State/UT Capital
1 Andaman and Nicobar Islands Port Blair
2 Andhra Pradesh Hyderabad
You can use itertools and zip to help with this:
from itertools import chain
import pandas as pd
df = pd.DataFrame({'Numbers': list(chain.from_iterable(A)),
'State/UT Capital': [' '.join([i[0], j[0]]) for i, j in zip(B, C)]})
Result:
Numbers State/UT Capital
0 1 Andaman and Nicobar Islands Port Blair
1 2 Andhra Pradesh Hyderabad
2 3 Arunachal Pradesh Itanagar
3 4 Assam Dispur
.........