Pandas Dataframe - python

I want to represent data using pandas dataframe , the column name - Product Title and populate t .
For eg :
Product Title
Marvel : Movies Collection
Marvel
Diney Movie and so on..
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
r= requests.get("http://www.walmart.com/search/?query=marvel&cat_id=4096_530598")
r.content
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class" : "tile-conent"})
g_price = soup.find_all("div",{"class" : "item-price-container"})
g_star = soup.find_all("div",{"class" : "stars stars-small tile-row"})
for product_title in g_data:
a_product_title = product_title.find_all("a","js-product-title")
for text_product_title in a_product_title :
t = text_product_title.text
print t
Desired Output-
Product Title :
Marvel Heroes: Collection
Marvel: Guardians Of The Galaxy (Widescreen)
Marvel Complete Giftset (Widescreen)
Marvel's The Avengers (Widescreen)
Marvel Knights: Wolverine Versus Sabretooth - Reborn (Widescreen)
Superheroes Collection: The Incredible Hulk Returns / The Trial Of The Incredible Hulk / How To Draw Comics The Marvel Way (Widescreen)
Marvel: Iron Man & Hulk - Heroes United (Widescreen)
Marvel's The Avengers (DVD + Blu-ray) (Widescreen)
Captain America: The Winter Soldier (Widescreen)
Iron Man 3 (DVD + Digital Copy) (Widescreen)
Thor: The Dark World (Widescreen)
Spider-Man (2-Disc) (Special Edition) (Widescreen)
Elektra / Fantastic Four / Daredevil (Director's Cut) / Fantastic Four 2: Rise Of The Silver Surfer
Spider-Man / Spider-Man 2 / Spider-Man 3 (Widescreen)
Spider-Man 2 (Widescreen)
The Punisher (Extended Cut) (Widescreen)
DC Showcase: Superman / Shazam!: The Return Of The Black Adam
Ultimate Avengers: The Movie (Widescreen)
The Next Avengers: Heroes Of Tomorrow (Widescreen)
Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)
I tired append function and join but it dint work.. Do we have any specific function this in pandas dataframe?
The desired output should be outcome of using Pandas dataframe.

Well this will get you started, this extracts all the titles into a dict (I use a defaultdict for convenience):
In [163]:
from collections import defaultdict
data=defaultdict(list)
for product_title in g_data:
a_product_title = product_title.find_all("a","js-product-title")
for text_title in a_product_title:
data['Product title'].append(text_title.text)
df = pd.DataFrame(data)
df
Out[163]:
Product title
0 Marvel Heroes: Collection
1 Marvel: Guardians Of The Galaxy (Widescreen)
2 Marvel Complete Giftset (Widescreen)
3 Marvel's The Avengers (Widescreen)
4 Marvel Knights: Wolverine Versus Sabretooth - ...
5 Superheroes Collection: The Incredible Hulk Re...
6 Marvel: Iron Man & Hulk - Heroes United (Wides...
7 Marvel's The Avengers (DVD + Blu-ray) (Widescr...
8 Captain America: The Winter Soldier (Widescreen)
9 Iron Man 3 (DVD + Digital Copy) (Widescreen)
10 Thor: The Dark World (Widescreen)
11 Spider-Man (2-Disc) (Special Edition) (Widescr...
12 Elektra / Fantastic Four / Daredevil (Director...
13 Spider-Man / Spider-Man 2 / Spider-Man 3 (Wide...
14 Spider-Man 2 (Widescreen)
15 The Punisher (Extended Cut) (Widescreen)
16 DC Showcase: Superman / Shazam!: The Return Of...
17 Ultimate Avengers: The Movie (Widescreen)
18 The Next Avengers: Heroes Of Tomorrow (Widescr...
19 Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)
So you can modify this script to add the price and actors as keys to the data dict and then construct the df from the resultant dict, this will be better than appending a row at a time

Related

How can I sort data by bins using groupby in pandas?

Question: How can I sort data by bins using groupby in pandas?
What I want is the following:
release_year listed_in
1920 Documentaries
1930 TV Shows
1940 TV Shows
1950 Classic Movies, Documentaries
1960 Documentaries
1970 Classic Movies, Documentaries
1980 Classic Movies, Documentaries
1990 Classic Movies, Documentaries
2000 Classic Movies, Documentaries
2010 Children & Family Movies, Classic Movies, Comedies
2020 Classic Movies, Dramas
To achieve this result I tried the following formula:
bins = [1925,1950,1960,1970,1990,2000,2010,2020]
groups = df.groupby(['listed_in', pd.cut(df.release_year, bins)])
groups.size().unstack()
It shows the following result:
release_year (1925,1950] (1950,1960] (1960,1970] (1970,1990] (1990,2000] (2000,2010] (2010, 2020]
listed_in
Action & Adventure 0 0 0 0 9 16 43
Action & Adventure, Anime Features, Children & Family Movies 0 0 0 0 0 0 1
Action & Adventure, Anime Features, Classic Movies 0 0 0 1 0 0 0
...
461 rows x 7 columns
I also tried the following formula:
df['release_year'] = df['release_year'].astype(str).str[0:2] + '0'
df.groupby('release_year')['listed_in'].apply(lambda x: x.mode().iloc[0])
The result was the following:
release_year
190 Dramas
200 Documentaries
Name: listed_in, dtype:object
Here is a sample of the dataset:
import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
The simplest way to do this is use the first part of your code and simply make the last digit of the release_year a 0. Then you can .groupby decades and get the most popular genres for each decade i.e. the mode:
input:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',np.nan,np.nan],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'],
'country':['United States, India, South Korea, China',
'United Kingdom','United States'],
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})
code:
df['release_year'] = df['release_year'].astype(str).str[0:3] + '0'
df = df.groupby('release_year', as_index=False)['listed_in'].apply(lambda x: x.mode().iloc[0])
df
output:
release_year listed_in
0 2010 Children & Family Movies, Comedies

Python 3 Beautifulsoup to scrape county names from a gov.uk webpage

I would be grateful for any help!
I'm trying to scrape the county names on this webpage (https://www.gov.uk/guidance/full-list-of-local-restriction-tiers-by-area) into four corresponding lists: Tier1, Tier2, Tier3, Tier4.
The issue is how I'm navigating the page...
This is how I'm setting my soup.
from bs4 import BeautifulSoup
url = "https://www.gov.uk/guidance/full-list-of-local-restriction-tiers-by-area"
headers = {...}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
I've tried finding the h2s and then looping through the siblings, find_all_next, etc. but I haven't had any luck.
Endstate
I'm trying to put each of the counties into a CSV that looks like this:
(colour is mapped as follows Tier 1:green, Tier 2:yellow, Tier 3:amber, Tier 4:red)
County
Country
Tier
colour
Isles of Scilly
England
1
Green
Rutland
England
3
Amber
etc.
Update: As a bare minimum example of the data to be extracted:
from bs4 import BeautifulSoup
html = '''<div class="govspeak">
<ul>
<li>case detection rates in all age groups</li>
</ul>
<h2 id="tier-1-medium-alert">Tier 1: Medium alert</h2>
<h3 id="south-west">South West</h3>
<ul>
<li>Isles of Scilly</li>
</ul>
<h2 id="tier-2-high-alert">Tier 2: High alert</h2>
<p>No areas are currently in Tier 2.</p>
<h2 id="tier-3-very-high-alert">Tier 3: Very High alert</h2>
<h3 id="east-midlands">East Midlands</h3>
<ul>
<li>Rutland</li>
</ul>
<h3 id="north-west">North West</h3>
<ul>
<li>Liverpool City Region</li>
</ul>
</div>'''
soup = BeautifulSoup(html, "lxml")
h2 = soup.find_all('h2')
# Whats the best way to find related li tags?
The issue is that the HTML is the H2 & UL are in a flat structure. There are many ways to extract the data. For example preforming For loop on every element.
soup.find('div', {"class": "govspeak"}) - Find the parent div (that contains h2 & li).
container.find_all('li') - Find all the li.
x.fetchPrevious('h2')[0].text.strip() - Find the first [0] previous h2 (and remove any whitespaces).
if x.fetchPrevious('h2')[0].findParent('div', {"class": "govspeak"}) - filter out any h2 that don't appear inside the parent div. (as fetchPrevious will literally find the previous).
namedtuple (which I've called CountyTierModel) to store the scraped data as an array.
re.search("(?<=Tier )\d(?=:)", x.tier) - RegEx to fetch number from h2 title.
Example for scraping data:
from collections import namedtuple
import re
import requests
from bs4 import BeautifulSoup
CountyTierModel = namedtuple('CountyTiers', ['tier', 'county'])
url = "https://www.gov.uk/guidance/full-list-of-local-restriction-tiers-by-area"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
container = soup.find('div', {"class": "govspeak"})
results = [CountyTierModel(x.fetchPrevious('h2')[0].text.strip(), x.text.strip()) for x in container.find_all('li')
if x.fetchPrevious('h2') and x.fetchPrevious('h2')[0].findParent('div', {"class": "govspeak"})]
# Here you can write yor code to convert to CSV & provide mapping for Country & Color.
for x in results:
# Regex to extract number from H2 title based on starting with 'Tier ' + number + ':'
m = re.search("(?<=Tier )\d(?=:)", x.tier)
print(f"{m.group(0)} - {x.county}")
Output:
1 - Isles of Scilly
3 - Rutland
3 - Liverpool City Region
3 - Bath and North East Somerset
3 - Bristol
3 - Cornwall
3 - Devon, Plymouth and Torbay
3 - Dorset
3 - North Somerset
3 - South Gloucestershire
3 - Wiltshire
3 - Herefordshire
3 - Shropshire, and Telford and Wrekin
3 - Worcestershire
3 - City of York and North Yorkshire
3 - The Humber: East Riding of Yorkshire, Kingston upon Hull/Hull, North East Lincolnshire and North Lincolnshire
3 - South Yorkshire (Barnsley, Doncaster, Rotheram, Sheffield)
3 - West Yorkshire (Bradford, Calderdale, Kirklees, Leeds, Wakefield)
4 - Derby and Derbyshire
4 - Leicester City and Leicestershire
4 - Lincolnshire
4 - Northamptonshire
4 - Nottingham and Nottinghamshire
4 - Bedford, Central Bedfordshire, Luton and Milton Keynes
4 - Cambridgeshire
4 - Essex, Southend-on-Sea and Thurrock
4 - Hertfordshire
4 - Norfolk
4 - Peterborough
4 - Suffolk
4 - All 32 London boroughs plus City of London
4 - North East Combined Authority (this area includes the local authorities of County Durham, Gateshead, South Tyneside and Sunderland)
4 - North of Tyne Combined Authority (this area includes the local authorities of Newcastle-upon-Tyne, North Tyneside and Northumberland)
4 - Tees Valley Combined Authority (this area includes the local authorities of Darlington, Hartlepool, Middlesbrough, Redcar and Cleveland, and Stockton-on-Tees)
4 - Cumbria
4 - Greater Manchester
4 - Lancashire, Blackburn with Darwen, and Blackpool
4 - Warrington and Cheshire Region
4 - Berkshire
4 - Brighton and Hove, East Sussex and West Sussex
4 - Buckinghamshire
4 - Hampshire, Southampton and Portsmouth
4 - Isle of Wight
4 - Kent and Medway
4 - Oxfordshire
4 - Surrey
4 - Bournemouth, Christchurch and Poole
4 - Gloucestershire (Cheltenham, Cotswold, Forest of Dean, Gloucester City, Stroud and Tewkesbury)
4 - Somerset (Mendip, Sedgemoor, Somerset West and Taunton, and South Somerset)
4 - Swindon
4 - Birmingham, Dudley, Sandwell, Walsall and Wolverhampton
4 - Coventry
4 - Solihull
4 - Staffordshire and Stoke-on-Trent
4 - Warwickshire
Note: To keep the question focused; I've only added code for scraping. Extracting to CSV should be a separate question.

Search text from list in a table row and extracting it in new column if finding one

I have a list of titles where there are esports teams names. I need to extract them in a separate column. It would be easy if titles had same 'mask' but they are different.
I am wondering if there is a way to extract team name from title if I have a list of all teams. Like make a code to loop through each row and duplicate team name in new column if it finds one.
Team List:
teams = ['IG.V', 'Matador', 'Galaxy Racer', 'MG.Trust', 'Five brothers', 'Team Rocket', 'Cobra Gaming', 'Revenge Gaming', 'Secret', 'Virtus.pro']
Expected result
N
Title
team1
team2
1
[RU] IG.V 0:1 Matador (BO2) MS Mid-Autumn #Skor #fourApril
Ig.V
Matador
2
[RU] Galaxy Racer 0:0 MG.Trust (BO3) Moon Studio Carnival Cup #Mantis
Galaxy Racer
MG.Trust
3
[RU/EN] Five brothers - Team Rockets Asian Gold Occupation S19
Five Brothers
Team Rocket
4
[RU/EN] Cobra Gaming - Revenge Gaming Masters Tournament S13
Cobra Gaming
Revenge gaming
5
LF พากย์ Secret⚔️Virtus.pro (Bo3)🏆EPIC League: Division 1 - รอบแบ่งกลุ่ม
Secret
Virtus.pro
Use Series.str.extractall with escaped values of lsit by re.escape, added word bondaries by \b\b and last add to original by Series.unstack with DataFrame.join :
import re
#change Team Rocket to Team Rockets for match
teams = ['IG.V', 'Matador', 'Galaxy Racer', 'MG.Trust',
'Five brothers', 'Team Rockets', 'Cobra Gaming',
'Revenge Gaming', 'Secret', 'Virtus.pro']
pat = "(" + '|'.join(r"\b{}\b".format(re.escape(x)) for x in teams) + ")"
df = df.join(df['Title'].str.extractall(pat)[0].unstack().add_prefix('team'))
print (df)
N Title team0 \
0 1 [RU] IG.V 0:1 Matador (BO2) MS Mid-Autumn #Sko... IG.V
1 2 [RU] Galaxy Racer 0:0 MG.Trust (BO3) Moon Stud... Galaxy Racer
2 3 [RU/EN] Five brothers - Team Rockets Asian Gol... Five brothers
3 4 [RU/EN] Cobra Gaming - Revenge Gaming Masters ... Cobra Gaming
4 5 LF พากย์ Secret⚔️Virtus.pro (Bo3)🏆EPIC League:... Secret
team1
0 Matador
1 MG.Trust
2 Team Rockets
3 Revenge Gaming
4 Virtus.pro
EDIT: After some test solution is convert values to strings by .astype(str):
df1 = pd.read_excel('vcxvcvx.xlsx')
# print (df1)
df2 = pd.read_csv('testcase.csv', index_col=[0])
# print (df2)
pat = "(" + '|'.join(r"\b{}\b".format(re.escape(x)) for x in df1['Teams'].astype(str)) + ")"
df = df2.join(df2['title'].str.extractall(pat)[0].unstack().add_prefix('team'))
# print (df)

Extracting year from a column of string movie names

I have the following data, having two columns, "name" and "gross" in table called train_df:
gross name
760507625.0 Avatar (2009)
658672302.0 Titanic (1997)
652270625.0 Jurassic World (2015)
623357910.0 The Avengers (2012)
534858444.0 The Dark Knight (2008)
532177324.0 Rogue One (2016)
474544677.0 Star Wars: Episode I - The Phantom Menace (1999)
459005868.0 Avengers: Age of Ultron (2015)
448139099.0 The Dark Knight Rises (2012)
436471036.0 Shrek 2 (2004)
424668047.0 The Hunger Games: Catching Fire (2013)
423315812.0 Pirates of the Caribbean: Dead Man's Chest (2006)
415004880.0 Toy Story 3 (2010)
409013994.0 Iron Man 3 (2013)
408084349.0 Captain America: Civil War (2016)
408010692.0 The Hunger Games (2012)
403706375.0 Spider-Man (2002)
402453882.0 Jurassic Park (1993)
402111870.0 Transformers: Revenge of the Fallen (2009)
400738009.0 Frozen (2013)
381011219.0 Harry Potter and the Deathly Hallows: Part 2 (2011)
380843261.0 Finding Nemo (2003)
380262555.0 Star Wars: Episode III - Revenge of the Sith (2005)
373585825.0 Spider-Man 2 (2004)
370782930.0 The Passion of the Christ (2004)
I would like to read and extract the date from "name" to create a new column that will be called "year", which I will then use to filter the data set by specific year.
The new table will look like the following:
year gross name
2009 760507625.0 Avatar (2009)
1997 658672302.0 Titanic (1997)
2015 652270625.0 Jurassic World (2015)
2012 623357910.0 The Avengers (2012)
2008 534858444.0 The Dark Knight (2008)
I tried the apply and lambda approach, but got no results:
train_df[train_df.apply(lambda row: row['name'].startswith('2014'),axis=1)]
Is there a way to use contains (as in C# or "isin" to filter the strings in python?
If you know for sure that your years are going to be at the end of the string, you can do
df['year'] = df['name'].str[-5:-1].astype(int)
This takes the column name, uses the str accessor to access the value of each row as a string, and takes the -5:-1 slice from it. Then, it converts the result to int, and sets it as the year column. This approach will be much faster than iterating over the rows if you have lots of data.
Alternatively, you could use regex for more flexibility using the .extract() method of the str accessor.
df['year'] = df['name'].str.extract(r'\((\d{4})\)').astype(int)
This extracts groups matching the expression \((\d{4})\) (Try it here), which means capture the numbers inside a pair of parentheses containing exactly four digits, and will work anywhere in the string. To anchor it to the end of your string, use a $ at the end of your regex like so: \((\d{4})\)$. The result is the same using regex and using string slicing.
Now we have our new dataframe:
gross name year
0 760507625.0 Avatar (2009) 2009
1 658672302.0 Titanic (1997) 1997
2 652270625.0 Jurassic World (2015) 2015
3 623357910.0 The Avengers (2012) 2012
4 534858444.0 The Dark Knight (2008) 2008
5 532177324.0 Rogue One (2016) 2016
6 474544677.0 Star Wars: Episode I - The Phantom Menace (1999) 1999
7 459005868.0 Avengers: Age of Ultron (2015) 2015
8 448139099.0 The Dark Knight Rises (2012) 2012
9 436471036.0 Shrek 2 (2004) 2004
10 424668047.0 The Hunger Games: Catching Fire (2013) 2013
11 423315812.0 Pirates of the Caribbean: Dead Man's Chest (2006) 2006
12 415004880.0 Toy Story 3 (2010) 2010
13 409013994.0 Iron Man 3 (2013) 2013
14 408084349.0 Captain America: Civil War (2016) 2016
15 408010692.0 The Hunger Games (2012) 2012
16 403706375.0 Spider-Man (2002) 2002
17 402453882.0 Jurassic Park (1993) 1993
18 402111870.0 Transformers: Revenge of the Fallen (2009) 2009
19 400738009.0 Frozen (2013) 2013
20 381011219.0 Harry Potter and the Deathly Hallows: Part 2 (... 2011
21 380843261.0 Finding Nemo (2003) 2003
22 380262555.0 Star Wars: Episode III - Revenge of the Sith (... 2005
23 373585825.0 Spider-Man 2 (2004) 2004
24 370782930.0 The Passion of the Christ (2004) 2004
You can a regular expression with pandas.Series.str.extract for this:
df["year"] = df["name"].str.extract(r"\((\d{4})\)$", expand=False)
df["year"] = pd.to_numeric(df["year"])
print(df.head())
gross name year
0 760507625.0 Avatar (2009) 2009
1 658672302.0 Titanic (1997) 1997
2 652270625.0 Jurassic World (2015) 2015
3 623357910.0 The Avengers (2012) 2012
4 534858444.0 The Dark Knight (2008) 2008
The regular expression:
\(: find where there is a literal opening parentheses
(\d{4}) Then, find 4 digits appearing next to each other
The parentheses here means that we're storing our 4 digits as a capture group (in this case its the group of digits we want to extract from the larger string)
\): Then, find a closing parentheses
$: All of the above MUST occur at the end of the string
When all of the above criterion are met, get those 4 digits- or if no match is acquired, return NaN for that row.
Try this.
df = ['Avatar (2009)', 'Titanic (1997)', 'Jurassic World (2015)','The Avengers (2012)', 'The Dark Knight (2008)', 'Rogue One (2016)','Star Wars: Episode I - The Phantom Menace (1999)','Avengers: Age of Ultron (2015)', 'The Dark Knight Rises (2012)','Shrek 2 (2004)', 'Boiling Point (1990)', 'Terror Firmer (1999)', 'Adam's Apples (2005)', 'I Want You (1998)', 'Chalet Girl (2011)','Love, Honor and Obey (2000)', 'Perrier's Bounty (2009)','Into the White (2012)', 'The Decoy Bride (2011)','I Spit on Your Grave 2 (2013)']
for i in df:
mov_title = i[:-7]
year = i[-5:-1]
print(mov_title) //do your actual extraction
print(year) //do your actual extraction
def getYear(val):
startIndex = val.find('(')
endIndex = val.find(')')
return val[(int(startIndex) + 1):endIndex]
Am not much of a python dev, but i believe this will do. You will just need to loop through passing each to the above function. On each function call you will get the date extracted for you.

Inserting a header row for pandas dataframe

I have just started python and am trying to rewrite one of my perl scripts in python. Essentially, I had a long script to convert a csv to json.
I've tried to import my csv into a pandas dataframe, and wanted to insert a header row at the top, since my csv lacks that.
Code:
import pandas
db=pandas.read_csv("netmedsdb.csv",header=None)
db
Output:
0 1 2 3
0 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
I wrote the following code to insert the first element at row 0,coloumn 0:
db.insert(loc=0,column='0',value='Brand')
db
Output:
0 0 1 2 3
0 Brand 3M CAVILON NO STING BARRIER FILM SPRAY 28ML OTC 0 Rs.880.00 3M INDIA LTD
1 Brand BACTI BAR SOAP 75GM OTC Rs.98.00 6TH SKIN PHARMACEUTICALS PVT LTD
2 Brand KWIKNIC MINT FLAVOUR 4MG CHEW GUM TABLET 30'S NICOTINE Rs.180.00 A S V LABORATORIES INDIA PVT LTD
3 Brand RIFAGO 550MG TABLET 10'S RIFAXIMIN 550MG Rs.298.00 AAREEN HEALTHCARE
4 Brand 999 OIL 60ML AYURVEDIC MEDICINE Rs.120.00 AAKASH PHARMACEUTICALS
5 Brand AKASH SOAP 75GM AYURVEDIC PRODUCT Rs.80.00 AAKASH PHARMACEUTICALS
6 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
7 Brand GROW CARE OIL 100ML AYURVEDIC MEDICINE Rs.190.00 AAKASH PHARMACEUTICALS
8 Brand RHUNS OIL 30ML AYURVEDIC Rs.50.00 AAKASH PHARMACEUTICALS
9 Brand VILLO CAPSULE 10'S AYURVEDIC MEDICINE Rs.70.00 AAKASH PHARMACEUTICALS
10 Brand VITAWIN FORTE CAPSULE 10'S AYURVEDIC MEDICINE Rs.150.00 AAKASH PHARMACEUTICALS
But unfortunately I got the word "Brand" inserted at coloumn 0 in all rows.
I'm trying to add the header coloumns "Brand", "Generic", "Price", "Company"
Need parameter names in read_csv only:
import pandas as pd
temp=u"""a,b,10,d
e,f,45,r
"""
#after testing replace 'pd.compat.StringIO(temp)' to 'netmedsdb.csv'
df = pd.read_csv(pd.compat.StringIO(temp), names=["Brand", "Generic", "Price", "Company"])
print (df)
Brand Generic Price Company
0 a b 10 d
1 e f 45 r

Categories

Resources