Add data_frame column while fulfill condition in dict

Add data_frame column while fulfill condition in dict - python

I am trying to add a column to a pandas.DataFrame. If the string in the DataFrame has one or more words as a key in a dict. But it gives me an error, and I don't know what went wrong. Could anyone help?
data_frame:
tw_test.head()
tweet
0 living the dream. #cameraman #camera #camerac...
1 justin #trudeau's reasons for thanksgiving. to...
2 #themadape butt…..butt…..we’re allergic to l...
3 2 massive explosions at peace march in #turkey...
4 #mulcair suggests there’s bad blood between hi...
dict:
party={}
{'#mulcair': 'NDP', '#cdnleft': 'liberal', '#LiberalExpress': 'liberal', '#ThankYouStephenHarper': 'Conservative ', '#pmjt': 'liberal'...}
My code:
tw_test["party"]=tw_test["tweet"].apply(lambda x: party[x.split(' ')[1].startswith("#")[0]])

I believe your trouble was due to trying to cram too much into a lambda. A function to do the lookup was pretty straight forward:
Code:
party_tags = {
'#mulcair': 'NDP',
'#cdnleft': 'liberal',
'#LiberalExpress': 'liberal',
'#ThankYouStephenHarper': 'Conservative ',
'#pmjt': 'liberal'
}
def party(tweet):
for tag in [t for t in tweet.split() if t.startswith('#')]:
if tag in party_tags:
return party_tags[tag]
Test Code:
import pandas as pd
tw_test = pd.DataFrame([x.strip() for x in u"""
living the dream. #cameraman #camera #camerac
justin #trudeau's reasons for thanksgiving. to
#themadape butt…..butt…..we’re allergic to
2 massive explosions at peace march in #turkey
#mulcair suggests there’s bad blood between
""".split('\n')[1:-1]], columns=['tweet'])
tw_test["party"] = tw_test["tweet"].apply(party)
print(tw_test)
Results:
tweet party
0 living the dream. #cameraman #camera #camerac None
1 justin #trudeau's reasons for thanksgiving. to None
2 #themadape butt…..butt…..we’re allergic to None
3 2 massive explosions at peace march in #turkey None
4 #mulcair suggests there’s bad blood between NDP

Related

How to search specific string in another col

What I want to do is, I have several columns and I wanna search each colum string in a column here is the example.
Here is my df.head()
Check index reviews summary output1 output2 ... output35 output36 output37 output38 output39 output40
0 True 1 After realizing my old mascara had a petroleum... Output: Quality: love the wand (positive); Len... Quality: love the wand (positive) Lengthening: lengthens my lashes well (positive) ... None None None None None None
1 True 2 Best mascara I’ve ever used. Makes my non-exis... Output: Makes lashes visible (positive); Non-e... Makes lashes visible (positive) Non-existent Asian lashes (negative) ... None None None None None None
2 True 3 I've never had a mascara that made my lashes l... Output: Look: long lashes (positive) Look: long lashes (positive) None ... None None None None None None
3 True 4 It is clump and smudge-free with an awesome la... Output: Clump-free (positive); Smudge-proof (p... Clump-free (positive) Smudge-proof (positive) ... None None None None None None
4 True 5 And I’m going to buy it again, and again, and ... Output: Quality: impressed by a mascara before... Quality: impressed by a mascara before (positive) Extends Lashes: huge impact (positive) ... None None None None None None
What I want to do is to check if the values of all output columns (except columns with N/A in them) are inside the string in the reviews column.
here is a example for reviews column and the output1 column value.
Reviews = "After realizing my old mascara had a petroleum by-product in it, I needed a new one I felt good about putting on my beautiful lashes. I needed one that had a clean formula, produced by a company making more environmentally sustainable efforts. I was so excited to receive this mascara after ordering it, and it did not disappoint. I love natural, clean beauty products. I love the wand on this mascara, and it lengthens my lashes very well. I like using mascara because I love my lashes, and I don't want to use falsies or extensions. This mascara takes me from girlboss to goddess in less than 2 minutes. It's so easy to use, non-messy, and dries fast. Even if you do or don't feel like doing a full-face of makeup, this mascara will upgrade your look. It is a must-have and worth every cent."
Output1 = "Quality: love the wand (positive)"
I wanna search "love the wand" value is in reviews column
## Read your DataFrame
df = pd.read_excel("ilia.xlsx")
df.insert(0, "Check","True")
# Split the values in the 'summary' column and create new columns
split_values = df['summary'].str.split(";", expand=True)
for i in range(split_values.shape[1]):
df[f'output{i+1}'] = split_values[i]
df = df.apply(lambda x: x.str.lstrip() if x.dtype == "object" else x)
df["output1"] = df["output1"].str.replace("Output: ", "")
df["Check"] = df.apply(lambda x: any(i in x['reviews'] for i in x['summary'].split(";")), axis=1)
print(df.head())
df.to_excel("foo.xlsx", index=False)

Seems like this is what you are looking for:
df['reviews'].str.contains('love the wand').any()
From Statology

I have this following algorithms question that I have solved but is not good for different test cases

I am preparing for a technical round and while preparing I encountered this problem through leetcode's interview questions section.
My solution can take 3 items in its input dict anything less than that it throws error.
I would also like to know what do you think the ranking of this question will be in terms of LC easy, medium and hard if it was actually in the problems section of LC.
PROBLEM:
Juan Hernandez is a Shopify merchant that owns a Pepper sauce shop
with five locations: Toronto, Vancouver, Montreal, Calgary and Halifax.
He also sells online and ships his sauces across the country from one
of his brick-and-mortar locations.
The pepper sauces he sells are:
Jalapeño (J)
Habanero (H)
Serrano (S)
The inventory count for each location looks like this:
City J H S
Toronto 5 0 0
Vancouver 10 2 6
Montreal 3 5 5
Calgary 1 18 2
Halifax 28 2 12
Every time he gets an online order, he needs to figure out
which locations can fulfill that order. Write a function that
takes an order as input and outputs a list of locations which
have all the items in stock.
Example
Input : J:3. H:2 s:4
Output: Van, Mon, Hali
Input: H:7 S:1
Output: Cal
My Solution:
inven = {
'tor': {'j':5,'h':0,'s':0},
'van': {'j':10,'h':2,'s':6},
'mon': {'j':3,'h':5,'s':5},
'cal': {'j':1,'h':18,'s':2},
'hal': {'j':28,'h':2,'s':12},
}
order = {
'j':3,
'h':2,
's':4
}
def find_order(order):
output = []
for city in inven:
if order['j'] <= inven[city]['j'] and order['h'] <= inven[city]['h'] and order['s'] <= inven[city]['s']:
output.append(city)
return output
print(find_order(order))
Sorry, if the answer is something super easy. I am still kinda new to coding and its my first technical round.
I only know python as of now. If its not your language, a hint toward the right direction will be very helpful.

Here's a way to do it:
inven = {
'tor': {'j':5,'h':0,'s':0},
'van': {'j':10,'h':2,'s':6},
'mon': {'j':3,'h':5,'s':5},
'cal': {'j':1,'h':18,'s':2},
'hal': {'j':28,'h':2,'s':12},
}
order = {
'j':3,
'h':2,
's':4
}
order2 = {
'h':7,
's':1
}
def find_order(order):
return [city for city, amts in inven.items() if all(amt >= order[sauce] for sauce, amt in amts.items() if sauce in order)]
print(find_order(order))
print(find_order(order2))
Output:
['van', 'mon', 'hal']
['cal']
Explanation:
in the list comprehension, we build a list containing each city that satisfies a condition
the condition is that all sauces found in the order are available in a given city in sufficient quantity to fill the order.
Some help from the docs:
all()
list comprehensions
dict.items()

Your solution looks very close to ok. I'm guessing by less then 3 items you mean that not all types of sauces are present in the order. To fix the error that you get in that case you can just check if the dict contains all expected keys ('j', 'h' and 's'), and if some of them are missing, insert them with the value of 0.
def find_order(order):
if 'j' not in order:
order['j'] = 0
if 'h' not in order:
order['h'] = 0
if 's' not in order:
order['s'] = 0
output = []
for city in inven:
if order['j'] <= inven[city]['j'] and order['h'] <= inven[city]['h'] and order['s'] <= inven[city]['s']:
output.append(city)
return output

Is it necessary to re-train BERT models, specifically RoBERTa model?

I am looking for a sentiment analysis code with atleast 80%+ accuracy. I tried Vader and it I found it easy and usable, however it was giving accuracy of 64% only.
Now, I was looking at some BERT models and I noticed it needs to be re-trained? Is that correct? Isn't it pre-trained? is re-training necessary?

You can use pre-trained models from HuggingFace. There are plenty to choose from. Search for emotion or sentiment models
Here is an example of a model with 26 emotions. The current implementation works but is very slow for large datasets.
import pandas as pd
from transformers import RobertaTokenizerFast, TFRobertaForSequenceClassification, pipeline
tokenizer = RobertaTokenizerFast.from_pretrained("arpanghoshal/EmoRoBERTa")
model = TFRobertaForSequenceClassification.from_pretrained("arpanghoshal/EmoRoBERTa")
emotion = pipeline('sentiment-analysis',
model='arpanghoshal/EmoRoBERTa')
# example data
DATA_URI = "https://github.com/AFAgarap/ecommerce-reviews-analysis/raw/master/Womens%20Clothing%20E-Commerce%20Reviews.csv"
dataf = pd.read_csv(DATA_URI, usecols=["Review Text",])
# This is super slow, I will find a better optimization ASAP
dataf = (dataf
.head(50) # comment this out for the whole dataset
.assign(Emotion = lambda d: (d["Review Text"]
.fillna("")
.map(lambda x: emotion(x)[0].get("label", None))
),
)
)
We could also refactor it a bit
...
# a bit faster than the previous but still slow
def emotion_func(text:str) -> str:
if not text:
return None
return emotion(text)[0].get("label", None)
dataf = (dataf
.head(50) # comment this out for the whole dataset
.assign(Emotion = lambda d: (d["Review Text"]
.map(emotion_func)
),
)
)
Results:
Review Text Emotion
0 Absolutely wonderful - silky and sexy and comf... admiration
1 Love this dress! it's sooo pretty. i happene... love
2 I had such high hopes for this dress and reall... fear
3 I love, love, love this jumpsuit. it's fun, fl... love
...
6 I aded this in my basket at hte last mintue to... admiration
7 I ordered this in carbon for store pick up, an... neutral
8 I love this dress. i usually get an xs but it ... love
9 I'm 5"5' and 125 lbs. i ordered the s petite t... love
...
16 Material and color is nice. the leg opening i... neutral
17 Took a chance on this blouse and so glad i did... admiration
...
26 I have been waiting for this sweater coat to s... excitement
27 The colors weren't what i expected either. the... disapproval
...
31 I never would have given these pants a second ... love
32 These pants are even better in person. the onl... disapproval
33 I ordered this 3 months ago, and it finally ca... disappointment
34 This is such a neat dress. the color is great ... admiration
35 Wouldn't have given them a second look but tri... love
36 This is a comfortable skirt that can span seas... approval
...
40 Pretty and unique. great with jeans or i have ... admiration
41 This is a beautiful top. it's unique and not s... admiration
42 This poncho is so cute i love the plaid check ... love
43 First, this is thermal ,so naturally i didn't ... love

you can use pickle.
Pickle lets you.. well pickle your model for later use and in fact, you can use a loop to keep training the model until it reaches a certain accuracy and then exit the loop and pickle the model for later use.
You can find many tutorials on youtube on how to pickel a model.

ERROR while loading & reading custom 20newsgroups corpus with NLTK

I am trying to load the 20newsgroups corpus with the NLTK corpus reader and thereafter I am extracting words from all documents and tagging them. But it is showing error when I am trying to build the word extracted and tagged list.
Here is the CODE:
import nltk
import random
from nltk.tokenize import word_tokenize
newsgroups = nltk.corpus.reader.CategorizedPlaintextCorpusReader(
r"C:\nltk_data\corpora\20newsgroups",
r'(?!\.).*\.txt',
cat_pattern=r'(not_sports|sports)/.*',
encoding="utf8")
documents = [(list(newsgroups.words(fileid)), category)
for category in newsgroups.categories()
for fileid in newsgroups.fileids(category)]
random.shuffle(documents)
And the corresponding ERROR is:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-10-de2a1a6859ea> in <module>()
1 documents = [(list(newsgroups.words(fileid)), category)
----> 2 for category in newsgroups.categories()
3 for fileid in newsgroups.fileids(category)]
4
5 random.shuffle(documents)
<ipython-input-10-de2a1a6859ea> in <listcomp>(.0)
1 documents = [(list(newsgroups.words(fileid)), category)
2 for category in newsgroups.categories()
----> 3 for fileid in newsgroups.fileids(category)]
4
5 random.shuffle(documents)
C:\ProgramData\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py in __len__(self)
231 # iterate_from() sets self._len when it reaches the end
232 # of the file:
--> 233 for tok in self.iterate_from(self._toknum[-1]): pass
234 return self._len
235
C:\ProgramData\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py in iterate_from(self, start_tok)
294 self._current_toknum = toknum
295 self._current_blocknum = block_index
--> 296 tokens = self.read_block(self._stream)
297 assert isinstance(tokens, (tuple, list, AbstractLazySequence)), (
298 'block reader %s() should return list or tuple.' %
C:\ProgramData\Anaconda3\lib\site-packages\nltk\corpus\reader\plaintext.py in _read_word_block(self, stream)
120 words = []
121 for i in range(20): # Read 20 lines at a time.
--> 122 words.extend(self._word_tokenizer.tokenize(stream.readline()))
123 return words
124
C:\ProgramData\Anaconda3\lib\site-packages\nltk\data.py in readline(self, size)
1166 while True:
1167 startpos = self.stream.tell() - len(self.bytebuffer)
-> 1168 new_chars = self._read(readsize)
1169
1170 # If we're at a '\r', then read one extra character, since
C:\ProgramData\Anaconda3\lib\site-packages\nltk\data.py in _read(self, size)
1398
1399 # Decode the bytes into unicode characters
-> 1400 chars, bytes_decoded = self._incr_decode(bytes)
1401
1402 # If we got bytes but couldn't decode any, then read further.
C:\ProgramData\Anaconda3\lib\site-packages\nltk\data.py in _incr_decode(self, bytes)
1429 while True:
1430 try:
-> 1431 return self.decode(bytes, 'strict')
1432 except UnicodeDecodeError as exc:
1433 # If the exception occurs at the end of the string,
C:\ProgramData\Anaconda3\lib\encodings\utf_8.py in decode(input, errors)
14
15 def decode(input, errors='strict'):
---> 16 return codecs.utf_8_decode(input, errors, True)
17
18 class IncrementalEncoder(codecs.IncrementalEncoder):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 6: invalid start byte
I have tried changing the encoding in the corpus reader to ascii and utf16 as well. That's not working either. I am not sure whether the regex I have provided is the right one or not. The filenames in the 20newsgroups corpus are in the form of 2 numbers separated by a hyphen(-), such as:
5-53286
102-53553
8642-104983
The second thing that I am worried about is whether the error is being generated from the document contents when they are being read for feature extraction.
Here are a what documents in 20newsgroups corpus look like:
From: bil#okcforum.osrhe.edu (Bill Conner) Subject: Re: free moral
agency
dean.kaflowitz (decay#cbnewsj.cb.att.com) wrote: : > : > I think
you're letting atheist mythology
: Great start. I realize immediately that you are not interested : in
discussion and are going to thump your babble at me. I would : much
prefer an answer from Ms Healy, who seems to have a : reasonable and
reasoned approach to things. Say, aren't you the : creationist guy
who made a lot of silly statements about : evolution some time ago?
: Duh, gee, then we must be talking Christian mythology now. I : was
hoping to discuss something with a reasonable, logical : person, but
all you seem to have for your side is a repetition : of the same
boring mythology I've seen a thousand times before. : I am deleting
the rest of your remarks, unless I spot something : that approaches an
answer, because they are merely a repetition : of some uninteresting
doctrine or other and contain no thought : at all.
: I have to congratulate you, though, Bill. You wouldn't : know a
logical argument if it bit you on the balls. Such : a persistent lack
of function in the face of repeated : attempts to assist you in
learning (which I have seen : in this forum and others in the past)
speaks of a talent : that goes well beyond my own, meager abilities.
I just don't : seem to have that capacity for ignoring outside
influences.
: Dean Kaflowitz
Dean,
Re-read your comments, do you think that merely characterizing an
argument is the same as refuting it? Do you think that ad hominum
attacks are sufficient to make any point other than you disapproval of
me? Do you have any contribution to make at all?
Bill
From: cmk#athena.mit.edu (Charles M Kozierok) Subject: Re: Jack Morris
In article <1993Apr19.024222.11181#newshub.ariel.yorku.ca> cs902043#ariel.yorku.ca (SHAWN LUDDINGTON) writes: } In article <1993Apr18.032345.5178#cs.cornell.edu> tedward#cs.cornell.edu (Edward [Ted] Fischer) writes: } >In article <1993Apr18.030412.1210#mnemosyne.cs.du.edu> gspira#nyx.cs.du.edu (Greg Spira) writes: } >>Howard_Wong#mindlink.bc.ca (Howard Wong) writes: }
>> } >>>Has Jack lost a bit of his edge? What is the worst start Jack Morris has had? } >> } >>Uh, Jack lost his edge about 5 years ago, and has had only one above } >>average year in the last 5. } > } >Again goes to prove that it is better to be good than lucky. You can }
>count on good tomorrow. Lucky seems to be prone to bad starts (and a } >bad finish last year :-). } > } >(Yes, I am enjoying every last run he gives up. Who was it who said } >Morris was a better signing than Viola?) } } Hey Valentine, I don't see Boston with any world series rings on their } fingers.
oooooo. cheap shot. :^)
} Damn, Morris now has three and probably the Hall of Fame in his } future.
who cares? he had two of them before he came to Toronto; and if the Jays had signed Viola instead of Morris, it would have been Frank who won 20 and got the ring. and he would be on his way to 20 this year, too.
} Therefore, I would have to say Toronto easily made the best } signing.
your logic is curious, and spurious.
there is no reason to believe that Viola wouldn't have won as many games had *he* signed with Toronto. when you compare their stupid W-L records, be sure to compare their team's offensive averages too.
now, looking at anything like the Morris-Viola sweepstakes a year later is basically hindsight. but there were plenty of reasons why it should have been apparent that Viola was the better pitcher, based on previous recent years and also based on age (Frank is almost 5 years younger! how many knew that?). people got caught up in the '91 World Series, and then on Morris' 21 wins last year. wins are the stupidest, most misleading statistic in baseball, far worse than RBI or R. that he won 21 just means that the Jays got him a lot of runs.
the only really valid retort to Valentine is: weren't the Red Sox trying to get Morris too? oh, sure, they *said* Viola was their first choice afterwards, but what should we have expected they would say?
} And don't tell me Boston will win this year. They won't } even be in the top 4 in the division, more like 6th.
if this is true, it won't be for lack of contribution by Viola, so who cares?
-*- charles
Please suggest me whether the error is while loading the documents or while reading the files and extracting words. What do I need to do to load the corpus correctly?

NLTK has corpora loading issues
you can load the useful category data using
from sklearn.datasets import fetch_20newsgroups
cats = ['alt.atheism', 'sci.space']
newsgroups_train = fetch_20newsgroups(subset='train', categories=cats)
Where newsgroups_train.target_names give you categories.

How to convert a list of mixed data type into a dataframe in Python

I have a list of mixed data type looking like this:
list = [['3D prototypes',
'Can print large objects',
'Autodesk Maya/Mudbox',
'3D Studio'],
['We can produce ultra high resolution 3D prints in multiple materials.',
'The quality of our prints beats MakerBot, Form 1, or any other either
powder based or printers using PLA, ABS, Wax or Resin. This printer has
the highest resolution and a very large build size. It prints fully
functional moving parts like a chain or an engine right out of the
printer.',
'The printer is loaded with DurusWhite.',
'Inquire to change the material. There is a $30 surcharge for material
switch.',
"Also please mention your creation's dimensions in mm and if you need
expedite delivery.",
"Printer's Net build size:",
'294 x 192 x 148.6 mm (11.57 x 7.55 x 5.85 in.)',
'The Objet30 features four Rigid Opaque materials and one material that
mimics polypropylene. The Vero family of materials all feature dimensional
stability and high-detail visualization, and are designed to simulate
plastics that closely resemble the end product.',
'PolyJet based printers have a different way of working. These
technologies deliver the highest quality and precision unmatched by the
competition. These type of printers are ideal for professionals, for uses
ranging from casting jewelry to device prototyping.',
'Rigid opaque white (VeroWhitePlus)',
'Rigid opaque black (VeroBlackPlus )',
'Rigid opaque blue (VeroBlue)',
'Rigid opaque gray (VeroGray)',
'Polypropylene-like material (DurusWhite) for snap fit applications'],
'Hub can print invoices',
'postal service',
'Mar 2015',
'Within the hour i',
[u'40.7134', u'-74.0069'],
'4',
['Customer JAMES reviewed Sun, 2015-04-19 05:17: Awesome print!
Good quality, relatively fast shipping, and very responsive to my
questions; would certainly recommend this hub. ',
'Hub XSENIO replied 2 days 16 hours ago: Thanks James! ',
'Customer Sara reviewed Sun, 2015-04-19 00:10: Thank you for going
out of your way to get this to us in time for our shoot. ',
'Hub XSENIO replied 2 days 16 hours ago: Thanks ! ',
'Customer Aaron reviewed Sat, 2015-04-18 02:36: Great service ',
'Hub XSENIO replied 2 days 16 hours ago: Thanks! ',
"Customer Arnoldas reviewed Mon, 2015-03-23 19:47: Xsenio's Hub was
able to produce an excellent quality print , was quick and reliable.
Awesome printing experience! "]]
It has a mixed data type looking like this,
<type 'list'>
<type 'list'>
<type 'str'>
<type 'str'>
<type 'str'>
<type 'str'>
<type 'list'>
<type 'str'>
<type 'list'>
But when I use
pd.DataFrame(list)
It shows that,
TypeError: Expected list, got str
Can anyone tell me what's wrong with that? Do I have to convert all items in list from string to list?
Thanks

It seems you should convert your list into a numpy array or a dict:
from pandas import DataFrame
import numpy
lst = numpy.array([['3D prototypes',
'Can print large objects',
'Autodesk Maya/Mudbox',
'3D Studio'],
['We can produce ultra high resolution 3D prints in multiple materials.',
'''The quality of our prints beats MakerBot, Form 1, or any other either
powder based or printers using PLA, ABS, Wax or Resin. This printer has
the highest resolution and a very large build size. It prints fully
functional moving parts like a chain or an engine right out of the
printer.''',
'The printer is loaded with DurusWhite.',
'''Inquire to change the material. There is a $30 surcharge for material
switch.''',
'''Also please mention your creation's dimensions in mm and if you need
expedite delivery.''',
"Printer's Net build size:",
'294 x 192 x 148.6 mm (11.57 x 7.55 x 5.85 in.)',
'''The Objet30 features four Rigid Opaque materials and one material that
mimics polypropylene. The Vero family of materials all feature dimensional
stability and high-detail visualization, and are designed to simulate
plastics that closely resemble the end product.''',
'''PolyJet based printers have a different way of working. These
technologies deliver the highest quality and precision unmatched by the
competition. These type of printers are ideal for professionals, for uses
ranging from casting jewelry to device prototyping.''',
'Rigid opaque white (VeroWhitePlus)',
'Rigid opaque black (VeroBlackPlus )',
'Rigid opaque blue (VeroBlue)',
'Rigid opaque gray (VeroGray)',
'Polypropylene-like material (DurusWhite) for snap fit applications'],
'Hub can print invoices',
'postal service',
'Mar 2015',
'Within the hour i',
[u'40.7134', u'-74.0069'],
'4',
['''Customer JAMES reviewed Sun, 2015-04-19 05:17: Awesome print!
Good quality, relatively fast shipping, and very responsive to my
questions; would certainly recommend this hub. ''',
'Hub XSENIO replied 2 days 16 hours ago: Thanks James! ',
'''Customer Sara reviewed Sun, 2015-04-19 00:10: Thank you for going
out of your way to get this to us in time for our shoot. ''',
'Hub XSENIO replied 2 days 16 hours ago: Thanks ! ',
'Customer Aaron reviewed Sat, 2015-04-18 02:36: Great service ',
'Hub XSENIO replied 2 days 16 hours ago: Thanks! ',
'''Customer Arnoldas reviewed Mon, 2015-03-23 19:47: Xsenio's Hub was
able to produce an excellent quality print , was quick and reliable.
Awesome printing experience! ''']])
df = DataFrame(lst)
print df
The above prints
0
0 [3D prototypes, Can print large objects, Autod...
1 [We can produce ultra high resolution 3D print...
2 Hub can print invoices
3 postal service
4 Mar 2015
5 Within the hour i
6 [40.7134, -74.0069]
7 4
8 [Customer JAMES reviewed Sun, 2015-04-19 05:17...
[9 rows x 1 columns]
The doc does state the data parameter should be a numpy array or dict: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html
PS: I also took the liberty of enclosing the multiline strings in triple quotes

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.