List items to pandas columns

List items to pandas columns - python

I have list with 4 urls:
['https://cache.wihaben.at/mmo/6/297/469/806_-1094197631.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_-455156804.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_466214286.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_1475201828.jpg']
and I want to build Pandas dataframe which should have Image_1, Image_2, Image_3andImage_4 as column names and URLs as row values.
My code:
advert_images = {('Image_1', eval(advert_image_list[0])),
('Image_2', eval(advert_image_list[1])),
('Image_3', eval(advert_image_list[2])),
('Image_4', eval(advert_image_list[3])),
}
adIm_DF = pd.DataFrame(advert_images)
is returning error:
File "", line 1
https://cache.wihaben.at/mmo/6/297/469/806_-1094197631.jpg
^ SyntaxError: invalid syntax
Evaluation is stuck on ":" in URL because it's probably parsing it as dict.
I also need option to itterate over n-number of URLs in list and build coresponding columns with values.
Columns being Image_(iterator_value), row being URL value.

If the URls are stored as a string (as #Tox pointed out) I have no problem with the code:
url_list = ['https://cache.wihaben.at/mmo/6/297/469/806_-1094197631.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_-455156804.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_466214286.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_1475201828.jpg']
im_labels = ['Image_{}'.format(x) for x in np.arange(1, len(url_list) ,1)]
im_df = pd.DataFrame([url_list], columns=im_labels)

You should make a string of the url.
str((advert_image_list[0])

I think you are confusing the use of eval. It is used to run code that is saved in a string. In your example python tries to run the url as code, which will obviously not work. You will not need eval.
Try this:
advert_image_list = ['https://cache.willhaben.at/mmo/6/297/469/806_-1094197631.jpg', 'https://cache.willhaben.at/mmo/6/297/469/806_-455156804.jpg', 'https://cache.willhaben.at/mmo/6/297/469/806_466214286.jpg', 'https://cache.willhaben.at/mmo/6/297/469/806_1475201828.jpg']
advert_images = [('Image_1', advert_image_list[0]),
('Image_2', advert_image_list[1]),
('Image_3', advert_image_list[2]),
('Image_4', advert_image_list[3])]
adIm_DF = pd.DataFrame(advert_images).set_index(0).T

this works for me
df = pd.DataFrame(columns=['Image1','Image2','Image3','Image4'])
df.loc[0] = ['https://cache.wihaben.at/mmo/6/297/469/806_-1094197631.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_-455156804.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_466214286.jpg', 'https://cache.wihaben.at/mmo/6/297/469/806_1475201828.jpg']

Related

How can I take the string of names and preferences & add them to the dictionary with names as the key?

My code as it is right now looks like this:
def read_in_movie_preference():
"""Read the move data, and return a
preference dictionary."""
preference = {}
movies = []
# write code here:
file_location="./data/"
f = open(file_location+"preference.csv","r")
df = f.readlines()
#names as keys and prefrences
for line in df:
name = line[1].strip("\n").split(",")
prefs = line[2:].strip("\n").split(",")
preference[line[1]] = line[2:]
#print(test)
#movie names`
movietitles = df[0].strip("\n").split(",")
for movie in movietitles:
movie=movie.rstrip()
#can't seem to get rid of the spaces at the end
movies+=movietitles[2:]
print(movies)
return [movies, preference]
I cant seem to get the movie titles into the list without spaces at the end of some of them & I also cant add the names and preferences into the dictionary... I am supposed to do this task with basic python and no pandas .. very stuck would appreciate any help!
the dictionary would have names as keys and the preference numbers in number format instead of strings so it would theoretically look like this:
key: pref:
dennis, 0 1 0 1 0 ect
[![enter image description here][1]][1]this is what the data set looks like
here is the data pasted:

So the issue here is that you are using rstrip on a copy of the data but never apply it to the original.
The issue
for movie in movietitles:
movie=movie.rstrip() # Changes the (copy) of the data rather than the original
# We still need to apply this back to movietitles
There are a couple ways to fix this!
# Using indexing
for _ in range(len(movietitles)):
movietitles[_] = movietitles[_].rstrip()
Or we can do this inline with list comprehension
# Using list comprehension
movietitles = [movie.rstrip() for movie in movietitles]
As stated in the other answer, when working with csv data it's recomended to use a csv parser, but completely unnecessary for this scale! Hope this helps

Pandas - Can't change datatype of dataframe columns

Downloading some data from here:
http://insideairbnb.com/get-the-data.html
Then
listings = pd.read_csv('listings.csv')
Trying to change types
listings.bathrooms = listings.bathrooms.astype('int64',errors='ignore')
listings.bedrooms = listings.bedrooms.astype('int64',errors='ignore')
listings.beds = listings.beds.astype('int64',errors='ignore')
listings.price = listings.price.replace('[\$,]','',regex=True).astype('float')
listings.price = listings.price.astype('int64',errors='ignore')
Tried some other combinations but at the end pops error or just doesn't change datatype.
EDIT: corrected some typos

The apostrophes in the last line is not in the correct place and the last one is not the correct type: you need ' instead of ` (maybe it was accidentaly added because of the code block).
So for me it works like this:
listings.price.astype('int64', errors='ignore')
But if you would like to reassign it to the original variable then you need the same structure as you used in the previous lines:
listings.price = listings.price.astype('int64', errors='ignore')

Update values in new column

I want to run a package(RAKE) to extract keyphrases from comments(df['CUSTOMER_RECOMMENDATIONS_TRANS]) and create a new column(df['keyphrase_RAKE']) to store them corresponding to each comment. I'm getting an error saying "ValueError: Length of values does not match the length of index".
I know the reason behind the error but don't know how to fix it. What can be done?
keywords return a list of keyphrases.
This the code:
import RAKE
import operator
# Reka setup with stopword directory
stop_dir = "SmartStoplist.txt"
rake_object = RAKE.Rake(stop_dir)
# Sample text to test RAKE
df = pd.read_excel('my.xlsx')
for i in df['CUSTOMER_RECOMMENDATIONS_TRANS']:
keywords = rake_object.run(i)
df['keyphrase_RAKE'] = keywords

you can usepandas.DataFrame.apply and avoid the for loop
df['keyphrase_RAKE'] = df['CUSTOMER_RECOMMENDATIONS_TRANS'].apply(rake_object.run)

How do I present my output as a Pandas dataframe?

CHECK_OUTPUT_HERE
Currently, the output I am getting is in the string format. I am not sure how to convert that string to a pandas dataframe.
I am getting 3 different tables in my output. It is in a string format.
One of the following 2 solutions will work for me:
Convert that string output to 3 different dataframes. OR
Change something in the function so that I get the output as 3 different data frames.
I have tried using RegEx to convert the string output to a dataframe but it won't work in my case since I want my output to be dynamic. It should work if I give another input.
def column_ch(self, sample_count=10):
report = render("header.txt")
match_stats = []
match_sample = []
any_mismatch = False
for column in self.column_stats:
if not column["all_match"]:
any_mismatch = True
match_stats.append(
{
"Column": column["column"],
"{} dtype".format(self.df1_name): column["dtype1"],
"{} dtype".format(self.df2_name): column["dtype2"],
"# Unequal": column["unequal_cnt"],
"Max Diff": column["max_diff"],
"# Null Diff": column["null_diff"],
}
)
if column["unequal_cnt"] > 0:
match_sample.append(
self.sample_mismatch(column["column"], sample_count, for_display=True)
)
if any_mismatch:
for sample in match_sample:
report += sample.to_string()
report += "\n\n"
print("type is", type(report))
return report

Since you have a string, you can pass your string into a file-like buffer and then read it with pandas read_csv into a dataframe.
Assuming that your string with the dataframe is called dfstring, the code would look like this:
import io
bufdf = io.StringIO(dfstring)
df = pd.read_csv(bufdf, sep=???)
If your string contains multiple dataframes, split it with split and use a loop.
import io
dflist = []
for sdf in dfstring.split('\n\n'): ##this seems the separator between two dataframes
bufdf = io.StringIO(sdf)
dflist.append(pd.read_csv(bufdf, sep=???))
Be careful to pass an appropriate sep parameter, my ??? means that I am not able to understand what could be a proper parameter. Your field are separated by spaces, so you could use sep='\s+') but I see that you have also spaces which are not meant to be a separator, so this may cause a parsing error.
sep accept regex, so to have 2 consecutive spaces as a separator, you could do: sep='\s\s+' (this will require an additional parameter engine='python'). But again, be sure that you have at least 2 spaces between two consecutive fields.
See here for reference about the io module and StringIO.
Note that the io module exists in python3 but not in python2 (it has another name) but since the latest pandas versions require python3, I guess you are using python3.

Splitting json data in python

I'm trying to manipulate a list of items in python but im getting the error "AttributeError: 'list' object has no attribute 'split'"
I understand that list does not understand .split but i don't know what else to do. Below is a copy paste of the relevant part of my code.
tourl = 'http://data.bitcoinity.org/chart_data'
tovalues = {'timespan':'24h','resolution':'hour','currency':'USD','exchange':'all','mining_pool':'all','compare':'no','data_type':'price_volume','chart_type':'line_bar','smoothing':'linear','chart_types':'ccacdfcdaa'}
todata = urllib.urlencode(tovalues)
toreq = urllib2.Request(tourl, todata)
tores = urllib2.urlopen(toreq)
tores2 = tores.read()
tos = json.loads(tores2)
tola = tos["data"]
for item in tola:
ting = item.get("values")
ting.split(',')[2] <-----ERROR
print(ting)
To understand what i'm trying to do you will also need to see the json data. Ting outputs this:
[
[1379955600000L, 123.107310846774], [1379959200000L, 124.092526428571],
[1379962800000L, 125.539504822835], [1379966400000L, 126.27024617931],
[1379970000000L, 126.723474983766], [1379973600000L, 126.242406356837],
[1379977200000L, 124.788410570987], [1379980800000L, 126.810084904632],
[1379984400000L, 128.270580796748], [1379988000000L, 127.892411269036],
[1379991600000L, 126.140579640523], [1379995200000L, 126.513705084746],
[1379998800000L, 128.695124951923], [1380002400000L, 128.709738051044],
[1380006000000L, 125.987767097378], [1380009600000L, 124.323433535528],
[1380013200000L, 123.359378559603], [1380016800000L, 125.963250678733],
[1380020400000L, 125.074618194444], [1380024000000L, 124.656345088853],
[1380027600000L, 122.411303435449], [1380031200000L, 124.145747100372],
[1380034800000L, 124.359452274881], [1380038400000L, 122.815357211394],
[1380042000000L, 123.057706915888]
]
[
[1379955600000L, 536.4739135], [1379959200000L, 1235.42506637],
[1379962800000L, 763.16329656], [1379966400000L, 804.04579319],
[1379970000000L, 634.84689741], [1379973600000L, 753.52716718],
[1379977200000L, 506.90632968], [1379980800000L, 494.473732950001],
[1379984400000L, 437.02095093], [1379988000000L, 176.25405034],
[1379991600000L, 319.80432715], [1379995200000L, 206.87212398],
[1379998800000L, 638.47226435], [1380002400000L, 438.18036666],
[1380006000000L, 512.68490443], [1380009600000L, 904.603705539997],
[1380013200000L, 491.408088450001], [1380016800000L, 670.275397960001],
[1380020400000L, 767.166941339999], [1380024000000L, 899.976089609997],
[1380027600000L, 1243.64963909], [1380031200000L, 1508.82429811],
[1380034800000L, 1190.18854705], [1380038400000L, 546.504592349999],
[1380042000000L, 206.84883264]
]
And ting[0] outputs this:
[1379955600000L, 123.187067936508]
[1379955600000L, 536.794013499999]
What i'm really trying to do is add up the values from ting[0-24] that comes AFTER the second comma. This made me try to do a split but that does not work

You already have a list; the commas are put there by Python to delimit the values only when printing the list.
Just access element 2 directly:
print ting[2]
This prints:
[1379962800000, 125.539504822835]
Each of the entries in item['values'] (so ting) is a list of two float values, so you can address each of those with index 0 and 1:
>>> print ting[2][0]
1379962800000
>>> print ting[2][1]
125.539504822835
To get a list of all the second values, you could use a list comprehension:
second_vals = [t[1] for t in ting]

When you load the data with json.loads, it is already parsed into a real list that you can slice and index as normal. If you want the data starting with the third element, just use ting[2:]. (If you just want the third element by itself, just use ting[2].)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

List items to pandas columns - python

You should make a string of the url. str((advert_image_list[0])

Related

How can I take the string of names and preferences & add them to the dictionary with names as the key?

Pandas - Can't change datatype of dataframe columns

Update values in new column

How do I present my output as a Pandas dataframe?

Splitting json data in python

Categories

Resources