string indices must be integers pandas dataframe - python

I am pretty new in data science. I am trying to deal DataFrame data inside a list. I have read the almost every post about string indices must be integers, but it did not help at all.
My DataFrame looks like this:
And the my list look like this
myList -> [0098b710-3259-4794-9075-3c83fc1ba058 1.561642e+09 32.775882 39.897459],
[0098b710-3259-4794-9075-3c83fc1ba057 1.561642e+09 32.775882 39.897459],
and goes on...
This is the Data in case you need to reproduce something guys.
I need to access the list items(dataframes) one by one, then I need to split dataframe if the difference between two timestamps greater than 60000
I wrote code this, but it gives an error, whenever I tried to access timestamp. Can you guys help with the problem
mycode:
a = []
for i in range(0,len(data_one_user)):
x = data_one_user[i]
x['label'] = (x['timestamp'] - x['timestamp'].shift(1))
x['trip'] = np.where(x['label'] > 60000, True, False)
x = x.drop('label', axis=1)
x['trip'] = np.where(x['trip'] == True, a.append(x) , a.extend(x))
#a = a.drop('trip', axis=1)
x = a
Edit: If you wonder the object types
data_one_user -> list
data_one_user[0] = x -> pandas. core.frame.DataFrame
data_one_user[0]['timestamp'] = x['timestamp'] -> pandas.core.series.Series
Edit2: I added the error print out
Edit3: Output of x

I found the problem that causes the error. At the end of the list, labels are repeated.

Related

Print Pandas without dtype

I've read a few other posts about this but the other solutions haven't worked for me. I'm trying to look at 2 different CSV files and compare data from 1 column from each file. Here's what I have so far:
import pandas as pd
import numpy as np
dataBI = pd.read_csv("U:/eu_inventory/EO BI Orders.csv")
dataOrderTrimmed = dataBI.iloc[:,1:2].values
dataVA05 = pd.read_csv("U:\eu_inventory\VA05_Export.csv")
dataVAOrder = dataVA05.iloc[:,1:2].values
dataVAList = []
ordersInBoth = []
ordersInBI = []
ordersInVA = []
for order in np.nditer(dataOrderTrimmed):
if order in dataVAOrder:
ordersInBoth.append(order)
else:
ordersInBI.append(order)
So if the order number from dataOrderTrimmed is also in dataVAOrder I want to add it to ordersInBoth, otherwise I want to add it to ordersInBI. I think it splits the information correctly but if I try to print ordersInBoth each item prints as array(5555555, dtype=int64) I want to have a list of the order numbers not as an array and not including the dtype information. Let me know if you need more information or if the way I've typed it out is confusing. Thanks!
The way you're using .iloc is giving you a DataFrame, which becomes 2D array when you access values. If you just want the values in the column at index 1, then you should just say:
dataOrderTrimmed = dataBI.iloc[:, 1].values
Then you can iterate over dataOrderTrimmed directly (i.e. you don't need nditer), and you will get regular scalar values.

Input values to a List

When I try to do the following, the subsequent error occurs.
ranges = []
a_values= []
b_values= []
for x in params:
a= min(fifa[params][x])
a= a - (a*.25)
b = max(fifa[params][x])
b = b + (b*.25)
ranges.append((a,b))
for x in range(len(fifa['short_name'])):
if fifa['short_name'][x]=='Nunez':
a_values = df.iloc[x].values.tolist()
Error Description
What does it mean? How do I solve this?
Thank you in advance
The problem is on this line:
if fifa['short_name'][x]=='Nunez':
fifa['short_name'] is a Series;
fifa['short_name'][x] tries to index that series with x;
your code doesn't show it, but the stack trace suggests x is some scalar type;
pandas tries to look up x in the index of fifa['short_name'], and it's not there, resulting in the error.
Since the Series will share the index of the dataframe fifa, this means that the index x isn't in the dataframe. And it probably isn't, because you let x range from 0 upto (but not including) len(fifa).
What is the index of your dataframe? You didn't include the definition of params, nor that of fifa, but your problem is most likely in the latter, or you should loop over the dataframe differently, by looping over its index instead of just integers.
However, there's more efficient ways to do what you're trying to do generally in pandas - you should just include some definition of the dataframe to allow people to show you the correct one.

how to create multiple variables with similar name in for loop?

I had a problem with for loops earlier, and it was solved thanks to #mak4515, however, there is something else I want to accomplish
# Use pandas to read in csv file
data_df_0 = pd.read_csv('puget_sound_ctd.csv')
#create data subsets based on specific buoy coordinates
data_df_1 = pd.read_csv('puget_sound_ctd.csv', skiprows=range(9,114))
data_df_2 = pd.read_csv('puget_sound_ctd.csv', skiprows=([i for i in range(1, 8)] + [j for j in range(21, 114)]))
for x in range(0,2):
for df in [data_df_0, data_df_2]:
lon_(x) = df['longitude']
lat_(x) = df['latitude']
This is my current code, I want to have it have it so that it reads the different data sets and creates different values based on the data set it is reading. However, when I run the code this way I get this error
File "<ipython-input-66-446aebc48604>", line 21
lon_(x) = df['longitude']
^
SyntaxError: can't assign to function call
What does "can't assign to function call" mean, and how do I fix this?
I think the comment by #Chris is probably a good way to go. I wanted to point out that since you're already using pandas dataframes, an easier way might be to make a column corresponding to the original dataframe then concatenate them.
import pandas as pd
data_df_0 = pd.DataFrame({'longitude':range(-125,-120,1),'latitude':range(45,50,1)})
data_df_0['dfi'] = 0
data_df_2 = pd.DataFrame({'longitude':range(-120,-125,-1),'latitude':range(50,45,-1),'dfi':[2]*5})
data_df_2['dfi'] = 2
df = pd.concat([data_df_0,data_df_2])
Then, you could acess data from the original frames like this:
df.loc[2]

Remove specific elements from a numpy array

I have an np.array I would like to remove specific elements from based on the element "name" and not the index. Is this sort of thing possible with np.delete() ?
Namely my original ndarray is
textcodes= data['CODES'].unique()
which captures unique text codes given the quarter.
Specifically I want to remove certain codes which I need to run through a separate process and put them into a separate ndarray
sep_list = np.array(['SPCFC_CODE_1','SPCFC_CODE_2','SPCFC_CODE_3','SPCFC_CODE_4])
I have trouble finding a solution on removing these specific codes in sep_list from textcodes because I don't know exactly where these sep_list codes would be indexed as it would be different each quarter and I would like to automate it based on the specific names instead because those will always be the same.
Any help is greatly appreciated. Thank you.
You should be able to do something like:
import numpy as np
data = [3,2,1,0,10,5]
bad_list = [1, 2]
data = np.asarray(data)
new_list = np.asarray([x for x in data if x not in bad_list])
print("BAD")
print(data)
print("GOOD")
print(new_list)
Yields:
BAD
[ 3 2 1 0 10 5]
GOOD
[ 3 0 10 5]
It is impossible to tell for sure since you did not provide sample data, but the following implementation using your variables should work:
import numpy as np
textcodes= data['CODES'].unique()
sep_list = np.array(['SPCFC_CODE_1','SPCFC_CODE_2','SPCFC_CODE_3','SPCFC_CODE_4'])
final_list = [x for x in textcodes if x not in sep_list]

Python, manipulating dataframes

Department = input("what dept")
editfile = pd.read_csv('52.csv', encoding='Latin-1')
editfilevalues= editfile.loc[editfile['Customer'].str.contains(Department, na=False), 'May-18\nQty']
editfilevalues = editfilevalues.fillna(int(0))
print(int(editfilevalues) *1.3)
I have looked through stackoverflow and no answer seems to help me this problem. I simply want to be able to manipulate data in a series like this but I get different errors, with this current code I receive this:
"{0}".format(str(converter))) TypeError: cannot convert the series to <class 'int'>
My main issue is converting a series to an int type, I have tried several different ways to do this and none are giving me the results
So a pandas Series is a bit like a list, but with different functions and properties. You can't convert the Series to int using int() because the function wasn't designed to work on list-like objects in that way.
If you need to convert the Series to all integers, this method will work.
int_series = your_series.astype(int)
This will get your entire series as 'int32' specifically. Below is a bonus if you want it in a numpy array.
int_array = your_series.values.astype(int)
From here you have a few options to do your calculation.
# where x is a value in your series and lambda is a nameless function
calculated_series = int_series.apply(lambda x: some_number*x)
The output will be another Series object with your rows calculated. Bonus using numpy array below.
calculated_array = int_array * some_number
Edit to show everything at once.
# for series
int_series = your_series.astype(int)
calculated_series = int_series.apply(lambda x: x * some_number)
# for np.array
int_array = your_series.values.astype(int)
calculated_array = int_array * some_number
Either will work, and it is ultimately up to what kind of data structure you want at the end of it all.

Categories

Resources