how to get the ids of the row after using to_frame() - python

after using to_frame(), how do I get the id of the row.
For example:
After to_frame()
column_name
1
35
2
34
If this is the table, is there a way to get the id 1 or 2
Edit:
What I am trying to do is I got all the unique product/Item in the data and stored the data into another table named stock.
I then checked how many of the products were sold, so I used value_count.to_frame() to store the value_counts as its own column. However, when I run this code, the stock name becomes the id, which is not what I wanted. I wanted the stock name and the values to count on a separate column in the same table. Is there a way to do this?

Related

Pandas dataframe- How to count the number of distinct rows for a given ID

I have this dataframe and I want to add a column to it with the total of distinct SalesOrderId for a given CustomerId
So, with I am trying to do there would be a new column with the value 3 for all this rows.
How can I do it?
I am trying this way but I get an error
data['TotalOrders'] = data.groupby([['CustomerID','SalesOrderID']]).size().reset_index(name='count')
Try using transform:
data['TotalOrders'] = df.groupby('CustomerID')['SalesOrderID'].transform('nunique')
This will give you one entry for each entry in the group. (thanks #Rodalm)

adding values from two different rows into one using pyspark

I have two rows with the exact same data but columns changing between those two rows:
id
product
class
cost
1
table
large
5.12
1
table
medium
2.20
so I'm trying to get the following:
id
product
class
cost
1
table
large, Medium
7.32
I'm currently using the following code to get this:
df.groupBy("id", "product").agg(collect_list("class"),
(
F.sum("cost")
).alias("Sum")
The issue with this snippet code is that when doing the grouping is the first value it finds in class, and the addition doesn't seem to be correct (I'm not sure if it because is getting the first value and adding it the times it encounters class on that same id throughout the rows), so I'm getting something like this
id
product
class
cost
1
table
large, large
10.24
this is another snippet code I used, so I could get all my other fields while performing the addition on those two columns:
df.withColumn("total", F.sum("cost").over(Window.partitionBy("id")))
will it be the same to apply the F.array_join() function ?
You need to use the array_join function to join the results of collect_list with commas (,).
df = df.groupBy('id', 'product').agg(
F.array_join(F.collect_list('class'), ',').alias('class'),
F.sum('cost').alias('cost')
)

How do I check how many rows exists in one column of the same value pandas?

I have a pandas dataset of about 200k articles containing a column called Category. How do I display all the different types of articles and also count the number of rows a certain category for example "Entertainment" exists in the Category column?
To get the different Category :
df['Category'].unique()
And the following to count the number of rows using contains for the category Entertainment :
len(df[df['Category'].str.contains('Entertainment')])
Use Series.value_counts, then is possible see unique Category values in index and for count select values by Series.loc:
s = df['Category'].value_counts()
print (s.index.tolist())
print (s.loc['Entertainment'])

Pandas convert data from two tables into third table. Cross Referencing and converting unique rows to columns

I have the following tables:
Table A
listData = {'id':[1,2,3],'date':['06-05-2021','07-05-2021','17-05-2021']}
pd.DataFrame(listData,columns=['id','date'])
Table B
detailData = {'code':['D123','F268','A291','D123','F268','A291'],'id':['1','1','1','2','2','2'],'stock':[5,5,2,10,11,8]}
pd.DataFrame(detailData,columns=['code','id','stock'])
OUTPUT TABLE
output = {'code':['D123','F268','A291'],'06-05-2021':[5,5,2],'07-05-2021':[10,11,8]}
pd.DataFrame(output,columns=['code','06-05-2021','07-05-2021'])
Note: The code provided is hard coded code for the output. I need to generate the output table from Table A and Table B
Here is brief explanation of how the output table is generated if it is not self explanatory.
The id column needs to be cross reference from Table A to Table B and the dates should be put instead in Table B
Then all the unique dates in Table B should be made into columns and the corresponding stock values need to be shifted to then newly created date columns.
I am not sure where to start to do this. I am new to pandas and have only ever used it for simple data manipulation. If anyone can suggest me where to get started, it will be of great help.
Try:
tableA['id'] = tableA['id'].astype(str)
tableB.merge(tableA, on='id').pivot('code', 'date', 'stock')
Output:
date 06-05-2021 07-05-2021
code
A291 2 8
D123 5 10
F268 5 11
Details:
First, merge on id, this is like doing a SQL join. First, the
dtypes much match, hence using astype to str.
Next, reshape the dataframe using pivot to get code by date.

How to Compare 2 values in 1 table corresponding to date and create another column in dyanmo db

Suppose we have 2 rows with date and I have to compare amount with previous date and put that value in another row in dynamo db. How can I do this?
TimePeriod LinkedAccount Amount Estimated Unit
2018-07-04 711035872*** 0.7715992257 True USD
2018-07-05 7110358***** 0.7715549731 True USD
DynamoDB is a NoSQL database. There is no ability to write queries that compare one row with another row.
Instead, your application should retrieve all relevant rows and them make the comparison.
Or, retrieve one row, figure out the desired values, then call DynamoDB again with parameters to retrieve any matching row (eg for the previous date).

Categories

Resources