How to restructure this dataframe using Python? - python

[What I am starting out withWhat I want to end up with](https://i.stack.imgur.com/xW8Zf.jpg)I am having trouble writing the code to transform this dataset into what you see below. I am a beginner and am just practicing using Python.
So far, I have tried the str.split, but it didn’t produce the results I was hoping for.

Related

How would I visually graph 2 string type data using python?

I am rather new to coding, and tutorial hell has started to show it's toll. I need help to graph data that are both strings. I have attempted transforming the data using matplotlib, and pandas. However, I seem to not be able to graph them as the ones I have used require int type data.
I have managed to group the data using df.groupby(['type', 'url']).sum()
My current goal is to get the sum (how many are in each type) of each group and graph them.
Dataset link below
Kaggle - Malicious Links
Edit: Had an Image here. Made it into a code block instead:
df = pd.read_csv('/content/malicious_phish.csv')
df
<output: csv contents>
df.shape
<output: 651191, 2>
df.groupby(['type', 'url']).sum()
<output: corrupted text in a table>
Not sure if this is any better
I have tried using len() and .sum() or .count(). I have started to read into the matplotlib and pandas library on functions and tools for me to use, and hopefully use to resolve this problem.
from collections import Counter
Counter(df['Wafer'])
To plot the dict result, the follwing link is helpful https://stackoverflow.com/a/52572237/16353662.

String Comparison in Python for harmonization

I am coming from SQL world and we are using pandas for ETL this time. We use DIFFERENCE and SOUNDEX for the string comparison. But its not giving expected results lately. Is there any way to achieve this in python?
Currently we are using code like below which will return a score for the match.
SELECT difference(soundex('string'),soundex(Col)) from table
Looking for a similar solution here. Thanks in advance

Assistance with Keras for a noise detection script

I'm currently trying to learn more about Deep learning/CNN's/Keras through what I thought would be a quite simple project of just training a CNN to detect a single specific sound. It's been a lot more of a headache than I expected.
I'm currently reading through this ignoring the second section about gpu usage, the first part definitely seems like exactly what I'm needing. But when I go to run the script, (my script is pretty much totally lifted from the section in the link above that says "Putting the pieces together, you may end up with something like this:"), it gives me this error:
AttributeError: 'DataFrame' object has no attribute 'file_path'
I can't find anything in the pandas documentation about a DataFrame.file_path function. So I'm confused as to what that part of the code is attempting to do.
My CSV file contains two columns, one with the paths and then a second column denoting the file paths as either positive or negative.
Sidenote: I'm also aware that this entire guide just may not be the thing I'm looking for. I'm having a very hard time finding any material that is useful for the specific project I'm trying to do and if anyone has any links that would be better I'd be very appreciative.
The statement df.file_path denotes that you want access the file_path column in your dataframe table. It seams that you dataframe object does not contain this column. With df.head() you can check if you dataframe object contains the needed fields.

Can I translate/duplicate an existing graph in Excel into Python?

I have many graphs in Excel that I would like to convert to Python but am struggling with how to do so using Matplotlib. Is there a package or method that would essentially convert/translate all the formatting and data series selection into python?
Once I could see a few examples of the correct code I think I could start doing this directly in python but I do not have much experience manually creating graph code (I use Excel insert graphs mostly) so am looking for a bridge.

pandas.dataframe.set_index(column1) in python to MATLAB

I want to index a table in MATLAB on a particular column. In Python we can use set_index(column_name) using pandas library. I want an equivalent code that can do in MATLAB. To be more precise I want to look at the internal code of set_index() in Python. Can someone help me?
Code in MATLAB:
T = readtable('filename.csv');
I want to set an index on T.column_name here.

Categories

Resources