I am trying to create a bipartite graph from an excel file that looks similar to this:
xyz pqr tsu
abc -1 1 -2
def -2 -1 2
ghj 2 -1 1
For begining, I have tried the following:
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
import xlrd
import numpy as np
from numpy import genfromtxt
df = pd.read_csv (r'C:\Users\Dragos\Desktop\networkx project\proiect.csv')
G=nx.read_edgelist('proiect.csv', create_using=nx.Graph(), nodetype=str)
nx.draw(G)
plt.show()
But I keep getting the error Failed to convert edge data (['wage,carbon', 'tax,imigration,healthcare,voting,drugs,dc', 'statehood,abortion,UBI,wealthtax']) to dictionary.
Right now I'm at a loss and not sure how to proceed.
First thing is loading the pandas dataframe
pd.read_csv(path, sep=',')
See more here.
Then you need to create a new dataframe such that it follows this format.
>>> df
weight cost 0 b
0 4 7 A D
1 7 1 B A
2 10 9 C E
G=nx.from_pandas_dataframe(df, 0, 'b', ['weight', 'cost'])
Check this as well.
Related
I have a dataframe df, which has many columns. In df["house_electricity"], there are values like 1,0 or blank/NA. I want to plot the column in terms of a pie chart, where percentage of only 1 and 0 will be shown. Similarly I want to plot another pie chart where percentage of 1,0 and blank/N.A all will be there.
customer_id
house_electricity
house_refrigerator
cid01
0
0
cid02
1
na
cid03
1
cid04
1
cid05
na
0
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
For a dataframe
df = pd.DataFrame({'a':[1,0,np.nan,1,1,1,'',0,0,np.nan]})
df
a
0 1
1 0
2 NaN
3 1
4 1
5 1
6
7 0
8 0
9 NaN
The code below will give
df["a"].value_counts(dropna=False).plot(kind="pie")
If you want combine na and empty value, try replacing empty values with np.nan, then try to plot
df["a"].replace("", np.nan).value_counts(dropna=False).plot(kind="pie")
For solution you need to try with this code to generate 3 blocks.
import pandas as pd
import matplotlib.pyplot as plt
data = {'customer_id': ['cid01', 'cid02', 'cid03', 'cid04', 'cid05'],
'house_electricity': [0, 1, None, 1, None],
'house_refrigerator': [0, None, 1, None, 0]}
df = pd.DataFrame(data)
counts = df['house_electricity'].value_counts(dropna=False)
counts.plot.pie(autopct='%1.1f%%', labels=['0', '1', 'NaN'], shadow=True)
plt.title('Percentage distribution of house_electricity column')
plt.axis('equal')
plt.show()
Result:
I would like to plot a heatmap from a csv file which contains pixels position. This csv file has this shape:
0 0 8.400000e+01
1 0 8.500000e+01
2 0 8.700000e+01
3 0 8.500000e+01
4 0 9.400000e+01
5 0 7.700000e+01
6 0 8.000000e+01
7 0 8.300000e+01
8 0 8.900000e+01
9 0 8.500000e+01
10 0 8.300000e+01
I try to write some lines in Python, but it returns me an error. I guess it is the format of column 3 which contains string. Is there any way to plot this kind of file?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
path_to_csv= "/run/media/test.txt"
df= pd.read_csv(path_to_csv ,sep='\t')
plt.imshow(df,cmap='hot',interpolation='nearest')
plt.show(df)
I tried also seaborn but with no success.
Here the error returned:
TypeError: Image data of dtype object cannot be converted to float
You can set dtype=float as a keyword argument of pandas.read_csv :
df = pd.read_csv(path_to_csv, sep='\t', dtype=float)
Or use pandas.DataFrame.astype :
plt.imshow(df.astype(float), cmap='hot', interpolation='nearest', aspect='auto')
plt.show()
# Output :
The objective is to update the df rows, by considering element in the df and and reference value from external np array.
Currently, I had to use a for loop to update each row, as below.
However, I wonder whether this can be takcle using any pandas built-in module.
import pandas as pd
import numpy as np
arr=np.array([1,2,5,100,3,6,8,3,99,12,5,6,8,11,14,11,100,1,3])
arr=arr.reshape((1,-1))
df=pd.DataFrame(zip([1,7,13],[4,11,17],['a','g','t']),columns=['start','end','o'])
for n in range (len(df)):
a=df.loc[n]
drange=list(range(a['start'],a['end']+1))
darr=arr[0,drange]
r=np.where(darr==np.amax(darr))[0].item()
df.loc[n,'pos_peak']=drange[r]
Expected output
start end o pos_peak
0 1 4 a 3.0
1 7 11 g 8.0
2 13 17 t 16.0
My approach would be to use pandas apply() function with which you can apply a function to each row of your dataframe. In order to find the index of the maximum element, I used the numpy function argmax() onto the relevant part of arr. Here is the code:
import pandas as pd
import numpy as np
arr=np.array([1,2,5,100,3,6,8,3,99,12,5,6,8,11,14,11,100,1,3])
arr=arr.reshape((1,-1))
df=pd.DataFrame(zip([1,7,13],[4,11,17],['a','g','t']),columns=['start','end','o'])
df['pos_peak'] = df.apply(lambda x: x['start'] + np.argmax(arr[0][x['start']:x['end']+1]), axis=1)
df
Output:
start end o pos_peak
0 1 4 a 3
1 7 11 g 8
2 13 17 t 16
I am reading a CSV file into variable called 'data' as follows in Jupyter Notebook using pandas
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv("C:/Users/hp/Desktop/dv project/googleplaystorecleaned.csv")
I tried to modify the 'Size' column of the data set to remove the character 'M' and 'k' using the following code
for i in range(len(data['Size'])):
data['Size'][i]=str(data['Size'][i])
data['Size'][i]=data['Size'][i].replace('M','')
data['Size'][i]=data['Size'][i].replace('k','')
data['Size'][i]=data['Size'][i].replace('Varies with device','')
data['Size'][i]=float(data['Size'][i])
print(data['Size'])
The code seems to work only partially on the data set as i am getting the following output
0 19
1 14
2 8.7
3 25
4 2.8
...
10836 53M
10837 3.6M
10838 9.5M
10839 Varies with device
10840 19M
Name: Size, Length: 10829, dtype: object
Please tell a proper way to do so.
I created an example dataframe to show the result :
df = pd.DataFrame({'A': [1,2,1], 'B': [3,4,3], 'Size': ['Ma2','kb3','3l Varies with device po']})
for i, v in enumerate(df['Size'].values):
v = v.replace('M', '')
v = v.replace('k', '')
v = v.replace('Varies with device', '')
df['Size'].values[i] = v
print(df)
Before :
A B Size
0 1 3 Mfoobar1
1 2 4 kfoobar2
2 1 3 Varies with devicefoobar3
After :
A B Size
0 1 3 foobar1
1 2 4 foobar2
2 1 3 foobar3
Hi you can also try this:
import pandas as pd
list1= ['20M','9M','10K','10']
dataframe1=pd.DataFrame(data=list1,columns=['Size'])
for i, s in enumerate(dataframe1['Size']):
if s[len(s)-1]=='M':
dataframe1['Col1'][i]=dataframe1['Size'][i].replace('M',"")
if s[len(s)-1]=='K':
dataframe1['Col1'][i]=dataframe1['Size'][i].replace('K',"")
dataframe1
You will get your expected output.
Note: You can add if condition according to your requirements
I want to shuffle a pandas dataframe 'n' times and save the shuffled dataframe with a new name and then export it to a 'csv' file. What I mean is-
import pandas as pd
import sklearn
import numpy as np
from sklearn.utils import shuffle
df = pd.read_csv('example.csv')
Then something like this-
for i in np.arange(n):
df_%i = shuffle(df)
df_%i.to_csv('example.csv')
I appreciate any help. Thanks!
You can use
for i in range(n):
df.sample(frac= 1).to_csv(f"example_{i}.csv")
If you need to create an arbitrary number of variables, you should store them in a dictionary and you can reference them later by their keys; in this case the integer you loop over.
d = {}
for i in range(n):
d[i] = df.sample(frac=1) #d[i] = shuffle(df) in your case
d[i].to_csv(f'example_{i}.csv')
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 10, (3, 3)))
d = {}
for i in range(5):
d[i] = df.sample(frac=1)
d[1]
# 0 1 2
#0 6 3 2
#1 7 6 4
#2 2 6 9
d[2]
# 0 1 2
#2 2 6 9
#1 7 6 4
#0 6 3 2