Deep Dive into Network Analysis and Social Relationships
Contents
Deep Dive into Network Analysis and Social Relationships#
This case study centers around the intricate study of complex systems prevalent in both scientific and societal domains. These systems are characterized by numerous independent components, which can be effectively portrayed as networks. Within these networks, nodes symbolize indivdual components, while edges signify the interactions connecting them.
The utility of network analysis extends to a diverse range of applications, including the examination of pathogen diffusion, behavioral dynamics, and information dissemination within social networks. Moreover, the scope of network analysis expands to encompass biological scenarios, particularly at the molecular level. This entails the exploration of gene regulation networks, signal transduction networks, protein interaction networks, and other connected systems.
Homophily represents a characteristic of networks in which nodes that are adjacent in a network tend to share a certain attribute more frequently than nodes that are not neighbors. This case study focuses on examining the presence of homophily in various attributes among individuals connected within social networks located in rural India.
Task Description#
Analyze homophily within network structures of rural villages.
Investigate the impact of various characteristics (such as religion, sex, caste) on social interactions.
Utilize data analysis and network analysis techniques to quantify and compare observed homophily with chance homophily.
Explore the impactions of homophily in the context of village communities.
Task 1#
The goal is to calculate the chance homophily for a specific characteristic within the provided sample data.
Creating a function that takes in a dictionary and calculates marginal probability which is the frequency of occurrence of a characteristic divided by the sum of frequencies of all characteristics.
Creating a function that calculates the chance homophily by summing the squares of the marginal probabilities.
from collections import Counter
import numpy as np
Calculating Marginal Probability#
This function takes a dictionary named chars
as input, where keys are personal IDs and values are characteristics. It calculates and returns a dictionary where characteristics are keys and their marginal probability as corresponding values.
def marginal_prob(chars):
"""
Calculate the marginal probability of characteristics in a network.
Inputs:
chars (dict): A dictionary consisting of IDs as keys and
characteristics as values
Returns
out_dict (dict): A dictionary where characteristics are keys
and their marginal probability as values
"""
out_dict = {}
char_list = list(chars.values())
for node in chars:
a_char = chars[node]
prob = char_list.count(a_char) / len(char_list)
out_dict[a_char] = prob
return out_dict
Calculating Chance Homophily#
This function takes a dictionary named chars
as inpur and uses the marginal_prob
function to calculate the marginal probabilities of characteristics. It calculates the chance homophily for the characteristics.
def chance_homophily(chars):
"""
The function takes in a dictionary consisting of IDs as keys
and characteristics as values.
It calculates the chance homophily for each characteristic
in a network.
Inputs:
chars (dict): A dictionary consists of IDs as keys and
characteristics as values
Returns:
float: Calculated chance homophily for the given characteristics
"""
marginal_prob_dict = marginal_prob(chars)
output_val = 0
for char in marginal_prob_dict:
output_val += (marginal_prob_dict[char] ** 2)
return output_val
favorite_food = {
"Person A": "burger",
"Person B": "chicken wings",
"Person C": "milkshake",
"Person D": "burger",
"Person E": "milkshake"
}
print(marginal_prob(favorite_food))
food_homophily = chance_homophily(favorite_food)
print(food_homophily)
{'burger': 0.4, 'chicken wings': 0.2, 'milkshake': 0.4}
0.3600000000000001
import os
relative_path = "./"
data_file = os.path.join(relative_path,"individual_characteristics.csv")
import pandas as pd
df = pd.read_csv(data_file, low_memory= False, index_col=0)
df.head()
Unnamed: 1 | village | adjmatrix_key | pid | hhid | resp_id | resp_gend | resp_status | age | religion | ... | privategovt | work_outside | work_outside_freq | shgparticipate | shg_no | savings | savings_no | electioncard | rationcard | rationcard_colour | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NaN | 0 | 1 | 5 | 100201 | 1002 | 1 | 1 | Head of Household | 38 | HINDUISM | ... | PRIVATE BUSINESS | Yes | 0.0 | No | NaN | No | NaN | Yes | Yes | GREEN |
NaN | 1 | 1 | 6 | 100202 | 1002 | 2 | 2 | Spouse of Head of Household | 27 | HINDUISM | ... | NaN | NaN | NaN | No | NaN | No | NaN | Yes | Yes | GREEN |
NaN | 2 | 1 | 23 | 100601 | 1006 | 1 | 1 | Head of Household | 29 | HINDUISM | ... | OTHER LAND | No | NaN | No | NaN | No | NaN | Yes | Yes | GREEN |
NaN | 3 | 1 | 24 | 100602 | 1006 | 2 | 2 | Spouse of Head of Household | 24 | HINDUISM | ... | PRIVATE BUSINESS | No | NaN | Yes | 1.0 | Yes | 1.0 | Yes | No | NaN |
NaN | 4 | 1 | 27 | 100701 | 1007 | 1 | 1 | Head of Household | 58 | HINDUISM | ... | OTHER LAND | No | NaN | No | NaN | No | NaN | Yes | Yes | GREEN |
5 rows × 49 columns
Task 2#
The individual_characteristics.csv
file contains several characteristics for each individual in the dataset such as age, caste and religion.
Storing seperate datasets for individuals belonging to
Village 1
andVillage 2
.
# creating the dataset for village 1
df1 = df[df['village'] == 1]
# creating the dataset for village 2
df2 = df[df['village'] == 2]
Task 3#
To help in data retrieval and analysis, dictionaries are created that map personal IDs to specific covariates for members of Village 1 and 2.
Defining dictionaries where personal IDs are the keys and the corresponding covariates (sex, caste, religion) are the values.
For village 1, these dictionaries are stored using the variable names sex1
, caste1
and religion1
. For village 2, these dictionaries are stored using the variable names sex2
, caste2
and religion2
.
sex1 = {}
caste1 = {}
religion1 = {}
for df1_ind in range(len(df1)):
particular_person = df1.iloc[df1_ind,:]
sex1[particular_person['pid']] = particular_person['resp_gend']
caste1[particular_person['pid']] = particular_person['caste']
religion1[particular_person['pid']] = particular_person['religion']
set(caste1.values())
{'GENERAL', 'OBC', 'SCHEDULED CASTE', 'SCHEDULED TRIBE'}
sex2 = {}
caste2 = {}
religion2 = {}
for df2_ind in range(len(df2)):
particular_person = df2.iloc[df2_ind,:]
sex2[particular_person['pid']] = particular_person['resp_gend']
caste2[particular_person['pid']] = particular_person['caste']
religion2[particular_person['pid']] = particular_person['religion']
Task 4#
Calculating the chance homophily for different characteristics within Villages 1 and 2.
Utilizing the
chance_homophily
function to calculate the chance homophily for the attributes of sex, caste and religion for both villages 1 and villages 2.
dict1 = {'Sex': sex1, 'Caste': caste1, 'Religion': religion1}
for covariate in dict1:
chance = chance_homophily(dict1[covariate])
print("Chance for {} in Village 1: {}".format(covariate,round(chance,3)))
Chance for Sex in Village 1: 0.503
Chance for Caste in Village 1: 0.674
Chance for Religion in Village 1: 0.98
dict2 = {'Sex': sex2, 'Caste': caste2, 'Religion': religion2}
for covariate in dict2:
chance = chance_homophily(dict2[covariate])
print("Chance for {} in Village 2: {}".format(covariate,round(chance,3)))
Chance for Sex in Village 2: 0.501
Chance for Caste in Village 2: 0.425
Chance for Religion in Village 2: 1.0
Task 5#
Measuring the homophily of characteristics within given network.
Creating a function named
homophily()
that takes three inputs: a networkG
, a dictionary of node characteristicschars
and a list of node IDsIDs
The function keeps track of two counters:
num_ties
to count the total number of ties between node pairs andnum_same_ties
to count the number of ties where nodes share the same characteristicsCalculating the ratio of
num_same_ties
tonum_ties
to determine the homophily of characteristics in networkG
def homophily(G, chars, IDs):
"""
Calculate the homophily of characteristics in a network.
Inputs:
G (networkx.Graph): The network graph
chars (dict): A dictionary of node characteristics for node IDs
IDs (dict): A dictionary of node IDs for each node in the network
Returns:
float: The homophily of the network
"""
# Initializing counters for same and total ties
num_same_ties = 0
num_ties = 0
# Iterating through all edges in the network
for node1, node2 in G.edges():
# Check if both nodes have corresponding characteristics
if IDs[node1] in chars and IDs[node2] in chars:
# Check if an edge exits between the two nodes
if G.has_edge(node1, node2):
# Increment total ties count
num_ties += 1
# Check if nodes share the same characteristics
if chars[IDs[node1]] == chars[IDs[node2]]:
# Increment same ties count
num_same_ties += 1
# Calculating and returning homophilyas the ratio of same ties to total ties
if num_ties == 0:
return 0.0
else:
return (num_same_ties / num_ties)
Task 6#
Retrieving the personal IDs for villages 1 and 2.
Using the
pd.read_csv
function to read and store the contents ofvillage1_id
aspid1
andvillage2_id
aspid2
. These variables hold the personal IDs for villages 1 and 2.
# loading ids for village 1
village1_file = os.path.join(relative_path, 'village1_id.csv')
pid1 = pd.read_csv(village1_file, dtype=int)['0'].to_dict()
# loading ids for village 2
village2_file = os.path.join(relative_path, 'village2_id.csv')
pid2 = pd.read_csv(village2_file, dtype=int)['0'].to_dict()
Task 7#
Calculating the homophily of various network characteristics for Villages 1 and 2. The graph objects G1
and G2
represent the networks of these villages.
Utilizing the
homophily()
function to calculate the observed homophily for the characteristics of sex, caste and religion in Villages 1 and 2, printing all six resulting valuesEmploying the
chance_homophily()
function to calculate the chance homophily for the same characteristics
import networkx as nx
Reading adjacency matrices: The code below first reads the adjacency matrices of shared characteristics for each village. These matrices represent the connections or relationships between villagers based on shared characteristics
Converting to numpy arrays: Coverting the read CSV data into numpy arrays
Creating Networkx Graphs: The numpy arrays obtained from the adjacency matrices are then converted into networkx graph objects using the
nx.to_networkx_graph()
function. The function takes the adjacency matrix as input and constructs a graph object where nodes represent villagers, and edges represent connections between villagers based on the shared characteristics indicated by the adjacency matrix
# Graph of village 1
village1_relations = os.path.join(relative_path, 'village1_relations.csv')
A1 = np.array(pd.read_csv(village1_relations, index_col=0))
G1 = nx.to_networkx_graph(A1)
# Graph of village 2
village2_relations = os.path.join(relative_path, 'village2_relations.csv')
A2 = np.array(pd.read_csv(village2_relations, index_col=0))
G2 = nx.to_networkx_graph(A2)
dict1 = {'Sex': sex1, 'Caste': caste1, 'Religion': religion1}
for covariate_vil1 in dict1:
chance = chance_homophily(dict1[covariate_vil1])
actual = homophily(G1, dict1[covariate_vil1], pid1)
print("Chance for {} in Village 1: {}".format(covariate_vil1,chance))
print("Observed for {} in Village 1: {}".format(covariate_vil1,actual),end="\n\n")
Chance for Sex in Village 1: 0.5027299861680701
Observed for Sex in Village 1: 0.6524390243902439
Chance for Caste in Village 1: 0.6741488509791551
Observed for Caste in Village 1: 0.7865853658536586
Chance for Religion in Village 1: 0.9804896988521925
Observed for Religion in Village 1: 0.991869918699187
G2.remove_node(877)
dict2 = {'Sex': sex2, 'Caste': caste2, 'Religion': religion2}
for covariate_vil2 in dict2:
chance = chance_homophily(dict2[covariate_vil2])
actual = homophily(G2, dict2[covariate_vil2], pid2)
print("Chance for {} in Village 2: {}".format(covariate_vil2,chance))
print("Observed for {} in Village 2: {}".format(covariate_vil2,actual),end="\n\n")
Chance for Sex in Village 2: 0.5005945303210464
Observed for Sex in Village 2: 0.5879556259904913
Chance for Caste in Village 2: 0.425368244800893
Observed for Caste in Village 2: 0.716323296354992
Chance for Religion in Village 2: 1.0
Observed for Religion in Village 2: 1.0
A higher observed homophily value compared to the chance homophily value indicates that there is a stronger tendency for nodes with the same characteristics to be connected in the network than would be expected by random chance. In other words, it suggests that there is a significant level of similarity or homogeneity in terms of the studied characteristic among connected nodes.
In the village setting, a higher observed homophily value implies that individuals within the village who share the same characteristics (such as religion or sex) are more likely to interact or be connected to each other.
Religion
: Observed homophily being higher suggests that individuals who share the same religious belief are more likely to have social interactions or connections within the village. This could reflect a strong sense of community or social bonds among people of the same religious group.Sex
: Higher observed homophily could indicate individuals of the same gender tend to interact more frequently or form connections within the village. This could be due to shared activities, interests, or social norms that contribute to gender-based interactions.
These results suggest that social interactions within the village are not purely random but influenced by common characteristics. This information provides insights into social dynamics, community structures and potental factors shaping relationships within the village.