Deep Dive into Network Analysis and Social Relationships#


This case study centers around the intricate study of complex systems prevalent in both scientific and societal domains. These systems are characterized by numerous independent components, which can be effectively portrayed as networks. Within these networks, nodes symbolize indivdual components, while edges signify the interactions connecting them.

The utility of network analysis extends to a diverse range of applications, including the examination of pathogen diffusion, behavioral dynamics, and information dissemination within social networks. Moreover, the scope of network analysis expands to encompass biological scenarios, particularly at the molecular level. This entails the exploration of gene regulation networks, signal transduction networks, protein interaction networks, and other connected systems.

Homophily represents a characteristic of networks in which nodes that are adjacent in a network tend to share a certain attribute more frequently than nodes that are not neighbors. This case study focuses on examining the presence of homophily in various attributes among individuals connected within social networks located in rural India.

Task Description#


  • Analyze homophily within network structures of rural villages.

  • Investigate the impact of various characteristics (such as religion, sex, caste) on social interactions.

  • Utilize data analysis and network analysis techniques to quantify and compare observed homophily with chance homophily.

  • Explore the impactions of homophily in the context of village communities.

Task 1#


The goal is to calculate the chance homophily for a specific characteristic within the provided sample data.

  • Creating a function that takes in a dictionary and calculates marginal probability which is the frequency of occurrence of a characteristic divided by the sum of frequencies of all characteristics.

  • Creating a function that calculates the chance homophily by summing the squares of the marginal probabilities.

from collections import Counter
import numpy as np

Calculating Marginal Probability#


This function takes a dictionary named chars as input, where keys are personal IDs and values are characteristics. It calculates and returns a dictionary where characteristics are keys and their marginal probability as corresponding values.

def marginal_prob(chars):
    """ 
    Calculate the marginal probability of characteristics in a network.

    Inputs:
        chars (dict): A dictionary consisting of IDs as keys and 
        characteristics as values

    Returns 
        out_dict (dict): A dictionary where characteristics are keys 
        and their marginal probability as values
    """

    out_dict = {}
    char_list = list(chars.values())

    for node in chars:

        a_char = chars[node]
        prob = char_list.count(a_char) / len(char_list)
        out_dict[a_char] = prob

    return out_dict

Calculating Chance Homophily#


This function takes a dictionary named chars as inpur and uses the marginal_prob function to calculate the marginal probabilities of characteristics. It calculates the chance homophily for the characteristics.

def chance_homophily(chars):
    """  
    The function takes in a dictionary consisting of IDs as keys
    and characteristics as values.
    
    It calculates the chance homophily for each characteristic
    in a network.

    Inputs:
        chars (dict): A dictionary consists of IDs as keys and 
        characteristics as values

    Returns:
        float: Calculated chance homophily for the given characteristics
    """
    
    marginal_prob_dict = marginal_prob(chars)

    output_val = 0

    for char in marginal_prob_dict:

        output_val += (marginal_prob_dict[char] ** 2)

    return output_val
favorite_food = {
    "Person A":  "burger",
    "Person B": "chicken wings",
    "Person C":   "milkshake",
    "Person D": "burger",
    "Person E": "milkshake"
}


print(marginal_prob(favorite_food))
food_homophily = chance_homophily(favorite_food)
print(food_homophily)
{'burger': 0.4, 'chicken wings': 0.2, 'milkshake': 0.4}
0.3600000000000001
import os 
relative_path = "./"
data_file = os.path.join(relative_path,"individual_characteristics.csv")
import pandas as pd
df = pd.read_csv(data_file, low_memory= False, index_col=0)
df.head()
Unnamed: 1 village adjmatrix_key pid hhid resp_id resp_gend resp_status age religion ... privategovt work_outside work_outside_freq shgparticipate shg_no savings savings_no electioncard rationcard rationcard_colour
NaN 0 1 5 100201 1002 1 1 Head of Household 38 HINDUISM ... PRIVATE BUSINESS Yes 0.0 No NaN No NaN Yes Yes GREEN
NaN 1 1 6 100202 1002 2 2 Spouse of Head of Household 27 HINDUISM ... NaN NaN NaN No NaN No NaN Yes Yes GREEN
NaN 2 1 23 100601 1006 1 1 Head of Household 29 HINDUISM ... OTHER LAND No NaN No NaN No NaN Yes Yes GREEN
NaN 3 1 24 100602 1006 2 2 Spouse of Head of Household 24 HINDUISM ... PRIVATE BUSINESS No NaN Yes 1.0 Yes 1.0 Yes No NaN
NaN 4 1 27 100701 1007 1 1 Head of Household 58 HINDUISM ... OTHER LAND No NaN No NaN No NaN Yes Yes GREEN

5 rows × 49 columns

Task 2#


The individual_characteristics.csv file contains several characteristics for each individual in the dataset such as age, caste and religion.

  • Storing seperate datasets for individuals belonging to Village 1 and Village 2.

# creating the dataset for village 1
df1 = df[df['village'] == 1]
# creating the dataset for village 2
df2 = df[df['village'] == 2]

Task 3#


To help in data retrieval and analysis, dictionaries are created that map personal IDs to specific covariates for members of Village 1 and 2.

  • Defining dictionaries where personal IDs are the keys and the corresponding covariates (sex, caste, religion) are the values.

For village 1, these dictionaries are stored using the variable names sex1, caste1 and religion1. For village 2, these dictionaries are stored using the variable names sex2, caste2 and religion2.

sex1      = {}
caste1    = {}
religion1 = {}
for df1_ind in range(len(df1)):

    particular_person = df1.iloc[df1_ind,:]

    sex1[particular_person['pid']] = particular_person['resp_gend']
    caste1[particular_person['pid']] = particular_person['caste']
    religion1[particular_person['pid']] = particular_person['religion']
set(caste1.values())
{'GENERAL', 'OBC', 'SCHEDULED CASTE', 'SCHEDULED TRIBE'}
sex2 = {}
caste2 = {}
religion2 = {}
for df2_ind in range(len(df2)):

    particular_person = df2.iloc[df2_ind,:]

    sex2[particular_person['pid']] = particular_person['resp_gend']
    caste2[particular_person['pid']] = particular_person['caste']
    religion2[particular_person['pid']] = particular_person['religion']

Task 4#


Calculating the chance homophily for different characteristics within Villages 1 and 2.

  • Utilizing the chance_homophily function to calculate the chance homophily for the attributes of sex, caste and religion for both villages 1 and villages 2.

dict1 = {'Sex': sex1, 'Caste': caste1, 'Religion': religion1}

for covariate in dict1:

    chance = chance_homophily(dict1[covariate])

    print("Chance for {} in Village 1: {}".format(covariate,round(chance,3)))
Chance for Sex in Village 1: 0.503
Chance for Caste in Village 1: 0.674
Chance for Religion in Village 1: 0.98
dict2 = {'Sex': sex2, 'Caste': caste2, 'Religion': religion2}

for covariate in dict2:

    chance = chance_homophily(dict2[covariate])

    print("Chance for {} in Village 2: {}".format(covariate,round(chance,3)))
Chance for Sex in Village 2: 0.501
Chance for Caste in Village 2: 0.425
Chance for Religion in Village 2: 1.0

Task 5#


Measuring the homophily of characteristics within given network.

  • Creating a function named homophily() that takes three inputs: a network G, a dictionary of node characteristics chars and a list of node IDs IDs

  • The function keeps track of two counters: num_ties to count the total number of ties between node pairs and num_same_ties to count the number of ties where nodes share the same characteristics

  • Calculating the ratio of num_same_ties to num_ties to determine the homophily of characteristics in network G

def homophily(G, chars, IDs):
    """
    Calculate the homophily of characteristics in a network.

    Inputs:
        G (networkx.Graph): The network graph
        chars (dict): A dictionary of node characteristics for node IDs
        IDs (dict): A dictionary of node IDs for each node in the network

    Returns:
        float: The homophily of the network
    """

    # Initializing counters for same and total ties
    num_same_ties = 0
    num_ties = 0

    # Iterating through all edges in the network
    for node1, node2 in G.edges():

        # Check if both nodes have corresponding characteristics
        if IDs[node1] in chars and IDs[node2] in chars:

            # Check if an edge exits between the two nodes
            if G.has_edge(node1, node2):
                
                # Increment total ties count
                num_ties += 1

                # Check if nodes share the same characteristics
                if chars[IDs[node1]] == chars[IDs[node2]]:

                    # Increment same ties count
                    num_same_ties += 1
                    
        # Calculating and returning homophilyas the ratio of same ties to total ties
    if num_ties == 0:
        return 0.0
    else:
        return (num_same_ties / num_ties)    

Task 6#


Retrieving the personal IDs for villages 1 and 2.

  • Using the pd.read_csv function to read and store the contents of village1_id as pid1 and village2_id as pid2. These variables hold the personal IDs for villages 1 and 2.

# loading ids for village 1
village1_file = os.path.join(relative_path, 'village1_id.csv')
pid1 = pd.read_csv(village1_file, dtype=int)['0'].to_dict()

# loading ids for village 2
village2_file = os.path.join(relative_path, 'village2_id.csv')
pid2 = pd.read_csv(village2_file, dtype=int)['0'].to_dict()

Task 7#


Calculating the homophily of various network characteristics for Villages 1 and 2. The graph objects G1 and G2 represent the networks of these villages.

  • Utilizing the homophily() function to calculate the observed homophily for the characteristics of sex, caste and religion in Villages 1 and 2, printing all six resulting values

  • Employing the chance_homophily() function to calculate the chance homophily for the same characteristics

import networkx as nx 
  • Reading adjacency matrices: The code below first reads the adjacency matrices of shared characteristics for each village. These matrices represent the connections or relationships between villagers based on shared characteristics

  • Converting to numpy arrays: Coverting the read CSV data into numpy arrays

  • Creating Networkx Graphs: The numpy arrays obtained from the adjacency matrices are then converted into networkx graph objects using the nx.to_networkx_graph() function. The function takes the adjacency matrix as input and constructs a graph object where nodes represent villagers, and edges represent connections between villagers based on the shared characteristics indicated by the adjacency matrix

# Graph of village 1
village1_relations = os.path.join(relative_path, 'village1_relations.csv')
A1 = np.array(pd.read_csv(village1_relations, index_col=0))
G1 = nx.to_networkx_graph(A1)

# Graph of village 2
village2_relations = os.path.join(relative_path, 'village2_relations.csv')
A2 = np.array(pd.read_csv(village2_relations, index_col=0))
G2 = nx.to_networkx_graph(A2)
dict1 = {'Sex': sex1, 'Caste': caste1, 'Religion': religion1}

for covariate_vil1 in dict1:

    chance = chance_homophily(dict1[covariate_vil1])
    actual = homophily(G1, dict1[covariate_vil1], pid1)

    print("Chance for {} in Village 1: {}".format(covariate_vil1,chance))
    print("Observed for {} in Village 1: {}".format(covariate_vil1,actual),end="\n\n")
Chance for Sex in Village 1: 0.5027299861680701
Observed for Sex in Village 1: 0.6524390243902439

Chance for Caste in Village 1: 0.6741488509791551
Observed for Caste in Village 1: 0.7865853658536586

Chance for Religion in Village 1: 0.9804896988521925
Observed for Religion in Village 1: 0.991869918699187
G2.remove_node(877)
dict2 = {'Sex': sex2, 'Caste': caste2, 'Religion': religion2}

for covariate_vil2 in dict2:

    chance = chance_homophily(dict2[covariate_vil2])
    actual = homophily(G2, dict2[covariate_vil2], pid2)

    print("Chance for {} in Village 2: {}".format(covariate_vil2,chance))
    print("Observed for {} in Village 2: {}".format(covariate_vil2,actual),end="\n\n")
Chance for Sex in Village 2: 0.5005945303210464
Observed for Sex in Village 2: 0.5879556259904913

Chance for Caste in Village 2: 0.425368244800893
Observed for Caste in Village 2: 0.716323296354992

Chance for Religion in Village 2: 1.0
Observed for Religion in Village 2: 1.0

A higher observed homophily value compared to the chance homophily value indicates that there is a stronger tendency for nodes with the same characteristics to be connected in the network than would be expected by random chance. In other words, it suggests that there is a significant level of similarity or homogeneity in terms of the studied characteristic among connected nodes.

In the village setting, a higher observed homophily value implies that individuals within the village who share the same characteristics (such as religion or sex) are more likely to interact or be connected to each other.

  • Religion: Observed homophily being higher suggests that individuals who share the same religious belief are more likely to have social interactions or connections within the village. This could reflect a strong sense of community or social bonds among people of the same religious group.

  • Sex: Higher observed homophily could indicate individuals of the same gender tend to interact more frequently or form connections within the village. This could be due to shared activities, interests, or social norms that contribute to gender-based interactions.

These results suggest that social interactions within the village are not purely random but influenced by common characteristics. This information provides insights into social dynamics, community structures and potental factors shaping relationships within the village.