Deep Dive into Network Analysis and Social Relationships#

This case study centers around the intricate study of complex systems prevalent in both scientific and societal domains. These systems are characterized by numerous independent components, which can be effectively portrayed as networks. Within these networks, nodes symbolize indivdual components, while edges signify the interactions connecting them.

The utility of network analysis extends to a diverse range of applications, including the examination of pathogen diffusion, behavioral dynamics, and information dissemination within social networks. Moreover, the scope of network analysis expands to encompass biological scenarios, particularly at the molecular level. This entails the exploration of gene regulation networks, signal transduction networks, protein interaction networks, and other connected systems.

Homophily represents a characteristic of networks in which nodes that are adjacent in a network tend to share a certain attribute more frequently than nodes that are not neighbors. This case study focuses on examining the presence of homophily in various attributes among individuals connected within social networks located in rural India.

Task Description#

Analyze homophily within network structures of rural villages.
Investigate the impact of various characteristics (such as religion, sex, caste) on social interactions.
Utilize data analysis and network analysis techniques to quantify and compare observed homophily with chance homophily.
Explore the impactions of homophily in the context of village communities.

Task 1#

The goal is to calculate the chance homophily for a specific characteristic within the provided sample data.

Creating a function that takes in a dictionary and calculates marginal probability which is the frequency of occurrence of a characteristic divided by the sum of frequencies of all characteristics.
Creating a function that calculates the chance homophily by summing the squares of the marginal probabilities.

from collections import Counter
import numpy as np

Calculating Marginal Probability#

This function takes a dictionary named chars as input, where keys are personal IDs and values are characteristics. It calculates and returns a dictionary where characteristics are keys and their marginal probability as corresponding values.

def marginal_prob(chars):
    """ 
    Calculate the marginal probability of characteristics in a network.

    Inputs:
        chars (dict): A dictionary consisting of IDs as keys and 
        characteristics as values

    Returns 
        out_dict (dict): A dictionary where characteristics are keys 
        and their marginal probability as values
    """

    out_dict = {}
    char_list = list(chars.values())

    for node in chars:

        a_char = chars[node]
        prob = char_list.count(a_char) / len(char_list)
        out_dict[a_char] = prob

    return out_dict

Calculating Chance Homophily#

This function takes a dictionary named chars as inpur and uses the marginal_prob function to calculate the marginal probabilities of characteristics. It calculates the chance homophily for the characteristics.

def chance_homophily(chars):
    """  
    The function takes in a dictionary consisting of IDs as keys
    and characteristics as values.
    
    It calculates the chance homophily for each characteristic
    in a network.

    Inputs:
        chars (dict): A dictionary consists of IDs as keys and 
        characteristics as values

    Returns:
        float: Calculated chance homophily for the given characteristics
    """
    
    marginal_prob_dict = marginal_prob(chars)

    output_val = 0

    for char in marginal_prob_dict:

        output_val += (marginal_prob_dict[char] ** 2)

    return output_val

favorite_food = {
    "Person A":  "burger",
    "Person B": "chicken wings",
    "Person C":   "milkshake",
    "Person D": "burger",
    "Person E": "milkshake"
}


print(marginal_prob(favorite_food))
food_homophily = chance_homophily(favorite_food)
print(food_homophily)

{'burger': 0.4, 'chicken wings': 0.2, 'milkshake': 0.4}
0.3600000000000001

import os

relative_path = "./"

data_file = os.path.join(relative_path,"individual_characteristics.csv")

import pandas as pd

df = pd.read_csv(data_file, low_memory= False, index_col=0)

df.head()

	Unnamed: 1	village	adjmatrix_key	pid	hhid	resp_id	resp_gend	resp_status	age	religion	...	privategovt	work_outside	work_outside_freq	shgparticipate	shg_no	savings	savings_no	electioncard	rationcard	rationcard_colour
NaN	0	1	5	100201	1002	1	1	Head of Household	38	HINDUISM	...	PRIVATE BUSINESS	Yes	0.0	No	NaN	No	NaN	Yes	Yes	GREEN
NaN	1	1	6	100202	1002	2	2	Spouse of Head of Household	27	HINDUISM	...	NaN	NaN	NaN	No	NaN	No	NaN	Yes	Yes	GREEN
NaN	2	1	23	100601	1006	1	1	Head of Household	29	HINDUISM	...	OTHER LAND	No	NaN	No	NaN	No	NaN	Yes	Yes	GREEN
NaN	3	1	24	100602	1006	2	2	Spouse of Head of Household	24	HINDUISM	...	PRIVATE BUSINESS	No	NaN	Yes	1.0	Yes	1.0	Yes	No	NaN
NaN	4	1	27	100701	1007	1	1	Head of Household	58	HINDUISM	...	OTHER LAND	No	NaN	No	NaN	No	NaN	Yes	Yes	GREEN

5 rows × 49 columns

Task 2#

The individual_characteristics.csv file contains several characteristics for each individual in the dataset such as age, caste and religion.

Storing seperate datasets for individuals belonging to Village 1 and Village 2.

# creating the dataset for village 1
df1 = df[df['village'] == 1]

# creating the dataset for village 2
df2 = df[df['village'] == 2]

Task 3#

To help in data retrieval and analysis, dictionaries are created that map personal IDs to specific covariates for members of Village 1 and 2.

Defining dictionaries where personal IDs are the keys and the corresponding covariates (sex, caste, religion) are the values.

For village 1, these dictionaries are stored using the variable names sex1, caste1 and religion1. For village 2, these dictionaries are stored using the variable names sex2, caste2 and religion2.

sex1      = {}
caste1    = {}
religion1 = {}

for df1_ind in range(len(df1)):

    particular_person = df1.iloc[df1_ind,:]

    sex1[particular_person['pid']] = particular_person['resp_gend']
    caste1[particular_person['pid']] = particular_person['caste']
    religion1[particular_person['pid']] = particular_person['religion']

set(caste1.values())

{'GENERAL', 'OBC', 'SCHEDULED CASTE', 'SCHEDULED TRIBE'}

sex2 = {}
caste2 = {}
religion2 = {}

for df2_ind in range(len(df2)):

    particular_person = df2.iloc[df2_ind,:]

    sex2[particular_person['pid']] = particular_person['resp_gend']
    caste2[particular_person['pid']] = particular_person['caste']
    religion2[particular_person['pid']] = particular_person['religion']

Task 4#

Calculating the chance homophily for different characteristics within Villages 1 and 2.

Utilizing the chance_homophily function to calculate the chance homophily for the attributes of sex, caste and religion for both villages 1 and villages 2.

dict1 = {'Sex': sex1, 'Caste': caste1, 'Religion': religion1}

for covariate in dict1:

    chance = chance_homophily(dict1[covariate])

    print("Chance for {} in Village 1: {}".format(covariate,round(chance,3)))

Chance for Sex in Village 1: 0.503
Chance for Caste in Village 1: 0.674
Chance for Religion in Village 1: 0.98

dict2 = {'Sex': sex2, 'Caste': caste2, 'Religion': religion2}

for covariate in dict2:

    chance = chance_homophily(dict2[covariate])

    print("Chance for {} in Village 2: {}".format(covariate,round(chance,3)))

Chance for Sex in Village 2: 0.501
Chance for Caste in Village 2: 0.425
Chance for Religion in Village 2: 1.0

Task 5#

Measuring the homophily of characteristics within given network.

Creating a function named homophily() that takes three inputs: a network G, a dictionary of node characteristics chars and a list of node IDs IDs
The function keeps track of two counters: num_ties to count the total number of ties between node pairs and num_same_ties to count the number of ties where nodes share the same characteristics
Calculating the ratio of num_same_ties to num_ties to determine the homophily of characteristics in network G

def homophily(G, chars, IDs):
    """
    Calculate the homophily of characteristics in a network.

    Inputs:
        G (networkx.Graph): The network graph
        chars (dict): A dictionary of node characteristics for node IDs
        IDs (dict): A dictionary of node IDs for each node in the network

    Returns:
        float: The homophily of the network
    """

    # Initializing counters for same and total ties
    num_same_ties = 0
    num_ties = 0

    # Iterating through all edges in the network
    for node1, node2 in G.edges():

        # Check if both nodes have corresponding characteristics
        if IDs[node1] in chars and IDs[node2] in chars:

            # Check if an edge exits between the two nodes
            if G.has_edge(node1, node2):
                
                # Increment total ties count
                num_ties += 1

                # Check if nodes share the same characteristics
                if chars[IDs[node1]] == chars[IDs[node2]]:

                    # Increment same ties count
                    num_same_ties += 1
                    
        # Calculating and returning homophilyas the ratio of same ties to total ties
    if num_ties == 0:
        return 0.0
    else:
        return (num_same_ties / num_ties)    

Task 6#

Retrieving the personal IDs for villages 1 and 2.

Using the pd.read_csv function to read and store the contents of village1_id as pid1 and village2_id as pid2. These variables hold the personal IDs for villages 1 and 2.

# loading ids for village 1
village1_file = os.path.join(relative_path, 'village1_id.csv')
pid1 = pd.read_csv(village1_file, dtype=int)['0'].to_dict()

# loading ids for village 2
village2_file = os.path.join(relative_path, 'village2_id.csv')
pid2 = pd.read_csv(village2_file, dtype=int)['0'].to_dict()

Task 7#

Calculating the homophily of various network characteristics for Villages 1 and 2. The graph objects G1 and G2 represent the networks of these villages.

Utilizing the homophily() function to calculate the observed homophily for the characteristics of sex, caste and religion in Villages 1 and 2, printing all six resulting values
Employing the chance_homophily() function to calculate the chance homophily for the same characteristics

import networkx as nx 

Reading adjacency matrices: The code below first reads the adjacency matrices of shared characteristics for each village. These matrices represent the connections or relationships between villagers based on shared characteristics
Converting to numpy arrays: Coverting the read CSV data into numpy arrays
Creating Networkx Graphs: The numpy arrays obtained from the adjacency matrices are then converted into networkx graph objects using the nx.to_networkx_graph() function. The function takes the adjacency matrix as input and constructs a graph object where nodes represent villagers, and edges represent connections between villagers based on the shared characteristics indicated by the adjacency matrix

# Graph of village 1
village1_relations = os.path.join(relative_path, 'village1_relations.csv')
A1 = np.array(pd.read_csv(village1_relations, index_col=0))
G1 = nx.to_networkx_graph(A1)

# Graph of village 2
village2_relations = os.path.join(relative_path, 'village2_relations.csv')
A2 = np.array(pd.read_csv(village2_relations, index_col=0))
G2 = nx.to_networkx_graph(A2)

dict1 = {'Sex': sex1, 'Caste': caste1, 'Religion': religion1}

for covariate_vil1 in dict1:

    chance = chance_homophily(dict1[covariate_vil1])
    actual = homophily(G1, dict1[covariate_vil1], pid1)

    print("Chance for {} in Village 1: {}".format(covariate_vil1,chance))
    print("Observed for {} in Village 1: {}".format(covariate_vil1,actual),end="\n\n")

Chance for Sex in Village 1: 0.5027299861680701
Observed for Sex in Village 1: 0.6524390243902439

Chance for Caste in Village 1: 0.6741488509791551
Observed for Caste in Village 1: 0.7865853658536586

Chance for Religion in Village 1: 0.9804896988521925
Observed for Religion in Village 1: 0.991869918699187

G2.remove_node(877)

dict2 = {'Sex': sex2, 'Caste': caste2, 'Religion': religion2}

for covariate_vil2 in dict2:

    chance = chance_homophily(dict2[covariate_vil2])
    actual = homophily(G2, dict2[covariate_vil2], pid2)

    print("Chance for {} in Village 2: {}".format(covariate_vil2,chance))
    print("Observed for {} in Village 2: {}".format(covariate_vil2,actual),end="\n\n")

Chance for Sex in Village 2: 0.5005945303210464
Observed for Sex in Village 2: 0.5879556259904913

Chance for Caste in Village 2: 0.425368244800893
Observed for Caste in Village 2: 0.716323296354992

Chance for Religion in Village 2: 1.0
Observed for Religion in Village 2: 1.0

A higher observed homophily value compared to the chance homophily value indicates that there is a stronger tendency for nodes with the same characteristics to be connected in the network than would be expected by random chance. In other words, it suggests that there is a significant level of similarity or homogeneity in terms of the studied characteristic among connected nodes.

In the village setting, a higher observed homophily value implies that individuals within the village who share the same characteristics (such as religion or sex) are more likely to interact or be connected to each other.

Religion: Observed homophily being higher suggests that individuals who share the same religious belief are more likely to have social interactions or connections within the village. This could reflect a strong sense of community or social bonds among people of the same religious group.
Sex: Higher observed homophily could indicate individuals of the same gender tend to interact more frequently or form connections within the village. This could be due to shared activities, interests, or social norms that contribute to gender-based interactions.

These results suggest that social interactions within the village are not purely random but influenced by common characteristics. This information provides insights into social dynamics, community structures and potental factors shaping relationships within the village.

Python Case Studies

Deep Dive into Network Analysis and Social Relationships

Contents