apparent.py¶

Query and interact with our US Physician Referral Network Datasette.

class apparent.apparent.Apparent(base_url=None)¶

Query and interact with our US Physician Referral Network Datasette.

The Apparent class provides a comprehensive interface for analyzing physician referral networks from the US healthcare system. It integrates network building, feature computation, comparison, clustering, and visualization capabilities.

Parameters:: base_url (str, optional) – The base URL for the Datasette instance. If not provided, will attempt to load from the APPARENT_URL environment variable.

base_url¶

The base URL for the Datasette instance.

Type:: str

data¶

The fetched data from the Datasette.

Type:: pd.DataFrame or None

kilt¶

KILT object for handling pairwise distances (future use).

Type:: object or None

networks¶

Dictionary storing network graphs with (hsa, year) as keys.

Type:: dict

distances¶

Dictionary storing pairwise distance matrices for different measures.

Type:: dict

network_ids¶

List of (hsa, year) tuples identifying unique networks.

Type:: list

physician_interactions¶

DataFrame containing physician interaction data.

Type:: pd.DataFrame

builder¶

Instance for building network graphs.

Type:: NetworkBuilder

comparator¶

Instance for comparing networks.

Type:: NetworkComparator

embedder¶

Instance for embedding networks.

Type:: NetworkEmbedder

clusterer¶

Instance for clustering networks.

Type:: NetworkClusterer

embedding¶

Low-dimensional embedding of networks.

Type:: np.ndarray

Examples

>>> import apparent
>>> app = apparent.Apparent()
>>>
>>> # Fetch data from a SQL query
>>> query = "SELECT * FROM physician_data WHERE year >= 2020"
>>> app.pull(query)
>>>
>>> # Build networks for each HSA and year
>>> app.build_networks()
>>>
>>> # Add network features
>>> app.add_features(node_features=["degree", "clustering"],
...                  edge_features=["forman_curvature"])
>>>
>>> # Compare networks and create embedding
>>> app.compare(measure="forman_curvature")
>>> app.embed()
>>>
>>> # Cluster networks
>>> app.cluster_networks()
>>>
>>> # Visualize the embedding
>>> app.plot_embedding()

__init__(base_url=None)¶

pull(sql_query)¶

Execute a SQL query against the Datasette and store the results.

Parameters:: sql_query (str) – SQL query string to execute, or path to a file containing the query. The query must include ‘hsa’ and ‘year’ columns for network identification.
Returns:: Results are stored in self.data and self.network_ids attributes.
Return type:: None

Notes

The query must return data with ‘hsa’ and ‘year’ columns, which are used to identify unique networks. The data is sorted by ‘hsa’ and ‘year’ after fetching.

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> print(app.data.head())

download_interactions()¶

Download physician interaction data for all network identifiers.

This method fetches detailed interaction data for each unique (hsa, year) combination identified in the pulled data. The interactions are downloaded in batches with a progress bar.

Returns:: Results are stored in self.physician_interactions attribute.
Return type:: None
Raises:: ValueError – If no data is available (must call pull() first).

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.download_interactions()
>>> print(f"Downloaded {len(app.physician_interactions)} interactions")

build_networks(build_method=None)¶

Build networks for each HSA (Health Service Area) in the fetched data.

For each unique combination of hsa and year, this method: - Filters the physician interaction data - Uses the selected build_method to create a network graph - Stores the resulting graphs in a dictionary with keys as (hsa, year) tuples

Parameters:: build_method (callable, optional) – A custom function that accepts the dataframe and additional keyword arguments and returns a NetworkX graph. If not provided, the default standard_build method is used.
Returns:: Results are stored in self.networks attribute and self.data is updated with a ‘Networks’ column.
Return type:: None

Notes

If physician interactions haven’t been downloaded yet, this method will automatically call download_interactions() first.

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.build_networks()
>>> print(f"Built {len(app.networks)} networks")

add_features(node_features=['degree_centrality'], edge_features=['forman_curvature'])¶

Add specified network features to the networks and update the data.

This method computes node and edge features for each network using the NetworkDescriber class and updates the network graphs with these features as attributes.

Parameters:

node_features (list, optional) – List of node features to compute. Available options include: ‘degree’, ‘clustering’, ‘betweenness’, ‘closeness’, ‘pagerank’. Default is [‘degree_centrality’].
edge_features (list, optional) – List of edge features to compute. Available options include: ‘edge_betweenness’ and any curvature measure from scott.kilt. Default is [‘forman_curvature’].

Returns:

Networks are updated in place and self.data is updated with a ‘Networks’ column containing the enhanced graphs.

Return type:

None

Raises:

ValueError – If no networks are available (must call build_networks() first).

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.build_networks()
>>> app.add_features(node_features=["degree", "clustering"],
...                  edge_features=["forman_curvature", "edge_betweenness"])

compare(measure='forman_curvature', **kwargs) → float | array¶

Compare networks using curvature-based metrics.

This method computes pairwise distances between all networks using the specified measure and stores the resulting distance matrix.

Parameters:

measure (str, optional) – The curvature measure to use for comparison. Default is ‘forman_curvature’. Must be a valid curvature measure from scott.kilt.
**kwargs (dict) – Additional keyword arguments passed to the NetworkComparator.

Returns:

Symmetric pairwise distance matrix of shape (n_networks, n_networks). Also stored in self.distances[measure].

Return type:

np.ndarray

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.build_networks()
>>> app.add_features(edge_features=["forman_curvature"])
>>> distances = app.compare(measure="forman_curvature")
>>> print(f"Distance matrix shape: {distances.shape}")

embed(measure='forman_curvature')¶

Embed networks into a lower-dimensional space using t-SNE.

This method creates a 2D embedding of the networks based on their pairwise distances using the specified measure.

Parameters:: measure (str, optional) – The curvature measure to use for embedding. Default is ‘forman_curvature’. Must match a measure used in compare().
Returns:: Results are stored in self.embedding attribute.
Return type:: None

Notes

If pairwise distances for the specified measure haven’t been computed yet, this method will automatically call compare() first.

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.build_networks()
>>> app.add_features(edge_features=["forman_curvature"])
>>> app.embed(measure="forman_curvature")
>>> print(f"Embedding shape: {app.embedding.shape}")

cluster_networks(measure='forman_curvature', clusterer=None)¶

Cluster networks based on their pairwise distances.

This method applies clustering to the networks using their pairwise distance matrix. If an embedding hasn’t been computed yet, it will be created first.

Parameters:

measure (str, optional) – The curvature measure to use for clustering. Default is ‘forman_curvature’. Must match a measure used in compare().
clusterer (object, optional) – A clustering algorithm instance. If None, uses AgglomerativeClustering with 2 clusters.

Returns:

Results are stored in self.clusterer attribute, with cluster labels available at self.clusterer.labels_.

Return type:

None

Notes

If pairwise distances or embedding for the specified measure haven’t been computed yet, this method will automatically call compare() and embed() first.

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.build_networks()
>>> app.add_features(edge_features=["forman_curvature"])
>>> app.cluster_networks(measure="forman_curvature")
>>> print(f"Cluster labels: {app.clusterer.labels_}")

plot_embedding()¶

Plot the 2D embedding of networks with cluster colors.

This method creates a scatter plot of the network embedding, with points colored by cluster membership if clustering has been performed.

Returns:: Displays the plot using matplotlib.
Return type:: None

Notes

If embedding or clustering haven’t been computed yet, this method will automatically call embed() and cluster_networks() first.

Examples

>>> app = Apparent()
>>> app.pull("SELECT * FROM physician_data WHERE year >= 2020")
>>> app.build_networks()
>>> app.add_features(edge_features=["forman_curvature"])
>>> app.plot_embedding()