apparent.py¶
Query and interact with our US Physician Referral Network Datasette.
- class apparent.apparent.Apparent(base_url=None)¶
Query and interact with our US Physician Referral Network Datasette.
The Apparent class provides a comprehensive interface for analyzing physician referral networks from the US healthcare system. It integrates network building, feature computation, comparison, clustering, and visualization capabilities.
- Parameters:
base_url (str, optional) – The base URL for the Datasette instance. If not provided, will attempt to load from the APPARENT_URL environment variable.
- base_url¶
The base URL for the Datasette instance.
- Type:
str
- data¶
The fetched data from the Datasette.
- Type:
pd.DataFrame or None
- kilt¶
KILT object for handling pairwise distances (future use).
- Type:
object or None
- networks¶
Dictionary storing network graphs with (hsa, year) as keys.
- Type:
dict
- distances¶
Dictionary storing pairwise distance matrices for different measures.
- Type:
dict
- network_ids¶
List of (hsa, year) tuples identifying unique networks.
- Type:
list
- physician_interactions¶
DataFrame containing physician interaction data.
- Type:
pd.DataFrame
- builder¶
Instance for building network graphs.
- Type:
- comparator¶
Instance for comparing networks.
- Type:
- embedder¶
Instance for embedding networks.
- Type:
- clusterer¶
Instance for clustering networks.
- Type:
- embedding¶
Low-dimensional embedding of networks.
- Type:
np.ndarray
Examples
>>> import apparent >>> app = apparent.Apparent() >>> >>> # Fetch data from a SQL query >>> query = "SELECT * FROM physician_data WHERE year >= 2020" >>> app.pull(query) >>> >>> # Build networks for each HSA and year >>> app.build_networks() >>> >>> # Add network features >>> app.add_features(node_features=["degree", "clustering"], ... edge_features=["forman_curvature"]) >>> >>> # Compare networks and create embedding >>> app.compare(measure="forman_curvature") >>> app.embed() >>> >>> # Cluster networks >>> app.cluster_networks() >>> >>> # Visualize the embedding >>> app.plot_embedding()
- __init__(base_url=None)¶
- pull(sql_query)¶
Execute a SQL query against the Datasette and store the results.
- Parameters:
sql_query (str) – SQL query string to execute, or path to a file containing the query. The query must include ‘hsa’ and ‘year’ columns for network identification.
- Returns:
Results are stored in self.data and self.network_ids attributes.
- Return type:
None
Notes
The query must return data with ‘hsa’ and ‘year’ columns, which are used to identify unique networks. The data is sorted by ‘hsa’ and ‘year’ after fetching.
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> print(app.data.head())
- download_interactions()¶
Download physician interaction data for all network identifiers.
This method fetches detailed interaction data for each unique (hsa, year) combination identified in the pulled data. The interactions are downloaded in batches with a progress bar.
- Returns:
Results are stored in self.physician_interactions attribute.
- Return type:
None
- Raises:
ValueError – If no data is available (must call pull() first).
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.download_interactions() >>> print(f"Downloaded {len(app.physician_interactions)} interactions")
- build_networks(build_method=None)¶
Build networks for each HSA (Health Service Area) in the fetched data.
For each unique combination of hsa and year, this method: - Filters the physician interaction data - Uses the selected build_method to create a network graph - Stores the resulting graphs in a dictionary with keys as (hsa, year) tuples
- Parameters:
build_method (callable, optional) – A custom function that accepts the dataframe and additional keyword arguments and returns a NetworkX graph. If not provided, the default standard_build method is used.
- Returns:
Results are stored in self.networks attribute and self.data is updated with a ‘Networks’ column.
- Return type:
None
Notes
If physician interactions haven’t been downloaded yet, this method will automatically call download_interactions() first.
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.build_networks() >>> print(f"Built {len(app.networks)} networks")
- add_features(node_features=['degree_centrality'], edge_features=['forman_curvature'])¶
Add specified network features to the networks and update the data.
This method computes node and edge features for each network using the NetworkDescriber class and updates the network graphs with these features as attributes.
- Parameters:
node_features (list, optional) – List of node features to compute. Available options include: ‘degree’, ‘clustering’, ‘betweenness’, ‘closeness’, ‘pagerank’. Default is [‘degree_centrality’].
edge_features (list, optional) – List of edge features to compute. Available options include: ‘edge_betweenness’ and any curvature measure from scott.kilt. Default is [‘forman_curvature’].
- Returns:
Networks are updated in place and self.data is updated with a ‘Networks’ column containing the enhanced graphs.
- Return type:
None
- Raises:
ValueError – If no networks are available (must call build_networks() first).
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.build_networks() >>> app.add_features(node_features=["degree", "clustering"], ... edge_features=["forman_curvature", "edge_betweenness"])
- compare(measure='forman_curvature', **kwargs) float | array ¶
Compare networks using curvature-based metrics.
This method computes pairwise distances between all networks using the specified measure and stores the resulting distance matrix.
- Parameters:
measure (str, optional) – The curvature measure to use for comparison. Default is ‘forman_curvature’. Must be a valid curvature measure from scott.kilt.
**kwargs (dict) – Additional keyword arguments passed to the NetworkComparator.
- Returns:
Symmetric pairwise distance matrix of shape (n_networks, n_networks). Also stored in self.distances[measure].
- Return type:
np.ndarray
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.build_networks() >>> app.add_features(edge_features=["forman_curvature"]) >>> distances = app.compare(measure="forman_curvature") >>> print(f"Distance matrix shape: {distances.shape}")
- embed(measure='forman_curvature')¶
Embed networks into a lower-dimensional space using t-SNE.
This method creates a 2D embedding of the networks based on their pairwise distances using the specified measure.
- Parameters:
measure (str, optional) – The curvature measure to use for embedding. Default is ‘forman_curvature’. Must match a measure used in compare().
- Returns:
Results are stored in self.embedding attribute.
- Return type:
None
Notes
If pairwise distances for the specified measure haven’t been computed yet, this method will automatically call compare() first.
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.build_networks() >>> app.add_features(edge_features=["forman_curvature"]) >>> app.embed(measure="forman_curvature") >>> print(f"Embedding shape: {app.embedding.shape}")
- cluster_networks(measure='forman_curvature', clusterer=None)¶
Cluster networks based on their pairwise distances.
This method applies clustering to the networks using their pairwise distance matrix. If an embedding hasn’t been computed yet, it will be created first.
- Parameters:
measure (str, optional) – The curvature measure to use for clustering. Default is ‘forman_curvature’. Must match a measure used in compare().
clusterer (object, optional) – A clustering algorithm instance. If None, uses AgglomerativeClustering with 2 clusters.
- Returns:
Results are stored in self.clusterer attribute, with cluster labels available at self.clusterer.labels_.
- Return type:
None
Notes
If pairwise distances or embedding for the specified measure haven’t been computed yet, this method will automatically call compare() and embed() first.
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.build_networks() >>> app.add_features(edge_features=["forman_curvature"]) >>> app.cluster_networks(measure="forman_curvature") >>> print(f"Cluster labels: {app.clusterer.labels_}")
- plot_embedding()¶
Plot the 2D embedding of networks with cluster colors.
This method creates a scatter plot of the network embedding, with points colored by cluster membership if clustering has been performed.
- Returns:
Displays the plot using matplotlib.
- Return type:
None
Notes
If embedding or clustering haven’t been computed yet, this method will automatically call embed() and cluster_networks() first.
Examples
>>> app = Apparent() >>> app.pull("SELECT * FROM physician_data WHERE year >= 2020") >>> app.build_networks() >>> app.add_features(edge_features=["forman_curvature"]) >>> app.plot_embedding()