This project is a Proof of Concept (PoC) using Google Cloud's Healthcare Natural Language API. The goal is to test the API's ability to extract medical entities and relationships from unstructured clinical notes, leveraging pre-trained natural language models.
The Healthcare Natural Language API is part of the Google Cloud Healthcare API. It uses natural language processing (NLP) models to extract healthcare-related information from medical text.
The API can identify and extract:
- 🏥 Medical concepts such as medications, procedures, and health conditions.
- 📅 Functional attributes like temporal relationships, subjects, and certainty assessments.
- 🔗 Relationships between entities, such as side effects or drug dosages.
In this tutorial, we will focus on the following function:
- Entity Analysis: The
analyzeEntitiesmethod inspects medical text to detect and return medical concepts and their relationships.
Before running this PoC, ensure you have completed the following steps:
- ✅ Google Cloud account: You must have a Google Cloud account set up.
- 🌐 Enable APIs: Ensure the Cloud Healthcare API and Healthcare Natural Language API are enabled.
- 🛠️ Install Google Cloud CLI (gcloud): Download and install the Google Cloud CLI.
- 📄 Create a
.envfile: This file will store the necessary environment variables.
To configure the environment variables, create a .env file in your project directory and add the following content:
PROJECT_ID = "project_name"
LOCATION = "location_name"
TOKEN = "token_value"
Follow these steps to authenticate and get your access token:
- Authenticate with Google Cloud by running the following command in your terminal:
gcloud auth login
- Get the access token by running:
gcloud auth print-access-token
Copy the token value and paste it into the .env file under the TOKEN variable.
This project includes an interactive Streamlit dashboard for visualizing and analyzing the Healthcare API results.
The dashboard provides three main sections:
- Compact metrics cards showing file info, total entities, unique types, and texts
- Interactive confidence filtering to filter entities by subject confidence
- Visual charts for entity type distribution, subject analysis, and temporal assessment
- Filterable data table with all entity details
- Export functionality to download results as CSV
- Advanced filtering by type, subject, and confidence levels
- Original medical note with color-coded entity highlighting
- Dynamic filtering to show/hide specific entity types
- Interactive legend showing colors for each entity type
- Real-time highlighting based on selected filters
The data/ folder includes:
note_es.txt- Sample Spanish medical note (fictitious)entities_*.json- Example API response with extracted entities
- Install dependencies:
cd streamlit
pip install -r requirements.txt- Run the application:
streamlit run app.py- Access the dashboard:
Open your browser and navigate to
http://localhost:8501
streamlit/
├── app.py # Main Streamlit application
├── google_api.py # Google Healthcare API service
├── data_etl.py # Data transformation functions
├── requirements.txt # Python dependencies
└── README.md # Setup instructions
- Test entity extraction from clinical notes.
- Evaluate how well the API identifies medical concepts and maps relationships.
- Understand the output format and how to integrate the extracted data into a larger cloud architecture.
- Visualize and analyze results through an interactive dashboard.
- Set up Google Cloud Healthcare API.
- Enable the Healthcare Natural Language API.
- Implement entity analysis using the
analyzeEntitiesmethod. - Analyze the results and review entity mapping.
- Use the Streamlit dashboard to explore and visualize the extracted entities.



