The Spezi Data Pipeline offers a comprehensive suite of tools designed to facilitate the management, analysis, and visualization of healthcare data from Firebase Firestore. By adhering to the Fast Healthcare Interoperability Resources (FHIR) standards, this platform ensures that data handling remains robust, standardized, and interoperable across different systems and software.
The Spezi Data Pipeline is engineered to improve workflows associated with data accessibility and analysis in healthcare environments. It supports 82 HKQuantityTypes and ECG data recordings and is capable of performing functions such as selection, storage, downloading, basic filtering, statistical analysis, and graphical representation of data. By facilitating the structured export of data from Firebase and incorporating FHIR standards, the pipeline enhances interoperability and streamlines data operations.
The Spezi Data Pipeline is organized into several directories, each serving a specific function as part of the overall application. This guide will walk you through the package structure, highlighting the key components and their usage based on your needs and challenges.
data_access/
FirebaseFHIRAccess
ResourceCreator
data_flattening/
ResourceFlattener
data_processing/
FHIRDataProcessor
CodeProcessor
data_exploration/
DataExplorer
ECGExplorer
data_export/
DataExporter
FirebaseFHIRAccess
to connect and fetch data.ResourceCreator
and its subclasses to convert Firestore documents to FHIR resources.ResourceFlattener
and its specific implementations to transform data into flat DataFrames
.DataExplorer
and ECGExplorer
, and QuestionnaireResponseExplorer
to create visualizations and explore your data.DataExporter
to save processed data and plots.Required Python packages are included in the requirements.txt file and are outlined in the list below:
You can install all required external packages using pip by running the following command in your terminal:
pip install -r requirements.txt
To interact with Firebase services like Firestore or the Realtime Database, ensure your Firebase project is configured correctly and possesses the necessary credentials file (usually a .JSON file).
Visit the “Project settings” in your Firebase project, navigate to the “Service accounts” tab, and generate a new private key by clicking on “Generate new private key.” Upon confirmation, the key will be downloaded to your system.
This .JSON file contains your service account credentials and is used to authenticate your application with Firebase.
# Path to the Firebase service account key file
serviceAccountKey_file = "path/to/your/serviceAccountKey.json"
# Firebase project ID
project_id = "projectId"
# Collection details within Firebase Firestore. Replace with the collection names in your project.
collection_name = "users"
subcollection_name = "HealthKit"
[!NOTE]
- Replace “path/to/your/serviceAccountKey.json” with the actual path to the .JSON file you downloaded earlier.
- The “projectId” is your Firebase project ID, which you can find in your Firebase project settings.
# Initialize and connect to Firebase using FHIR standards
firebase_access = FirebaseFHIRAccess(project_id, service_account_key_file)
firebase_access.connect()
In this example, we will demonstrate how we can perform Firestore query to download step counts (LOINC code: 55423-8) and heart rate (LOINC code: 8867-4) data, and, subsequently, to flatten them in a more readable and convenient tabular format.
# Select the LOINC codes for the HealthKit quantities to perform a Firebase query
loinc_codes = ["55423-8", "8867-4"]
# Fetch and flatten FHIR data
fhir_observations = firebase_access.fetch_data(collection_name, subcollection_name, loinc_codes)
flattened_fhir_dataframe = flatten_fhir_resources(fhir_observations)
[!NOTE]
- If loinc_codes are omitted from the input arguments, FHIR resources for all stored LOINC codes are downloaded.
You can alternatively specify the full path to a collection in Firestore to fetch data. This ensures that you can use Spezi Data Pipeline flatteners on any location within your database.
# Fetch FHIR data from full path
collection_path = "collection/subcollection/......"
fhir_observations = firebase_access.fetch_data_path(collection_path, loinc_codes)
If you want to fetch data from a particular date range, you can setup a Firebase index within Firestore. You can then specify a start_date
and end_date
to filter your data before it is streamed. This can help prevent fetching unnecessary data.
# Sort FHIR data by date and fetch from full path
collection_path = "collection/subcollection/......"
start_date = "2024-04-22"
end_date = "2024-04-24"
fhir_observations = firebase_access.fetch_data_path(collection_path, loinc_codes, start_date, end_date)
flattened_fhir_dataframe = flatten_fhir_resources(fhir_observations)
Spezi Data Pipeline offers basic functions for improved data organization and readability. For example, individual step count data instances can be grouped by date using the process_fhir_data() function. If no intuitive function needs to be performed, the data remain unchanged.
processed_fhir_dataframe = FHIRDataProcessor().process_fhir_data(flattened_fhir_dataframe)
The dowloaded data can be then plotted using the following commands:
# Create a visualizer instance
visualizer = DataVisualizer()
# Set plotting configuration
selected_users = ["User1","User2", "User3"]
selected_start_date = "2022-03-01"
selected_end_date = "2024-03-13"
# Select users and dates to plot
visualizer.set_user_ids(selected_users)
visualizer.set_date_range(selected_start_date, selected_end_date)
# Generate the plot
figs = visualizer.create_static_plot(processed_fhir_dataframe)
In a similar way, we can download and flatten ECG recordings (LOINC code: 131329) that are stored in Firestore.
# Create a visualizer instance
visualizer = ECGVisualizer()
# Set plotting configuration
selected_users = ["User1"]
selected_start_date = "2023-03-13"
selected_end_date = "2023-03-13"
# Select users and dates to plot
visualizer.set_user_ids(selected_users)
visualizer.set_date_range(selected_start_date, selected_end_date)
# Generate the plot
figs = visualizer.plot_ecg_subplots(processed_fhir_dataframe)
The Spezi Data Pipeline also handles questionnaire responses stored as FHIR resources, facilitating the collection and analysis of questionnaire data in a standardized format. In addition, it includes calculation formulas for risk scores for certain questionnaire types based on the provided questionnaire responses.
[!NOTE]
In FHIR standards, the
Questionnaire
resource represents the definition of a questionnaire, including questions and possible answers, while theQuestionnaireResponse
resource captures the responses to a completed questionnaire, containing the answers provided by a user or patient.
Contributions to this project are welcome. Please make sure to read the contribution guidelines and the contributor covenant code of conduct first.
This project is licensed under the MIT License. See Licenses for more information.