From figures to findings - statistics in practice
At MedxTeam, the focus is on clinical data. In this context, as CRO, we not only carry out clinical tests (studies) with medical devices in accordance with MDR and ISO 14155, but also offer support in the statistical planning and evaluation of the study data. In this article, we explain in this article an overview of the most important statistical concepts in clinical studies, starting with basic explanations for practical examples and recesses for advanced users.
Abbreviations
GCP Good Clinical Practice
MDR Medical Device Regulation; EU Regulation 2017/745
Underlying regulations
EU Regulation 2017/745 (MDR)
General Data Protection Regulation (GDPR)
Medical Device Environmental Constitution (MPDG)
ISO 14155
1 Introduction
Statistical methods play a central role in the clinical testing of medical devices. They are the key to analyzing data, the interpretation of results and the fulfillment of regulatory requirements. The following topics are dealt with in this article:
- Confidence intervals
- Error 1. and 2nd order
- Acceptance criteria
- Boxing plot
- Forest plot
- Connected data
- Sensitivity / specificity
2. How was that again with the confidence interval?
The confidence interval indicates the area in which an estimated parameter, such as the mean or an effect size, lies with a defined probability. It quantifies the uncertainty about an estimate and is therefore an indispensable tool of statistics.
The confidence interval provides information about how precisely an estimate is. The narrower the interval, the safer we can be that the real value is close to the estimated value. Conversely, a wide interval indicates greater uncertainties. A confidence interval is often given with a probability of trust of 95 %. This means that the true value in 95 out of 100 cases is within the specified area if the study is repeated under identical conditions.
Example:
Suppose a study shows a medium wound healing time of 10 days with a confidence interval of [8, 12] at 95 % level of trust. This means that the real mean with 95%probability is within this area.
2.1 deepening
How do confidence intervals work and why are they decisive?
- The basic idea of the confidence interval The confidence interval is based on the uncertainty that is associated with every sample. It helps to quantify this uncertainty by specifying an area in which the true value of a parameter is very likely. The more data we collect and the lower the spread of the data, the more precise (i.e. narrower) the interval becomes.
- Interpretation of a 95% confidence interval: It does not mean that the true value is "with 95% probability" in the interval. Instead, the statement refers to the fact that if we repeat the process of data collection infinitely often, the interval includes the true value in 95% of cases.
- Which factors influence the width of a confidence interval? The width of the interval depends on three main factors:
- Sample size: Larger samples provide precise estimates because the influence of random fluctuations is reduced. This leads to closer confidence intervals. With small samples, the intervals are wider because the uncertainty is greater.
- Variability of the data: With a greater spread of the data (i.e. if the values spread heavily for the mean), the intervals are further because the uncertainty over the actual value increases.
- The level of trust selected: higher levels of trust (e.g. 99% instead of 95%) lead to wider intervals because more uncertainty is taken into account. However, a lower level of trust (e.g. 90%) results in closer intervals.
Practical consequences: A particularly wide interval indicates that additional data is necessary in order to better narrow the true value.
2.2 Alternative methods for estimating confidence intervals
The classic method requires that the distribution of the underlying data is normal and the sample is sufficiently large. In the event of deviations from these assumptions or for small samples, alternative approaches can be used:
- Boat trapping:This method is ideal if the normal distribution assumption is injured or with small samples. A large number of samples are repeatedly drawn from the existing data. For each of these samples, the parameter of interest (e.g. the mean value) is calculated. The distribution of these estimated values then serves as the basis to derive the confidence interval.
- Advantages: robust over distribution injuries; flexibly applicable.
- Application example: In the case of non-normal distributed data, such as strong asymmetrical blood pressure values, boot trapping provides precise estimate.
- Bayesche confidence intervals (credit intervals):In contrast to classic statistics, the Bayesian approach works with probabilities. Here prior knowledge of the parameter is brought in by a so-called prior distribution. This is combined with the observation data (Likelihood) to calculate the posterior distribution. The credit interval then indicates the area in which the true value is with a certain probability.
- Advantages: integration of prior knowledge; Better interpretability with small samples.
- Application example: If earlier studies show that a medical device typically causes a wound healing period of about 10 days, this information can be included in the analysis in order to reduce uncertainty.
2.3 Practical relevance of confidence intervals in clinical research
- Clinical relevance versus statistical significance: confidence intervals provide more information than a P value. While a P value only indicates whether an effect is statistically significant, the confidence interval also shows whether the effect is clinically significant. Example: A medical device could cause a statistically significant reduction in the wound healing period, but this reduction may be so low that it is clinically irrelevant.
- Evaluation of uncertainties: In regulatory decisions, it is often checked whether the lower limit of the confidence interval is above a certain threshold value that is considered clinically significant.
3. Error 1. and 2nd order
Statistical tests cover the risk of errors, since decisions are made on the basis of stab test data that can only partially reflect reality. Errors 1. And 2nd order are therefore important concepts in statistics and particularly relevant in clinical research, where wrong decisions can have considerable consequences.
Two fault types can occur in statistical tests:
- Error 1. Order (alpha error): This occurs when the null hypothesis is rejected even though it is true. This error is also referred to as "false alarm". Example: An ineffective medical device is classified as effective.
- Error 2. Order (beta error): This error happens when the null hypothesis is retained, although the alternative hypothesis applies. This is often described as "overlooking an effect". Example: An effective medical device is not recognized.
The balance between these types of errors is a core task of planning clinical studies. The level of significance and test strength play a central role.
Example:
A new medical device is tested. An alpha error would lead to approval of an ineffective product, an effective product could classify an effective product as ineffective.
Practical importance: While an alpha error is regulatory and economically problematic, a beta error can hinder medical innovation.
3.1 deepening
- Connection between alpha and beta errors: There is a direct ratio between these two mistakes. If the significance level (alpha) is chosen more strictly to reduce the likelihood of an alpha error (e.g. from 0.05 to 0.01), the risk of a beta error often increases. Conversely: A loosening of the alpha value reduces the beta error, but increases the risk of recognizing incorrect effects.
- Test strength: The test strength is a measure of how well a statistical test can discover an actual effect. A test strength of 80% means that a real effect is overlooked in 20% of cases (beta errors).
- Influence factors on the test strength: sample size, effect size and the selected level of significance. A larger sample increases the likelihood of discovering small effects and reduces the beta error.
- Adjustment for multiple analyzes:
- Intermediate evaluations: In studies with regular data analyzes, the likelihood of an alpha error can increase, since with every analysis there is a chance to discover a random effect. Methods such as the O'Brien-fleming method use stricter limit values in early evaluations to control the total error rate.
- Bonferroni correction: This method divides the level of significance through the number of comparisons to keep the total error rate low. However, this is conservative and can reduce the test strength with a large number of tests.
- Bayesian perspective:
Instead of using rigid levels of significance, Bayesian statistics evaluate probabilities. For example: How likely is it that an effect is greater than a clinically relevant threshold? This can lead to more flexible and interpretable results, especially for small samples.
- ROC curves:
The Receiver Operating Characteristic (ROC) curve shows the trade-offs between sensitivity (true positive) and 1 specificity (wrong positive). It helps to identify threshold values that minimize both alpha and beta errors.
4. Acceptance criteria
Acceptance criteria determine the conditions under which a clinical result is considered successful. They are crucial for the interpretation of study results and the decision as to whether a medical device is effective or safe.
Define acceptance criteria which results are required to achieve a specific goal. They influence the planning of the study, hypothesis formulation and ultimately the approval decision of a product.
Example:
A medical device is developed to shorten the healing time after an operation. As an acceptance criterion, it is determined that the average healing time must be reduced by at least 20% compared to standard treatment. The study checked whether the confidence interval of the result exceeds this limit.
4.1 deepening
- Non-subdue, superiority and equivalence tests:
- Non-subdness test: shows that the new product is no worse than existing treatment within an acceptable tolerance limit.
- Supervision test: proves that the product is significantly better.
- Equivalence test:Check whether the product is similar within a specified area (e.g. ± 10%).
- Bayesche approaches:
- Instead of establishing a fixed threshold for acceptance, Bayesch methods calculate the likelihood that the true effect is greater than a pre -defined threshold. This allows a dynamic and probability -theoretical consideration.
- Clinical importance:
- A statistically significant effect does not automatically meet an acceptance criterion, since the clinical relevance must also be assessed. Example: A pain reduction of 1% could be statistically significant, but could not be clinically significant.
- Cost-benefit assessment:
- Strict acceptance criteria can increase the quality of the assessment, but often require larger samples, which increase the costs and duration of the study.
5. How do I read a box plot?
A box plot, also called the box graphic, is a versatile statistical tool that visualizes the distribution of data in a simple way. It helps to recognize central tendencies, variability and potential outliers at a glance and is particularly useful when comparing groups.
A box plot is compact to distribute a data record. The most important components are:
- Median: The line in the middle of the box represents the central value of the data.
- Quarters: The lower edge of the box is the 1st quartile (Q1), the upper edge of the 3rd quartile (Q3).
- Interquartile distance (IQR): The area between Q1 and Q3 includes the middle 50% of the data.
- Whisker: The lines above and below the box indicate the data values outside the IQR, up to a defined limit (often 1.5 times of the IQR).
- Extra: Data points that are outside of this border are shown separately as points.
Example
Let us imagine that we have data on the healing time of two patient groups (group A and Group B):
- Group A has shorter healing times with low variability, which results in a compact box with short whiskers.
- Group B shows bigger differences between the patients, which leads to a wider box and longer whiskers.
A direct comparison of the two box plots can quickly show which group is more homogeneous and whether there are extreme outliers.
5.1 deepening
- Detailed interpretation:
- Median: indicates the central tendency of the data and is robust towards outliers.
- IQR: shows the diversification of the middle 50% of the data and gives an impression of the variability.
- Whiskers and outliers:Help to identify extreme values that could possibly distort the analysis.
- Comparison of groups: Box plots are ideal for presenting differences between groups, e.g. B. to compare the effect of a medical device on different age groups. Differences in the height of the box or the whisker can indicate variability or systematic effects.
- Extended visualizations:
- Violin plots: A combination of box plot and density plot that shows the entire distribution of the data. Particularly useful for multimodal distributions (e.g. two tips in the data).
- Parallel boxing plots:Several box plots side by side make it easier to compare groups.
- Application in clinical studies:
- Outlier analysis: In a clinical study, outliers could indicate patients who react exceptionally well or poorly to treatment. Such insights can provide information about individual differences that are important for further research.
- Stratification:Box plots can be used to stratify and visually present data according to subgroups (e.g. age groups, gender).
- Robustness: Since the median and the quarters are insensitive to outliers, the box plot is particularly robust. Nevertheless, strong asymmetrical distributions (e.g. long "cocks" on one side) can be misleading. In such cases, alternative representations such as the violin plot can be helpful.
6. How do I read a forest plot?
A forest plot is an indispensable tool in the meta-analysis and enables the presentation and interpretation of the results of several studies or subgroups. It shows estimates and their confidence intervals in a uniform diagram.
The Forest Plot consists of:
- Estimated points: These points or squares represent the effect (e.g. mean value, odds ratio) of each study or sub -group.
- Confidence intervals: The horizontal lines indicate the uncertainty of the estimate.
- Vertical line: This represents the "no effect" point, e.g. B. an odds ratio of 1 or an effect size of 0.
- Overall effect: A diamond at the lower end shows the weighted average of all studies, with the width of the diamond representing the confidence interval.
Example
A meta-analysis examines the effectiveness of a patch on the wound healing time in various studies.
- Study A shows a significant reduction in the wound healing period, with a confidence interval that is completely below the "no effect" line.
- Study B has a broad confidence interval that includes both positive and negative effects, which indicates uncertainty in the results.
- The overall effect (diamond) is also below the line, which indicates a significant effectiveness of the pavement.
6.1 deepening
- Analysis of heterogeneity:
- Cochran's Q-Test: Check whether the variation between the studies is larger than expected by chance.
- I² statistics: indicates the percentage of variability, which is explained by heterogeneity. A high value (e.g. over 50%) indicates that a random effects model makes more sense.
- Fixed-Effects vs. Random-EffectS models:
- Fixed effects model: assumes that all studies measure the same true effect and only result in differences.
- Random-EffectS model:Considering that studies can have different populations and conditions and allows greater variability between the studies.
- Bayess's Forest Plots:
- Bayesian approaches use prior knowledge to better model uncertainty. The forest plot could visualize posterior distributions and credit intervals here, which enables a deeper interpretation.
- Interpretation in practice:
- A forest plot can be used to evaluate the consistency of results. Studies whose confidence intervals do not cut the "no effect" line provide strong evidence. Different results of individual studies can be signs of methodological differences or specific population effects.
7. What are connected data?
Connected data are measurements that are not independent of one another. This often occurs in clinical studies when, for example, the same patient is measured several times (e.g. before and after treatment) or if there are observations within couples or groups (e.g. twins or devices that are tested on the same patient).
In the case of connected data, one measurement affects the other directly. The best known examples are before and after measurements or paired samples. In such cases, it is important to use statistical procedures that take this dependency into account, otherwise incorrect conclusions can be drawn.
- Typical scenario: data collection before and after the treatment of a patient. Since both measurements come from the same patient, they are not independent.
Example
A study examines the effectiveness of a new wound association to accelerate healing after operations. The healing time is measured in the same patients before and after the application of the wound association. Since both measurements come from the same patient, they are connected. A simple comparison of the mean without taking the connection would lead to distorted results. Instead, a paired T-test should be used to correctly analyze the differences in the healing times.
7.1 deepening
- Why is dependency important? Independent data follow the basic assumption of many statistical tests. In the case of connected data, however, this dependence violates this assumption. A special analysis is therefore required to avoid distorted results.
- Suitable statistical procedures:
- Paired T-Test: This test compares the mean values of two connected groups by analyzing the differences between the couples.
- Wilcoxon sign ranking test: This is the non-parametric alternative if the data has no normal distribution.
- Linear mixed models (LMM): These models are particularly useful for complex study designs with several times or groups. You can analyze random effects (e.g. individual differences) and solid effects (e.g. treatment) at the same time.
- Variance-covariance structure: In advanced models such as the ANOVA with measurement repetitions, the dependency between the measurements must be modeled correctly. Different assumptions about the structure (e.g. Compound Symmetry or Autoregressive Structures) influence the results.
- Practical challenges:
- Missing values: Connected data are particularly susceptible to bias if measured values are missing. Methods such as multiple imputation or maximum liikelihood estimates can help minimize distortions.
- Complexity: The analysis of connected data often requires specialized software and knowledge in advanced statistical methods.
8. Difference in sensitivity/specificity
Sensitivity and specificity are fundamental dimensions to evaluate the quality of a diagnostic test. You describe how good a test is able to recognize the sick and correctly exclude healthy people.
Sensitivity: the proportion of actually sick people who are recognized correctly by the test (true positive). It measures the ability not to overlook the sick.
Specification: the proportion of actually healthy that are correctly recognized as healthy (True negative). She describes how well the test can avoid false alarms.
Why is that important? A perfect test would have a sensitivity and specificity of 100%. In practice, however, compromises often have to be made, e.g. B. for mass screenings, in which a high sensitivity test is preferred in order not to overlook a patient.
Example
Has a test for diagnosis of a rare disease:
- 90% sensitivity: The test 90 recognizes from 100 people actually sick patients correctly; 10 are incorrectly classified as healthy.
- 80% specificity: Out of 100 healthy people, 80 are correctly recognized as healthy; 20 are wrongly classified as sick.
8.1 deepening
- Connection with prevalence:
- The positive and negative predictive values (PPV and NPV) depend directly on the prevalence of the disease. A test with high sensitivity could provide many false-positive results at low prevalence.
- ROC curves and threshold values:
- A receiver Operating Characteristic (ROC) curve shows how sensitivity and specificity change in different threshold values of a test. The ideal threshold maximizes the sensitivity and specificity and minimizes false-positive and false-negative results. The area under the ROC curve (AUC) is a measure of the overall performance of the test.
- Trade offs between sensitivity and specificity:
- Tests with high sensitivity (e.g. screening tests) often have a lower specificity and create more false-positive results. Combined test strategies (e.g. a sensitive screening test followed by a specific confirmation test) can improve diagnostic accuracy.
- Bayesche consideration:
- Bayesian analyzes enable the probability that a patient is actually sick, based on a positive test result and the known prevalence. This helps to better inform diagnostic decisions.
- Practical applications:
- Diagnostic tests such as Covid-19 anti-tests or mammography screenings.
- Evaluation of new diagnostic devices or methods in clinical studies.
9. Conclusion
Statistical methods are indispensable tools in clinical research and the development of medical devices. They enable data to analyze precisely, quantify uncertainties and make well -founded decisions. From the calculation of confidence intervals to the avoidance of errors 1. And 2nd order to the interpretation of box plots and forest plots - the statistics offer a variety of techniques to improve the quality and meaningfulness of clinical studies. Through the targeted use of these methods, we can not only demonstrate the effectiveness and safety of medical devices, but also meet the regulatory requirements and ultimately optimize patient care. In a world in which data play an increasingly important role, the statistics remain an indispensable part of evidence -based medicine.
10. How we can help you
We are happy to support you with regard to the structure and implementation and use of a database -based system. As Cro, we also take over the complete data management via the EDC system and monitoring.
So we support you during your complete project with your medical device, starting with a free initial consultation, help with the introduction of a QM system, study planning and implementation to technical documentation - always with primary reference to the clinical data on the product: from the beginning until to the end.
Do you already have some initial questions?
You can find free initial consultation here : Free initial consultation