Title: | Mixtures of Proportional Hazard Models |
---|---|
Description: | Fits multiple variable mixtures of various parametric proportional hazard models using the EM-Algorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing values and censored values are allowed. Independence is assumed over the single variables. |
Authors: | Patrick Mair [cre, aut], Marcus Hudec [aut] |
Maintainer: | Patrick Mair <[email protected]> |
License: | GPL-2 |
Version: | 0.7-2 |
Built: | 2024-10-29 03:00:07 UTC |
Source: | https://github.com/cran/mixPHM |
This package fits multiple variable mixtures of various parametric proportional hazard models using the EM-Algorithm. Proportionality restrictions can be imposed on the latent groups and/or on the variables. Several survival distributions can be specified. Missing and censored values are allowed. Independence is assumed over the single variables.
Package: | mixPHM |
Type: | Package |
Version: | 0.7-2 |
Date: | 2015-07-23 |
License: | GPL-2 |
Patrick Mair, Marcus Hudec
Maintainer: Patrick Mair <[email protected]>
Mair, P., and Hudec, M. (2009). Multivariate Weibull mixtures with proportional hazard restrictions for dwell time based session clustering with incomplete data. Journal of the Royal Statistical Society, Series C (Applied Statistics), 58(5), 619-639.
Kalbfleisch, J.D., and Prentice, R.L. (1980). The statistical analysis of failure time data. New York: Wiley.
Celaux, G., and Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14, 315-332.
This function fits models for different proportionality restrictions.
msBIC(x, K, method = "all", Sdist = "weibull", cutpoint = NULL, EMoption = "classification", EMstop = 0.01, maxiter = 100)
msBIC(x, K, method = "all", Sdist = "weibull", cutpoint = NULL, EMoption = "classification", EMstop = 0.01, maxiter = 100)
x |
Data frame or matrix of dimension n*p with survival times ( |
K |
A vector with number of mixture components. |
method |
A vector with the methods provided in |
Sdist |
Various survival distrubtions such as |
cutpoint |
Cutpoint for censoring |
EMoption |
|
EMstop |
Stopping criterion for EM-iteration. |
maxiter |
Maximum number of iterations. |
Based on the output BIC matrix, model selection can be performed in terms of the number of mixture components and imposed proportionality restrictions.
Returns an object of class BICmat
with the following values:
BICmat |
Matrix with BIC values |
K |
Vector with different components |
method |
Vector with proportional hazard methods |
Sdist |
Survival distribution |
##Fitting 3 Weibull proportional hazard models (over groups, pages) for K=2,3 components data(webshop) res <- msBIC(webshop, K = c(2,3), method = c("main.p","main.g"), maxiter = 10) res
##Fitting 3 Weibull proportional hazard models (over groups, pages) for K=2,3 components data(webshop) res <- msBIC(webshop, K = c(2,3), method = c("main.p","main.g"), maxiter = 10) res
This function allows for the computation of proportional hazards models with different distribution assumptions on the underlying baseline hazard. Several options for imposing proportionality restrictions on the hazards are provided. This function offers several variations of the EM-algorithm regarding the posterior computation in the M-step.
phmclust(x, K, method = "separate", Sdist = "weibull", cutpoint = NULL, EMstart = NA, EMoption = "classification", EMstop = 0.01, maxiter = 100)
phmclust(x, K, method = "separate", Sdist = "weibull", cutpoint = NULL, EMstart = NA, EMoption = "classification", EMstop = 0.01, maxiter = 100)
x |
Data frame or matrix of dimension n*p with survival times ( |
K |
Number of mixture components. |
method |
Imposing proportionality restrictions on the hazards:
With |
Sdist |
Various survival distrubtions such as |
cutpoint |
Integer value with upper bound for observed dwell times. Above this cutpoint, values are regarded as censored. If NULL, no censoring is performed |
EMstart |
Vector of length n with starting values for group membership,
|
EMoption |
|
EMstop |
Stopping criterion for EM-iteration. |
maxiter |
Maximum number of iterations. |
The method "separate"
corresponds to an ordinary mixture model. "main.g"
imposes proportionality
restrictions over variables (i.e., the group main effect allows for free-varying variable hazards). "main.p"
imposes proportionality restrictions over groups (i.e., the variable main effect allows for free-varying group hazards).
If clusters with only one observation are generated, the algorithm stops.
Returns an object of class mws
with the following values:
K |
Number of components |
iter |
Number of EM iterations |
method |
Proportionality restrictions used for estimation |
Sdist |
Assumed survival distribution |
likelihood |
Log-likelihood value for each iteration |
pvisit |
Matrix of prior probabilities due to |
se.pvisit |
Standard errors for priors |
shape |
Matrix with shape parameters |
scale |
Matrix with scale parameters |
group |
Final deterministic cluster assignment |
posteriors |
Final probabilistic cluster assignment |
npar |
Number of estimated parameters |
aic |
Akaike information criterion |
bic |
Bayes information criterion |
clmean |
Matrix with cluster means |
se.clmean |
Standard errors for cluster means |
clmed |
Matrix with cluster medians |
Mair, P., and Hudec, M. (2009). Multivariate Weibull mixtures with proportional hazard restrictions for dwell time based session clustering with incomplete data. Journal of the Royal Statistical Society, Series C (Applied Statistics), 58(5), 619-639.
Celaux, G., and Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14, 315-332.
data(webshop) ## Fitting a Weibll mixture model (3 components) is fitted with classification EM ## Observations above 600sec are regarded as censored res1 <- phmclust(webshop, K = 3, cutpoint = 600) res1 summary(res1) ## Fitting a Rayleigh Weibull proportional hazard model (2 components, proportional over groups) res2 <- phmclust(webshop, K = 2, method = "main.p", Sdist = "rayleigh") res2 summary(res2)
data(webshop) ## Fitting a Weibll mixture model (3 components) is fitted with classification EM ## Observations above 600sec are regarded as censored res1 <- phmclust(webshop, K = 3, cutpoint = 600) res1 summary(res1) ## Fitting a Rayleigh Weibull proportional hazard model (2 components, proportional over groups) res2 <- phmclust(webshop, K = 2, method = "main.p", Sdist = "rayleigh") res2 summary(res2)
Plotting functions for hazard rates, survival times and cluster profiles.
plot_hazard(x, gr.subset, var.subset, group = TRUE, xlim = NA, ylim = NA, xlab = "Survival Time", ylab = "Hazard Function", main = "Hazard Functions", type = "l", lty = 1, lwd = 1, col = NA, legpos = "right", ...) plot_survival(x, gr.subset, var.subset, group = TRUE, xlim = NA, ylim = NA, xlab = "Survival Time", ylab = "Survival Function", main = "Survival Functions", type = "l", lty = 1, lwd = 1, col = NA, legpos = "right", ...) plot_profile(x, method = "mean", type = "b", pch = 19, lty = 1, lwd = 1, col = NA, xlab = "Variables", leglab = NA, ylab = NA, main = NA, legpos = "topright", ...)
plot_hazard(x, gr.subset, var.subset, group = TRUE, xlim = NA, ylim = NA, xlab = "Survival Time", ylab = "Hazard Function", main = "Hazard Functions", type = "l", lty = 1, lwd = 1, col = NA, legpos = "right", ...) plot_survival(x, gr.subset, var.subset, group = TRUE, xlim = NA, ylim = NA, xlab = "Survival Time", ylab = "Survival Function", main = "Survival Functions", type = "l", lty = 1, lwd = 1, col = NA, legpos = "right", ...) plot_profile(x, method = "mean", type = "b", pch = 19, lty = 1, lwd = 1, col = NA, xlab = "Variables", leglab = NA, ylab = NA, main = NA, legpos = "topright", ...)
x |
object of class |
gr.subset |
Optional vector for plotting subset of clusters |
var.subset |
Optional vector for plotting subset of variables |
group |
if |
method |
|
xlim |
limits for x-axis |
ylim |
limits for y-axis |
xlab |
label for x-axis |
ylab |
label for y-axis |
main |
title of the plot |
leglab |
label for the legend |
type |
type of plot |
lty |
line type |
lwd |
line width |
pch |
type of plotting points |
col |
colors; if |
legpos |
position of the legend; |
... |
Additional plot options |
##Plots for mixture Weibull model with 3 components data(webshop) res <- phmclust(webshop, 3) ##Hazard plot for first and third group, all pages plot_hazard(res, gr.subset = c(1,3), group = TRUE, xlab = "Dwell Time") ##Survival plot for each group, first 6 pages plot_survival(res, var.subset= 1:6, group = FALSE, xlab = "Dwell Time") ##Cluster profile plot plot_profile(res, xlab = "Pages", ylab = "Mean Dwell Time", main = "Cluster Profile")
##Plots for mixture Weibull model with 3 components data(webshop) res <- phmclust(webshop, 3) ##Hazard plot for first and third group, all pages plot_hazard(res, gr.subset = c(1,3), group = TRUE, xlab = "Dwell Time") ##Survival plot for each group, first 6 pages plot_survival(res, var.subset= 1:6, group = FALSE, xlab = "Dwell Time") ##Cluster profile plot plot_profile(res, xlab = "Pages", ylab = "Mean Dwell Time", main = "Cluster Profile")
This function produces a scree plot on the basis of the BIC values in msBIC
.
screeBIC(x, lty = 1, col = NA, pch = 19, type = "b", main = "BIC Screeplot", xlab = "Number of Components", ylab = "BIC", legpos = "topright", ...)
screeBIC(x, lty = 1, col = NA, pch = 19, type = "b", main = "BIC Screeplot", xlab = "Number of Components", ylab = "BIC", legpos = "topright", ...)
x |
Object of class |
lty |
Line type |
col |
Line colors; if |
pch |
Value for plotting points |
type |
Type of plot |
main |
Plot title |
xlab |
Label for x-axis |
ylab |
Label for y-axis |
legpos |
position of the legend |
... |
Additional plot parameters |
##Fitting all Weibull proportional hazard models for K=2,3,4 components data(webshop) res <- msBIC(webshop, K = c(2,3,4), method = "all", maxiter = 5) screeBIC(res)
##Fitting all Weibull proportional hazard models for K=2,3,4 components data(webshop) res <- msBIC(webshop, K = c(2,3,4), method = "all", maxiter = 5) screeBIC(res)
This function performs the clustering for different EM starting values in order to find a stable solution.
stableEM(x, K, numEMstart = 5, method = "separate", Sdist = "weibull", cutpoint = NULL, EMoption = "classification", EMstop = 0.0001, maxiter = 1000, print.likvec = TRUE)
stableEM(x, K, numEMstart = 5, method = "separate", Sdist = "weibull", cutpoint = NULL, EMoption = "classification", EMstop = 0.0001, maxiter = 1000, print.likvec = TRUE)
x |
Data frame or matrix of dimension n*p with survival times ( |
K |
Number of mixture components. |
numEMstart |
Number of different starting solutions |
method |
Imposing proportionality restrictions on the hazards:
With |
Sdist |
Various survival distrubtions such as |
cutpoint |
Integer value with upper bound for observed dwell times. Above this cutpoint, values are regarded as censored. If NULL, no censoring is performed |
EMoption |
|
EMstop |
Stopping criterion for EM-iteration. |
maxiter |
Maximum number of iterations. |
print.likvec |
If |
After the computation of the models for different starting solutions using the
function phmclust
the best model is chosen, i.e., the model with the largest likelihood value.
The output values refer to this final model.
Returns an object of class mws
with the following values:
K |
Number of components |
iter |
Number of EM iterations |
method |
Method with propotionality restrictions used for estimation |
Sdist |
Assumed survival distribution |
likelihood |
Log-likelihood value for each iteration |
pvisit |
Matrix of prior probabilities due to |
se.pvisit |
Standard errors for priors |
shape |
Matrix with shape parameters |
scale |
Matrix with scale parameters |
group |
Final deterministic cluster assignment |
posteriors |
Final probabilistic cluster assignment |
npar |
Number of estimated parameters |
aic |
Akaike information criterion |
bic |
Bayes information criterion |
clmean |
Matrix with cluster means |
se.clmean |
Standard errors for cluster means |
clmed |
Matrix with cluster medians |
## Exponental mixture model with 2 components for 4 different starting solutions data(webshop) res <- stableEM(webshop, K = 2, numEMstart = 4, Sdist = "exponential") res summary(res)
## Exponental mixture model with 2 components for 4 different starting solutions data(webshop) res <- stableEM(webshop, K = 2, numEMstart = 4, Sdist = "exponential") res summary(res)
This artificial data set represents dwell times in seconds of 333 sessions on 7 webpage categories of a webshop. Missing values indicate that the corresponding session did not visit a particular page.
data(webshop)
data(webshop)
Numeric matrices of data frames with subjects as rows and variables as columns.
Missing values are coded as NA
(which corresponds to 0 survival time).
data(webshop) str(webshop)
data(webshop) str(webshop)
This function computes Wilcox H-test and the Steiger-Hakstian-Test for testing H0: R = I.
WilcoxH(x, use = "pairwise.complete.obs")
WilcoxH(x, use = "pairwise.complete.obs")
x |
Data frame or matrix of dimension n*p with survival times ( |
use |
Treatment of |
This test is robust against violations of normality. Since phmclust()
assumes independence across pages, this test can be used to explore the appropriateness of the data.
Returns an object of class "wilcoxh"
with the following values:
Rmat |
Correlation matrix |
SH.res |
Results for Steiger-Hakstian-Test |
WH.res |
Results for Wilcox H-test |
Wilcox, R. (1997). Tests of independence and zero correlations among P variables. Biometrical Journal, 2, 183-193.
data(webshop) res <- WilcoxH(webshop) res
data(webshop) res <- WilcoxH(webshop) res