--- title: "Mendelian Laws of Inheritance" author: "Alexandra Sarafoglou" date: "9/16/2020" output: html_document vignette: > %\VignetteIndexEntry{Mendelian Laws of Inheritance} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} bibliography: ../inst/REFERENCES.bib --- In this vignette, we will explain how to compute a Bayes factor for mixtures of equality and inequality-constrained hypotheses for multinomial models. ## Model and Data As example for a mixture of equality and inequality-constrained hypotheses in multinomial models, we will use the dataset, `peas`, which is included in the package `multibridge`. The dataset provides the categorization of crossbreeds between a plant variety that produced round yellow peas with a plant variety that produced wrinkled green peas. This dataset contains the phenotypes of peas from 556 plants that were categorized either as (1) round and yellow, (2) wrinkled and yellow, (3) round and green, or (4) wrinkled and green. Furthermore, in the context of the evaluation of mixture of equality and inequality-constrained hypotheses the dataset was discussed in @sarafoglou2020evaluatingPreprint. ```{r load_data} library(multibridge) data(peas) peas ``` The model that we will use assumes that the vector of observations $x_1, \cdots, x_K$ in the $K$ categories follow a multinomial distribution. The parameter vector of the multinomial model, $\theta_1, \cdots, \theta_K$, contains the probabilities of observing a value in a particular category; here, it reflects the probabilities that the peas show one of the four phenotypes. The parameter vector $\theta_1, \cdots, \theta_K$ is drawn from a Dirichlet distribution with concentration parameters $\alpha_1, \cdots, \alpha_K$. The model can be described as follows: \begin{align} x_1, \cdots, x_K &\sim \text{Multinomial}(\sum_{k = 1}^K x_k, \theta_1, \cdots, \theta_K) \\ \theta_1, \cdots, \theta_K &\sim \text{Dirichlet}(\alpha_1, \cdots, \alpha_K) \\ \end{align} Based on the Mendelian laws of inheritance we test the informed hypothesis $\mathcal{H}_r$ that the number of peas that will be categorized as "round and yellow" will be highest, since both traits are dominant in the parent plants and should thus appear in the offspring. Furthermore, the Mendelian laws of inheritance predict that the phenotypes "wrinkled and yellow" and "round and green" occur second most often and the probability to fall into one of the two categories is equal, due to the fact that in each case one of the traits is dominant. Consequently, "wrinkled and green" peas should appear least often. This informed hypothesis will be tested against the encompassing hypothesis $\mathcal{H}_e$ without constraints: \begin{align*} \mathcal{H}_m &: \theta_{1} > \theta_{2} = \theta_{3} > \theta_{4} \\ \mathcal{H}_e &: \theta_1, \theta_2, \theta_{3}, \theta_{4}. \end{align*} To compute the Bayes factor in favor of the restricted hypothesis, $\text{BF}_{re}$, we need to specify (1) a vector containing the number of observations, (2) the restricted hypothesis, (3) a vector with concentration parameters, (4) the labels of the categories of interest (i.e., the manifestation of the peas). ```{r specify_hr} x <- peas$counts # Test the following restricted Hypothesis: # Hr: roundYellow > wrinkledYellow = roundGreen > wrinkledGreen Hr <- c('roundYellow > wrinkledYellow = roundGreen > wrinkledGreen') # Prior specification # We assign a uniform Dirichlet distribution, that is, we set all concentration parameters to 1 a <- c(1, 1, 1, 1) categories <- peas$peas ``` With this information, we can now conduct the analysis with the function `mult_bf_informed()`. Since we are interested in quantifying evidence in favor of the informed hypothesis, we set the Bayes factor type to `BFre`. For reproducibility, we are also setting a seed with the argument `seed`: ```{r compute_results} results <- multibridge::mult_bf_informed(x=x,Hr=Hr, a=a, factor_levels=categories, bf_type = 'BFre', seed = 2020) ``` ## Summarize the Results We can get a quick overview of the results by using the implemented `summary()` method: ```{r show_summary} m1 <- summary(results) m1 ``` The summary of the results shows the Bayes factor estimate, the evaluated informed hypothesis and the posterior parameter estimates of the marginal beta distributions (based on the encompassing model). The data show evidence in factor of our informed hypothesis: The data is `r signif(m1$bf, 3)` more likely to have occurred under the informed hypothesis than under the encompassing hypothesis. We can also further decompose the Bayes factor into an equality constrained Bayes factor (i.e., the Bayes factor that evaluates the equality constraints against the encompassing hypothesis) and an inequality constrained Bayes factor (i.e., the Bayes factor that evaluates the inequality constraints against the encompassing hypothesis given that the equality constraints hold). We can access this information with the `S3` method `bayes_factor` ```{r show_bf} bayes_list <- bayes_factor(results) bayes_list$bf_table # Bayes factors in favor for informed hypothesis bfre <- bayes_list$bf_table[bayes_list$bf_table$bf_type=='BFre', ] ``` Based on this summary table of the Bayes factors, we can infer the following: - In total, the data are `r signif(bfre$bf_total, 3)` more likely under the informed hypothesis than under the encompassing hypothesis - The data is `r signif(bfre$bf_equalities, 3)` more likely under the equality constrained hypothesis $\theta_{2} = \theta_{3}$ than under the encompassing hypothesis - Given that the equality constrained hypothesis holds, the data is `r signif(bfre$bf_inequalities, 3)` more likely under the inequality constrained hypothesis $\theta_{1} > \theta_{2,3} > \theta_{4} | \theta_{2} = \theta_{3}$ than under the encompassing hypothesis. - The relative mean-squared error for the Bayes factor is $`r signif(bayes_list$error_measures$re2)`$ Details on the decomposition of the Bayes factor can be found in @sarafoglou2020evaluatingPreprint. # References