Multiple Mini-Interviews (MMIs) are increasingly used around the world as tools to assess non-academic attributes for selection in health professions . They were first implemented at McMaster for medical school selection in 2002  and were designed to reduce the contextual specificity seen with traditional interviews. They usually consist of a series of short structured or semi-structured interviews or role plays with actors  and, depending on their implementation parameters, may show conceptual overlap with Assessment Centers (ACs), which also have several components aimed at assessing specific behaviors . Since MMIs are generally a very high-stakes assessment tool, proof of their validity is of the utmost importance. According to Kane , validity must be conceived as a process of validation, rather than a concept to be broken down into several forms (eg face validity, construct validity, predictive validity, etc.). The aim is to provide evidence related to 1) how the instrument was developed (content and scoring), 2) the accuracy or stability of the scores obtained (reliability and generalizability), 3) the constructs that are assessed and possible sources of undesirable variance (extrapolation) and 4) the credibility and implications of decisions that arise from the process [6, 7]. A recent review suggested that more data were needed regarding evidence of the construct validity of the MMI which consists mainly of extrapolation validity proofs in Kane’s framework.
What exactly is evaluated by MMIs remains elusive and likely varies depending on implementation parameters and actual content . In some cases, the authors suggested that this could be a “problem assessment”  or more recently “adaptability” or “ability to identify criteria” . A fairly consistent finding is that MMI scores are uncorrelated or inversely correlated with GPA or other measures of past academic performance. [11,12,13]. Positive associations were found between MMI scores and OSCE scores in medical school [12, 14,15,16]internship rotation evaluations [16,17,18]and in some contexts with exam results . Two multicenter studies found correlations between MMIs that were developed and implemented by institutions independently [10, 19], suggesting some overlap between the concepts assessed in various contexts. Moreover, a recent systematic review of personal domains assessed in MMI demonstrated that a few personal characteristics, such as communication skills and collaboration, were included in the design of most MMIs described in the literature. .
In various contexts, authors have attempted to study the dimensionality of IMMs, that is, the number of latent variables or constructs that are measured, with mixed results. For example, the exploratory factor analysis (EFA) studies of Lemay et al.  and by Cox et al.  identified that each of their MMI stations constituted a factor and was likely to assess a different dimension. An EFA study in veterinary medicine on an MMI (semi-directive interviews with behavioral questions) at 5 stations resulted in a solution with 3 factors (i.e. three dimensions) labeled “moral and ethical values”, “relational ability” and “school ability “. ”, which also combined the applicant’s age and GPA. More recently, an Australian study suggested that the MMI in different Australian institutions was one-dimensional .
MMIs for medical selection use a wide range of station formats and, no doubt, candidates will need to rely on a different set of skills to perform at these different types of stations. In the AC literature, even with separate components, most observed performance differences will vary based on the simulation exercise rather than any underlying pre-specified construct. . From a theoretical perspective, station formats can be viewed as one of the “building blocks” of an MMI modular design process that will likely provide varying levels of contextualization and consistency of presentation of stimuli. . For example, scripted role-playing games will typically provide very high and detailed contextualization that could reflect social interaction in “real life”, much like simulated patients. , while discussion posts, which are often less contextualized and more “open”, are likely to require more thinking and argumentative skills. Therefore, exploring how different station formats (e.g., discussion, role-playing, etc.) contribute to grading is highly relevant since this is a design choice that admissions committees have a say in. Total control. Indeed, if all MMI stations appear to rate the same dimension, then the stations within a given MMI are most likely interchangeable and could be chosen based on other factors such as ease of implementation or cost. For example, in our experience, role-playing games are generally more complex and time-consuming to plan and may add some inconsistencies related to the actor’s performance. On the other hand, if station formats assess different dimensions, then it becomes important to assess whether they all bring relevant information to the process and to explore the use of subscores to inform admissions decisions. Moreover, reliability problems may appear, since certain dimensions will be evaluated by fewer items. In a recent retrospective analysis of the psychometric properties of role-playing games and interview stations in Integrated French Multiple Mini-Interviews (IFMMI), Renaud et al. showed that factorial models considering these two station formats as two dimensions could better explain the structure of the test . This analysis however did not include more recent iterations of the IFMMI where a third type of station was added (collaboration).
Therefore, the aim of this study was to see if, in our context, stations with three different formats could possibly assess different underlying dimensions. The IFMMI is a collaborative effort between the three French-speaking faculties of medicine in Quebec (Canada). Each year, approximately 1,600 candidates are assessed over a weekend at four interview centers located in Montreal, Quebec, Sherbrooke and Moncton. The interview score is then shared among the three medical schools, so applicants applying to more than one institution only need to do the interviews once. Each institution then uses the overall interview score according to its own selection criteria. Overall, in 2018 and 2019, the weight given to IFMMI was around 50% of the final pre-ranking score for offers of admission, with the remaining 50% given to the R grade (academic performance score) . In recent years, IFMMI has relied on a mixture of chat stations, role-playing games, and collaborative stations. It has already been found to show reliable scores [17, 27] and some predictive validity with the performance of internship rotation [17, 18]. Thus, the present study is part of a validation process which aims to assess the dimensions which are evaluated by the IFMMI on the basis of their station format. Based on recent work carried out on two types of stations we postulated that each station format would evaluate a different dimension and therefore a three-dimensional structure would provide a better fit for our MMI results than a one-dimensional structure.