The present study used a cross-sectional correlation design. The questionnaire was designed based on a psychological perspective of engagement and the student engagement construct was considered multidimensional. Student engagement is conceptualized in this study as the student investment of time and energy in PBL tutorial experiences at the cognitive, affective, behavioral dimensions. An initial questionnaire was designed to operationalize the three dimensions of the student engagement construct based on our previously published conceptual framework of engagement [9]. A focus group discussion was then conducted with medical education experts (n = 12) who examined the degree of concordance between each item of the questionnaire and the intended construct and for examining the degree of clarity of the items. The outcome of the focus group discussion was that experts agreed to include all the 15 items with slight modifications and degrees of agreement ranging from 60 to 100% for items. The questionnaire was then pilot tested with a small group of year 2 medical students (n = 10) for suitability of the items and no further modifications were included.

Setting and participants

The target population in this study were medical students in phase II of the medical program at a college of medicine in the Gulf region. The medical MBBS program at this college consists of five years duration. Year 1 (Phase I) is a foundation year with emphasis on basic medical sciences and general education courses. Year 2 and 3 (Phase II) consists of integrated medical sciences courses arranged in body systems. Problem based learning (PBL) is the main strategy of learning in Phase II of the program and PBL tutorials are the backbone activity. Year 4 and 5 (Phase III) consists of hospital-based rotations in different core clinical specialties.

The context of the study was the PBL small group tutorials conducted during an integrated system-based course. Small-group PBL tutorials consist of 8 to 10 students who meet twice a week for two hours in each session. The tutorials are led by a PBL tutor who functions mainly as a facilitator of learning rather than providing information. In the first session, students discuss a clinical case which is designed to stimulate rich discussion in the group and students generate a list of learning needs by the end of the session. Students then go into a stage of self-study scaffolded by structured college teaching activities between the first and second session. Students then meet again to present their learning during the week and integrate the information related to the case. Each PBL tutor is assigned to the group throughout the whole semester.

Instruments and sampling

The final form of the study questionnaire consists of 15 items representing emotional engagement (4 items), cognitive engagement (6 items), and behavioral engagement (5 items). The multiple-choice achievement test consisted of 100 items of the A-type (single best response) and covered all contents of the course. Most of the questions are context-rich scenarios which test the application of knowledge rather than simple recall. We used convenient sampling with a targeted population size of 204 year 2 and 3 medical students. The paper-based questionnaire was filled in by 176 students (Response rate = 86%) at the end of an organ-system course. Students were informed to score their overall engagement in PBL tutorials during the course which ranged from 6 to 7 weeks.

Statistical analysis

The purpose of the study was to collect different lines of evidence supporting the validity of the questionnaire. The data were entered and analyzed using the Statistical Package for Social Sciences (SPSS) version 25.0 and Analysis of Moment Structures (Amos) version 25.0 (Chicago, IBM SPSS). A P-value < 0.05 was considered statistically significant.

Confirmatory factor analysis

Confirmatory factor analysis using maximum likelihood estimation was applied to examine the degree of fit between the measurement model (the observed indicators) and the underlying structural model (the latent factors). Different indices were used to assess the goodness-of-fitness of the model. The Comparative Fit Index (CFI) assesses the overall performance of the model studied over a baseline (independence) model. Conventionally, CFI should be equal to or greater than 0.90 to accept the model. This denotes that 90% of the covariation in the data can be reproduced by the given model. The Chi-Square2) test indicates the degree of fit between implied and observed covariance matrices. An insignificant χ2or a χ2 / df < 2 indicates good fit for the model. The Root Mean Square Error of Approximation (RMSEA) indicates the mean difference between observed and predicted covariance, and a value of 0.08 or less indicates an acceptable model fit. The Standardized Root Mean Square Residual (SRMR) is defined as the mean standardized difference between the observed correlation matrix and the model implied correlation matrix. A value less than 0.08 is considered a good fit. This measure tends to be smaller as sample size increases and as the number of parameters in the model increases [22]. Finally, often the Aikaike Information Criterion (AIC) is computed. The AIC compares all different possible models in terms of appropriate use of all information in the data. Lower AIC values indicate a better fit. In conclusion, a decision on what the best model fit represents always takes these different indicators into account.

We first test the full 15-item questionnaire data as envisioned by the focus groups of students and experts against the three-factor model. The results are: χ2 = 321.35, df = 90, χ2/df = 3.57, CFI = 0.85, RMSEA = 0.12, SRMR = 0.26, and AIC = 381.35. This model clearly did not fit the data. A possible reason was that four items had small loadings or did load on more than one factor. The item “I feel the time passes quickly during the PBL tutorial” cross-loaded with high regression weights on both cognitive and emotional engagement. The items “I challenge myself in understanding the topics related to PBL case” and” I pay full attention to during the PBL tutorial” loaded on both cognitive and behavioral engagement. On the other hand, the item “I feel bored in PBL tutorials” cross loaded on the three engagement dimensions. We therefore decided to continue the analysis with the remaining 11 items.

The first model assumed that all these Items loaded on the same engagement factor, suggesting that one latent factor was sufficient to explain the data. This model is the simplest possible and therefore in theory the most parsimonious. It does, however, not fit with the original theoretical analyses. The second model hypothesized three independent, latent factors. The assumption here is that indeed three latent factors, emotional, cognitive, and behavioral, would explain the data, but that these factors were uncorrelated. The third model assumed that the three factors were uncorrelated. However, since the data were acquired using the same method, it was hypothesized that the data shared common-method variance in the form of a fourth latent factor related to all items. This approach assumes that participants have a biased tendency to respond to all items in a somewhat similar way. Finally, the fourth model allowed the three latent factors to be correlated, assuming that emotion, cognition, and behaviors are (at least) to some extent in harmony.

Construct reliability (CR)

Composite (or construct) reliability is a measure of internal consistency in the observed indicators that load on a latent variable (construct). In structural equation modeling, the formula for calculating construct reliability is:

$$CR=\frac{{\left(\sum {\lambda }_{i}\right)}^{2}}{{\left(\sum {\lambda }_{i}\right)}^{2}+\left(\sum {\epsilon }_{i}\right)}$$

whereby, λ (lambda) is the standardized factor loading for item i and ε is the respective error variance for item i. The error variance (ε) is estimated based on the value of the standardized loading (λ) and appears in the Amos output [23].

Correlations with academic achievement

Correlations were computed between the three student engagement factors and their examination scores.