20 How Mature Teaching and Learning Centers Evaluate their Services
Skip other details (including permanent urls, DOI, citation information)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Please contact : [email protected] to use this work in a way not covered by the license.
For more information, read Michigan Publishing's access and usage policy.
This study investigated facuity development program evaluation practices at thirty-three established, centralized, university-funded teaching and learning centers (TLCs). My prior statewide study (Hines, 2009) revealed that limitations of time, resources, and assessment knowledge resulted in superficial evaluation practices. Since the majority of respondents in the previous study were part-time faculty developers with limited funding and staff, I assumed that established, centralized TLCs would have the knowledge and resources to conduct a more rigorous evaluation. This study reveals that established centralized TLCs have significantly stronger practices for evaluating their services.
The field of faculty development emerged from a wave of academic accountability (Centra, 1976), and yet for years, minimal attention was given to program evaluation. According to early studies by Gaff (1975) and Centra (1976), faculty development program evaluation ranged from nonexistent to the occasional use of satisfaction surveys. Chism and Szabo’s (1997) nationwide faculty development study noted a significant increase in the quantity of program evaluation but superficial quality as evidenced by widespread use of satisfaction surveys and routine gathering of self-reported changes in teaching.
In 2007, I conducted a statewide study of the program evaluation practices of twenty faculty developers at public and private institutions (Hines, 2009). Paralleling Chism and Szabo’s (1997) findings, results from this study indicated strong interest and limited rigor. Deficiencies in evaluation were most commonly attributed to a lack of time, resources, knowledge, and good evaluation models. Organizational factors may have contributed since the majority of the universities in the study took Minter’s (2009) "point B" approach to faculty development.
Minter (2009) devised a continuum of faculty development from point A to point D. The point A approach (the "organized-centralized model" [p. 66)) is characterized by a centralized, well-organized, university-funded unit led by a full-time director and staff responsible for developing and implementing faculty development activities for the university and its faculty and for evaluating the program outcomes. A point B program is led part time by a faculty member on release time and provides a variety of "semi-planned and ad hoc" (p. 66) events and activities with limited evaluation. Point C is typified by a "totally or quasi-decentralized" (p. 66) approach in which deans or department heads plan events around their individual unit needs and budgets. Point D, the bottom of the continuum, is characterized by the absence of organized faculty development, leaving the faculty to self-direct their professional growth.
Of the twenty faculty development programs involved in my 2007 statewide study, seventeen were in the point B category and only three could be categorized as point A. Based on Minter’s (2009) continuum, it was not surprising to find a preponderance of low-level evaluation practices. Therefore, the next logical step was to investigate the evaluation practices at point A teaching and learning centers (TLCs). TLCs were selected for this 2010 interview study using seven criteria: (1) a director (75 percent to full time) and staff dedicated exclusively to faculty development, (2) university funded, (3) separate and centralized location, (4) in existence for at least five years, (5) an articulated mission for the TLC, (6) a POD member, and (7) a U.S. university. These TLCs are referred to in this chapter as mature.
Study Design
Qualifying TLCs were identified through a cross-search between the list of more than nine hundred members of the Professional and Developmental Organization Network (POD) and member universities’ websites. The website search and review resulted in fifty-six qualifying centers. The director of each TLC received an e-mail invitation to participate in the study, along with a request to confirm that the TLC met the seven criteria.
Thirty-three directors from qualifying TLCs agreed to telephone interviews. These interviews were chosen to allow in-depth inquiry, open-ended responses, and clarification of questions and terminology. The interviews were structured using a semiclosed, fixed-response, and open-ended questionnaire similar to those used in prior studies (Chism & Szabo, 1997; Hines, 2009). Questions were designed to identify services offered; prevalence, type, and quality of evaluation practices; and reasons for gaps and limitations in their evaluation work. Participants received the questionnaire in advance of the interview and were asked to confirm or correct a postinterview transcript of their responses. All transcripts were then coded by the researcher and an outside coder, and interrater reliability was established through a series of independent coding and comparison sessions.
Findings
All participating TLCs could be characterized as point A on Minter’s continuum. The participants were twenty-seven public and six private universities. Five had been in existence for five to nine years, seven for ten to fifteen years, thirteen for sixteen to twenty-five years, and eight for twenty-six or more years. All thirty-three were open year round. Four served fewer than one thousand faculty, thirteen served one thousand to two thousand, eleven served two thousand to four thousand, and five served more than four thousand faculty.
Types of Services
Services offered by the thirty-three TLCs were similar to those reported in Chism and Szabo’s (1997) and my (Hines, 2009) studies. The majority provided seminars, workshops, brown bag sessions, conferences, and orientations, in addition to a variety of consultation services, online resources, and grant programs. Unlike the centers in the previous studies, over half of the TLCs surveyed sponsored faculty learning communities (FLCs). Approximately one-third designed customized programs such as faculty inquiry groups, academic fellowship programs, faculty writing programs, course revision programs, early and midcareer teaching programs, teaching enrichment series, interactive theater programs, and department-specific support programs.
Staff Conducting Program Evaluation
Most TLCs dispersed program evaluation duties among all staff members. The three TLCs with full-time staff assigned exclusively to evaluation indicated these were essential and fairly recent additions to their program. One director supported this recent appointment by saying, "If you’re trying to figure out what’s working, what’s not working, and where to invest time and money, you need one." Another indicated that the need for evaluation staff was related to projects funded by government grants. All three reported significantly higher levels of program evaluation activity than the other TLCs in the study.
Individuals outside the TLC were also involved in program evaluation. Approximately 20 percent of TLCs hired outside consultants to perform periodic program reviews. Several tapped staff from their university’s office of assessment. A small number recruited their advisory committees to review their physical and online teaching and learning resources.
Prevalence of Evaluation
All TLCs engaged in some routine evaluation, although disparities appeared in the types of services being evaluated (Table 20.1). A high percentage of TLCs evaluated, at least occasionally, user satisfaction and impact on teaching resulting from their events and activities, consultation services, and mentoring programs. Almost half of the TLCs offering grant programs, consultation services, or large resource events made some effort to measure the impact these services had on student learning. There was little interest in gathering satisfaction or impact data relating to publications and resources.
Percentage Evaluating Each Program Outcome | ||||
---|---|---|---|---|
Type of Service | Number of TLCs Offering the Service | Satisfaction | Impact on Teaching | Impact on Learning |
Events and activities | 33 | 100% | 94% | 45%a |
Consultation services | 26 | 81 | 81 | 47 |
Publications and resources | 33 | 52 | 21 | 0 |
Grant programs | 24 | 50 | 83 | 50 |
Mentoring programs | 13 | 77 | 77 | 2% |
Evaluation Methods
Evaluation methods most frequently consisted of satisfaction surveys, participation data, formal self-reported changes in teaching, grant reports, and formal teacher-reported changes in student learning, respectively (Figure 20.1). A variety of other methods were also implemented, although significantly less frequently.
USAGE
Participation and usage were commonly tracked. Attendance data included the department or school in which the participant taught. Usage data for online resources were frequently tracked using Google Analytics.
SATISFACTION
Satisfaction was measured through the routine use of paper or electronic surveys administered after an event or service. Almost all participating TLCs reported the use of postevent meetings to debrief on satisfaction data and plan program adjustments. Three TLCs administered an annual survey, one administered a survey every four years, and two did an end-of-term survey for consultations only. Anecdotal satisfaction data were rarely used. All respondents indicated a moderate to high level of satisfaction with their services. One TLC director was able to use an activity report listing annual services and events offered, combined with satisfaction, participation, and usage data, to persuade state legislators to avoid funding cuts that would have had a negative impact on the center.
IMPACT OF SERVICES ON TEACHING
Self-reports of changes in teaching were commonly gathered by embedding specific questions in satisfaction surveys. Typical questions were, "What will do you differently as a result of this program?" and, "What did you learn? Did you apply it?" Half of the twenty-four TLCs offering grant programs required recipients to report pedagogical changes that occurred as a result of the funded project. Many asked grant recipients to share their new instructional insights in a seminar, workshop, or poster session. Experimental data demonstrating instructional gains were collected if the design of the funded project produced such data. Evidence of pedagogical changes resulting from high-impact programs were gathered through focus groups, one-on-one interviews, and a review of instructorcreated products resulting from program participation.
Besides self-reports from follow-up surveys, teaching impacts from consultation services were often evaluated through follow-up classroom observations, if appropriate and permitted by the faculty member. Student evaluations were used with similar conditions. If a faculty member requested a small group instructional diagnosis, a follow-up was sometimes performed to gather student reports of changes in teaching.
The evaluation of mentoring programs relied heavily on self-reported changes in teaching solicited through e-mail inquiries and follow-up surveys. TLC directors also reported gathering feedback from mentors and mentees through focus groups, one-on-one interviews, pre-post examinations of syllabi, and pre-post reviews of student evaluations and in-class feedback. One TLC asked mentees to write "critical account" analyses. Two designed a formal experimental study using a control group (those not in the program) and an experimental group (those in the program) and compared gains, with one using tenure ratings as a measure.
The impact of FLCs on teaching was most commonly measured through self-reported changes solicited through e-mail inquiries, focus groups, project reports, and presentations. One director conducted a retroactive faculty survey inquiring into the impact of FLCs on teaching careers over the past twenty years. A few used evidence-based measures, including teaching portfolios, classroom videos, and experimental studies, to measure gains in teaching.
IMPACT OF SERVICES ON LEARNING
Eleven of the fifteen respondents who measured impacts of events on learning solicited teacher-reported changes in student learning through surveys and interviews. FLCs were often evaluated in this manner as well. One respondent administered a five-year follow-up survey to 650 FLC participants inquiring into the perceived impact of FLC participation on student learning.
More robust evaluation efforts, reported by four respondents, targeted high-impact events where evidence demonstrating the return on the investment was required. Methods used to measure changes in learning varied by program. Programs designed to improve specific student skills (writing, for example) typically measured qualitative changes in products of student learning such as e-portfolios, writing samples, and capstone projects. Programs focused on changing instructional methods, such as large course redesigns, active learning initiatives, and cluster teaching projects, used pre-post quantitative measures of student course performance (for example, test scores, homework scores, drop-withdraw-fail rates) or overall academic success (for example, retention, persistence, grade point average). Instructional technology programs gathered reports of changes in learning through case studies, student self-reports, and student surveys. Combinations of these methods were included in reports from faculty who received instructional grants.
Program Evaluation Purpose and Strategy
Reasons for evaluation varied in frequency and type. All thirty-three TLCs studied evaluated program services for purposes of improvement: twenty-seven did so to document success, twenty-six wanted to see if their goals were met, fifteen reported that their administration required program evaluation, thirteen desired to do so, and one wanted to model evidence-based practices.
The production of an annual report summarizing program activity (participation, usage, and satisfaction) and linkage to program goals was the most common summative program evaluation practice, reported by twenty seven of the thirty-three participants. Seven of the thirty-three participants commissioned a periodic program review conducted by individuals outside their center. Four of thirty-three improved efficiency and focus, with staggered evaluation across individual program offerings from year to year.
The most systematic practices were reported by a director who developed a staggered and staged approach to program evaluation. A three-year evaluation plan, staggered program by program, was staged to measure three outcome levels: participation, implementation, and impact. The director first tracked participation data, noting, "It’s not possible and there’s no point to measure impact on student learning and teaching if participation is not present." After adequate participation became evident, evidence of implementation was gathered. Once the data indicated implementation, then impact on learning was measured. Implementation data were gathered through the diligent creation of "a one page case study (like a health record) with pre- and post-assessment data to look for improvement and holes in the process." Three weeks in July are set aside to analyze the data and write up the annual report, during which time all program activities and most services cease. The direc or readily admits this approach "is very hard work and very time-consuming" and emphasizes the crucial need for automation and customized databases to make this work, especially with limited staff. This unique approach captured valid evidence of significant impact and yearly progress that was published in the annual report.
Reported Reasons for Gaps in Program Evaluation
Evaluation of events and activities was performed to various extents at all responding TLCs. However, events seen as informal, infrequent, irregular, or lightly attended reportedly did not justify evaluation, Consultation services were not routinely evaluated due to the desire to maintain confidentiality and also the perceived lack of time and resources. A small number believed consultation evaluation to be too difficult or unnecessary, or the services too irregular, to justify assessing. Participants reporting a lack of evaluation for their occasional services, such as online and physical resources, grant programs, and mentoring programs, most commonly cited a lack of time and resources as the cause. Other respondents indicated that evaluation was too difficult to do, it was low on the priority list, there was no good process, or the informality of the services did not justify evaluation.
The most common reason reported for gaps in evaluation regarding the impact of services on teaching was a lack of time and resources. Accompanying comments such as, "We want to, though, so we can show our dean for funding," and, "The scientist in me says it’s a good approach, but between being a scientist or helpful, it’s better to be helpful," highlight the conflict between desire and resource constraints. Others attributed the evaluation gap to the inherent difficulty and fear of causing survey fatigue.
The most common reason for not evaluating impacts on learning was a lack of time and resources. Several indicated that the presence of multiple confounding variables make the evaluation of impact on student learning very difficult. One respondent summed it up this way: "Being a psychologist with training in assessment, I know the level of effort it takes to do this well; anything less is a crapshoot or just a political tool."
At the conclusion of the interviews, many respondents remarked there was no reason to institute more rigorous program evaluation practices since administration already supports their work, suggesting that time, resources, and energy should go toward providing, rather than justifying, their services. An equal number of respondents reiterated the need for more staff and funding in order to develop more rigorous evaluation practices. For some, the absence of an institutional culture of assessment or leadership inconsistencies reduced their desire to improve program evaluation efforts. Others indicated that a lack of knowledge and absence of models for developing quality program evaluation plans greatly hindered their evaluation work. Disciplinary knowledge played a role and was reflected in comments such as, "We lack the knowledge in the staff. The director has a Ph.D. in social science, so there’s a high standard for quality assessment with rigorous methods which we’re [the rest of us are] unable to do" and "I have a Ph.D. in English, not stats. I would like to know how to do it."
Several other parting comments suggested strong interest and support for continued work in program evaluation:
"I’m interested in assessing for viability and sustainability but just don’t know how."
"This conversation helped. We would do more evaluation if we had a better model on how to do it."
"We are well funded but would need a FIPSE [Fund for the Improvement of Postsecondary Education] grant to make a
research report. We do need to move from self-reported changes to seeing it."
"Assessment is the future and accountability is critical, especially for federal and regional accreditation."
Discussion
Similar to the findings in Chism and Szabo’s (1997) study and my own study (Hines, 2009), routine evaluation of services is prevalent among the mature TLCs studied. Unlike the previous studies, however, the mature TLCs that constituted this study group exhibited a stronger interest in extending measures beyond satisfaction and participation data to evaluation of program impact. The percentage of respondents making efforts to measure the impact of services on teaching was 20 percent in Chism and Szabo’s (1997) study, 40 percent in Hines’s (2009) study, and 97 percent in this study. The percentage of respondents attempting to measure the impact of particular services on learning was insignificant in Chism and Szabo’s (1997) study, 20 percent in Hines’s (2009) study, and 45 percent in this study. This increase in impact measures of teaching and learning is encouraging, but it is tempered by the majority reporting only superficial measures of self-reported changes. However, recognition is due to the increased prevalence of grant programs requiring evidence of changes in teaching and learning and of the implementation of periodic program reviews, both of which were rare to nonexistent in the previous two studies. In addition, this study revealed some noteworthy efforts by a select few who devoted extensive effort to gathering causal evidence of high-impact events on student learning through pre-post measures and experimental studies using multiple measures. In addition, reports of using anecdotal data were offset by substantial reports of systematic formal evaluation methods with a high reliance on technology in the form of online surveys, Google Analytics, and databases.
As has been reported for decades in the literature, many of the respondents reported gaps and superficiality in evaluation practices, with most blaming the lack of time and resources. Some individuals formally trained in assessment could not justify the time and effort to demonstrate their worth when administration already believed in them. Others felt they lacked useful models or staff with assessment knowledge.
Unlike reports from previous studies, many TLCs report efforts to overcome the obstacles of time and resources by implementing changes to their evaluation practices and staffing. The TLCs most active in evaluation have systemized their evaluation process in these ways:
Automating attendance and online surveys
Evaluating their programs on a staggered annual basis
Evaluating outcomes in a staged manner
Using random sample data collection methods
Reserving rigorous evaluation for high-impact programs
Leveraging support from deans or department chairs
Creating a strong culture of assessment within the center
Hiring full-time evaluation staff trained in program evaluation
Recommendations
Considering the findings from this and earlier studies, it appears that the most feasible and useful evaluation practices should be designed within a culture of assessment. This work can be summarized in what could be termed the four S’s of program evaluation: staffing, systemization, staggered evaluation of programs, and staged outcome evaluation.
Staffing
Build staffing and institutional collaboration to support program evaluation efforts. Distribute data gathering among staff or, ideally, assign it to a full-time staff specialist hired spe ifically for program evaluation. In addition, collaborate with the university’s office of assessment to design evaluation plans, provide readily available institutional data, and combine survey efforts.
Systemization
Create a comprehensive plan to systematically gather data for evaluating the program. Determine the goal of the program, the outcomes to be measured, the methodology and timing for data collection, and the schedule for analyzing, reviewing, and implementing the findings. Customize the plan to fit the resource limitations. Where possible, use technology such as online survey software, content management servers (such as SharePoint), database software, and student response systems (clickers) to automate the collection and analysis of data. Simplify survey distribution by standardizing surveys, using preexisting institutional data, and combining survey efforts with other institutional assessment efforts. Embed evaluation in program planning as part of standard practice along with annual reports or fact sheets to track and report program trends and success.
Staggered Evaluation of Programs
Evaluation of the entire TLC does not need to occur at one time. Stagger the evaluation of individualized programs or services on an annual basis. Create a three- to five-year plan outlining the staggered evaluation of each component of the TLC. For example, in year 1, evaluate consultation services; in year 2, evaluate the teaching certificate program; and in year 3, evaluate the mentoring program.
Staged Outcome Evaluation
Take a staged approach to the evaluation of outcomes of various programs. For example, for any given program, track participation only until a significant number is achieved. Then gather data to determine if participants are implementing the new skills. Finally, me sure the impact on student learning once significant implementation is seen. Another approach could be to tailor evaluation to selected outcome measures most appropriate to the intended impact of the individual programs or services. For example, satisfaction data may suffice for ad hoc workshops, data about impact on teaching may be needed for mentoring programs, and data concerning impact on student learning may be important and feasible for grant-funded teaching projects. In other words, collect data that will add value ro the center’s work.
Conclusion
Directors of mature TLCs are interested in program evaluation and need feasible and useful evaluation models. The findings of this nationwide study suggest that staffing, systemization, staggered evaluation, and staged outcome measures are a useful framework for the design of evaluation methods to demonstrate the worth and inform the continuous improvement of faculty development services. Continued efforts must be put forth to share best practices in program evaluation through scholarly research, consortiums, publications, conferences, and presentations. Continued research is needed to find ways to measure the impact of faculty development on teaching and learning. Perhaps this director’s closing remark best captures the need for continued research: "We’re still asking, ’Does faculty development make a difference?’ I don’t think anyone has a good answer to that yet."
References
- Centra, J. A. (1976). Faculty development practices in U.S. colleges and universities. Princeton, NJ: Educational Testing Service.
- Chism, N.V.N., & Szabo, B. L. (1997). How faculty development programs evaluate their services. Journal of Staff, Program, and Organizational Development, 15(2), 55-62.
- Gaff, J. G. (1975). Toward faculty renewal. San Francisco, CA: Jossey-Bass.
- Hines, S. R. (2009). Investigating faculty development program assessment practices: What’s being done and how can it be improved? Journal of Faculty Development, 23(3), 5-19.
- Minter, R. L. (2009). The paradox of faculty development. Contemporary Issues in Education Research, 2(4), 65-70.