Experimenting for excellence: Randomised trials in human resources
I acknowledge the Gadigal people of the Eora nation, and all First Nations people present today. Thank you to the organisers for the chance to address you on a topic that is a passion of mine – using better evidence to create a fairer society and a stronger economy.
As we navigate an era marked by rapid technological advancements and shifting workforce dynamics, the role of human resources has never been more critical. HR is the backbone of organisations, ensuring not just compliance and management, but also fostering a culture of growth, inclusion, and innovation. At its best, HR helps unlock workers’ full potential, aligns individual aspirations with organisational goals, and builds resilient structures that thrive in the face of future challenges. HR is not just a support function, but a driver of organisational success.
Yet for HR to succeed, you need more than gut instinct. As the cliché goes, ‘In God we trust; all others must bring data’. If your company faces no competitors for your products and employees, you might be able to get away with formulating HR policies based on feelpinions. For mere mortals, evidence matters.
In this short address, I want to run you through a few of my favourite examples of evidence-based policymaking in human resources, presenting some surprising findings from a succession of randomised trials. I will then turn to why randomised trials should typically be given more weight than other forms of evidence, and how we are seeking to use them in government.
So let’s take a look at what randomised trials in human resources can teach us.
Mental illness not only hurts the sufferer, but can also be costly to employers. Those who are struggling mentally cannot perform at their best in the workplace, and mental ill health is one of the main causes of long-term work absences.
Might mental wellbeing for managers help? To test this, Fire and Rescue NSW conducted a randomised trial (Milligan-Saville 2017). The service recruited 128 duty commanders for the experiment. Half of the managers were randomly assigned to a four-hour face-to-face RESPECT mental health training program. The other half were assigned to a waitlist to receive the training.
They then looked at whether there was any difference in the sick leave of those firefighters supervised by those who had received the training. Over the next six months, they found a significant drop in sick leave in the treatment group. Firefighters whose managers had received the training were one-third less likely to take sick leave. After six months, the control group also received the training.
At the Tomago Aluminium Smelter, north of Newcastle, human resource managers were concerned about the physical health of employees (Morgan et al 2011). Shift work is correlated with obesity, and the smelter were keen to assist employees who wanted to lose weight. So researchers recruited 110 overweight or obese men who worked in the smelter and were keen to lose weight. Half were randomly assigned to a weight loss program, and the other half were randomly assigned to a waitlist control group.
The chosen weight loss program was known as POWER – a backronym standing for ‘Preventing Obesity Without Eating Like a Rabbit’. Men in the program attended a 75-minute information session, and were provided with a weight loss handbook, a pedometer, and online resources that encouraged them to reduce energy intake and increase energy expenditure. There was also a competitive element to POWER: the crew with the greatest weight loss would each receive a $50 voucher to be spent at a local sporting goods store.
Over the first 14 weeks, men in the POWER program lost 4.3 kilograms more than those in the control group – a significant drop in a group where the average weight started at 95 kilograms. Because of the design of the intervention, it was not possible to assess the long-term impacts – after three months, the control group were eligible to participate in the program. But given that many weight loss programs produce zero short-term effects, this one is promising. It would also be terrific to have further randomised trials that could help unpack whether the program worked because of the information component or the competitive aspect.
Outside Australia, human resources experiments have looked at the impact of incentives on employees. In one fascinating experiment, a large insurance company was refurbishing the offices of 198 staff in the underwriting department (Greenberg 1988). While the work was being done, employees were temporarily assigned to different offices. Some got an office that was higher-status than their co-workers, others were assigned to lower-status offices, and the rest were put in offices that were of equal status to their co-workers.
Relative to workers assigned to equal-status offices, those employees put in higher-status offices raised their performance. Likewise, those in lower-status offices dropped their performance. While this fascinating finding tells us a lot about how much status matters, it’s easy to imagine that it might not generalise to every setting. Could a view of Sydney harbour be so distracting that it lowers productivity? Only a well-designed experiment can tell us the answer.
One of the key issues in human resource management is how people are paid. In Britain, an experiment with fruit pickers analysed how they performed under two different pay structures – piece rates or relative incentives (Bandiera, Barankay and Rasul 2005).
Under the piece rate system, workers were simply paid a flat rate for each kilogram of fruit. Under the system of relative incentives, the day’s pay rates depend on how much the average worker picked.
To test which system was better, researchers conducted a randomised trial, involving 142 workers, who were switched between the two pay systems.
Before the experiment began, managers liked the relative incentive system. But as the workers quickly figured out, relative incentives come with a cost. Working harder pushes up the average output, which lowers everyone else’s pay. In economic jargon, hard workers impose a negative externality on their co-workers. The experiment found that workers were 50 percent less productive under the system of relative incentives, especially when working alongside their friends.
Let me emphasise that point. Under the pay system that managers liked, workers were 50 percent less productive than under the system that performed best in a rigorous experiment.
In another experiment at the same farm, the researchers carried out a randomised experiment to test the effect of incentive pay for managers (Bandiera, Barankay and Rasul 2007). They found that when managers were paid according to the productivity of their teams, average output per worker hour rose. But they also found that the variation in performance grew. Faced with a financial incentive to boost output, managers target their effort towards more able workers, and ignore those who are struggling.
What about the effect of working from home? In 2010, CTrip, a large Chinese travel agency in Shanghai, was curious to see the impact of working from home for its call centre employees (Bloom et al 2015). So they asked for volunteers to participate in a randomised experiment of working from home. In total, 249 put their hands up. Among this group, CTrip allowed employees with an even-numbered birth date to work from home, while those with odd-numbered birth dates continued to work from the office as usual.
When the nine-month period was up, CTrip found that those who had worked from home were 13 percent more productive. They made more calls per minute, which they attributed to a quieter working environment, and took fewer sick days. Home workers reported being more satisfied with their work, and were less likely to quit. However, after adjusting for their improvement in productivity, they were also less likely to be promoted – perhaps because they were ‘out of sight, out of mind’ (though the study also discusses other possible reasons). While the post-COVID debate over working from home rages, the CTrip study remains one of the most convincing pieces of evidence as to its impacts.
One final human resources experiment. Last year, 758 management consultants from BCG, representing 7 percent of the firm’s global staff, participated in a controlled study to evaluate the effect of artificial intelligence on productivity (Dell’Acqua et al 2023).
These consultants were tasked with various assignments for a hypothetical shoe company, encompassing creative challenges (such as generating at least 10 new shoe ideas for an overlooked market or sport), analytical challenges (such as categorising the footwear market by consumer segments), and tasks focused on writing and marketing (such as composing a press release for a new product). Additionally, they tackled persuasive challenges (such as writing a motivational memo to employees that highlights the superiority of their product over competitors).
Half of the consultants performed these tasks using traditional methods, while the other half incorporated ChatGPT into their workflow. The group utilising artificial intelligence significantly outperformed the others, completing tasks 25 percent quicker and producing outcomes that were 40 percent more exceptional in quality. This difference is comparable to what one might expect between a novice and a seasoned employee.
The study also highlighted that AI served as an equaliser of skills; individuals who initially scored lower in skill assessments saw the most improvement with AI usage, although high performers also saw benefits, albeit to a lesser extent. The research further identified scenarios where tasks were intentionally assigned that stretched the capabilities of AI, leading to poorer performance by those using AI, a situation researchers termed as ‘falling asleep at the wheel’.
My purpose in describing these studies is not to persuade you that workplace mental wellbeing and weight loss programs always work, that piece rates are invariably effective, that working from home always works, nor that every task can be done by artificial intelligence. Instead, it is to encourage a mindset of humble experimentation over brash overconfidence. In human resources, as in politics, you will come across plenty of leaders who are certain that their theories are right. The question that you should always ask is: how good is your evidence, and how can we make it better?
Stronger evaluation is the breakthrough that can assist policy makers solve some of the challenges faced by successive governments, irrespective of who they are. Such as why one-third of our students continue to struggle at reading. Or why half of working age people with disability are outside the workforce. Or why two out of five released prisoners are back behind bars within two years. The Labor government is committed to meeting these challenges and strengthened evaluation will be a tool we use to assist us.
To build the evidence base, the Australian Government announced last year the creation of the Australian Centre for Evaluation (Leigh 2023). Housed in the Commonwealth Treasury, the Australian Centre for Evaluation is tasked with working across government to encourage more high-quality evaluations, particularly randomised trials. Too often, past governments have run ad hoc pilot programs. The trouble with many pilot programs is that it’s hard to know what to compare the outcomes to. As evaluators put it, with a pilot program, there isn’t a clear counterfactual.
That’s why we’re looking to carry out more randomised trials across government, providing insights on what works in social policy programs, education programs and employment programs. Just as marketing companies use A/B testing to increase their impact, we’re looking at opportunities to A/B test the many letters, emails and text messages that the government sends to Australians. We’re not just trusting our intuition – we’re using science to better serve the public.
One of the myths about randomised trials is that they take decades and cost millions. Yet plenty of organisations have built randomised trials into all parts of their business (Leigh 2018). Every time you use Google, you’re participating in multiple randomised trials – a demonstration of the fact that big data is no substitute for experimentation. One insider quipped that every pixel on the Amazon home screen has had to justify itself through randomised trials. One in 100 Coles Flybuy cardholders are randomly selected into a control group that receives no advertising materials – allowing managers to test the impact of their marketing campaigns.
In the quest to improve government, funders recognise that insights from randomised trials can have an outsized impact on policy. In the US, Arnold Ventures funds a wide range of randomised trials in social policy, many of which are conducted with a small budget. In the UK, the Education Endowment Foundation, the most successful of the UK What Works Centres, has conducted over a hundred randomised trials to see what works in school education, and what does not.
As the Australian Centre for Evaluation recognises, ethics must be at the heart of evaluation. The same holds in business. If you’re surrounded by people who are sceptical of randomised trials, it can be helpful to use a waitlist design; as some of the experiments I’ve described have done. In the end, everyone gets the program – the only difference is that those in the treatment group get it a bit earlier than those in the control group.
Fundamentally, my advocacy for randomised trials embodies a more modest approach. Unlike what the braggards and blowhards would have you believe, there’s a lot that we don’t know about the world. Whether it’s raising employee satisfaction in your company or reducing inequality in our society, achieving significant goals presents substantial challenges.
So rather going off hunch and supposition, why not try experimenting more often?
If you’re sending out an all-staff message, why not test two versions, and see which is more likely to be read?
If you’re running a health program for the first time, why not use a waitlist design to run an experiment on its effectiveness? Provide the program to one randomly selected group of employees for the first six months, and then to everyone else.
If you’re providing new AI tools, why not roll them out sequentially, and test their impact on productivity?
If you’re considering a different interviewing strategy, wouldn’t it be better to run it across part of your company first, to see whether it’s more or less effective?
We all make mistakes. The question is: are you learning from yours, or repeating them?
For more productive businesses, and better government, try a randomised trial. The results might just surprise you.
References
Bandiera, O., Barankay, I. and Rasul, I., 2005. Social preferences and the response to incentives: Evidence from personnel data. Quarterly Journal of Economics, 120(3), pp.917-962.
Bandiera, O., Barankay, I. and Rasul, I., 2007. Incentives for managers and inequality among workers: Evidence from a firm-level experiment. Quarterly Journal of Economics, 122(2), pp.729-773.
Bloom, N., Liang, J., Roberts, J. and Ying, Z.J., 2015. Does working from home work? Evidence from a Chinese experiment. Quarterly Journal of Economics, 130(1), pp.165-218.
Dell’Acqua, F., McFowland, E., Mollick, E.R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F. and Lakhani, K.R., 2023. ‘Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality’. Harvard Business School Technology and Operations Mgt. Unit Working Paper, (24-013).
Greenberg, J., 1988. Equity and workplace status: A field experiment. Journal of Applied Psychology, 73(4), pp.606-613.
Leigh, A. 2018. Randomistas: How Radical Researchers Changed Our World. Black Inc, Melbourne.
Leigh, A., 2023. Evaluating policy impact: working out what works. Australian Economic Review, 56(4), pp.431-441.
Milligan-Saville, J.S., Tan, L., Gayed, A., Barnes, C., Madan, I., Dobson, M., Bryant, R.A., Christensen, H., Mykletun, A. and Harvey, S.B., 2017. Workplace mental health training for managers and its effect on sick leave in employees: a cluster randomised controlled trial. Lancet Psychiatry, 4(11), pp.850-858.
Morgan, P.J., Collins, C.E., Plotnikoff, R.C., Cook, A.T., Berthon, B., Mitchell, S. and Callister, R., 2011. Efficacy of a workplace-based weight loss program for overweight male shift workers: the Workplace POWER (Preventing Obesity Without Eating like a Rabbit) randomized controlled trial. Preventive Medicine, 52(5), pp.317-325.