Solving the speed vs. quality experimentation dilemma and growing the New York Times- an interview with Shane Murray

What were your initial career interests? I see you studied mathematics and finance.  

For me there was a mix of passion and pragmatism. When I left high school, I was not sure what I wanted to do. Mathematics was an interest area for me, and finance was a pragmatic way to use my skills. The double degree in mathematics and finance at the University of New South Wales in Australia, where I grew up, was useful to learn how to apply mathematics in the real world. There are elements of the degree that I use today. Applied probability theory and decision making under uncertainty, for example, are subjects that I’ve leaned on throughout my career in analytics.

Why did you decide to go into analytics?

I went through a handful of interview rounds with hedge funds, which were competitive. Finally, I landed a job with a research firm in Sydney, Australia. At this firm, I applied experiment design and discrete choice modeling frameworks using survey-based data to understand the attributes that go into a decision and then simulate how populations make decisions. Then I landed a job at Memetrics, a software startup, where I was running large scale digital experiments for customers across the globe and contributing new algorithms to automatically analyze these experiments in our software.

You write, among other subjects, about experimentation. What interests you about experimentation?

I’ve always had an interest in behavioral science and seeking to understand the rational and irrational drivers behind how humans make decisions. That is essentially the job of analyst and data scientists in most companies today.

Experiments allow you to simplify the complexity of how humans make decisions into probabilistic models, then make an informed and occasionally transformative business decision.

What is experimentation for you?

The range of techniques I have used start with discrete experiments, extend into adaptive techniques like multi-armed bandits, and include natural or quasi-experiments where you lack the ability to randomly assign treatments. But discrete A/B and multivariate experiments tend to be the bread and butter of most analytics teams. 

Why is it so challenging for so many organizations to implement experiments?

In my experience, the best experimentation is multi disciplinary.

An experiment starts with a hypothesis, which leans on the expertise of different disciplines; editors, product designers, engineers, marketers. 

The work led by analytics teams, collaborating across disciplines, includes: The hypothesis definition, the translation of that hypothesis into an experiment design, the definition of outcomes to measure, segments to explore, and the required sample size and duration. 

Building and implementing experiments usually falls on design and engineering teams, before an analyst or data scientist delivers statistical models or tests, and interprets the results to inform a decision.

Most of the experiments I have run have been based on digital experiences. At the New York Times, this could include the paywall, redesigning the homepage, content recommendations, product marketing and pricing.  Prior to The Times I consulted with organizations like American Express to run similar experiments on marketing landing pages, emails, paid search, social, and even direct mail inserts.

Speed and quality seems to be the eternal friction point. Marketing and product often want the former while data teams struggle to argue for more thoroughness and time in the name of quality. Why is that?

Without the right foundation in place, you often observe an inverse relationship between speed and quality. Speed is imperative in digital product development for a number of reasons; from building momentum in a team, finding wins that show you’re on the right path, to beating your competition.

But in experimentation, speed can often amount to short cuts, such as not defining hypotheses, or not collecting sufficient sample. These shortcuts can ultimately be detrimental to the quality of the decision. 

So, invariably you find this dynamic, where product or marketing Stakeholders push for speed and analysts are the bastions of quality.

What can be done to solve this conflict of speed vs. quality?

The onus is on the data teams to build the capabilities and processes to scale experiments with quality built in. For a start, education about methodology and statistics is important. Rolling out lightweight processes for experimental design review so you can critique and improve before launch is also good.

Often in the name of speed, experiments are set up and deployed without a clear understanding of what success looks like and how a “winner” will be decided. Analysts then wind up providing analysis of multiple outcomes and scenarios. This slows down the process of experimentation. and puts onus on the analyst post experiment to come up with compelling insights that lead to a decision. 

If, instead, you can align on a clear success criteria in the planning process, that would be helpful. For more complex decisions, you should also have alignment about the trade off between different outcomes. At the New York Times, that could be a trade off between subscriptions and reader engagement. Aligning on such trade-offs will make the decision process smoother. 

How could that look organizationally?

It usually makes sense to start with a central, cross functional team that are owners of process and technology. Then when you decentralize in order to scale experimentation across an organization, these are often the people who can advocate for best practices in the respective functions.

Are there a few methods, mind-sets or habits that could create a more positive culture of collaboration? 

Mindset: Respecting expertise and different perspectives, across product, editorial, marketing, design, engineering and data. 

Habits: Focus on hypotheses over “ideas”. It should be more about proving well-defined hypotheses than testing your own ideas.

Experimentation projects often stall or take significantly longer than expected. Why do people often get caught off-guard by the complexity?

Technical implementation.

Implementation is often more challenging than people realize, because when deploying an experiment across different systems, there can be a bunch of unknowns about how these systems will work together, especially given you are introducing new code to the system. 

Additionally, there is often a disconnect between analytics teams and technical counterparts about what you need for a successful experiment, such as the specifics of random allocation. 

The choice to build in-house vs. buy/subscribe with a vendor is often a point of contention in analytics teams. What principles do you suggest?

In my opinion, the decision to build should be a fairly high bar. If you are running simple experiments on marketing landing pages, then there are many solutions to fill that need. However, if you are running more complex product experiments that must integrate with various backend and frontend systems, or need to measure a wide variety of user behaviors and outcomes, then the decision is less clear.  

For example, at The New York Times, we elected to build a solution in 2014 given we needed an allocation mechanism that could be customized and sit within our tech stack to reduce latency in the page load, and we often ran experiments where you wanted to understand the impact across a wide array of reading habits, subscription outcomes and advertising revenue that were available in our data warehouse.

So for me, there are two main sources of complexity that you should consider: 

Measurement: How complex are the user behaviors and outcomes that you need to track across experiments? The more complex, the more it makes sense for the solution to be natively integrated with your data warehouse. 

Allocation: How much will you need to tailor the way users are allocated to experiments, both in terms of how a user is identified, and how integrated these frameworks are with your frontend and backend systems?  

These days, I’m seeing more vendor tools emerge that help data teams to scale product experimentation with warehouse-native solutions, so that you avoid creating data silos.

Are compromises necessary? If yes, what are a couple of compromises that could be acceptable?

In general, data teams should look for ways to systematically increase experimentation speed and autonomy. 

Automated reporting that allows for decentralized autonomy. 

Data teams can automate analysis and reporting that covers the most common metrics, such that teams have immediate access and can make faster decisions. Provided these decentralized teams are not seeing a negative impact on a critical counter-metric, they can make decisions that maximize their goal.  

Forecasting future value based on experiments:

Often data teams are attempting to optimize Lifetime Value, but must either run experiments for many months or use near-term indicators of lifetime value. 

The Dropbox data team builds user-level expected revenue metrics that takes weeks rather than months to estimate. They are using the behavior they observe in the experiment and immediately after to project or forecast the expected revenue of a user over the span of the next few years, allowing them to make a recommendation on the experiment that approximates the long term impact of the business. 

Evaluating return on experimentation and course correcting where speed is not having the desired impact. You can do a meta analysis of your experiments to understand where you are getting return and where the practices might need to be improved. 

What surprised you about being a data leader at the New York Times? 

Despite being a 170 year old company steeped in tradition, The New York Times is undergoing continual transformation. Arriving in 2013 and spending nearly a decade there, we shifted from print to digital, desktop to mobile, and advertising to subscription-first. It was a pivotal time to be building out a data team that played a central role in this transformation.

How was it being a data leader at the New York Times while Donald Trump was President of the USA?

While we were measuring the impact of each news cycle, including those related to the Trump administration, on the subscription business, it was critical as a data team to understand how we could drive sustainable subscription growth beyond any news cycle.  

The role of the data team, in building the business from less than a million to more than 8 million digital subscribers, included modeling the reader behaviors that lead to subscription, experimenting with different ways to balance the free and paid experiences, and informing decisions on where and how to participate in news experiences on large platforms owned by Google, Facebook, Apple and others.  

How was it being an immigrant working at such a New York institution during such a critical moment in American history?

It was thrilling. I always had an interest in journalism and have a brother who was a radio journalist in Australia. There is simply no better place than The Times to observe and be part of journalism in America. 

The mission of The Times, to seek truth and help people understand the world, is important and meaningful to me. And a culture that rewards integrity and curiosity was incredibly well aligned with how I wanted to run a data team. 

What would surprise other people inside and outside the USA?

The New York Times has become a tech company, albeit one with a journalistic core, which makes it unique and impressive. When I left, we had built up a data team with more than 150 people, including a data platform team, analysts providing insights and running experiments across the business and newsroom, and applied machine learning teams that were having a direct and significant impact on the business. 

Does it still have a print version?

Yes, I still get print delivered. The print plant in College Point Queens is a great place to see print robotics on display. 

You are now an executive at Monte Carlo. Why are you so passionate about data reliability? 

Throughout my career in data, I felt like we had one hand tied behind our back because of data quality and reliability issues.

You are providing data to the business, in the form of dashboards, insights and machine learning, that are increasingly relied upon to make decisions and drive operations. Ultimately, data teams experience loss of credibility if you are making decisions based on unreliable data, and more often today this is tied to a direct loss of revenue. My team was writing manual tests to check the quality of the data, but these methods are tricky to scale. 

Once I was a customer of Monte Carlo, I saw the shift from going from manual to automated monitoring, and being able to see the upstream issues and downstream impact that allow you to effectively troubleshoot data incidents. Ultimately, this leads to a real impact on reliability and trust in data. 

My role at Monte Carlo gives me the opportunity to partner with our customers on the operational changes and transformation that comes with Monte Carlo, to build trust in their data and drive success in their initiatives. 

What have been a few secrets to your success?

The three things that have helped me. 

  1. Building expertise around a specific problem space, which for me was experimentation. Going deep in methodology, research and being an active participant in communities driving how experimentation is done.
  2. As I have grown into more senior roles, curiosity about different disciplines. I relied as much on partners in design and tech as I did on peers in data.
  3. Having managed large organizations, I learned that management is frequently about caring about peoples’ careers, problems and frustrations. Occasionally you might impart some wisdom, but mostly you are creating the conditions for people to succeed. 

Who is Shane Murray?

Shane is the field chief technology officer at Monte Carlo, partnering with data leaders on their data strategy and operations, to realize the maximal value from their data observability and data quality initiatives. Prior to Monte Carlo, Shane was the senior vice president of data & insights at The New York Times, leading 150+ employees across data science, analytics, governance and data platforms. 

Other articles from the contextualizing the world with data series.

Part 1, Advertising: Why creatives in advertising should embrace data science and data mining – an interview with Les Guessing.

Part 2, Customer Relationship Management: Nurturing the customer relationship with data – an interview with Sarah Carr.

Part 3, Public Relations: On the communication front with the Ukrainian PR army – an interview with Liuka Lobarieva. 

Related content

The prevelance of AI and the importance of engaging in dialogue – A blog by Varsh Anilkumar

Beyond AI, the realities of operationalizing AI – A podcast interview with Elizabeth Press

Why the public needs to know more about AI – An interview with Varsh Anilkumar

What is the future of AI adoption?

Check out other D3M Labs Series- In depth look at specific topics that spans multiples interviews and posts

How to manage a data science product    Deutsch

Data science is moving from R&D into products – both online and off. Managing data products requires evolution from both traditional software and hardware development.

The future of the analyst

 Is the role of the analyst endangered? What is the future of the most visible role in analytics, and the one responsible for delivering the insight? What is the future of the analyst?

1 thought on “Solving the speed vs. quality experimentation dilemma and growing the New York Times- an interview with Shane Murray

  1. What was the driving force behind your decision to pursue a double degree in mathematics and finance? How have the skills and knowledge you acquired during your studies helped shape your career in analytics, particularly in areas such as applied probability theory and decision making under uncertainty?

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert