Proteomics at the speed of Drug Discovery: Accelerating Programs to Clinic

Jun 27, 2025 9:29:43 AM

Harris Bell-Temin at Bruker eXceed meeting during ASMS Conference 2025

Transcript

Lightly cleaned from the original auto-generated transcript. The talk above is the source of record.

Host. It is my absolute pleasure today to welcome on stage Doctor Harris Bell-Temin, Director of Proteomics, Johnson & Johnson, Boston, Massachusetts. His talk is titled Proteomics at the Speed of Drug Discovery - Accelerating Programs to the Clinic. Ladies and gentlemen, Doctor Harris Bell-Temin.

Harris Bell-Temin. Hey, thanks everyone. I'm going to start with the most important slide - acknowledgements need to come first, they need not go last. This is my team within Johnson & Johnson Discovery where we're using proteomics to enable programs and drive insights across the portfolio. You can see that we work across multiple areas - targeted proteomics, functional proteomics, chemo-proteomics - and I'll just stop there.

Shameless plug: we have open right now a postdoctoral scholar position for an exceptional synthetic chemist interested in tools for chemical proteomics. This is a fifty-fifty postdoc between my lab and the lab of Dave MacMillan at Princeton. If you are interested, or you know someone who might be, please forward on that opportunity.

But I know what you came here to talk about today, and that is obviously middle-distance running. We're going to talk about the four-minute mile.

The four-minute mile was considered for the longest time to be a physiological impossibility - until Roger Bannister broke it in May 1954. The idea that the four-minute mile was an impossibility was shattered. And what happened? It was broken again, just a few weeks later, by John Landy, an Aussie. So you think, OK, two of them could be outliers. By the end of the Sydney Olympics, ten people had broken the four-minute mile.

What I posit to you is that there was no grand technological change that had occurred. What changed was the notion of what was possible. The four-minute mile was this idea that was hanging in everyone's heads as impossible. And then it was broken once, and the floodgates opened.

So what I'm here today is to talk about how we use proteomics at Johnson & Johnson, but also to challenge you - to challenge how you think about applying, using, and owning proteomics.

What I'm going to talk about today focuses on proteomics from an end-to-end viewpoint. Which means not looking at proteomics only as the mass spec. The mass specs are extraordinary. The things you can do with the timsTOF are amazing. I never would have dreamed of being able to run ninety-six samples a day and seeing nine, ten, eleven thousand proteins in experiments. But now it's easy, and it is accessible to many, many different people.

And so - it is also not sufficient. Having a fast, insightful mass spec is not enough to drive impact in pharma. You have to meet that, otherwise you're just going to trade bottleneck for bottleneck.

What we do at Johnson & Johnson and the Discovery Proteomics organization is marry that exceptional technology in order to drive insight. What type of insights do we want and what are we looking for?

The first one is: it needs to be comprehensive. This is our first guiding principle. This used to be really challenging and take a lot of time, and now it just doesn't. Eight to ten thousand proteins quantified in single cell lines - that is standard now. Everyone in this room has that capability. Ten thousand to fourteen thousand proteins quantified in tissue. Two thousand to four thousand proteins quantified in biofluids. All readily accessible. There's really no secret sauce there.

The second part I think is actually more important: it needs to be rapid. And this is not speed for speed's sake. We constantly talk about, well, we can push this many samples, we see this number of proteins - well, does that matter? What stage is that program? What governance are they going to? What is the sticky wicket that they are not able to knock down? What answer do they need? We need to be able to return data in time that is useful. And I can tell you that in pharma it is an incredibly rapid, iterative process of improving ADME properties, of selectivity, of target engagement. We need to get answers fast enough to make a difference for programs.

The data should be available. Meaning: if the only people who can access this data - if we hide it behind, say, mystique - your data won't be able to have nearly as much impact. So big data should be available. And it should also not be seen just by looking under the lamppost. Meaning: there is extraordinary value in every piece of data you have ever collected, to answer questions you have not thought of yet. The only way to handle that is to make sure that you have systems in place that are able to extract that meaning and extract that data.

And then finally this needs to be translatable. We live in the discovery portion, not the development portion, of Johnson & Johnson. However, we need our data - we need those immortalized cell lines to be as translatable as possible to either model organism, etc. We also need to do the work in the model organisms. We need to do the work in the patient samples to ensure that we are maintaining that really critical translatability.

What does it look like when it's all put together? What tools are we bringing to bear?

We have automated sample processing that allows us to really own our data. We're kind of a unique group for proteomics because we generate almost all of our samples — about ninety to ninety-five percent of what we run, of the immortalized lines, of the organoids, we're building them. That allows us to work in a very end-to-end fashion, because we're never really waiting on other schedules. We can have really bespoke handling of sample.

All of this feeds into ultrafast mass spectrometry. We have a fleet of PASEF-based timsTOFs - Pro 2s, HTs, and an Ultra 2. It makes a lot of data. Our team this year is going to make somewhere between three and four petabytes of raw data.

So we lean very, very heavily on cloud compute, cloud storage, as well as - and I think this is one of the more critical parts that gets ignored - automated movement. Moving sample information, sequence information to mass specs; moving data off of mass specs; moving into visualization, without having to wait for a scientist to push a button. And then finally the dynamic interactive visualization suites on the back end to make these massive datasets digestible and to really own the impact.

I'll go through each of these areas one by one. The first one, automated sample processing: it needs to be parallelized, it needs to be scalable, and most critically, it needs to be reproducible across everyone's hands.

We have what we call automation to suit. We are not trying to force everything we do into a full-deck solution so we can say we have a full-deck solution, where you put cells in the front end and you get meaning on the back end. This isn't like Shel Silverstein's homework machine. What we want is to have the right automation to answer the question and to enable our scientists to work faster.

So we have full-deck automation where it's feasible, such as an Assay MAP. We use workstations for dispensing and plate-stamping to allow scientists to handle sixteen plates simultaneously with a fair bit of ease. And we also use volumetric pipetting, which is really helpful. We have a Lynx. What makes the Lynx unique is that it can deliver ninety-six discrete volumes simultaneously. So think about normalizing across lots and lots of Evosep tips, and being able to do eight plates in about fifteen minutes. These are the types of steps that can really, really help.

We also use convergent consumables. That means all of our digests all eventually flow -- whether they started as just a differential expression analysis, chemical proteomics, etc. - into PreOmics kits. So we have a unified path to the mass spec. Everything flows into PreOmics, because we have found their consumables to be robust, to be reproducible, and that has allowed us to really drive a lot of impact.

And then finally we have SOPs. People bristle at SOPs, where they think, oh, I have SOPs because I have protocols - and an SOP and a protocol are very, very different things if you've worked in GxP. We are not a GxP lab, but we've taken this from the GxP model in order to ensure that we can get the same data out of multiple sets of hands. We have now seventeen, soon to be eighteen members on the team, so we have to make sure that things are reproducible across all of those folks.

And then we have the high-throughput mass spec. As I said, we use Pro 2s, we use HTs, etc. I hear there's fun things coming in columns from Bruker this year, but as of right now we are using IonOpticks Auroras, 8 by 75.

We run very tight tolerance QCs and we run across multiple instruments at the exact same time. So we will put the exact same program, exact same dataset across four timsTOF HTs. That way we get data back much, much faster. We're also just generally running faster - when we stepped up to ninety-six, what we found was that we could increase our speed by 2.2 times and lose two percent of our proteins. I'll take that trade every day and twice on Sunday, because it allows us to impact so much more.

You can see that we're dropping from 9,300 proteins down to 9,100. The number of peptides we're seeing from 173,000 to 151,000 in HEK 293s. We still have substantial coverage protein to protein. The quantification is still beautiful. It allows an instrument with our average uptime of 85% to push 30,000 samples a year. That's really the engine for this.

However, it's a lot of data. It's an enormous amount of data - so how do we handle it? All of our data goes to AWS's Elastic Cloud Storage, S3. It really seamlessly stores petabytes of data. It allows for instantaneous storage over to Iceberg, which allows us to control costs.

And then I'll skip the middle one for a second and talk about our cloud compute. We use a scalable Kubernetes cluster that assigns logically based off of how large the experiment is. Our rule set is that if we have, say, one thousand samples, we want the raw data matrix to be able to be generated in about twelve hours. So we deploy the right amount of cores in order to do that. We have a large enough cloud that we can drive many simultaneous programs such as this.

The key to all of this, though, is the automated data movement. How many people here, by a show of hands, have had an experiment finish on a Friday night and it's a long weekend and they didn't hit that button to start a search until Tuesday? That's happened to all of us, right? Everybody at some stage of their career - that happened. Wouldn't it be great if the search knew to start itself?

It's not hard to program. You can absolutely make that happen. So we have automated sequence generation, meaning we read out from our dosing, we read out from our Ferristar - that's our protein assay - and so our sequences are automatically generated. Our runs occur, and as sample to sample are made, they are uploaded to cloud individually, searched individually, and then the wrap-up experiment search occurs automatically at the back end. It decreased human intervention into this process by 90% and saved us 30% overall time in turn. That's not just in data turn. Automated data movement is critical.

And finally, the data has to be digestible when it's this big. So we lean on Mass Dynamics - who is here outside today - as an online, very accessible platform for ourselves and our partners to access our data.

Raise your hand if you use PowerPoints. Come on, raise your hands. Awesome. We don't anymore. So we do not deliver data in PowerPoints anymore. All data that we deliver is in an online portal for that program team so they can dynamically interact with the data. It allows for experimental discussions to occur so much more seamlessly.

I'll give an example just from last week, where someone was looking at some things that were occurring in a volcano plot and they said, well, isn't that just - couldn't those all be just from protein family X? And we were able to just really, really quickly pull up a mask and show all the members of protein family X to show that they were evenly distributed, and that no, it was not being driven by that. These are the types of things that you can do when your data isn't static.

So let's talk about an experiment. Day one and two, sample treatment. Usually we'll do about - actually this is outdated - we're now doing ninety-six twelve-point dose curves at a given time. So we can vary the time course, we can vary the cell background. That's just our basis. We do everything in dose curve, because ultimately at the end of the day we're pharmacologists and all the other pharmacological readouts are also all in dose curve. There's just a far greater information content in the dose curve than there is in just high-low.

We use the parallel processing, etc. The mass spec will take somewhere between two and three days for the ninety-six dose curves, because we're running across four timsTOF HTs. And we have the confidence that we're fully QC'd - we're running a really tight QC: not only QC injections, but we are QC'ing every single sample. Well, actually I don't do it - our AI agent does it. Our AI agent is telling us whether or not that sample is telling us that something is going wrong with that sample.

Finally, we've got the cloud-based data searching, days three to five. Single sample search separately, AWS Kubernetes with three thousand assignable cores. And this is also including the time for when we need to drive automated re-injections for things that did not pass QC. So it has to pass this step, and then it runs back into the mass spec, re-injects where it needs to, and then puts together the data.

So finally by the end of the week we're sharing out data. What I'd like to suggest to everyone here who has the ability to do so is that where proteomics has trouble is when we have to hand off data. If you're handing off a massive matrix to someone else, or if someone else owns your analysis - Mass Dynamics and the tools that we have allow us to own our own analysis. So we are delivering the insight, we're delivering the impact, as opposed to handing it off to a data sciences team who then owns the impact, and we kind of hope our names are still at the bottom of the slide deck by the time it's all said and done.

Let's talk a use case really quickly in my last few minutes: targeted protein degradation - one of the most fit-for-purpose uses of proteomics to help drive meaning and impact in drug discovery.

We've built a proteome atlas of three hundred plus very deeply quantified libraries of proteins across tissues, immortalized and primary cell lines. The majority of this was done in-house. Some of it was done by our friends at Bagnosis. It allows you on-target / off-target tissue exploration. We've got a total depth in our atlases now approaching nineteen thousand proteins, so they're very comprehensive.

For TPD this allows us to look at, say, E3 ligase avoidance or E3 ligase specificity. If you're worrying about bleeding risk, for example, then you could look to E3 ligases that are not present in platelets, and build programs around that.

We have protein half-life information on sixty different biome matrices, all developed in-house. This allows us to understand what our half-lives are - both on-target and off-target - for how we drive impact, etc.

And then finally for degradation, dose curves are very fit for purpose, and I had mentioned those earlier in the talk as well. Here is - if anyone here does TPD I'm sure you do - the most common QC is to use MZ1-dBET1, that's a cereblon handle or a VHL handle attached to JQ1 to target the BRD family. You can see here is just a really simple four-time-point kinetics study that we use as our pilot for new members of the team to do just to get their feet wet in how we do proteomics. This was done by someone who had never done proteomics before, who joined our team. You can see that there is enough quantitative information and precision in those dose curves - despite the fact that they are three proteins out of, I think for this dataset it was in JRCA, we're like 8,800 proteins deep - you can very clearly see that we're really leveling out in the kinetics. At four and eight hours, only one to two is shifting.

I hope that this has given you an insight into how fast mass specs are the beginning, and not the end, of the journey. If you look at mass-spec-based proteomics instead as an end-to-end platform you can run faster, you can run more, and you can deliver insights across a portfolio — and from pre-portfolio space all the way into clinic entry. I ran a little bit over so maybe I have time for one question, but thanks so very much for your time.

Q&A

Host. Thank you, Harris. We have some time for a few questions. So obviously you've scaled up your workflow from one system to multiple system data handling. Where do you see this going in the next four to five years in accelerating it even? What can — from a user's perspective - be the future?

Harris. Oh, it's a great question. So are you saying specifically for scaling to multiple instruments, or are you saying in terms of data handling as well as maybe shortening gradients or anything like that to make it of an even higher speed?

I think the mass spec is great because it just calls balls and strikes. It'll read what you feed it. So being able to send greater numbers of samples to the mass spec in a given time could help drive a lot of improvement there. We'll start developing other bottlenecks. Right now we are at really a natural limit of how many analyses we can drive at the back end. So it really depends on what you want to do. If you are just performing work and then other teams are handling analysis, then the sky's the limit if you're a core. But as a non-core, right now we're kind of at our natural limit, so we like where we are at ninety-six samples a day. We like the depth - the trade-offs start to become a little bit more profound at faster rates.

Audience member, Zoetis. This is amazing. I work at Zoetis - we used to be Pfizer Animal Health. A couple of years ago I came into the organization and I said, I really want to do Discovery Proteomics for animal health. And they kind of said, well, when has proteomics ever led to a decision? But go ahead, figure it out. So what would be your best piece of advice for a young proteomics team? I have a team of three. We have one timsTOF HT. We have a couple of Thermos. How do you build that?

Harris. OK, so I'll start by saying that when I was hired by Johnson & Johnson in 2021, I too had a team of three. And that's the size of the team today. We were able to do that by answering that question - by delivering data that could be decisional. We are on critical path for over a third of the portfolio at Johnson & Johnson now, and that took three years to build.

So what you have to build is not - to think "I need to build data" - you need to think "I need to build trust." That's really where it comes in: finding the use case where there is a seamless decision that can be made being driven by proteomics. That can be in a lot of different spaces - chemoproteomics, etc. Find the first use cases as rapid-turnaround things, things that aren't going to be highly, highly iterative and exploratory, but where there's a specific question that other technologies have not answered. A good portion of our work is in the safety space - working in translational models to our safety and de-risking - and there is a lot of meat there.

Host. Thank you, Harris. Thank our speaker one more time.

Proteomics at the speed of Drug Discovery: Accelerating Programs to Clinic

Transcript

Q&A

Build scientific memory from your next experiment.