Data leaders start their transformation
with Databricks
When people think of data-driven organizations, you hear the same names again and again: Google, Netflix, Facebook and Uber. Often overlooked are large data-driven organizations like Johnson & Johnson that have thrived for decades by using data to make business decisions. Join us to hear Pallaw Sharma from J&J talk about the importance of having a well-curated data layer for all data types — structured, semi-structured and unstructured — so you can tackle all data and AI use cases.
Speaker 1:
Welcome to Champions of Data and AI brought to you by Databricks. In each episode, we salute Champions of Data and AI, the change agents who are shaking up the status quo. These Mavericks are rethinking how data and AI can enhance the human experience. We’ll dive into their challenges and celebrate their successes all while getting to know these leaders a little more personally.
Chris D’Agostino:
Welcome to the Champions of Data and AI. I’m your host, Chris D’Agostino. When people think of data-driven organizations, they normally are inclined to think about companies such as Google, Netflix, and Uber. However, there are very large data-driven organizations that have thrived for several generations because of how they are using data to make informed business decisions. Today I’m joined by Pallaw Sharma, chief data science officer at Johnson and Johnson to discuss what it means to be data-driven at an iconic brand that’s been around for over 130 years. This includes the importance of having a well-curated data layer for all of their data types, structured, semi-structured and unstructured data in order to serve all the data and AI use cases throughout the company. Yeah. So let’s get started. So you’ve had an interesting career, you’ve been a marketer, you’ve done data analytics from a customer service perspective. You’ve done B2B and B2C environments, what has been the consistent experience throughout your career that has gotten you to where you are today?
Pallaw Sharma:
Yeah. That’s a great question. If I reflect on my journey, one thing that has remained main true is the power of data and the power of insight. What has really happened over the years is we are generating more data and more information. We are harvesting more. The tools have become significantly better, and that has driven a tremendous productivity increase and disruption of many industries and companies and data has become pretty much central to any process of any business. So it’s exciting to see in my last 20 years journey, that continuously data is becoming pretty good to the business. And every business, every industry is trying to leverage the data more. Right? So that has been a continuous truth all along. And I see that trend, not weekly, ever, it is becoming a stronger and stronger. We’re generating more data. We are obviously moving foundation. We are mining more. The tools are getting better. The platforms are getting better.
Chris D’Agostino:
Yeah. That’s great. And then, give the audience sort of an example of what does a typical day for you in terms of… Obviously, we’re in this remote work scenario, but under normal circumstances, what is life like for you when you go to the office?
Pallaw Sharma:
Yeah. So I lead the whole data and digital part of our supply chain in Johnson and Johnson. And it’s a very large and complex supply chain. We are a very global organization. We have hundreds of sites, manufacturing, warehouse, in transportation, quality processes all over the world. We have hundreds of thousands of STUs. My typical day to day life starts with reviewing a lot of important programs where data is sent at the pool of it, right? So, for example, are we producing the right amount of the product in the right places? What is our inventory situation? How are our taskforce performing? How are our data science and AI algorithms performing? What is the business value? We all start from there.
Pallaw Sharma:
It’s all about, how do we serve our patient better? It starts from there. And then how do we make sure that our employees and our businesses are doing well? So across dozens or hundreds of programs, where data illuminates or supply chain excites better decision making, that becomes my whole job day in, day out to make sure that these programs are running well. We are laying down the foundation of a data centric culture. We are actually lifting the capabilities across the organization, and we’re really focused on top tier your talent, right? To make sure that our businesses and other products keep on getting better by the day.
Chris D’Agostino:
Yeah. Being in the Bay Area and the West Coast, I would say it’s one of the least favorable times zones to be in when you’ve got global operations. I’ve worked with customers globally and you’re immediately eight hours behind the UK on a given day, and it just gets worse and worse as you go around the globe. How do you handle that transition? Do you find yourself in meetings at 2:00 in the morning?
Pallaw Sharma:
Not 2:00, but definitely, the day starts early in most of the time. Right? I mean, we do have colleagues all over the world. We have a strong West Coast operation. We have an office in Seattle, in Bay Area, in Southern California, but then we have a big presence in the East Coast, in Europe, in UK. And I also have teams in Singapore and India. So pretty much all of the world, right? The day starts early, and then it goes still around, I would say 6:00 PM Eastern time. And the West Coast starts after that, I get some leader and then our Asia Pacific colleagues start. Right? But it’s fun. It’s fun to kind of… It’s exciting to talk about the latest and greatest work that people are doing, I’m always super impressed about the ingenuity of people. These are long days, but we try take breaks in between to make sure that it’s not becoming a continuous series of many, many meetings. But yes, West Coast time is not really very conducive to global opps.
Chris D’Agostino:
So you have this global presence, these regional base datasets, I would imagine, right? Based on sort of the supply chain and manufacturing that’s happening in country, the data architecture and data analytics is a really important piece, but oftentimes organizations are getting started with some new data insights and they may not have a completely, fully baked data architecture. They’re doing some proof of concepts. Give us your point of view in terms of that balance between moving quick and maybe having less rigor. And the architecture completely defined and buttoned up versus, when does that eventually catch up to you and start causing problems?
Pallaw Sharma:
That’s a great question. Right? And then we see our role, I mean, my role is at an enterprise level, supporting all businesses, all supply chains all over the world and quality and sustainability and procurement. So we see our role as more of a catalyst and enabler, and then making sure that we empower people to do their best work wherever they are, to make their best decisions wherever they are. So I don’t particularly see a conflict between moving fast and doing that in our architecturally robust manner. In fact, because if you do that in our architecturally robust manner, you will move fast. Right? So what we try to do is we will think of it like a large operating system where we provide… Our team provides the common data layer, making sure that all the data is pulled in one place and ready to be used, making sure that all the data curation and cleansing has been taken care of. So that the data scientist and the visualization expert do not have to keep on going into cleaning the data or extracting the data and so on, right?
Pallaw Sharma:
So we provide that platform with what is still a stack. And then we provide some old products on top of it like forecasting or natural language processing or image analytics, right? So that actually liberates people closer to the business to start using this rapidly. Right? So we do encourage people to do a lot of custom learn, a lot of PoC’s, but also we give them this platform on which they can do this faster experimentation, rather than thinking about infrastructure, thinking about data platform, thinking about all services on a cloud platform.
Pallaw Sharma:
We take care of that and we do make sure that it’s open to everyone’s platform and they can actually move much faster and while not worrying about the foundational layers. So that is a model and that is actually driving a lot of rapid innovation across the company, because that’s what one needs to do that. How do you democratize, how do you lift the whole thing up? Not single credit, by making it pretty hard for the enclosed ecosystem and also not kind of giving any guidance and having almost a kiosk, which is hundreds of conflicting projects. So the truth is somewhere in between the sweet spot if somewhere in between.
Chris D’Agostino:
So, Pallaw, I know Johnson and Johnson has been around for over 130 years, and many of our viewers may not realize that, but I’m sure you’ve had all manner of data ecosystem and platform architectures and things like that. When we last spoke, you talked about how Hadoop was really important in the journey. And I’d like to talk a little bit more about how you’re supplementing some of the workloads that are done there with some of the cloud-based initiatives at J and J.
Pallaw Sharma:
Absolutely. And we have 130 year old company, we actually want to think of ourself as a 130 year old startup. So innovation is in our DNA. We are assigned to one engineering driven company. Our products are really, really complex products. They are life saving products. And then they matter a lot to lower customers and patients and so on. I think our journey has been continuously evolving, right? We are multi-generational technologies in our stack, I mean, and then we also acquire and divest companies and bring many more technologies from all over the place. So in that situation, as they came on board a few years ago, one thing that was clear, again, going back to the center is business value. We centered on, how would we create more business value, more value for our patients, our doctors, customers, mothers, and fathers? Better, faster, and make sure that our whole ecosystem, internal and external, are moving faster.
Pallaw Sharma:
It’s a good thing from that perspective, there were many use cases. For example, how do we monitor real time, hundreds of thousands of lines all over the world and make sure that our yield is optimized. How do we predict inventory? How do we do demand sensing? How do we actually collaborate with our external partners, rapidly and close to real time, both on the customer side, as well as on the supplier side? So these use cases, made us think about leveraging the modern architecture, the cloud-based architecture, machine learning AI, API first architecture. And we have been on that journey because that has supplemented and really complimented our analytic stack. So we have a bouquet of technologies, but now, we are really leveraging these modern platforms and they are showing great results.
Chris D’Agostino:
That’s great. Can you talk a little bit since you mentioned the different use cases. In a globally distributed organization that you run, how do you look at project investments and how do you work to ensure that you’re not unnecessarily duplicating effort, right? You might want to sort of do a trade off and horse race two concepts and two implementations, and that’s all great. But if you’re really looking to be very efficient and you’ve got confidence in the teams, you want to make sure that the work’s not being duplicated. How do you manage that?
Pallaw Sharma:
That’s a great question. I think one has to be very cautious about it. And again, the balance is very important because on the one hand we want to definitely encourage innovation at all levels and experimentation at all levels. And we do encourage people and try to make the decision to [inaudible 00:12:47]. And on the other hand, we have to make sure that we are all doing things in an ecclesial manner, which is building on top of each other so that we are not duplicating things. The way we go about these things, in some cases, in a platform model, for example, we can give a very good guidance to the organization that all the data should come to the common data layer, right? And everybody should try to connect their analytics, well, workload or the visualization workload or the BI workload to the same common data layer.
Pallaw Sharma:
People do see value in that, right? Because why would somebody want to undersell the extract and clean the data, which is already provided for them? Right? So once they see value, it becomes a common thing, right? So the Whiting, the large organizations, the right platform, the right enablers and evangelizing it, goes a long way. And then on the other hand, we also have a very disciplined process of managing innovation. We try to do things initially in a small manner, instead of calling them proof of concept, we call it test and learn because everything is a learning.
Pallaw Sharma:
So we do a small test and learns. And then from rapidly, from there, we scale, right? Once we understand that, which technology or which platforms will work better with ambiguity scale, right? So for both the points, one is not just giving people a top down message about what can be and what shouldn’t be done, but actually enabling them, empowering them and making them see the benefit of the modern technology. So then they will align better. And two, having a disciplined and a streamlined innovation process that has helped us. I mean, but it’s a journey and we keep on optimizing as we go along.
Chris D’Agostino:
Yeah. That’s great, Pallaw one of the things that we’re doing at Databricks is talking to our customers and prospective customers about that data layer and the need to have that as clean as possible. And the benefits of making sure you’re really consistent with how you curate the data, cleanse the data, ensure its security. So that it can be done once, so that you’re operating in a more efficient way, and you’re enabling these downstream use cases. And just being able to, as you said, empower the teams to do more with the data that you already have, and that you’ve worked with and curated once.
Pallaw Sharma:
You like this. And I think that that can be one of the most important initiators for any large organization to undertake. And that actually unlocks a tremendous amount of potential across the whole organization, data scientist and data analyst and visualization expert, I mean, they like to do the work on top of clean and curated data because that’s where they can practice their craft. That’s where they are the most productive. You will typically find a lot of feedback about how much time it takes for the data scientists, just to get the data, just to clean the data and not really focus on their skills, which is data science. And if we can just reduce that friction in the system, we can just empower them, enable them.
Pallaw Sharma:
Making sure that they are delighted with the right information in the right time with the right latency of the right granularity. That goes a long way in unleashing innovation across the whole organization that drives tremendous amount of business value, right? So I completely agree that has been one of our biggest focus areas, in addition to, of course, making sure that we have absolutely latest and greatest and most sophisticated machine learning algorithms working, but all of them are dependent on this foundational layer, right? So this is a huge unlock mechanism, and then we are committed to it.
Chris D’Agostino:
Yeah. This is what we’re calling that Databricks, the Lakehouse architecture. So this idea of trying to combine the benefits of an enterprise data warehouse and all the data governance that goes along with it. With the flexibility of a data lake and the ability to have different workloads for different data types and combine these two engines, if you will, using low cost cloud-based storage. So excited to see you aligned with that. So, Pallaw, with Johnson and Johnson, I know that you all are users of Databricks, of course. You’re using the Delta Lake engine for having that common data layer and the data hygiene associated with it. Can you tell us a little bit about the use of it, and why you feel it’s an important part of your architecture?
Pallaw Sharma:
Sure. Yeah. I think we’ve had a lot of promise with Databricks and probably while underpinning all the common data layer. If you look at our main business use-cases, we have many, many use cases which will require different types of data and different latency. So, for example, our planning system might need data at a certain latency, but manufacturing might need real time data, right? Sourcing might need documents, right? Deliver and logistics might need a real time feed on temperature and location coming from all over the world. Right? And our underlying system landscape, we have hundreds of systems which are actually generating this type of data. For Databricks, Data lake architecture allows us to actually bring all of these together with the right latency, right? And the right cost, that we can provide a clean data’s stream to our analytics and visualization leader. Right?
Pallaw Sharma:
So we work on the CTR architecture where fundamental IT platforms, and then we have a common data layer, and then we have analytics and visualization on top, right? So to provide and empower analytics and visualization for Databricks technology, the data lake architecture is actually a really good step forward for us because we are able to get a seamless integration across hundreds of systems, different levels of latency and real clean data coming out of it. That is empowering end to end visibility, machine learning workloads like natural language processing, image analytics, forecasting, optimization, and so on. Right? So it’s an important investment for us and we’re looking forward to generating even more value out of it.
Chris D’Agostino:
So why don’t we shift into just learning a little bit about advice you would give other leaders that are aspiring to be chief data science officers, or chief data analytics officers. People that are looking to you within your own organization and external to your organization, what types of things, what you do you feel really kind of defined and enabled your career to get where you are today.
Pallaw Sharma:
Sure. Right. I can think about a few things, some of these principles and we keep on learning as we go ahead. One of the most important thing is I’ve mentioned this a couple of times is the focus on creating business value, being the real and right partners of the business and functional leaders who are actually responsible for day-to-day business performance side. Enabling them, empowering them and thinking always from the perspective of whether this project or this initiative that I’m championing, and I’m so passionate about, that it really matters to our customers. Does it really solve a key business problem? A lot of times there are things which are very sophisticated and very interesting. And I get really excited about them, but sometime, their leaders far, they’re too early to go to the market or we are not going to feel the right impact.
Pallaw Sharma:
So that’s point number one, which is focused on business value. Point number two is think about the scale, right? We can get busy in terms of working on one or two initiative, but our job as chief data science officer, chief data analytics officer is actually to empower and enable the whole organization and lifting the whole organization. So it’s not only about the work that we are working directly to, but it is more about how do we empower those so that we can unleash and unlock innovation across the organization. At the end of the day, large and complex organization will require tremendous amount of experimentation at all levels in the organization so scale is important.
Pallaw Sharma:
And with the scaling, one has to think about the right amount of architecture, right? Which is what’s the right architecture that will get you to scale? Which is a good balance between it all, fast experimentation and rapid decision making locally, while adhering to some common standards and some common architecture, right? And then leveraging the modern technology, which is cloud, API based architecture and machine learning AI, making sure that everything gets connected and so forth. So that’s striking the right balance, but going for the scale, not very focused on one or two or three things, right?
Chris D’Agostino:
Yeah. I was just going to say, it’s one of the things that we actually have been focused on, which is we want to enable single node data science to keep the costs down, and really democratize access to create those citizen data scientists. But what we found with a lot of organizations, they’re doing it with, say a laptop and a subset of the data, because that’s really convenient for the data scientist to work with. But then when they try to move it into the production environment, the libraries are different versions. The data sets might look a little bit different than what they had downloaded or we’re working with. And so, our goal, we have the ability to do single node data science on our platform where you’ve got the same underlying environment and same libraries and the same access to the data, but you’re keeping your compute costs way, way down. So I think it’s spot on what you’re describing about being innovative and kind of locally, but then making sure you can scale it up.
Pallaw Sharma:
Yeah. Absolutely. And that is something that comes across as a conflict, but it is not right. I mean, as you likely said, we need to make sure that we will build stuff for the scale and we built stuff that runs in production, right? Because otherwise, there’s no business value connecting it to point number one. I mean, we can do a lot of interesting work, but unless it hits production, unless it lands in large, scalable manner and continuously updates itself, right? There’s not a lot of business value. And finally, I would say talent play, which can be actually, number one point, this whole area is a fast growing, rapidly improving the space. Talent is very, very critical. How do we attract and retain and motivate the right talent via skill and up-skill existing talent? I mean, that becomes a game changer, right?
Pallaw Sharma:
I mean, unless it’s not only about the right technology, right architecture, not only about the right business value and the focus on the right problems, but it’s also about the right talent and making sure that they are doing the right work in a streamline manner. So folks who are more engineering and builders, they are focused on building the platform, folks who are more business process and business knowledge centric, they’re actually focused on the requirements and deployment of these things. Also, having the right operating model with the right talent is also very crucial. So again, focus on business value, both for the scale and go for the right talent. I think those are the things which are very important.
Chris D’Agostino:
So that talent point is great. And it leads me into, I think the next area, which is developing that culture around data science, right? Recruiting, training, retaining this top talent is very expensive. And plus you get all the institutional knowledge the longer, the more tenure somebody has that you don’t want to lose. So can you tell me a bit about your principles around developing a thriving data culture, and what role that architecture, in terms of enabling people to be productive and do their best work? What role that plays in your mind for that culture.
Pallaw Sharma:
Yeah, sure. And I think this is, again, a very, very important topic, but really for large organizations, which require a lot of domain knowledge and knowledge of navigation across the organization. Right? So we start by saying that it is going to be a tool in a box or a model, right. Because as much as one needs, the modern technology skills and the data science and machine learning skills, one also needs to be equally important in some cases, more important, the business knowledge, the process, how things work, the relations, the knowledge of all of those things. So what we try to do is we try to make sure that there is osmosis happening between these two skillsets. I mean, the folks who understand data and AI and machine learning and full-stack, and the folks who understand business, right?
Pallaw Sharma:
We put them together in a bond program, and then providing them with the right architecture and the right platform so that they can experiment fast. Right? I mean, there’s not a lot of lag between the idea and the requirement and the build and the deployment play. So how do we run that thing fast? And that gets enabled by AI and a common data layer, the common set of tools, a cloud-based platform, low-cost compute and so forth. So those platforms and the modern technology actually helped drive the stewing a box model. And then, we actually put both these people, all the skill set, which is more on the data and technical skills and business and domain knowledge skill set as responsible for the business outcome, right? So we don’t want to have this wall in between where somebody is just doing the requirement and somebody is doing a bid, but both of them are jointly responsible for generating business value. Right?
Pallaw Sharma:
And the more we showcase these type of collaboration, the better it is. Typically, it is very, very hard to find, it’s a unicorn kind of a situation where one wants, if somebody has a PhD in computer science or machine learning, and also knows, tremendous amount of chemistry or biology and so forth. So we have to make sure that we need to bring these terms together and we need to make sure that both of them appreciate, right? I mean the complexity and the importance of expertise, right? So our data scientists become more educated on our manufacturing processes, our logistics and transportation quality processes and vice versa, our quality and manufacturing. People get more educated on our data science and our platform, right? So that’s how we are going about it.
Speaker 1:
Thank you for joining this episode of champions of data and AI brought to you by Databricks. Thousands of data leaders rely on Databricks to simplify data and AI. So data teams can innovate faster and solve the world’s toughest problems. Visit databricks.com to learn how data leaders are unlocking the true potential of all their data.