This session is part of the “Managing the Machine Learning (ML) Lifecycle Using MLflow” series with Jules Damji.
This workshop covers how to use MLflow Projects packaging format to reproduce runs and how to use MLflow Models general format to send models to diverse deployment tools.
Suggested Prep Work:
To get the most out of this series, please review and complete the prep work here: https://github.com/dmatrix/mlflow-workshop-part-1#prerequisites
What I’m gonna do today is for those who actually did not attend, I’m just gonna do a quick review and recap of what we covered in the first part, which was MLflow Tracking.
And then there’s a YouTube video, you can actually see the link over there. Now I also posted the link on the chapter, you should be able to actually get to see this. And I’m gonna talk about the two other components which comprise the platform, call them MLflow Projects and Models and discuss, what’s the motivation behind that, what was the reason that we actually need projects and models and how they actually fit into the grand scheme of things. And then we’ll discover how you can actually do model execution on a Databricks Community Edition, hopefully most of you will have signed in. And then we’ll explore how you can actually use the MLflow UI built in as part of the Community Edition. So you can actually compare and contrast the model execution to people that shared the models with you that you can actually compare and run different runs. The two URLs that you actually see are the GitHub that you will actually need. So go to that particular URL, either clone it or keep it open and use a browser to get to the material that you actually need. There’s also a directory under there called Slides, and the Slides has all the PDF that you actually need. If you wanna follow me online, I believe somebody in QA posted, do we actually have slides for this? The slides are in this particular GitHub URL. So the first one is the GitHub URL, the second one is at the project that we can execute. So keep those in mind, okay. Now, just just a quick review, last one, the last workshop that we did part one, we started with an assertion. I made a claim that Machine Learning is complex. And the complexity is not so much from the fact that the algorithms are difficult or the math of the theory behind is difficult, or the languages and the frameworks are difficult. The problems stem from two different sources. One is the methodology and the second is how the model evolves over time in different stages of the model development require and demand different requirements. And because of those two, the whole process actually becomes quite difficult. So let me just recap very quickly, what those are. The first one is the Methodology.
What I mean by methodology is that, how do you go about? What is the modus operandi you actually use to do traditional software development from machine learning. And I think those are quite different because the goals are very different, right? In the traditional machine development, you actually have a specification that you actually go through and you write to this particular specification, whereas in machine learning, you have to optimize the metric. You’re mode concerned about how can you actually make the metrics work, how can you actually, what can you use to move the needle, so you can actually deploy this model with confidence that you have a reasonable accuracy. And the second is that the quality in the traditional software development depends only on code. Well much of it depends on code, whereas the quality in the machine learning depends on your input data and your tuning parameter. This is not to say that the code quality doesn’t depend on the machine learning code that you write, but I think that will be determined by what’s your input. How large is your input? What are the tuning parameters that allows you to do regularization or what are the tuning parameters that you actually use to make sure that your model is generalized. And as a result of that gives you a good metric. And the final thing is that in the software development cycle in the compile edit cycle, there is a limited stack in number of libraries and number of tools and utilities you’re gonna use, it’s pretty much limited what your company’s infrastructure supports. Whereas the machine learning, it’s slightly different because you wanna compare and combine many libraries to create a particular model that gives you the ultimate the best model. And those are the ones that you actually wanna use, which are most recent, which are has a large community around it. And they’re sort of, good parties open source. So I think that’s problem number one. The second one is that if you look at the quintessential cycle of how machine learning preparation or model evolves, you got all these different stages and each and every stage will have a myriad set of tools that you’re gonna use, right?
So for data preparation, you might be using a combination of tools to do your ETL or to do your feature engineering. you might use scikit-learn, you might use pandas, or you might use Spark along with with with SQL, or you might just use pandas as a programming language or Scala, whatever. And then if you’re training, as I said, the machine learning cycle demands that you just not use only one library, but you have the option. You have the ability to use other frameworks as well so you get the best of the world. And then the third is you might be deploying a different form factors. You’re not only limited to one, you’ll have, several deployment targets you actually wanna do. And then finally, your data injection as well is complicated. And so that’s one thing, you have different set of tools, and they have different requirements. The second part is the Tuning. Each of these tools will have specific tools and configurations you actually wanna maintain. And more importantly, is the tuning part in the training model. You actually wanna make sure that you probably use the best hyperparameter tuning that you have scoped over the entire space. And so that really affect the needle because that’s what you wanna do you wanna optimize the metric, and one of the things you do that is by constantly experimenting with new parameters, new tuning, so that you get the best model. The third thing you have to do is you have to do it at scale. Each of these stages require that you actually do it at scale. And the scale comes into prominence because of the fact that we are dealing with the large magnitudes of data in today’s era of big data in ML, so the scale is important. And then third and final is Model Exchange. How do you ensure that the model that you tune, the model that you that you train, the model that you were satisfied with that has an accuracy is easily transferable in the same way that you actually trained, so model exchange is important. And then last but not least, we are living in the world of data privacy, and we had to worry about governance, we have to worry about provenance. So how do you actually control these, and as a result of these two distinct sources, machine learning in general actually becomes a bit difficult. And what I propose, what we propose is that, can we actually do this in a very open manner, so you’re not restricted to only using proprietary software, or you’re not restricted to use only a limited set of things.
And there are companies who actually been very successful in standardizing something. But what we wanted to do was we wanted to create an open source platform so we can actually do that in an open manner. So other tools can be used, other libraries can be used and we can actually deploy things in different environment. And the result was having two very distinct and separate components called MLflow Tracking, MLflow Projects, MLflow Models. And the last one, which was recently released was MLflow Registry. And we talked about tracking last time. And today we’re gonna talk about Projects and Model. And just to recap the whole idea behind tracking was the ability for you, for us to provide a very simple pythonic fluent API that you just infuse your machine learning code with this very simple way to track parameters.
You can check parameters In machine learning parlance, you can track metrics that you actually create, depending on what machine language or framework they actually use. And then you can actually save a particular model, which we actually call it a log model. And as a result of that, when you actually do all these, what you get later on is that you can actually use the MLflow UI to be able to peruse and investigate and compare all the different metrics that you created that you have lone or the model have you actually created and you can look at the artifacts such as plots that you have saved. And it’s very easy for you to not only track those that is very easy and methodical way to compare and contrast how the needle actually moved from one parameter to the other one, or how different models actually compared to each other. And that was the whole idea behind tracking. And the tracking is, think of it tracking is a central server, where people, things on the left hand side are the producers of all the data or the metadata, or the runs of experiment, which use the several API that we actually provided as part of tracking to log everything on the tracking server.
And things on the right hand side, think of them as consumers that actually use the same API to consume the metadata, the experiment, the parameters and metrics that you actually use, and present them in a digestible way, in an exploratory where you can actually consume and makes make sense out of it, and you can relate that as or convey that as a story of how the model evolved and what happened to it. And you can actually save all these metrics not only as a data source, but you can actually show them in the API on the console. So that’s sort of the essence of the tracking. And one thing good about tracking is that you can actually use it locally, or you can actually use it programmatically by setting their mind per variable. So that in a very nutshell, in a quick five minute was what we actually talked about in part one. And then we did a logic tutorials and databricks combinations, to show how you can actually use those. And so today, what I wanna talk about is MLflow Projects and MLflow Models and how they relate to tracking as well.
And then by the end of the three series, you’ll actually have an idea of how each of these work together. And you don’t have to use everything you can use just tracking for everything and then you don’t have to worry about using models and projects. But these are added benefits that allow you to have distinct and a selective way you can actually use this components. So MLflow Project, what are MLflow Projects? These are a conventional way that we have introduced and hopefully by that becomes a standard to package your machine learning code in a format that enables you to produce this particular runs anywhere. Now, one of the things that I stated earlier, and I did that in part one that I proposed, and I said that, MLflow was built with the proposition and the principle of an openness.
In other words, we actually wanted to have support all the common libraries, you’re not stuck only with one or two, but a set of common libraries and most of the data scientists and machine learning developers actually use. And we also wanted them to be able to actually use different environments in which they can actually use this and deployed this particular project. And that requirement, that meaning itself poses three problems, right? The first Is that each of this particular tool will have its very generic way to encapsulate data, and a very generic way to deploy those things. And second the people sort of run this experiment in different environments. Some running locally, some run it on the local machine, some run it in the cloud and some run it in notebooks that we will actually see. And the third is that even if you capture the exact data, even if you capture the exact data, and the parameters that you actually need, you might not be able to actually create the same experiment that you actually did when you were developing your machine. In other words, the target environment might have a different set of libraries, the target environment might have different versions of the Python that you actually use, or the language you use. And as a result, this poses an interesting challenge. How do you actually use machine learning very difficult to produce? How do you reproduce a particular set of experiments, did you actually did it on your local machine? Or you did it on your dev environment. And then you wanna make sure that the same environment exists in your production environment, or the same environment exists when you wanna share your particular project with someone. So if I create a particular project, and I say, “Hey, Chris and Mary, “here are the two projects that I created. “Can you guys actually run it “and see what you think about it?” And so one way to do that would be able to use this particular project as a way to share with my colleagues so they can actually run the experiments in the same environment. In other words, recreate the same environment, reincarnate the same project in the same environment so that they actually get the results. Well, so how do we do that?
Well, the solution was that, if you think of MLflow Project
as a self contained unit of execution that bundles all the machine code that you actually need in order for you to train that. all the dependencies that you actually need to run the particular model the libraries, the languages, the different configurations, all the data that you actually need, and the config specification. That way, you can now run it locally, or you can actually run it remotely. And as a result, because MLflow can actually install and execute all this particularly depends in the target environment, you essentially are recreating that particular project in the target environment. So if I hand my project off to Mary, and she went and executed this particular project, she would actually get the same results as I did on my local machine. So what’s the secret behind that, right? How does it manifests itself? How does MLflow project specify so that we can do this recreation reproducibility? Well, there are two sets of things, right? As you can see, if you come from the Unix environment, which most of you probably do.
This is just a simple directory structure that we actually provide on the GitHub, where under my directory called MLflow Project, you will have two files, one is called MLflow Project. And the other one is called Conda. An MLflow Project is nothing but a specification and configuration instructing MLflow to say, okay, here’s how you actually recreate my particular project in my target environment or my local environment. I have a Conda file that actually tells me what my dependencies are. And these are my entry points that actually will use, in order for me to actually recreate and run this experiment. And the entry point will have a command, will have, some variables called training data that gives you the path or where the data is located. The Lambda is all my arguments that actually provide. I could have several arguments, but these are my default and the kind of type of argument it is. And then the command that is gonna actually use to particular train this particular model so when I hand this over to Chris or Mary, they can actually just run this particular thing and it will it will recreate that. And the way they actually run is very simple, we provide the MLflow UI or MLflow CLI to say you can run this particular Git repository that I’ve given you with this particular project, and run it as MLflow Git run, and then provide the arguments that I need. Now if I don’t provide the arguments, and it’s gonna use the default. But suppose Mary wanted to use her own set of lambda arguments, her own set of learning codes to actually recreate this particular model to see how it actually does, because I’m gonna experiment that. So she could provide these particular arguments as well. Or you can actually use that programmatically, there’s an API that says MLflow run GitHub repository, and the parameter is nothing but the dictionary, with set of default arguments are the ones that you actually want. Or alternatively, when you do a Git clone of that particular project, you can seed into that particular directory and then just run in place whereby you say MLflow run E, which is the current directory, and my entry point is main and here are my arguments. So that’s one MLflow Project is a way to express my project configuration and instructing MLflow to create this, re-incarnate this particular project in my target environment.
But more importantly, it is a second file, which is the conda.yaml. And the conda file or this could be also a docker file by the way. So conda file tells tells MLflow to go ahead and create a conda environment. And then within the conda environment here on my channels to use, the default is a way for conda to say, here is my default way to get all the libraries and packages that I need. And here are my dependencies. My dependencies are, I depend on Python 3.7.3. So even though my local machine might have 2.3, but I don’t wanna run into that .3 because I created this project using 3.7.3, so when Mary runs it, it’s gonna run with with the Python install 3.3. At least I could learn that’s how I build the model. And I’m gonna use pip to install MLflow and then I need pickle because I use a scikit-learn pickle where to serialize the code and I’m gonna call my environment mlflow-env.
So that in essence is how MLflow encapsulate or formats a particular project. So in summary, what is a Project?
As I said, the project is nothing but a convention way for me to reproduce ML runs. And the way we do that is that we can actually provide a GitHub repository where you can plop in an MLflow project file that you express your project configuration. And it also defines a set of dependency that allows people to reproduce the environment in which they initially trained. And then finally, we provide a set of execution in API arguments to run the particular project. And this could be a CLI, and they’re available both in Python, R and Java. And then you can support them by running locally or you can actually do that by executing remotely on a databricks server by specifying the back-end or you can actually run it on Kubernetes which is experimental right now. So that in an essence is what the project is. Is nothing but a conventional way to express and convey to Mlflow, say here’s my project, go ahead and execute and recreate this environment by installing all the dependencies that actually need. So what is the anatomy of MLflow project? What happens when I do an MLflow run with this particular URL as my GitHub?
Number of things happen. The first thing actually does, it does a GitHub of the checkout of that particular project into a temporary location on your machine or on the target machine. It creates a conda environment, it activates that and the name of the conda environment is normally the MLflow-runid and that’s that. And once it’s actually activated yaml, it will start installing all the packages that you actually need for it to create the environment in which it was initially run and experimented and sealed. And then finally is gonna execute your particular program with that entry point that you actually provided, which is In this case was Python trained up UI arguments, and it’s gonna train them all of them, right there. So essentially recreating model with the same configuration we actually had when, say, Chris created and shared that model with me.
So you’re probably asking, well, this is all good and fun. How do you actually build a particular project? Which is quite simple, really. If you think about it, the few steps, just like how we execute it, the first thing you do is you create an ML for a project file, right?
And you populate that with your entry points. And by now, you probably know what those entry points are. And you provide the default parameters and so on and so forth. The second thing you do is you create a conda file. Well, how do you actually create a conda yaml? Well, you populate that with dependencies but how do you know what your dependencies are? Now, if you have already created this particular model using the MLflow tracking API then when you look at the particular model, one of the things that we covered in part one was that you if you look at the artifacts in the UI, there’ll be an artifact called Model Directory and in the model directory, you will have a file called Conda. And conda.yaml is the one that is what you actually want, unless you remember all of them. But this is the one that actually captured and that’s when you actually wanna do and you just take that copy and paste and put it in your GitHub as a second sort of file.
The third thing you wanna do, obviously, is you create a GitHub repository and upload those particular files in any other code and data and sources that you actually want that captivates and that captures what the project is. And then finally, test it by running them locally. Or you can use it programmatically or you can use a command line and then share it on GitHub anybody actually wants that this is sort of a good way to publicly share your projects that you actually have created all period of time, the other people can just do a GitHub, it’s a good way to show the resources. And one of the good things about the project is because of this way you can specify how the project should be executed, you can create what we call a pretty complicated multi-step workflow in your project specification to say these are the way I actually want to create a multi works but I can actually do say I want to do three sets of things before I train my Keras model, so I can load a particular file, let’s say a CSV file from some location, I can use Spark ’cause of the ETL.
Once I’m done with ETL and once I’m done to feature engineering, I’m gonna save all my clean data as a parquet file, and then I’m gonna use Spark MLlib to load that file and then create a particular model do that. Once I’ve done that, the output of my MLlib could be an input to my modern training parameters. So then the next step would be, go ahead and start training the Keras model. And then this sort of gives you an elaborate way to string together several workflow steps you actually want. And then the dry work can just look at each and every step and say is it actually finished? If it’s finished and go to the next one, and so on and so forth. So you can create those in a fairly easy multi-step workflow. What would MLflow Project file look like? Actually quite simple. Well, it’s a bit involved, but you get all the different entry points you can actually see. And the entry points are, this is my conda environment, these are my different entry points. I’m gonna go ahead and do the ETL after I’ve done ETL, I’d created my ALS model then I’ll do the train Keras. And then my final thing that I’m gonna run is called actually Main. And the main is the one that actually sort of invokes all these different arguments. All this different entry points and you can re-take the results of those and feed them to the next. You can actually create a fairly good workflow.
So that in essence what MLflow projects are.
Are there any questions on the on the QA that you want me to answer or you guys are taking care of?
– [Chris] I think we’re in pretty good shape so far. Thank you. – Thanks a lot, Chris, I appreciate that. Okay, so let’s move on. So the next thing I wanna talk about is models, right? And models just like projects are very similar and the motivation came actually from the same thing is that we won’t be able to deploy these machine learning models into diverse environments. Remember, I talked about the ability for a model to be actually be executed in different environments. And so this is one of the reasons why we actually wanted to have that. So the motivations behind models is no different from what it was for projects as well.
We wanted to have all this MLflow frameworks that it actually wanted to support because our whole proposition was that we wouldn’t be able to have people use and combine the best out of tools available for them to do that. And we won’t be able to serve them in different environments. And these in itself poses a number of problems. First is that models can be written with a wide radii of tools and as a result what happens is that you have to worry about how they’re gonna be used and how they’re gonna be executed in different environments. The second problem is that these models may be written, say, for example, a different framework. I might rise a TensorFlow model that I won’t only run on Kubernetes. Or I might run a scikit-learn model that I want to do both that scoring on Spark or I might write a Spark model want to create and they have the ability to not only run on the spark infrastructure or Spark cluster, but also run it on say, SageMaker as an MLlib. And how do I do that? Well, if I wanted to support all that, what happens is that I end up having all these different end-by-end combinations to support my model on different servings. And it becomes the same problem that you actually had in projects. So the motivation is very similar. I don’t wanna have to support end-by-end different ways to say for TensorFlow use this kind of model to deploy it on Kubernetes. Or for spark use three different ways to do it on SageMaker to do it on Spark and to do it on the Docker environment. So what was the solution? The solution, I think was a way to say, can we actually have a convention unified model abstraction, right?
Think of it as an intermediate format that captures the flavor of the model that you want to deploy in a variety of environments, right? Think of a model as a conventional format for packaging so that machine learning model can be used in a variety different environments. Think of models more importantly, as Docker files for model because if you have a Docker file, it can actually execute in the application on new one. But more importantly, the abstraction was that think of a model as a lambda function. If you come from a Python environment, if you come from a functional programming environment, think of it as a lambda function for a particular flavor or of a particular model that we can actually deploy in a desirable Python environment and just invoke its its scoring function called Predict. So this is where I can actually take my spark pi function and I can deploy it anywhere where it’s actually supported in Python.
So these intermediate level allows us to actually have a one layer one format that all this other different formats can actually be used. And then we can actually have a target environment where we can actually store. And I think a good analogy that I would actually use for this is that those of you are familiar with, say, an Apache Arrow, which is the way to internally capture how data is actually formatted or arranged in memory by all these different tools. Before Apache Arrow, all we had was this different way of formatting tools so they can actually share the data. But Apache Arrow gave a standard way to say, okay, here’s how you actually you’re gonna represent your data internally. So any of these tools can actually share. They know that this is Apache Arrow format, I’m gonna serialize and deserialize in one particular way. So their knowledge is very similar in that you actually have an intermediate layer that actually have all these multiple tools that can actually write to that particular format, so then I can think of the consumer or the producer, the consumers can actually know what flavor it is, and I can use that particular flavor. So the knowledge is very similar.
Well, how did you do that? (clears throat) So what happens is that when you use the MLflow tracking API, and when you use say MLflow.tensorFlow log particular model, what happens is this point is that we create this directory for you.
Just like we create a directory, just like you need a created the project, we created this directory call and then model this name of the model. Underneath you will have everything you actually need to capture that. So you’ll have the model file, and the model file will have two flavors in it. The flavors are normally the flavor that you actually use to log the model. In this case, it’s gonna be TensorFlow model and underneath, you will have the directory that has the estimate of the TFF graphs, or the variables or the weights that they actually use to create this particular model. And you have a significant function that it’s gonna use to log the TensorFlow model. So this is usable by tools that understand TensorFlow model format. So if you deploy this somewhere on SageMaker or Kubernetes, it understands how to actually load the TensorFlow model and then use the predict function on it.
The other one, which is actually quite powerful, which I thought about, which I said earlier, I said think of it as a lambda function, is the ability to actually use the pip or we call the Python function or pi function model. They can actually load the model anywhere in a Python environment. And by providing a panda’s frame, you can actually do the predict function in this. So what’s the result of this?
Think of it this way, if I was actually using a Keras model and I log the particular model, using the tracking API. Go in the MLflow, log this particular model. What happens is that I actually create this particular model format, and has two flavors in it. One is the Python func. flavor, and the other one is the Keras flavor. And when I had this two particular flavor, I can deploy them in multiple environments. So in an environment where Python is understood, I can just load that particular Keras model as a Python function and then just use a panda’s input in frame to score the particular model. Or alternatively, if I just wanted to natively deployed it in the Keras environment, then I just use the Keras load model API, which will load the image file model as it’s actually stored and then supply the keras input frame or whatever the reptor it is or whatever the tensor is actually requires, as part of the predict function, and it does the scoring. And this important way to represent this different flavors gives you the ability for all this different models that you actually create, API torch, scikit-learn, the Spark, the TensorFlow or Keras. With its Python function to be able to deploy them in a different diverse environment where Python is understood. This is actually quite powerful. And we’ll look at some of that in the lab, hopefully, we’ll get time through that because we got about another 20 minutes or so. So to summarize what are MLflow models?
Well, they’re nothing but packaging format for MLflow models. Just like projects, they’re nothing but any directory in which you actually have an ML model file. And they allow you to define dependency for reproducibility. So if I’m running this in a Docker environment, and I have a Docker environment that will capture the dependencies. Over here the dependencies are what do I actually need to run this particular model.
We provide model creation utilities, so I can actually use MLflow model flavor to save the particular model. And I can use MLflow load model or I can use MLflow load Python model to load those utilities in a target environment. And we provide all these different deployment API’s for us to actually do that. So that in an essence, really is in a nutshell in the full projects and MLflow models. And the MLflow models and tracking are very, very related because the tracking API allows you to give you the utilities to actually create and save this particular models. So I think this is pretty much ends our discussion and what we’re gonna do now is go straight to the tutorials.
(clears throat) So these are the projects we’re gonna work on, the tutorial we’re gonna work on, hopefully. By now if you have Git clone that particular URL that you actually have. And we’ll go and you have installed or at least sign up for Community Edition, and we’ll go straight into it and see how we actually fare, okay.
Can you guys see my screen?
– Yes sir. – Yep. – All right. Is the phone good enough, can you actually see it? – [Mary] Actually, I think we just see your desktop right now, Jules. – Okay.
Let me just reshare it again. I think I’m running into…
Let me see if I can do new share and we try this.
How about now? – [Mary] I think so. Yeah, so we can see a few windows, the one to run project example. – Okay, so you can see the notebook down here can you? – [Mary] Yes, yep.
– Okay, let me just do this here.
Is the phone good enough? – Maybe one– – Okay, let me know… How about now?
– [Mary] Yeah, I think that looks great. – Okay, brilliant. So I just wanna…
How many of you actually have already signed up for Community Edition? Had a chance to log in and go into the Community Edition.
Any quick count? If not, I’ll go straight into it. I’m assuming that people have created a Community Edition. They have logged in. And I’m gonna just very briefly, really quickly give you summary what Community Edition is and how you can actually use it. – [Mary] Yeah, I think a few people responded, yes, they’ve– – Okay, alright. So I’ll just ahead and go straight into it. Right, assuming that you actually have loaded the DBC files.
If you don’t know how to load this file, you can actually go to your workspace and what you do is do an import. And in this particular URL, you can either type in the entire DBC URL that you have in the GitHub, or you can actually just go and do a browse. And then go to your GitHub directory.
And click on this, done.
And then you import. If you don’t import, it’s gonna go ahead and import those particular files. A couple of ways to do that. So I’m assuming you actually have done that and you have a quick way to navigate that. The next thing we actually wanna do is, we’re gonna go ahead and create a particular cluster, I already create this for the cluster but the way to create the cluster, is to go to these tabs on the left hand side or icons on the left hand side. Click on this to create a cluster. Type in your cluster name and then choose the 6.5 machine learning runtime.
So that installs the MLflow and TensorFlow and Keras and all the libraries that you actually need. So you don’t have to worry about creating those. Now the reason my button is grayed out is because I already created that cluster. So when you create the cluster you would start spinning stuff. And then when it’s ready, what you’ll have is a solid yellow, a solid green dot that you can actually attach to it. So my cluster has got this particular configuration. And so now let’s do it.
Right, the way you navigate around the sale is that you click on this particular sale you can either hit Control Return, which will execute this particular shell and remain on the same shell. Or you can, or you can hit Shift Return, will relax you to this particular cell, and then go to the next one. The other way to do that would be to actually use this particular drop down arrow to actually run the sale or run everything above it, right? So we’re gonna run this. And the first thing we’re gonna do is just show you what this particular MLflow for example looks like that I wanna execute in this particular notebook. So I can click on this and let me open a new tab.
And if I look at this particular example. This is an example was created a long time ago with Matei. And this was where to illustrate that. If you look at my MLflow project I talked about again, it’s a project specification that tells me this is what my entry points are. Here is my conda environment, execute this particular command. I can look at now with, at the conda file. And these are the dependencies that actually need to create this. I can look at the the train.py and that’s my Python program that actually trains a linear model that actually trains to predict the acidity of a particular wine. So it’s very simple thing but the idea is I wanted to illustrate how you can actually do that. So let’s go back over here and execute this.
Right, so I’m using 1.7 and I’m gonna have to run this because in databricks I have to create this particular profile that allows me to create a token on my remote machine, so that I can actually execute that. It Is something you actually need on a Databricks Community Edition. You really don’t need it on the local machine.
And I hear this parameter are actually specified. So for this particular thing, what I actually want is my alpha, which is part of the learning curve. These are the four arguments that I actually wanna use. Here’s my Git URL that I wanna use.
And then I’m gonna use MLflow run to submit my run. This is the way I’m gonna actually go and execute this particular run. How are doing the time? I’ve got 15 minutes. Alright, so here, this is just a very simple loop. I’m gonna go ahead and run it. And at this point, what this is gonna do is say earlier, it’s gonna go to the GitHub, it’s gonna download this particular directory. It will start executing and installing packages that actually need on a particular temporary location. It will activate a conda environment. It will go in and execute this particular run and then give me the results, all right? Now this might take a few minutes for some of you guys.
And the good thing about it is that if I don’t specify any of these, then I’m just gonna use the default parameters. And so if you wanted to experiment a little more with this particular project that somebody shared with you on GitHub and you downloaded it, you can specify the arguments you actually need. and it’s gonna go start running this thing, right? And the reason is taking a while because it for the first time. It’s gonna download everything, install all the packages, all the dependencies that actually need in order for you do that. On the local OS, it’s a lot faster but that’s what it is.
Are there any questions on the on the chat.
– [Chris] No, I think we’re looking pretty good. So let’s keep checking. – Lovely.
And once this has actually run what you will get is that it will actually create all this different runs and experiments and then we can actually look at the UI to see what the metrics were. What is the reaction due to change in the parameter and look at what are the hard effects actually created. It’s a simple example, but it looks just the point that when you actually create this particular model, so when you create this particular project, you can share with other people. You can share it publicly with other people.
Interesting question to pose is, do you actually need to get to projects with the ML model registry? And the answer is (clears throat) I mean, yes and no. I think border registries are somewhere where you actually create different stages a particular model to keep track of central place where you actually hit this particular models. The idea is the same as you can actually share the models and discovering the models. But model register is really more advanced feature. Projects is a good way for reproduce that the whole idea behind project is that you can actually reproduce and recreate those environments. And that way you can actually share it using the best public source available, the GitHub.
So here are all my running parameters, I got my run ID. And now I’m gonna go ahead and click on this UI that actually gives me at high level of what my runs are. So when I click on this, this actually the result of all the experiments that actually ran with this particular parameter. And I can see that at high level desire my elbow and ratio, I use the alpha one, since I use naso, these are my matrix that I actually created for all the three runs. I can go ahead and look at each and every run over here by clicking on the up arrow. And now I’m in the MLflow UI. (clears throat) – Excuse me. As you can see, I actually have three runs that have all the parameters that actually logged in. My metrics are…
Actually gives you a nice tabular way to do this.
So in our period, very common regression metrics that I use and then I can compare and contrast these things, all right, I can actually do a compare to see what my model looks like. and this gives you a very nice tabular way. These are my run IDs that that ran, these are all my metrics that I got. This gives you a nice little scatterplot. I can actually use our rsme to see is there a correlation between the two. I can actually change this parameter to see what my scatterplot look like with one ratio. Should be pretty much the same because I’ve used the same ratio everywhere. But you can actually see that I can use all these different way to visualize this. I can look at my contour map. I can look at my parallel coordinates to see how each of these actually fared. So it’s a good visual representation for UI to actually give that. So that sort of gives you the ability to actually do that. So let’s go back to my run experiment, we actually run that. Another way to run that would be very similar to run this using projects.run. And projects.run is just another way to actually execute the same thing, they actually execute the same thing. So I’m not gonna do that. But here, we actually looked at some of the matrix. And as a result, you can actually create a fairly complex multi-step flow we actually looked at. Now let’s look at the second one. The second one I’m gonna do is a keras example.
And over here, I’m going to demonstrate how you can actually use this particular payload function. So here, I’m just gonna import all the stuff that I need. Now on the second notebook, I’ll just go ahead and create this little thing called import a warning. And I’m running TensorFlow 1.1 and you will see in my third example, how I ignored TensorFlow 1.1 but I use the TensorFlow 2.0 because that’s the model I created and that’s the whole way to prove that you can actually reproduce and recreate environments using MLflow projects, right? So this is my TensorFlow running because there’s what natively running. So I’m just gonna create a very simple model over here. All it does is takes a Fahrenheit as a temperature and then predicts it Celsius.
I’m gonna create a very dense two layer model, is that a linear regression. And one of the great things about MLflow tracking is if you have hundreds of parameters and if you have half a dozen metrics, it becomes a bit cumbersome when you actually use the MLflow API to go and log something, log parameter A, log parameter B, log parameter C, or log metric A, B and C. What we have done is to make things easier for developers is that with each and every framework that we actually support, we have this notion of autologging. And you just create one MLflow autolog or Spark autolog or Keras autolog, and it takes care of all the logging for you. So your code becomes very simple, your code becomes very succinct. You probably use the same machine learning code that you use everywhere and the only thing you actually need is start to run the MLflow to instantiate a session with the tracking server. And then there to use MLflow autolog. And these are some of the PI requests that actually going on so we should have that in the next couple of release. And this is all my code. I’m doing a MLflow.start_run creating a session with the tracking server. I get my experiment ID. I create my baseline model. I’m gonna use this MLflow.keras.autolog. Remember, I’m not logging any of the parameters, everything will be logged for me. I’ll do the fit of the model and then I’ll do the prediction after I get the model. So let’s run this. So this is gonna train. And this is my driver, I’m gonna create a batch of 10 epochs thousand and then run the MLflow. So let’s look at what this looks like.
So what I’ll do in the interest of time, I’m gonna go ahead and go to my pre-run because it’s gonna take a little while to run it. To demonstrate to you that when I run that, this is this is gonna train (clears throat) and then I can actually run this to particularly train this particular model. And this goes through all these epochs and then finally we’ll give you the metrics. Then I can actually I should go to my runs and I should see everything, right?
All these things, I didn’t have to log individually. Keras autolog took care of all that. If I had to do all that I would actually have to log each, I would have to have like several code, you log this, log that and so on and so forth. So it becomes a bit cumbersome. This allows you to say just go ahead and do an auto log and then I don’t have to worry about it It’ll keep track of everything. The good thing it actually does, it also prints out the summaries and gives you a particular tag and if you go to this particular model summary, it gives you the entire summary of what your layer looks like how many of them had trainable parameters, and so on and so forth. You can actually look at the particular model and this is actually a conda file that actually has 1.1.5 run, all right? And then I can look at my mean square error, to see what it looks like. I can actually step through that, gain nice little smooth curve. I can compare and add another one here, let’s say loss. And it gives me the ability to see how my mean square actually progress of each and every epoch is actually approaching zero. So that essentially sort of gives you the ability to run this keras model running in natively.
Now, I’m gonna do something else over here, just to show you that here’s my prediction for the temperatures. And you can see the model is not very good, right? It just should be close to zero and this should be close to 100. So what is the thing we actually do? Is it because of 1.5? Is it because I don’t have enough data? So let’s go in and look at the third example, where I run this particular project that I created that I shared with someone. And I’m gonna download that. The changes I’ve made over here is just one thing I’m running TensorFlow 2.0, rather than TensorFlow 1.1. So when I ran this particular model, what is the project look like? Let’s open a new tab.
Very simple file, I’m a conda environment. These are my parameters that actually trained by default I’m using 10 and epochs of 100. I can look in my conda file. And if you look at the conda file, I have all this dependencies. Now I’m using TensorFlow 2.0, right? So what will happen over here is that when I actually run this particular project, it’s not gonna use the TensorFlow 1.1, which is running mentally, but you will recreate my environments with TensorFlow 2.0, and all those new dependencies and it will actually train this particular model into the prediction in this particular project environment. So let’s go back over here, and then run this, right? And before went there, I’m running this particular GitHub, the same thing it did that before, and I’m going Run this particular parameters. And when I ran this, I get this particular UI, I get this my run ID and my metrics. And the next thing I’m gonna do is to prove to you that that it actually ran 1.7.
Here’s my 2.x flavor, I’m gonna load this particular model. Remember, we talked about it I load this Keras a load model I’m gonna load this back as a Python function model. And this is how we actually load the model you actually provide the URI, you load the particular function and so you loaded this particular back-end function and then I do the prediction. And when I did the prediction with this new particular model that ran 2.0, I got a fairly good prediction, right? It was much better. The only thing I changed was the most recent one. And I can make this back to the model far more accurate by either heavy more layers, increasing the data, doing other parameters and start running this experiment over and over again, to get a better metric. And then this sort of prove that the reproducibility of ML project is that even though I’m running TensorFlow Keras in my second example, the third example actually, the TensorFlow 2.0. And it downloaded the TensorFlow 2.0 it recreate this particular environment to do that. And if I went back to the runs, and I went to this particular directory for my run, and I looked at the particular run.
And I looked at my model conda yaml. It’s TensorFlow 2.0. So it essentially download the Tensor Flow to recreate this particular model. So I think we actually coming close to the hour and I have taken up almost everything. Like I said, it’s always difficult to go through all this particular examples, but the examples are there for you. Examples out there for you to run tonight or tomorrow. If you have any questions, you can always send me email or DM me
Dive through the internals of Delta Lake, a popular open source technology enabling ACID transactions, time travel, schema enforcement and more on top of your data lakes.