heuzef
/
jan24_cds_mushrooms

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Utilisation du serveur de tracking MLflow du projet\n",
    "\n",
    "Le serveur de tracking MLFlow est accessible à l'adresse suivante : https://champi.heuzef.com\n",
    "\n",
    "Ce notebook explique comment utiliser ce dernier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Experiment: artifact_location='mlflow-artifacts:/103379370584144202', creation_time=1721579566179, experiment_id='103379370584144202', last_update_time=1721579566179, lifecycle_stage='active', name='champi', tags={}>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Initialisation de l'URL\n",
    "mlflow_server_uri = \"https://champi.heuzef.com\"\n",
    "\n",
    "# Imports et paramétrage de MLflow\n",
    "from mlflow import MlflowClient\n",
    "import mlflow\n",
    "import setuptools\n",
    "\n",
    "mlflow.set_tracking_uri(mlflow_server_uri)\n",
    "mlflow.set_experiment(\"champi\") # Le nom du projet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Vérifier la disponibilité\n",
    "\n",
    "Dans un premier temps, il faut s'assurer que le serveur est bien joignable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Le serveur de tracking MLflow est disponible : https://champi.heuzef.com\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Response [200]>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "def is_mlflow_tracking_server_available(mlflow_server_uri):\n",
    "  try:\n",
    "    response = requests.get(mlflow_server_uri)\n",
    "    if response.status_code == 200:\n",
    "      return True\n",
    "    else:\n",
    "      return False\n",
    "  except requests.exceptions.RequestException:\n",
    "    return False\n",
    "\n",
    "if is_mlflow_tracking_server_available(mlflow_server_uri):\n",
    "  print(\"Le serveur de tracking MLflow est disponible :\", mlflow_server_uri)\n",
    "else:\n",
    "  print(\"Le serveur de tracking MLflow n'est pas disponible.\")\n",
    "\n",
    "requests.get(mlflow_server_uri)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Entrainement d'un modèle pour l'exemple\n",
    "\n",
    "Nous allons entrainer un petit modèle basique, avec Scikit-learn, pour obtenir quelques métriques qui seront enregistrés dans des variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "metrics :\n",
      "{'mae': 130.27056867217001, 'mse': 27735.25123415424, 'rmse': 166.53903816869555, 'r2': 0.14952412556941785}\n",
      "\n",
      "params :\n",
      "{'n_estimators': 10, 'max_depth': 10, 'random_state': 42}\n"
     ]
    }
   ],
   "source": [
    "# Imports librairies\n",
    "from sklearn.ensemble import RandomForestRegressor\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Import d'une database (au pif juste pour ce test)\n",
    "data = pd.read_csv(\"https://github.com/DataScientest-Studio/MLflow/raw/main/fake_data.csv\")\n",
    "X = data.drop(columns=[\"date\", \"demand\", \"weekend\", \"holiday\", \"promo\"])\n",
    "y = data[\"demand\"]\n",
    "X_train, X_val, y_train, y_val = train_test_split(\n",
    "    X, y, test_size=0.2, random_state=42\n",
    ")\n",
    "\n",
    "# Train model\n",
    "params = {\n",
    "    \"n_estimators\": 10,\n",
    "    \"max_depth\": 10,\n",
    "    \"random_state\": 42,\n",
    "}\n",
    "rf = RandomForestRegressor(**params)\n",
    "rf.fit(X_train, y_train)\n",
    "\n",
    "# Evaluate model\n",
    "y_pred = rf.predict(X_val)\n",
    "mae = mean_absolute_error(y_val, y_pred)\n",
    "mse = mean_squared_error(y_val, y_pred)\n",
    "rmse = np.sqrt(mse)\n",
    "r2 = r2_score(y_val, y_pred)\n",
    "metrics = {\"mae\": mae, \"mse\": mse, \"rmse\": rmse, \"r2\": r2}\n",
    "\n",
    "print(\"\\nmetrics :\")\n",
    "print(metrics)\n",
    "\n",
    "print(\"\\nparams :\")\n",
    "print(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Envoi des informations au serveur MLflow\n",
    "\n",
    "Maintenant que nous avons nos resultats, nous allons donc créer une \"run\" et la transférer sur le serveur. \n",
    "\n",
    "Pour cet exemple, c'est le module mlflow.sklearn qui est utilisé. Il vous faudra bien sur utiliser celui adapté à votre outil : https://mlflow.org/docs/latest/python_api/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "run_name = \"run_test_001\" # Le nom de la run, nous utiliserons notre propre nomenclature pour le projet\n",
    "\n",
    "with mlflow.start_run(run_name=run_name) as run:\n",
    "    mlflow.log_params(params)\n",
    "    mlflow.log_metrics(metrics)\n",
    "    mlflow.sklearn.log_model(sk_model=rf, input_example=X_val, artifact_path=run_name+\"_artifacts\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ce code executé en fin de script ou fin de notebook est finalement suffisant et assez flexible pour transférer les informations que nous souhaitons. Mais il est possible de faire encore plus simple en laissant MLflow se debrouiller avec `mlflow.autolog()`.\n",
    "\n",
    "> https://mlflow.org/docs/latest/tracking/autolog.html\n",
    "\n",
    "N'hésitez pas à tester des hyper-paramètres et envoyer quelques métriques à comparer sur l'interface."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Charger le modèle le plus performant"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/heuzef/GIT/jan24_cds_mushrooms/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Downloading artifacts: 100%|██████████| 6/6 [00:01<00:00,  4.64it/s]  \n",
      "2024-10-01 15:18:25.979867: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2024-10-01 15:18:26.089889: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n",
      "2024-10-01 15:18:26.138677: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "2024-10-01 15:18:26.233926: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "2024-10-01 15:18:26.255095: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2024-10-01 15:18:26.371668: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "2024-10-01 15:18:27.797753: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "mlflow.pyfunc.loaded_model:\n",
       "  artifact_path: heuzef_efficientnetb1_010_artifacts\n",
       "  flavor: mlflow.keras\n",
       "  run_id: 93ce2df782da48108f127f3e6c4adb8b"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import mlflow.pyfunc\n",
    "\n",
    "champi_cnn = mlflow.pyfunc.load_model(f\"models:/champi_cnn@champion\")\n",
    "\n",
    "champi_cnn"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}