Motivation
It often happens that you want to use your own fine-tuned model or popular models like Meta LLaMA, Stable Diffusion, or others to perform inference and execute a job. These are commonly referred to as batch-endpoint
jobs in Microsoft Azure. In this blog post, we will cover how to use Stable Diffusion SDXL to generate images entirely within Azure under a batch job, with some customizations you may need, such as dependencies using the diffusion library or custom infer.py
code. The image above is generated using the output of this blog post.
Requirements
To follow along, you need an Azure account with a subscription that allows you to use Azure Machine Learning. After logging into Azure, create a workspace. For the purposes of this blog, we will use workspace_name="Generate-Image"
as the name for the workspace. Follow the steps here to create a workspace.
Since we plan to use a GPU, and if this is your first time, you need to request an increase in your GPU quota. The GPU compute instance required is either Standard_NC6s_v3
or Standard_NC4as_T4_v3
. You can request an increase through the contact+support panel in your Azure portal.
For this blog, ensure you have the following prepared:
Additionally, we need to use the Azure CLI, so you should install it by following the instructions here. You can verify that it is working correctly by running:
Setup Pre-requisites
Besides the variables mentioned above, we have other prerequisites that will be introduced later. But why use bash? Think of it as the control room where you manage resources, configure settings, and integrate everything to leverage the benefits of Azure Machine Learning. There are many .sh
files in official Azure examples that you can get inspiration from. Here are the variables we need for our job:
# Model from system registry that needs to be deployed
model_name="stabilityai-stable-diffusion-xl-base-1-0"
model_label="latest"
endpoint_name="text-to-image"
deployment_name="generate-image-batch-sdxl"
deployment_compute="gpu-cluster"
compute_sku="Standard_NC6s_v3"
base_endpoint_path="endpoint"
Now we need to configure our settings and introduce ourselves to az cli
for the first time:
# Configure defaults
az configure --defaults group=$resource_group workspace=$workspace_name location=$location subscription=$subscription_id
There are famous models uploaded in Azure, such as SDXL, under the registry_name="azureml"
registry name. We can access them directly; otherwise, we need to upload our own model into our registry to make it accessible in Azure. To check the model information, use the command below:
# Check if the model exists in the registry
if ! az ml model show --name $model_name --label $model_label --registry-name $registry_name
then
echo "Model $model_name:$model_label does not exist in registry $registry_name"
exit 1
fi
Later, we need the model version, which we can get from the model information above:
model_version=$(az ml model show --name $model_name --label $model_label --registry-name $registry_name --query version --output tsv)
Great! Now we need to set up a compute instance. We can do it manually through the Azure portal, Visual Studio Code, or az cli
to create a cluster with the instance type Standard_NC6s_v3. If it does not exist, use the command below (since we may run the script multiple times during development, it is good practice to use an "if not exist" statement for each module creation to avoid errors):
# Check if compute $deployment_compute exists, else create it
if az ml compute show --name $deployment_compute $workspace_info
then
echo "Compute cluster $deployment_compute already exists"
else
echo "Creating compute cluster $deployment_compute"
az ml compute create --name $deployment_compute --type amlcompute --min-instances 0 --max-instances 2 --size $compute_sku $workspace_info || {
echo "Failed to create compute cluster $deployment_compute"
exit 1
}
fi
To run the job and generate the output, we need some input data. In this case, the input data is a prompt for SDXL to generate an image. The input data can be in the form of a CSV file. For example, you can create a CSV file with the following content:
,prompt,negative_prompt,width,height
0,"photo of a rhino dressed in a suit and tie as a reporter telling news on TV, award-winning photography, photorealistic","ugly, deformed, cartoon, illustration, animation, face, male, female",720,720
Let's call the folder containing the CSV file processed_chunk_path
and keep it under this variable in the bash script:
Deploy Model
What does "Model" refer to here? This is an important question to understand how cloud operations work. Essentially, "Model" encompasses everything from dependencies
, model artifacts (torch files)
, infer.py code
, and the docker image
. If any of these components change, the model needs to be redeployed. We can deploy the model to an endpoint and then invoke the endpoint to see how our deployment works. Often, resources are created using yml
files, but to keep things dynamic and be able to change variables within the yml file, here is a bash function that allows us to modify names inside the yml file:
# Helper function to change parameters in yaml files
change_vars() {
for FILE in "$@"; do
TMP="${FILE}_"
cp $FILE $TMP
readarray -t VARS < <(cat $TMP | grep -oP '{{.*?}}' | sed -e 's/[}{]//g');
for VAR in "${VARS[@]}"; do
sed -i "s/{{${VAR}}}/${!VAR}/g" $TMP
done
done
}
Now, to create the endpoint, we can use:
if [ $(az ml batch-endpoint list --query "[?name=='$endpoint_name'].name" -o tsv) ]; then
echo "Endpoint $endpoint_name already exists."
else
echo "Endpoint $endpoint_name does not exist. Creating it..."
change_vars $base_endpoint_path/generate-endpoint.yml
az ml batch-endpoint create -f $base_endpoint_path/generate-endpoint.yml_
fi
rm $base_endpoint_path/generate-endpoint.yml_
And here is the corresponding yml file for generate-endpoint.yml_
:
$schema: https://azuremlschemas.azureedge.net/latest/batchEndpoint.schema.json
name: "{{endpoint_name}}"
The change_vars()
function replaces endpoint_name
with those set in the bash script before. After creation, we can get the endpoint URL by:
# Get scoring url
echo "Getting scoring URL..."
scoring_url=$(az ml batch-endpoint show -n $endpoint_name --query scoring_uri -o tsv)
echo "Scoring url is $scoring_url"
Now, it's time to create the deployment. Here is the yml file:
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
name: generate-image
type: model
endpoint_name: "{{endpoint_name}}"
model: azureml://registries/azureml/models/{{model_name}}/versions/{{model_version}}
code_configuration:
code: code
scoring_script: infer.py
environment:
image: mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cuda11.8-gpu-inference:latest
conda_file: environment/conda.yaml
resources:
instance_count: 1
settings:
mini_batch_size: 1
retry_settings:
max_retries: 2
timeout: 9999
There is another way to set variables into the yml file with the az command using --set
:
az ml batch-deployment create -f $base_endpoint_path/generate-deployment.yml --set \
endpoint_name=$endpoint_name \
name=$deployment_name \
compute=$deployment_compute \
model=azureml://registries/$registry_name/models/$model_name/versions/$model_version || {
echo "deployment create failed"; exit 1;
}
One of the variables set is model=azureml://registries/$registry_name/models/$model_name/versions/$model_version
, which refers to SDXL. The other variables are names and the compute instance, which were defined earlier. The base image is image: mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cuda11.8-gpu-inference:latest
, and Azure will install conda_file: environment/conda.yaml
into this image so we can handle our dependencies here. Here is the conda file we need for our purpose:
name: model-env
channels:
- anaconda
- pytorch
- conda-forge
dependencies:
- python=3.8.16
- pip<=23.0.1
- pip:
- mlflow==2.3.2
- torch==2.0
- transformers==4.29.1
- diffusers==0.23.0
- accelerate==0.22.0
- azureml-core==1.52.0
- azureml-mlflow==1.52.0
- azure-ai-contentsafety==1.0.0b1
- aiolimiter==1.1.0
- azure-ai-mlmonitoring==0.1.0a3
- azure-mgmt-cognitiveservices==13.4.0
- azure-identity==1.13.0
conda_file: environment/conda.yaml
. Consequently, we're required to incorporate all the conda dependencies into our custom Dockerfile.
In the code configuration section, we define how the inference code should work. Here is our infer.py
script:
import base64
import logging
import os
import pandas as pd
import torch
from diffusers import DiffusionPipeline, EulerAncestralDiscreteScheduler
from vision_utils import image_to_base64, save_image
def init():
global base
global refiner
global g_logger
global output_path
g_logger = logging.getLogger("azureml")
g_logger.setLevel(logging.INFO)
# Log whether CUDA is available
g_logger.info("CUDA available: " + str(torch.cuda.is_available()))
# Output directory and path
output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
g_logger.info("Output path: " + output_path)
# Input directory and path
model_dir = os.environ["AZUREML_MODEL_DIR"]
mlflow_folder = os.listdir(model_dir)[0]
model_path = os.path.join(model_dir, mlflow_folder, "artifacts", "INPUT_model_path")
g_logger.info("Model path: " + model_path)
# List all files in the model path
files = os.listdir(model_path)
g_logger.info("Model files: " + str(files))
# Load base and refiner of SDXL
base = DiffusionPipeline.from_pretrained(
model_path,
safety_checker=None,
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True,
).to("cuda")
base.scheduler = EulerAncestralDiscreteScheduler.from_config(base.scheduler.config)
base.enable_model_cpu_offload()
refiner = DiffusionPipeline.from_pretrained(
model_path,
safety_checker=None,
text_encoder_2=base.text_encoder_2,
vae=base.vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
).to("cuda")
refiner.scheduler = EulerAncestralDiscreteScheduler.from_config(refiner.scheduler.config)
refiner.enable_model_cpu_offload()
g_logger.info("Init complete")
def run(files):
g_logger.info("Input text files: " + str(files))
output_list = []
for data_file in files:
input_pd = pd.read_csv(data_file, header=0)
g_logger.info("Input data file: " + data_file.split("/")[-1])
g_logger.info("Input shape: " + str(input_pd.shape))
# Input columns: prompt, negative_prompt, width, height
assert len(set(input_pd["width"])) == 1, "All width values should be the same"
assert len(set(input_pd["height"])) == 1, "All height values should be the same"
# Get width and height from the first row
row = input_pd.iloc[0]
width = row["width"]
height = row["height"]
# List of prompts and negative prompts
prompts = list(input_pd["prompt"])
negative_prompts = list(input_pd["negative_prompt"])
g_logger.info(f"Prompts size: {len(prompts)}, Negative prompts size: {len(negative_prompts)}")
image = base(
prompt=prompts,
width=width,
height=height,
negative_prompt=negative_prompts,
num_inference_steps=40,
denoising_end=0.8,
output_type="latent",
).images
generated_images = refiner(
prompt=prompts,
width=width,
height=height,
negative_prompt=negative_prompts,
num_inference_steps=40,
denoising_start=0.8,
aesthetic_score=10,
negative_aesthetic_score=2.4,
image=image,
).images
g_logger.info("Generated images size: " + str(len(generated_images)))
for image in generated_images:
base64_img = image_to_base64(image, "PNG")
saved_path = save_image(output_path, image, "PNG")
g_logger.info("Saved image path: " + saved_path)
output_list.append((base64_img, saved_path))
return pd.DataFrame(output_list, columns=["base64", "path"])
Based on the base image and model that we're using, there are couple of important points to consider:
- The input path for accessing the SDXL model is os.environ["AZUREML_MODEL_DIR"]
.
- The output path for saving the generated image is output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
, which allows us to download it from the job's output in Azure.
- We provide width and height parameters to generate an image, primarily to illustrate how parameter passing can be implemented.
- When dealing with large models that barely fit into memory, it's advisable to consider optimization techniques such as quantization. Given that our compute instance has only 16GB of memory, we utilize enable_model_cpu_offload
from the diffusers library for efficient processing.
Invoke Endpoint
Now that deployment is completed, we can invoke the endpoint to execute our batch job for the provided input CSV file saved earlier.
job_name=$(az ml batch-endpoint invoke --name $endpoint_name \
--deployment-name $deployment_name \
--input $processed_chunk_path \
--input-type uri_folder --query name --output tsv) || {
echo "Endpoint invocation failed"; exit 1;
}
This command initiates the job. To monitor its progress and determine when it will finish, we can stream the job logs using:
az ml job stream --name $job_name || {
echo "Job streaming failed. If the job fails with an Assertion Error stating that the actual size of the CSV exceeds 100 MB, consider splitting the input CSV file into multiple files."; exit 1;
}
Streaming may take up to 20 minutes. Once streaming is complete, we can check the status of our job:
status=$(az ml job show -n $job_name --query status -o tsv)
echo $status
if [[ $status == "Completed" ]]
then
echo "Job completed"
elif [[ $status == "Failed" ]]
then
echo "Job failed"
exit 1
else
echo "Job status is neither failed nor completed"
exit 2
fi
Finally, we can easily download the output based on the job name if we've saved it under output_path = os.environ["AZUREML_BI_OUTPUT_PATH"]
in the infer.py
code:
az ml job download --name $job_name --download-path "$base_endpoint_path/generated_images" || {
echo "Job output download failed"; exit 1;
}
It's worth noting that sometimes jobs encounter errors for certain files in the input folders, but the status will still appear as completed. Azure handles these exceptions, so it's important to be aware of this. Utilize logging, as demonstrated in infer.py
. These logs are stored under the path logs/user/stdout/<node_id>/processNNN.stdout.txt
in the logging section of the job.
Clean Up
Azure provides us with commands for deleting the endpoint and compute as shown below:
az ml batch-endpoint delete --name $endpoint_name --yes || {
echo "Endpoint deletion failed"; exit 1;
}
az ml compute delete --name $deployment_compute --yes || {
echo "Compute deletion failed"; exit 1;
}
This process may take a few minutes, but overall, you'll only be charged for the time you use, and after finishing and cleaning up, there will be no further charges.