Exploring the Potential of Core ML Integration in ComfyUI

·

5 min read

Introduction

I recently acquired a MacBook Pro with an M2 Pro chip and was amazed by its AI capabilities. Initially, I experimented with running Large Language Models (LLMs) locally and was impressed with the inference speeds. My curiosity then led me to Stable Diffusion models, which offer a broad range of applications in the field of Machine Learning.

There's a broad range of options for Stable Diffusion on a Mac, including both paid solutions and free apps like Draw Things and Mochi Diffusion. These apps are straightforward and beginner-friendly, allowing you to generate images swiftly. My experience with Draw Things introduced me to Core ML, a Machine Learning framework that significantly boosts model performance by utilizing the Apple Neural Engine (ANE).

While Draw Things is fantastic for generating images using diverse models from platforms like Hugging Face or Civitai, it lacks flexibility for more advanced workflows, like animation generation. For my research and experiments, I needed a more flexible tool.

In the realm of stable diffusion, two web UIs stand out for their flexibility - stable-diffusion-webui by AUTOMATIC1111 and ComfyUI. Both of these platforms offer high levels of customization. However, the node-based workflows of ComfyUI particularly appealed to me. This feature enables the combination and interaction of different components, offering the adaptability I was seeking.

However, I noticed a shortcoming - ComfyUI does not support CoreML. Even though it uses Metal Performance Shaders (MPS - Apple's equivalent of CUDA) for fast local inference with the newer versions of PyTorch (>2.0), the ANE remains idle while the GPU usage shoots up to 100%.

Motivated to overcome this, I aimed to integrate the flexibility of ComfyUI with the efficiency of CoreML models running on ANE. This blog is about that journey.

Understanding ComfyUI

ComfyUI is a powerful and versatile stable diffusion GUI and backend. It's an essential tool for designing and executing advanced stable diffusion pipelines with its graph, nodes, and flowchart-based interface.

The beauty of ComfyUI lies in its ability to create complex Stable Diffusion workflows without requiring any coding from the user. It's designed to be inclusive, fully supporting SD1.x, SD2.x, and SDXL. For systems with less powerful GPUs, it caters by including a low VRAM mode and a CPU mode.

A key strength of ComfyUI lies in its numerous features, from loading various model types (ckpt, safetensors, diffusers, standalone VAEs, and CLIP models) to offering an asynchronous Queue system. It optimizes the workflow execution by only re-running the parts that change between executions. Additionally, it includes a wide variety of advanced features such as Loras, Hypernetworks, Inpainting with both regular and inpainting models, Upscale Models, unCLIP Models, GLIGEN, and Model Merging, to name a few.

Furthermore, ComfyUI allows saving and loading workflows as JSON files and even loading full workflows from generated PNG files. Its innovative nodes interface can be used to create complex workflows and provides options for area composition, latent previews with TAESD, and much more.

Fast startup times and complete offline functionality make ComfyUI a reliable tool for creating and experimenting with stable diffusion workflows. For a better understanding of ComfyUI's capabilities, numerous workflow examples are available on the Examples page.

Exploring the Capabilities of ANE

The Apple Neural Engine (ANE) is a Neural Processing Unit (NPU) specifically created to operate Machine Learning models. Embedded in all Apple Silicon devices, it allows the execution of even complex Stable Diffusion models on these devices. For a more comprehensive understanding of ANE, further information can be found here and here.

The Need for Core ML Integration

Although ComfyUI works very fast thanks to the utilization of GPU with MPS, I am simply curious if using ANE could speed things up even more. If that were the case, it would potentially mean super fast image generation with all the benefits of graph-based workflows of ComfyUI. And since ComfyUI allows you to write custom nodes, I was eager to try adapting it (at least to some degree) to work with Core ML models.

The Potential Challenges

In undertaking this project, there are several potential limitations and challenges associated with Core ML and ComfyUI that could impact the final outcome.

One critical constraint with the Core ML model is that the dimensions of inputs must be defined during the conversion process. While there are methods to create flexible inputs, they're not without their own issues. For instance, defining ranges of inputs is one solution, but models converted this way are incompatible with ANE. Another approach is using Enumerated Inputs, which should work with ANE and is a method I'm keen to explore. However, this approach restricts the user to a predetermined set of image dimensions, which could be acceptable if it results in a speed improvement.

ComfyUI presents another set of challenges. It uses a proprietary data model to handle operations. For example, when loading a checkpoint, it's wrapped in a custom class to ensure compatibility with other nodes. Adapting a Core ML model to behave identically to Pytorch models within this environment could prove challenging.

The Expected Impact

I'm aware that integrating this project with other ComfyUI components may not be a smooth process. But even if all I manage to accomplish is a standalone, robust workflow for Core ML models within ComfyUI, I'd consider it a significant achievement. My goal is to harness the power of ANE within the ComfyUI ecosystem, hopefully encouraging more Mac users to run Stable Diffusion locally. This endeavor might even inspire other developers to further optimize the process and strive for superior results.

Conclusion

As I embark on this journey of integrating the power of ANE and the flexibility of ComfyUI, I am aware of the challenges that lie ahead. This project isn't just about enhancing the user experience for Mac users wanting to run Stable Diffusion locally. It's also about setting a precedent and potentially sparking further optimization from other developers in the field.

I invite you to join me on this exciting venture. Follow this blog for updates on my progress and insights into this project. As the code repository becomes available, I encourage you to check it out. Together, we can explore the possibilities of marrying the speed of ANE with the versatility of ComfyUI.