I tried the official codes from Stability without much modifications, and also tried to reduce the VRAM consumption. radianart • 4 mo. 1. • 3 mo. Works as intended, correct CLIP modules with different prompt boxes. ckpt. AdamW8bit uses less VRAM and is fairly accurate. Inside /training/projectname, create three folders. 9 and Stable Diffusion 1. DeepSpeed is a deep learning framework for optimizing extremely big (up to 1T parameter) networks that can offload some variable from GPU VRAM to CPU RAM. Training. Around 7 seconds per iteration. Here are some models that I recommend for. Since the original Stable Diffusion was available to train on Colab, I'm curious if anyone has been able to create a Colab notebook for training the full SDXL Lora model. Its code and model weights have been open sourced, [8] and it can run on most consumer hardware equipped with a modest GPU with at least 4 GB VRAM. This ability emerged during the training phase of. Checked out the last april 25th green bar commit. 0 base and refiner and two others to upscale to 2048px. After training for the specified number of epochs, a LoRA file will be created and saved to the specified location. 5 based LoRA,. AdamW and AdamW8bit are the most commonly used optimizers for LoRA training. Dreambooth in 11GB of VRAM. I use. Tick the box for FULL BF16 training if you are using Linux or managed to get BitsAndBytes 0. Moreover, DreamBooth, LoRA, Kohya, Google Colab, Kaggle, Python and more. (UPDATED) Please note that if you are using the Rapid machine on ThinkDiffusion, then the training batch size should be set to 1 as it has lower vRam; 2. First Ever SDXL Training With Kohya LoRA - Stable Diffusion XL Training Will Replace Older Models - Full Tutorial. r/StableDiffusion. Learn to install Automatic1111 Web UI, use LoRAs, and train models with minimal VRAM. SDXL: 1 SDUI: Vladmandic/SDNext Edit in : Apologies to anyone who looked and then saw there was f' all there - Reddit deleted all the text, I've had to paste it all back. only trained for 1600 steps instead of 30000, 0. 9 to work, all I got was some very noisy generations on ComfyUI (tried different . You may use Google collab Also you may try to close all programs including chrome. Guide for DreamBooth with 8GB vram under Windows. . In Kohya_SS, set training precision to BF16 and select "full BF16 training" I don't have a 12 GB card here to test it on, but using ADAFACTOR optimizer and batch size of 1, it is only using 11. If you wish to perform just the textual inversion, you can set lora_lr to 0. py is a script for SDXL fine-tuning. Use TAESD; a VAE that uses drastically less vram at the cost of some quality. safetensor version (it just wont work now) Downloading model. Switch to the 'Dreambooth TI' tab. . The core diffusion model class (formerly. yaml file to rename the env name if you have other local SD installs already using the 'ldm' env name. Dunno if home loras ever got solved but I noticed my computer crashing on the update version and stuck past 512 working. ) This LoRA is quite flexible, but this should be mostly thanks to SDXL, not really my specific training. OutOfMemoryError: CUDA out of memory. The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0. Fine-tune using Dreambooth + LoRA with faces datasetSDXL training is much better for Lora's, not so much for full models (not that its bad, Lora are just enough) but its out of the scope of anyone without 24gb of VRAM unless using extreme parameters. . I also tried with --xformers --opt-sdp-no-mem-attention. but from these numbers I'm guessing that the minimum VRAM required for SDXL will still end up being about. I just went back to the automatic history. that will be MUCH better due to the VRAM. 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32 47:15 SDXL LoRA training speed of RTX 3060 47:25 How to fix image file is truncated error[Tutorial] How To Use Stable Diffusion SDXL Locally And Also In Google Colab On Google Colab . So if you have 14 training images and the default training repeat is 1 then total number of regularization images = 14. 5 model. Consumed 4/4 GB of graphics RAM. I wrote the guide before LORA was a thing, but I brought it up. Version could work much faster with --xformers --medvram. Share Sort by: Best. 9 through Python 3. 5GB vram and swapping refiner too , use --medvram. Video Summary: In this video, we'll dive into the world of automatic1111 and the official SDXL support. VRAM spends 77G. 5 training. accelerate launch --num_cpu_threads_per_process=2 ". 512 is a fine default. It takes a lot of vram. 6. worst quality, low quality, bad quality, lowres, blurry, out of focus, deformed, ugly, fat, obese, poorly drawn face, poorly drawn eyes, poorly drawn eyelashes, bad. DreamBooth training example for Stable Diffusion XL (SDXL) . Base SDXL model will stop at around 80% of completion. Here is the wiki for using SDXL in SDNext. Funny, I've been running 892x1156 native renders in A1111 with SDXL for the last few days. 7 GB out of 24 GB) but doesn't dip into "shared GPU memory usage" (using regular RAM). Below the image, click on " Send to img2img ". No branches or pull requests. Join. in anaconda, run:I haven't tested enough yet to see what rank is necessary, but SDXL loras at rank 16 come out the size of 1. MSI Gaming GeForce RTX 3060. Discussion. 9 and Stable Diffusion 1. -Works on 16GB RAM + 12GB VRAM and can render 1920x1920. SDXL has 12 transformer blocks compared to just 4 in SD 1 and 2. Augmentations. How to do checkpoint comparison with SDXL LoRAs and many. It's important that you don't exceed your vram, otherwise it will use system ram and get extremly slow. optional: edit evironment. Don't forget your FULL MODELS on SDXL are 6. $270 at Amazon See at Lenovo. 0 came out, I've been messing with various settings in kohya_ss to train LoRAs, as well as create my own fine tuned checkpoints. I've a 1060gtx. Then this is the tutorial you were looking for. 10 is the number of times each image will be trained per epoch. 08. 6). My hardware is Asus ROG Zephyrus G15 GA503RM with 40GB RAM DDR5-4800, two M. Also, for training LoRa for the SDXL model, I think 16gb might be tight, 24gb would be preferrable. Refiner same folder as Base model, although with refiner i can't go higher then 1024x1024 in img2img. Train costed money and now for SDXL it costs even more money. 7:42. 0 almost makes it worth it. Based on a local experiment with GeForce RTX 4090 GPU (24GB), the VRAM consumption is as follows: 512 resolution — 11GB for training, 19GB when saving checkpoint; 1024 resolution — 17GB for training,. 0 comments. Click it and start using . The training image is read into VRAM, "compressed" to a state called Latent before entering U-Net, and is trained in VRAM in this state. 5 (especially for finetuning dreambooth and Lora), and SDXL probably wont even run on consumer hardware. System requirements . You signed in with another tab or window. The largest consumer GPU has 24 GB of VRAM. Here I attempted 1000 steps with a cosine 5e-5 learning rate and 12 pics. 5 Models > Generate Studio Quality Realistic Photos By Kohya LoRA Stable Diffusion Training - Full Tutorial I'm not an expert but since is 1024 X 1024, I doubt It will work in a 4gb vram card. SDXL Model checkbox: Check the SDXL Model checkbox if you're using SDXL v1. Stable Diffusion XL (SDXL) was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. It could be training models quickly but instead it can only train on one card… Seems backwards. Head over to the following Github repository and download the train_dreambooth. Launch a new Anaconda/Miniconda terminal window. Notes: ; The train_text_to_image_sdxl. 5 Models > Generate Studio Quality Realistic Photos By Kohya LoRA Stable Diffusion Training - Full TutorialI'm not an expert but since is 1024 X 1024, I doubt It will work in a 4gb vram card. Ultimate guide to the LoRA training. 6. With swinlr to upscale 1024x1024 up to 4-8 times. Automatic 1111 launcher used in the video: line arguments list: SDXL is Vram hungry, it’s going to require a lot more horsepower for the community to train models…(?) When can we expect multi-gpu training options? I have a quad 3090 setup which isn’t being used to its full potential. I just tried to train an SDXL model today using your extension, 4090 here. Same gpu here. I don't have anything else running that would be making meaningful use of my GPU. I would like a replica of the Stable Diffusion 1. Your image will open in the img2img tab, which you will automatically navigate to. What you need:-ComfyUI. 5. Researchers discover that Stable Diffusion v1 uses internal representations of 3D geometry when generating an image. For training, we use PyTorch Lightning, but it should be easy to use other training wrappers around the base modules. Suggested upper and lower bounds: 5e-7 (lower) and 5e-5 (upper) Can be constant or cosine. The 3060 is insane for it's class, it has so much Vram in comparisson to the 3070 and 3080. Tried that now, definitely faster. Suggested Resources Before Doing Training ; ControlNet SDXL development discussion thread ; Mikubill/sd-webui-controlnet#2039 ; I suggest you to watch below 2 tutorials before start using Kaggle based Automatic1111 SD Web UI ; Free Kaggle Based SDXL LoRA Training New nvidia driver makes offloading to RAM optional. However, results quickly improve, and they are usually very satisfactory in just 4 to 6 steps. Got down to 4s/it but still if you got 2. As expected, using just 1 step produces an approximate shape without discernible features and lacking texture. 0 with lowvram flag but my images come deepfried, I searched for possible solutions but whats left is that 8gig VRAM simply isnt enough for SDLX 1. How To Do SDXL LoRA Training On RunPod With Kohya SS GUI Trainer & Use LoRAs With Automatic1111 UI. You know need a Compliance. Example of the optimizer settings for Adafactor with the fixed learning rate:Try the float16 on your end to see if it helps. 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32. Supporting both txt2img & img2img, the outputs aren’t always perfect, but they can be quite eye-catching, and the fidelity and smoothness of the. Just an FYI. So, this is great. Using locon 16 dim 8 conv, 768 image size. Head over to the official repository and download the train_dreambooth_lora_sdxl. Training a SDXL LoRa can easily be done on 24gb, taking things furthers paying for cloud when you already paid for. 7:06 What is repeating parameter of Kohya training. This option significantly reduces VRAM requirements at the expense of inference speed. 5 based custom models or do Stable Diffusion XL (SDXL) LoRA training but… 2 min read · Oct 8 See all from Furkan Gözükara. The default is 50, but I have found that most images seem to stabilize around 30. We experimented with 3. With 24 gigs of VRAM · Batch size of 2 if you enable full training using bf16 (experimental). I am using RTX 3060 which has 12GB of VRAM. No need for batching, gradient and batch were set to 1. Big Comparison of LoRA Training Settings, 8GB VRAM, Kohya-ss. FP16 has 5 bits for the exponent, meaning it can encode numbers between -65K and +65. </li> </ul> <p dir="auto">Our experiments were conducted on a single. Well dang I guess. Based on a local experiment with GeForce RTX 4090 GPU (24GB), the VRAM consumption is as follows: 512 resolution — 11GB for training, 19GB when saving checkpoint; 1024 resolution — 17GB for training, 19GB when saving checkpoint; Let’s proceed to the next section for the installation process. If training were to require 25 GB of VRAM then nobody would be able to fine tune it without spending some extra money to do it. . Generated images will be saved in the "outputs" folder inside your cloned folder. 9 working right now (experimental) Currently, it is WORKING in SD. Now it runs fine on my nvidia 3060 12GB with memory to spare. Repeats can be. How to install #Kohya SS GUI trainer and do #LoRA training with Stable Diffusion XL (#SDXL) this is the video you are looking for. I have 6GB Nvidia GPU and I can generate SDXL images up to 1536x1536 within ComfyUI with that. 11. 5 loras at rank 128. At the moment I experimenting with lora trainig on 3070. You buy 100 compute units for $9. The generated images will be saved inside below folder How to install Kohya SS GUI trainer and do LoRA training with Stable Diffusion XL (SDXL) this is the video you are looking for. Even after spending an entire day trying to make SDXL 0. Navigate to the directory with the webui. 69 points • 17 comments. -- Let’s say you want to do DreamBooth training of Stable Diffusion 1. SDXL in 6GB Vram optimization? Question | Help I am using 3060 laptop with 16gb ram on my 6gb video card. AdamW and AdamW8bit are the most commonly used optimizers for LoRA training. I have the same GPU, 32gb ram and i9-9900k, but it takes about 2 minutes per image on SDXL with A1111. Stable Diffusion XL(SDXL)とは?. You can specify the dimension of the conditioning image embedding with --cond_emb_dim. Big Comparison of LoRA Training Settings, 8GB VRAM, Kohya-ss. /sdxl_train_network. 80s/it. SDXL = Whatever new update Bethesda puts out for Skyrim. 0 works effectively on consumer-grade GPUs with 8GB VRAM and readily available cloud instances. It works by associating a special word in the prompt with the example images. Well dang I guess. 手順1:ComfyUIをインストールする. Since this tutorial is about training an SDXL based model, you should make sure your training images are at least 1024x1024 in resolution (or an equivalent aspect ratio), as that is the resolution that SDXL was trained at (in different aspect ratios). In this tutorial, we will use a cheap cloud GPU service provider RunPod to use both Stable Diffusion Web UI Automatic1111 and Stable Diffusion trainer Kohya SS GUI to train SDXL LoRAs. 2 (1Tb+2Tb), it has a NVidia RTX 3060 with only 6GB of VRAM and a Ryzen 7 6800HS CPU. --medvram and --lowvram don't make any difference. 5 and 2. num_train_epochs: Each epoch corresponds to how many times the images in the training set will be "seen" by the model. July 28. I think the minimum. However, with an SDXL checkpoint, the training time is estimated at 142 hours (approximately 150s/iteration). . A simple guide to run Stable Diffusion on 4GB RAM and 6GB RAM GPUs. See how to create stylized images while retaining a photorealistic. Based on our findings, here are some of the best value GPUs for getting started with deep learning and AI: NVIDIA RTX 3060 – Boasts 12GB GDDR6 memory and 3,584 CUDA cores. Refine image quality. In this tutorial, we will discuss how to run Stable Diffusion XL on low VRAM GPUS (less than 8GB VRAM). 46:31 How much VRAM is SDXL LoRA training using with Network Rank (Dimension) 32 47:15 SDXL LoRA training speed of RTX 3060 47:25 How to fix image file is truncated error [Tutorial] How To Use Stable Diffusion SDXL Locally And Also In Google Colab On Google Colab . json workflows) and a bunch of "CUDA out of memory" errors on Vlad (even with the. Get solutions to train SDXL even with limited VRAM — use gradient checkpointing or offload training to Google Colab or RunPod. Higher rank will use more VRAM and slow things down a bit, or a lot if you're close to the VRAM limit and there's lots of swapping to regular RAM, so maybe try training ranks in the 16-64 range. 7. It'll process a primary subject and leave. SDXL 1. Full tutorial for python and git. In this video, I'll show you how to train amazing dreambooth models with the newly released SDXL 1. I tried recreating my regular Dreambooth style training method, using 12 training images with very varied content but similar aesthetics. 1 text-to-image scripts, in the style of SDXL's requirements. Close ALL apps you can, even background ones. If you use newer drivers, you can get past this point as the vram is released and only uses 7GB RAM. Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook 🧨. The interface uses a set of default settings that are optimized to give the best results when using SDXL models. Wiki Home. 5 based checkpoints see here . Switch to the advanced sub tab. 92GB during training. Currently on epoch 25 and slowly improving on my 7000 images. Even less VRAM usage - Less than 2 GB for 512x512 images on ‘low’ VRAM usage setting (SD 1. ADetailer is on with "photo of ohwx man" prompt. 9 delivers ultra-photorealistic imagery, surpassing previous iterations in terms of sophistication and visual quality. You don't have to generate only 1024 tho. Thanks @JeLuf. ago. SDXL 1. It may save some mb of VRamIt still would have fit in your 6GB card, it was like 5. compile to optimize the model for an A100 GPU. Max resolution – 1024,1024 (or use 768,768 to save on Vram, but it will produce lower-quality images). However, there’s a promising solution that has emerged, allowing users to run SDXL on 6GB VRAM systems through the utilization of Comfy UI, an interface that streamlines the process and optimizes memory. Default is 1. @echo off set PYTHON= set GIT= set VENV_DIR= set COMMANDLINE_ARGS=--medvram-sdxl --xformers call webui. Discussion. You don't have to generate only 1024 tho. 手順3:ComfyUIのワークフロー. SDXL 1. Phone : (540) 449-5501. Imo I probably could have raised the learning rate a bit but I was a bit conservative. How to do SDXL Kohya LoRA training with 12 GB VRAM having GPUs. 9 doesn't seem to work with less than 1024×1024, and so it uses around 8-10 gb vram even at the bare minimum for 1 image batch due to the model being loaded itself as well The max I can do on 24gb vram is 6 image batch of 1024×1024. 5 GB VRAM during the training, with occasional spikes to a maximum of 14 - 16 GB VRAM. Create perfect 100mb SDXL models for all concepts using 48gb VRAM - with Vast. So I had to run my desktop environment (Linux Mint) on the iGPU (custom xorg. Applying ControlNet for SDXL on Auto1111 would definitely speed up some of my workflows. Which makes it usable on some very low end GPUs, but at the expense of higher RAM requirements. There's also Adafactor, which adjusts the learning rate appropriately according to the progress of learning while adopting the Adam method Learning rate setting is ignored when using Adafactor). Going back to the start of public release of the model 8gb VRAM was always enough for the image generation part. But here's some of the settings I use for fine tuning SDXL on 16gb VRAM: in this comment thread said kohya gui recommends 12GB but some of the stability staff was training 0. It is a much larger model compared to its predecessors. I think the minimum. At the moment, SDXL generates images at 1024x1024; if, in the future, there are models that can create larger images, 12 GB might be short. I've found ComfyUI is way more memory efficient than Automatic1111 (and 3-5x faster, as of 1. Over the past few weeks, the Diffusers team and the T2I-Adapter authors have been collaborating to bring the support of T2I-Adapters for Stable Diffusion XL (SDXL) in diffusers. Training LoRA for SDXL 1. Stable Diffusion is a popular text-to-image AI model that has gained a lot of traction in recent years. Fast ~18 steps, 2 seconds images, with Full Workflow Included! No controlnet, No inpainting, No LoRAs, No editing, No eye or face restoring, Not Even Hires Fix! Raw output, pure and simple TXT2IMG. 🧨 Diffusers Introduction Pre-requisites Vast. 0-RC , its taking only 7. I the past I was training 1. Since this tutorial is about training an SDXL based model, you should make sure your training images are at least 1024x1024 in resolution (or an equivalent aspect ratio), as that is the resolution that SDXL was trained at (in different aspect ratios). From the testing above, it’s easy to see how the RTX 4060 Ti 16GB is the best-value graphics card for AI image generation you can buy right now. 0-RC , its taking only 7. I haven't tested enough yet to see what rank is necessary, but SDXL loras at rank 16 come out the size of 1. 2. May be even lowering desktop resolution and switch off 2nd monitor if you have it. I don't believe there is any way to process stable diffusion images with the ram memory installed in your PC. Resizing. 1500x1500+ sized images. All you need is a Windows 10 or 11, or Linux operating system, with 16GB RAM, an Nvidia GeForce RTX 20 graphics card (or equivalent with a higher standard) equipped with a minimum of 8GB. By using DeepSpeed it's possible to offload some tensors from VRAM to either CPU or NVME allowing to train with less VRAM. It took ~45 min and a bit more than 16GB vram on a 3090 (less vram might be possible with a batch size of 1 and gradient_accumulation_step=2)Option 2: MEDVRAM. do you mean training a dreambooth checkpoint or a lora? there aren't very good hyper realistic checkpoints for sdxl yet like epic realism, photogasm, etc. Can. 4070 solely for the Ada architecture. Cosine: starts off fast and slows down as it gets closer to finishing. Despite its powerful output and advanced architecture, SDXL 0. Click to open Colab link . Learn to install Automatic1111 Web UI, use LoRAs, and train models with minimal VRAM. The new version generates high-resolution graphics while using less processing power and requiring fewer text inputs. You signed out in another tab or window. SDXL includes a refiner model specialized in denoising low-noise stage images to generate higher-quality images from the base model. I'm using a 2070 Super with 8gb VRAM. We were testing Rank Size against VRAM consumption at various batch sizes. Successfully merging a pull request may close this issue. I can train lora model in b32abdd version using rtx3050 4g laptop with --xformers --shuffle_caption --use_8bit_adam --network_train_unet_only --mixed_precision="fp16" but when I update to 82713e9 version (which is lastest) I just out of m. and it works extremely well. I know almost all tricks related to vram, including but not limited to “single module block in GPU, like. But I’m sure the community will get some great stuff. 5 models and remembered they, too, were more flexible than mere loras. One of the reasons SDXL (and SD 2. Finally had some breakthroughs in SDXL training. Roop, base for faceswap extension, was discontinued on 20. 画像生成AI界隈で非常に注目されており、既にAUTOMATIC1111で使用することが可能です。. Deciding which version of Stable Generation to run is a factor in testing. 47:25 How to fix image file is truncated error Training Stable Diffusion 1. Which suggests 3+ hours per epoch for the training I'm trying to do. It has incredibly minor upgrades that most people can't justify losing their entire mod list for. SDXL LoRA Training Tutorial ; Start training your LoRAs with Kohya GUI version with best known settings ; First Ever SDXL Training With Kohya LoRA - Stable Diffusion XL Training Will Replace Older Models ComfyUI Tutorial and Other SDXL Tutorials ; If you are interested in using ComfyUI checkout below tutorial When it comes to AI models like Stable Diffusion XL, having more than enough VRAM is important. Experience your games like never before with the power of the NVIDIA GeForce RTX 4090 video. 0004 lr instead of 0. We succesfully trained a model that can follow real face poses - however it learned to make uncanny 3D faces instead of real 3D faces because this was the dataset it was trained on, which has its own charm and flare. You switched accounts on another tab or window. I'm sharing a few I made along the way together with some detailed information on how I run things, I hope. py training script. A very similar process can be applied to Google Colab (you must manually upload the SDXL model to Google Drive). DreamBooth is a training technique that updates the entire diffusion model by training on just a few images of a subject or style. With 3090 and 1500 steps with my settings 2-3 hours. Anyways, a single A6000 will be also faster than the RTX 3090/4090 since it can do higher batch sizes. Moreover, I will investigate and make a workflow about celebrity name based. As for the RAM part, I guess it's because the size of. To start running SDXL on a 6GB VRAM system using Comfy UI, follow these steps: How to install and use ComfyUI - Stable Diffusion. ago. Fooocusis a Stable Diffusion interface that is designed to reduce the complexity of other SD interfaces like ComfyUI, by making the image generation process only require a single prompt. I was playing around with training loras using kohya-ss. System. Training on a 8 GB GPU: . Email : [email protected]. ago. 5 and if your inputs are clean. 92GB during training. BF16 has as 8 bits in exponent like FP32, meaning it can approximately encode as big numbers as FP32. Or things like video might be best with more frames at once. 4260 MB average, 4965 MB peak VRAM usage Average sample rate was 2. There are two ways to use the refiner: use the base and refiner model together to produce a refined image; use the base model to produce an image, and subsequently use the refiner model to add more. Other reports claimed ability to generate at least native 1024x1024 with just 4GB VRAM. It has enough VRAM to use ALL features of stable diffusion. Since those require more VRAM than I have locally, I need to use some cloud service. I found that is easier to train in SDXL and is probably due the base is way better than 1. Epochs: 4When you use this setting, your model/Stable Diffusion checkpoints disappear from the list, because it seems it's properly using diffusers then. 7GB VRAM usage. 5 which are also much faster to iterate on and test atm. Edit: Tried the same settings for a normal lora. I also tried with --xformers -. Hey all, I'm looking to train Stability AI's new SDXL Lora model using Google Colab. 21:47 How to save state of training and continue later. Version could work much faster with --xformers --medvram. In this case, 1 epoch is 50x10 = 500 trainings. Dim 128. 3a. This will save you 2-4 GB of. check this post for a tutorial. SDXL Support for Inpainting and Outpainting on the Unified Canvas. SDXL 1. And make sure to checkmark “SDXL Model” if you are training the SDXL model. Reasons to go even higher VRAM - can produce higher resolution/upscaled outputs. Yikes! Consumed 29/32 GB of RAM. Next). That is why SDXL is trained to be native at 1024x1024. AdamW and AdamW8bit are the most commonly used optimizers for LoRA training. The VxRail upgrade task status in SDDC Manager is displayed as running even after the upgrade is complete. The augmentations are basically simple image effects applied during. Modified date: March 10, 2023. Hello. json workflows) and a bunch of "CUDA out of memory" errors on Vlad (even with the lowvram option). I've gotten decent images from SDXL in 12-15 steps. 36+ working on your system. How To Use Stable Diffusion XL (SDXL 0. 5 SD checkpoint. 12GB VRAM – this is the recommended VRAM for working with SDXL. 4070 uses less power, performance is similar, VRAM 12 GB.