Performance Benchmark: Windows vs. Linux for Stable Diffusion WebUI on WSL2

Home » Tech » Performance Benchmark: Windows vs. Linux for Stable Diffusion WebUI on WSL2

February 10, 2024
No Comments

overview

According to Wiki, when running Stable Diffusion WebUI, WSL2 seems to be faster on Windows and WSL2.

I tested if there is speed difference even though the programs themselves are exactly the same.

test environment

CPU: Intel Core i7-13700K
CPU Cooler: Big shuriken3 RGB size
Motherboard: Asrock Z690M-ITX/ax
SSD:M2_2 (chipset side) ARKINE NVMe Gen3 SSD 256GB
M2_1 (CPU side) None
Aluminum Case NVMe-USB3.2 Gen2X2 BLM20C:SUNEAST SE900NVG3-256G (not used this time)
Power: Corsair SFX 750W SF-750 power supply
Memory: G Skill Trident Z DDR4-3600 OC memory 16GB*2=32GB
Case: QDIY 0040-*PCJMK6-ITX (testbed)
Operating System: Lubuntu 22.04LTS (based on jammy-jellyfish)
Zotac Geforce RTX3060 Twinedge and Zotac Geforce RTX4070Ti 12GB GPU
Windows11 operating system (updated to the latest) and Lubuntu 22.04LTS (updated to the latest)
Windows11 driver official nVIDIA GRD driver, Linux official nVIDIA 545 driver

This time we didn’t use a USB-SSD, but we tested it using a 256GB NVMe Gen3 SSD connected to the main drive.

Now let’s look at the results.

Stable deployment WebUI generation benchmark

For Windows we cloned the portable version distributed on this site and for Linux we cloned it from github.

Python on Windows is 3.10.6 and on Linux it is 3.6.12.

torch2.0.1+cu118.

Options include –autolaunch and –opt-sdp-attention.

Verification method

Benchmark Hello Asuka: batch size 1 with batch count 10
Hi Asuka Benchmark 768: The above settings are set to 768*768

We measured each of these three times with cudnn as the default (built-in flashlight) and replaced it with the latest version and calculated the average value.

The fewer seconds, the better the performance, and the higher the number of processing steps per second, the better the performance.

Benchmark Hi Asuka: 512*512/28 steps/10 images

Hello Asuka Benchmark 768:768*768/28 steps/10 images

There’s basically nothing superior to Linux in terms of Windows performance, and the performance difference is especially noticeable with the latest GPU, the RTX4070Ti.

The RTX4070Ti on Linux is stable and fast regardless of cudnn version.

On the other hand, the RTX4070Ti on Windows is clearly slower than Linux.

The performance difference seems small for 512*512, but the results for 768*768 show that the difference increases as the processing becomes enormous.

On the other hand, with the dying RTX3060, it was confirmed that when the cudnn version was increased, the results with Linux decreased.

From this it appears that Windows drivers and libraries lag behind Linux in optimization.

Even so, it doesn’t outperform Linux in any area, so there must be a clear bottleneck somewhere.

benchmark kohya_ss gui (learning).

Generate LoRA using the frog image for operation and performance confirmation distributed on this site.

Comparison of processing steps per second with AdamW8bit, Lion8bit, AdamW and Lion

Since the number of processing is huge, there is almost no change in the numbers and it is a one-time operation.

The higher the number, the higher the performance.

Now let’s look at the learning outcomes.

In terms of reasoning, it wasn’t that big… although there was quite a difference in Ada Lovelace, but in terms of learning, there was a difference that couldn’t be reversed no matter how hard I tried.

For the RTX4070Ti the difference is almost double, and even for the previous generation RTX3060, which seems more optimized, there is a difference of about 30%.

Python, pytorch, etc. they were originally ported from Linux to Windows, so when combined with the inference results, it seems like there is some sort of bottleneck in Windows.

In short

I will write a cruel conclusion for Windows users.

Windows has no advantage as an operating system that uses AI generation.

Especially for those who use the latest generation GPUs, it can be said that using free Linux you can obtain performance that is one degree higher than Windows.

Furthermore, I think it is safe to say that performance will be 2-3 grades higher in terms of learning.

Although not related to Stable Diffusion WebUI, TensorFlow-GPU (which uses CUDA for calculations) has already canceled the release of Windows version binaries, and considering such future trends, it is difficult to use AI generation on Windows. at all.

First of all, with Linux you can use ROCm, so you can use Radeon too.

This expands your options and allows you to organize your environment at a relatively low cost.

By the way, I made this project because of what I wanted to say in the previous two lines.

Grabo used it for verification this time

ZOTAC

¥43,979 (As of 2/10 2024 19:04:14 Amazon search – details)

ZOTAC

¥136,498 (as of 2/10 2024 19:04:51 Amazon search – details)

Currently, RTX4070Super is a good deal.

Expert oriented

¥101,714 (as of 2/10 2024 19:05:44 Amazon search – details)

#compared #speed #Stable #Diffusion #WebUI #Windows #Linux #Gaming #guide #explained #selftaught #user

Comments

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Plaintiffs Accuse Nursing Assistant of Unlawful Medical Practices and Unauthorized Distribution of Affected person Images

May 20, 2024 No Comments

Tragic Helicopter Crash Claims Lives of Iranian President and Overseas Minister

May 20, 2024 No Comments

The Hidden Hyperlink Between Imaginative and prescient and Dementia: How Your Eyes May Predict Dementia 12 Years in Advance

May 20, 2024 No Comments

Newsletter

Performance Benchmark: Windows vs. Linux for Stable Diffusion WebUI on WSL2

overview

Stable deployment WebUI generation benchmark

benchmark kohya_ss gui (learning).

In short

Related

Comments

Leave a Reply Cancel reply

Toriden and Creator “Tuktak” Collaborate for Magnificence Exhibition at Musinsa: “MAKE IT BRIGHT!” Assortment Out there

Medirama Hosts Oncology College of 2024 in Collaboration with APACE-KoNECT

Rangers suffered a bitter defeat in opposition to U. Concepción, falling 2 to 1

King Salman has accomplished his medical examination

Plaintiffs Accuse Nursing Assistant of Unlawful Medical Practices and Unauthorized Distribution of Affected person Images

Tragic Helicopter Crash Claims Lives of Iranian President and Overseas Minister

The Hidden Hyperlink Between Imaginative and prescient and Dementia: How Your Eyes May Predict Dementia 12 Years in Advance

Shohei Ohtani Leads LA Dodgers to Third Straight Win with Stroll-Off Hit

Trending

Toriden and Creator “Tuktak” Collaborate for Magnificence Exhibition at Musinsa: “MAKE IT BRIGHT!” Assortment Out there

Medirama Hosts Oncology College of 2024 in Collaboration with APACE-KoNECT

Rangers suffered a bitter defeat in opposition to U. Concepción, falling 2 to 1

King Salman has accomplished his medical examination

Performance Benchmark: Windows vs. Linux for Stable Diffusion WebUI on WSL2

overview

Stable deployment WebUI generation benchmark

benchmark kohya_ss gui (learning).

In short

Share this:

Related

Comments

Leave a Reply Cancel reply

Trending