Our Discord server ⤵️
https://bit.ly/SECoursesDiscord
If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 ⤵️
https://www.patreon.com/SECourses
Technology & Science: News, Tips, Tutorials, Tricks, Best Applications, Guides, Reviews ⤵️
• Technology & Scie...
Playlist of #StableDiffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img ⤵️
• Stable Diffusion ...
DeepFloyd IF GitHub repo ⤵️
https://github.com/deep-floyd/IF
DeepFloyd IF Official Website ⤵️
https://deepfloyd.ai/
DeepFloyd IF Kaggle NoteBook ⤵️
https://www.kaggle.com/furkangozukara...
Generate your Hugging Face token ⤵️
https://huggingface.co/settings/tokens
DeepFloyd IF License Agreement To Accept ⤵️
https://huggingface.co/DeepFloyd/IF-I...
Improved Kaggle Notebook file ⤵️
https://www.patreon.com/posts/enhance...
Kandinsky 2.1 Tutorial ⤵️
• Midjourney Level ...
0:00 Introduction to Stability AI DeepFloyd IF
0:29 How DeepFloyd IF is built and how does it work
0:51 Architecture of the DeepFloyd IF model
1:10 What makes DeepFloyd IF model better
1:55 Strongest part of DeepFloyd IF
2:17 Comparison between DeepFloyd IF and other models
3:16 More detailed architecture of DeepFloyd IF
3:39 Minimum requirements to use DeepFloyd IF
4:18 How to register a free Kaggle account
4:35 How to use DeepFloyd IF on a free Kaggle notebook step by step
5:23 How to contact Kaggle support to activate your Kaggle account for GPU usage
5:40 Other Kaggle notebook settings
5:50 Start Kaggle session and installation
7:50 How to get your Hugging Face token
9:07 How to accept DeepFloyd IF license agreement
9:41 Continuing the installation of the DeepFloyd IF libraries on Kaggle
11:09 Starting image generation with DeepFloyd IF
12:55 Seeing the first ourselves generated images by DeepFloyd IF
14:45 Where is saved generated images
15:15 DeepFloyd IF vs SD 1.5 Custom Model Rev Animated comparison
16:05 DeepFloyd IF vs Kandinsky 2.1 comparison
16:18 DeepFloyd IF vs Stable Diffusion 1.5 base model comparison
16:39 DeepFloyd IF vs Stable Diffusion 2.1 768px base model comparison
16:46 Text generation performance comparison of DeepFloyd IF with other models
17:16 How to disable IF watermark from generated images
17:43 Results of text written image generation
18:35 DeepFloyd IF vs other models text generation comparison
19:19 Experiments of 4 different prompts
20:45 How to download all of the images as a zip file. Utilize ChatGPT to get the code
22:00 Examples provided on DeepFloyd AI and testing them
22:16 How to generate multiple different images with same prompt by using random seeds
24:07 How to delete all generated images in the runtime folder of Kaggle
25:37 How to used downloaded enhanced Kaggle notebook
IF-I-XL-v1.0
DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model, that can generate pictures with new state-of-the-art for #photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset.
Developed by: DeepFloyd, StabilityAI
Model type: pixel-based text-to-image cascaded diffusion model
Cascade Stage: I
Num Parameters: 4.3B
Language(s): primarily English and, to a lesser extent, other Romance languages
License: DeepFloyd IF License Agreement
Model Description: DeepFloyd-IF is modular composed of frozen text mode and three pixel cascaded diffusion modules, each designed to generate images of increasing resolution: 64x64, 256x256, and 1024x1024. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention-pooling
Training Data:
1.2B text-image pairs (based on LAION-A and few additional internal datasets)
Test/Valid parts of datasets are not used at any cascade and stage of training. Valid part of COCO helps to demonstrate "online" loss behaviour during training (to catch incident and other problems), but dataset is never used for train.
Training Procedure: IF-I-XL-v1.0 is a pixel-based diffusion cascade which uses T5-Encoder embeddings (hidden states) to generate 64px image. During training,
Images are cropped to square via shifted-center-crop augmentation (randomly shift from center up to 0.1 of size) and resized to 64px using Pillow==9.2.0 BICUBIC resampling