SDXL 1.0 AI generated images

AI is a concept known for years now, but the last period a huge emerge of new technology and software are reshaping our approaches and making it available to almost anyone to at least test and play with some of the ideas. The most prevalent and rightfully so is the ChatGPT model which proves to be a solid backbone for building AI apps that was thought to be impossible before. The key factor here is the ease of setting up these apps and more importantly the quality of the outcome. However, at the same time there are more to AI generated material than chatting bots and has to do with image, video and sound generation and probably more that I do not know about.

All the information below is about the new at the time of writing Open Source model SDXL 1.0 which produces amazing results comparable to Midjourney which is considered by many the top commercial implementation.

Image generation

AI Image, sound and generation is already producing amazing results and is evolving very fast, since the AI hype and expectations are real and more and more researchers and companies invest a lot of talent, effort and money to them. At the same time, there are exceptional open source software out there to put in motion implementations with little to no cost, in order for users to experiment, build apps and create content.

Training

The AI models have already been trained to produce modern celebrities and famous people from all over the world. Even though some of them may look more or less like the celebrity, some are really close to the point where you cannot distinguish between a real and a produced photo. For example Al Pacino, even though he is pretty famous, SDXL 1.0 can produce photos that kind of looks as him, but it is far from good. Especially in the cases of persons that are not so well known, the chances are that when prompted the models will not have enough data to produce an image that resemble the person, or they may even produce something that is not human at all.

In order to be able to produce such persons, or improve the images for the likes of Al Pacino, the models can be trained from existing photos and in a variety of ways. The photos can be either real, or fictional (either produced by ai, or various forms of artwork), but the quality needs to be high in order to produce the better results. This does not mean huge resolution images, but clear representation of the subject / person in the image that we want the model to train and enough variety so that we can have flexibility and avoid overtraining. The balance of flexibility and training is important in order to have success with the image generation in various poses and conditions and of course the quality of the results.

Below there are several examples of images of real person that the model produced after training from existing images. Most of the cases the images was around 10, which are definitely in the lower end of the number we have to use and the resolution was adequate and in some cases bad. The most telling example of bad resolution and bad quality images is the one of an old actor Thanassis Vengos, where the images are low resolution and black and white. The first image is a real photo and the two images below that AI gererated from SDXL.

The generation of the images are not always of good quality and in this case where the model was trained poorly the generation was many times not very successful. The main problem with the generation is that the person in the images does not look like Vengos, sometimes the person have hair and a moustache and / or his face is similar, but not similar enough with the face of the photos that used to train the model. However, using repaint to regenerate part of many of these images, produced better results most of the times. It is possible with these generated images and with the best of the real ones to retrain the model for more consistent and higher quality images.

Better results yielded the training of a candidate for Heraklion mayor, that I tested with better photos than the previous case. The photos were better but again not ideal and not enough. However, the results are better and more consistent and the model seems to be flexible enough for simple regeneration of various poses and outfits.

Again, below, the first is a real photo and below that the AI generated images after the training.