ITKarma picture

Here you will find a list of materials published in June in English. All of them are written without undue academicism, contain code examples and links to non-empty repositories. Most of the technologies mentioned are in the public domain and do not require heavy-duty iron for testing.

Image GPT

Open AI decided that since the transformer model, which was trained on the text, is able to generate coherent complete sentences, then if the model is trained on sequences of pixels, it will be able to generate augmented images. Open AI demonstrate how high-quality sampling and accurate classification of images allows the created model to compete with the best convolutional models in an unattended learning environment.

ITKarma picture

Face depixelizer

A month ago, we were given the opportunity to play with tool , which with the help of a machine learning model makes beautiful pixel art out of portraits. This is fun, but it’s hard to imagine the wide scope of this technology. But the tool, which produces the opposite effect, was immediately very interested in the public. Using a facial depixelizer, in theory it will be possible to establish a person’s identity by video recording from outdoor surveillance cameras.
ITKarma picture


If working with pixel images is not enough, and you need to compose a photo with a portrait of a person using a primitive outline, then a DNN-based tool has already appeared for this. According to the creators, only general styles are needed, and not professional sketches - the model itself will restore the person’s face, which will coincide with the outline. The system was created using the Jittor framework, as the creators promise, the source code on Pytorch will also be added to the project repository soon.

ITKarma picture


With face reconstructions figured out, what about the rest of the body? Thanks to the development of DNN, 3D modeling of a human figure based on a two-dimensional photo has become possible. The main limitation was due to the fact that accurate forecasts require analysis of a wide context and high-resolution source data. The multi-level architecture of the model and the ability for end-to-end learning will help solve this problem. At the first level, to save resources, the whole image is analyzed in low resolution. After this, the context is formed, and at a more detailed level, the model evaluates the geometry by analyzing a high-resolution image.

ITKarma picture


Many of the things that surround us consist of cycles of varying periodicity. Often, in order to understand the essence of a phenomenon, it is necessary to analyze information about its recurring manifestations. Given the capabilities of video recording, it is no longer difficult to record repetitions, the problem was their calculation. The method of frame-by-frame comparison of pixel densities in a frame was often not suitable due to camera shake or obstruction by objects, as well as a sharp difference in scale and shape when approaching and moving away. Now this problem is solved by a model developed by Google. It identifies repetitive actions in the video, including those that were not used in training.As a result, the model returns data on the frequency of repetitive actions recognized on the video. Cola is already available .

ITKarma picture

SPICE model

Previously, in order to determine the pitch, you had to rely on manually-created complex signal processing algorithms. The biggest difficulty was to separate the sound being studied from the background noise or the sound of accommodating instruments. Now a pre-trained model is available for this task, which determines high and low frequencies. The model is available for use on the web and on mobile devices.

Social distance detector

A case for creating a program with which you can monitor whether people observe social distance. The author tells in detail how he chose a pre-trained model, how he coped with the task of recognizing people, and how, using OpenCV, he converted the image into an orthographic projection in order to calculate the distance between people. You can also read more about the source code of the project.

ITKarma picture

Recognition of sample documents

Today, there are thousands of variations of the most common template documents such as receipts, bills and checks. Existing automated systems that are designed to work with a very limited type of template. Google suggests using machine learning for this. The article discusses the architecture of the model and the results of the data. Soon, the tool will become part of the Document AI service.

How to create a scalable Pipeline of development and deployment of machine learning algorithms for contactless retail

Israeli startup Trigo shares its experience of using machine learning and computer vision for take-and-go retail. The company is a supplier of a system that allows stores to operate without a cash register. The authors tell what tasks they faced and explain why they chose PyTorch as a framework for machine learning, Allegro AI Trains for infrastructure and how they managed to set up the development process.

That's all, thank you for your attention !.