Now, in the eyes of the general public, machine learning is strongly associated with various options for training neural networks. If at first it was fully connected networks, then replaced by convolutional and recurrent networks, now it has become completely exotic options such as GAN and LTSM networks. In addition to the increasing volumes of samples required for their training, they still suffer from the inability to explain why a decision was made. But there are structural approaches to machine learning, the software implementation of one of which is described in this article.

ITKarma picture

This is a domestic approach to machine learning, called the VKF-method of machine learning based on the theory of lattices. The origin and choice of the name is explained at the very end of this article.

1. Method Description


Initially, the entire system was created by the author in C++ as a console application, then it was connected to the database under the MariaDB DBMS (using the mariadb ++ library), then it was turned into a Python library (using the pybind11 package).
Several arrays for testing machine learning algorithms from the repository of the University of California in Irvine were selected as test data.

On the Mushrooms array containing descriptions of 8,124 North American mushrooms, the system showed 100% result. More precisely, the source data were divided by a random number sensor into a training sample (2088 edible and 1944 poisonous mushrooms) and a test sample (2120 edible and 1972 poisonous). After calculating about 100 hypotheses about the causes of edibility, all test cases were correctly predicted. Since the algorithm uses a paired Markov chain, a sufficient number of hypotheses can vary. Quite often it turned out to be enough to generate 50 random hypotheses. I note that when generating the causes of toxicity, the number of required hypotheses is grouped around 120, however, all test cases in this case are predicted correctly. Kaggle.com has a Mushroom Classification competition, where quite a few authors achieved 100% accuracy. But most of the solutions are neural networks. Our approach allows the mushroom picker to learn only about 50 rules. Since most of the signs are irrelevant, each hypothesis will also be a conjunction of a small number of values ​​of essential signs, which makes them easy to remember. After this, the mushroom picker can go for mushrooms, not being afraid to take a toadstool or skip an edible mushroom.

Here is an example of one of the hypotheses, on the basis of which we can assume that the mushroom is edible:
[('gill_attachment', 'free'), ('gill_spacing', 'close'), ('gill_size', 'broad'), ('stalk_shape', 'enlarging'), ('stalk_surface_below_ring', 'scaly'), ('veil_type', 'partial'), ('veil_color', 'white'), ('ring_number', 'one'), ('ring_type', 'pendant')]

I draw your attention to the fact that only 9 out of 22 signs are listed, since the remaining 13 signs of similarity are not observed in edible mushrooms that gave rise to this cause.

Another array was SPECT Hearts. There, the accuracy of prediction of test cases reached 86.1%, which turned out to be slightly more than the results (84%) of the CLIP3 machine learning system, based on learning how to cover examples using integer programming used by the authors of the array. I believe that due to the structure of the description of the tomograms of the heart that are already pre-encoded with binary signs, it is not possible to significantly improve the prognosis quality.

The author recently invented (and implemented software) an extension of his approach to the processing of data described by continuous (numerical) attributes. In some aspects, its approach is similar to the C4.5 system of learning decision trees. Testing of this option took place on the Wine Quality array. This array describes the quality of Portuguese wines. The results are encouraging: if you take high-quality red wines, the hypotheses fully explain their high scores.

2.Platform Choice


Currently, a series of web-servers for various types of tasks (using the Nginx + Gunicorn + Django bundle) is being created by the students of the Department of Intelligent Systems of the RSUH.

However, I decided to describe my personal version here (using a bunch of aiohttp, aiojobs, and aiomysql). The aiomcache module is not used due to known security issues.

There are several advantages to the proposed option:

  1. it is asynchronous due to the use of aiohttp;
  2. it allows the processing of Jinja2 templates;
  3. it works with the database connection pool through aiomysql;
  4. it provides the launch of independent computing processes through aiojobs.aiohttp.spawn.

We’ll point out the obvious disadvantages (compared to Django):

  1. no Object Relational Mapping (ORM);
  2. it is more difficult to organize the use of the Nginx proxy server;
  3. no Django Template Language (DTL).

Each of the two options is aimed at different strategies for working with a web server. The synchronous strategy (on Django) is aimed at single-user mode, in which the expert works with a single database at any given time. Although the probabilistic procedures of the VKF method are remarkably parallel, nevertheless, there is a theoretical possibility that machine learning procedures will take considerable time. Therefore, the option discussed in this note is aimed at several experts, each of whom can simultaneously work (in different browser tabs) with different databases that differ not only in data but also in ways of representing them (different lattices on values ​​of discrete attributes, different significant regressions and number thresholds for continuous). Then, when starting the VKF experiment in one tab, the expert can switch to another, where he will prepare or analyze the experiment with other data and/or parameters.

To account for multiple users, experiments and the different stages at which they are located, there is a service database (vkf) with two tables (users, experiments). If the user table stores the login and password of all registered users, then the experiments, in addition to the names of the auxiliary and main tables of each experiment, maintains the status of filling these tables. We abandoned aiohttp_session, because you still need to use the Nginx proxy server to protect critical data.

Here is the structure of the experiments table:

  • id int (11) NOT NULL PRIMARY KEY
  • expName varchar (255) NOT NULL
  • encoder varchar (255)
  • goodEncoder tinyint (1)
  • lattices varchar (255)
  • goodLattices tinyint (1)
  • complex varchar (255)
  • goodComplex tinyint (1)
  • verges varchar (255)
  • goodVerges tinyint (1)
  • vergesTotal int (11)
  • trains varchar (255) NOT NULL
  • goodTrains tinyint (1)
  • tests varchar (255)
  • goodTests tinyint (1)
  • hypotheses varchar (255) NOT NULL
  • goodHypotheses tinyint (1)
  • type varchar (255) NOT NULL

It should be noted that there are some sequences of data preparation for VKF experiments, which, unfortunately, are radically different for discrete and continuous cases. The mixed case case combines the requirements of both types.

discrete:=> goodLattices (semi-automatic)
discrete: goodLattices=> goodEncoder (automatic)
discrete: goodEncoder=> goodTrains (semi-automatic)
discrete: goodEncoder, goodTrains=> goodHypotheses (automatic)
discrete: goodEncoder=> goodTests (semi-automatic)
discrete: goodTests, goodEncoder, goodHypotheses=> (automatic)
continuous:=> goodVerges (Manual)
continuous: goodVerges=> goodTrains (Manual)
continuous: goodTrains=> goodComplex (automatic)
continuous: goodComplex, goodTrains=> goodHypotheses (automatic)
continuous: goodVerges=> goodTests (Manual)
continuous: goodTests, goodComplex, goodHypotheses=> (automatic)

The machine learning library itself is named vkf.cpython-36m-x86_64-linux-gnu.so for Linux or vkf.cp36-win32.pyd for Windows. (36 is the version of Python for which this library was built).

The term “automatic” means the operation of this library, “semi-automatic” means the operation of the auxiliary library vkfencoder.cpython-36m-x86_64-linux-gnu.so. Finally, the “manual” mode is a call to programs that specifically process the data of a particular experiment and are now transferred to the vkfencoder library.

3. Implementation Details


When creating a web server, we use the “View/Model/Control” approach

Python code is located in 5 files:

  1. app.py - application launch file
  2. control.py - file with procedures for working with the VKF solver
  3. models.py - file with classes for processing data and working with the database
  4. settings.py - application settings file
  5. views.py - file with visualization and processing of routes (routes).

The app.py file has the standard form:

#!/usr/bin/env python import asyncio import jinja2 import aiohttp_jinja2 from settings import SITE_HOST as siteHost from settings import SITE_PORT as sitePort from aiohttp import web from aiojobs.aiohttp import setup from views import routes async def init(loop): app=web.Application(loop=loop) # install aiojobs.aiohttp setup(app) # install jinja2 templates aiohttp_jinja2.setup(app, loader=jinja2.FileSystemLoader('./template')) # add routes from api/views.py app.router.add_routes(routes) return app loop=asyncio.get_event_loop() try: app=loop.run_until_complete(init(loop)) web.run_app(app, host=siteHost, port=sitePort) except: loop.stop() 

I do not think that something needs explanation here. The next file to be included in the project is views.py:

import aiohttp_jinja2 from aiohttp import web#, WSMsgType from aiojobs.aiohttp import spawn#, get_scheduler from models import User from models import Expert from models import Experiment from models import Solver from models import Predictor routes=web.RouteTableDef() @routes.view(r'/tests/{name}', name='test-name') class Predict(web.View): @aiohttp_jinja2.template('tests.html') async def get(self): return {'explanation': 'Please, confirm prediction!'} async def post(self): data=await self.request.post() db_name=self.request.match_info['name'] analogy=Predictor(db_name, data) await analogy.load_data() job=await spawn(self.request, analogy.make_prediction()) return await job.wait() @routes.view(r'/vkf/{name}', name='vkf-name') class Generate(web.View): #@aiohttp_jinja2.template('vkf.html') async def get(self): db_name=self.request.match_info['name'] solver=Solver(db_name) await solver.load_data() context={ 'dbname': str(solver.dbname), 'encoder': str(solver.encoder), 'lattices': str(solver.lattices), 'good_lattices': bool(solver.lattices), 'verges': str(solver.verges), 'good_verges': bool(solver.good_verges), 'complex': str(solver.complex), 'good_complex': bool(solver.good_complex), 'trains': str(solver.trains), 'good_trains': bool(solver.good_trains), 'hypotheses': str(solver.hypotheses), 'type': str(solver.type) } response=aiohttp_jinja2.render_template('vkf.html', self.request, context) return response async def post(self): data=await self.request.post() step=data.get('value') db_name=self.request.match_info['name'] if step is 'init': location=self.request.app.router['experiment-name'].url_for( name=db_name) raise web.HTTPFound(location=location) solver=Solver(db_name) await solver.load_data() if step is 'populate': job=await spawn(self.request, solver.create_tables()) return await job.wait() if step is 'compute': job=await spawn(self.request, solver.compute_tables()) return await job.wait() if step is 'generate': hypotheses_total=data.get('hypotheses_total') threads_total=data.get('threads_total') job=await spawn(self.request, solver.make_induction( hypotheses_total, threads_total)) return await job.wait() @routes.view(r'/experiment/{name}', name='experiment-name') class Prepare(web.View): @aiohttp_jinja2.template('expert.html') async def get(self): return {'explanation': 'Please, enter your data'} async def post(self): data=await self.request.post() db_name=self.request.match_info['name'] experiment=Experiment(db_name, data) job=await spawn(self.request, experiment.create_experiment()) return await job.wait() 

I reduced this file for a real note by throwing out classes serving utility routes:

  1. the Auth class is bound to the root route '/' and displays a user authentication request form. If the user is not registered, there is a SignIn button that redirects the user along the '/signin' route. If the user with the entered username and password is detected, then it will be redirected along the route '/user/{name}'.
  2. the SignIn class processes the route '/signin' and, after successful registration, returns the user to the root route.
  3. The
  4. class Select processes the routes '/user/{name}' and asks which experiment and at what stage the user wants to run. After checking for the existence of such a database experiment, the user is redirected to the route '/vkf/{name}' or '/experiment/{name}' (if the experiment has already been registered).

The remaining classes process the routes that are responsible for the stages of machine learning:

    The
  1. class Prepare processes the routes '/experiment/{name}' and collects the names of the service tables and numerical parameters necessary to start the procedures of the VKF method. After saving this information in the database, the user is redirected to the route '/vkf/{name}'.
  2. the Generate class processes the routes '/vkf/{name}' and starts the various stages of the procedure for inducing the VKF method, depending on the preparedness of the data by the expert.
  3. the Predict class processes the routes '/tests/{name}' and starts the VKF prediction method by analogy.

To transfer a large number of parameters to the vkf.html form, use the construct from aiohttp_jinja2

response=aiohttp_jinja2.render_template('vkf.html', self.request, context) return response 


Also note the use of the spawn call from the aiojobs.aiohttp package:

job=await spawn(self.request, solver.make_induction(hypotheses_total, threads_total)) return await job.wait() 

This is necessary to safely call co-procedures from the classes defined in the models.py file that process user data and experiments stored in the database under the control of the MariaDB DBMS:

import aiomysql from aiohttp import web from settings import AUX_NAME as auxName from settings import AUTH_TABLE as authTable from settings import AUX_TABLE as auxTable from settings import SECRET_KEY as secretKey from settings import DB_HOST as dbHost from control import createAuxTables from control import createMainTables from control import computeAuxTables from control import induction from control import prediction class Experiment(): def __init__(self, dbName, data, **kw): self.encoder=data.get('encoder_table') self.lattices=data.get('lattices_table') self.complex=data.get('complex_table') self.verges=data.get('verges_table') self.verges_total=data.get('verges_total') self.trains=data.get('training_table') self.tests=data.get('tests_table') self.hypotheses=data.get('hypotheses_table') self.type=data.get('type') self.auxname=auxName self.auxtable=auxTable self.dbhost=dbHost self.secret=secretKey self.dbname=dbName async def create_db(self, pool): async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute("CREATE DATABASE IF NOT EXISTS " + str(self.dbname)) await conn.commit() await createAuxTables(self) async def register_experiment(self, pool): async with pool.acquire() as conn: async with conn.cursor() as cur: sql="INSERT INTO " + str(self.auxname) + "." + str(self.auxtable) sql += " VALUES(NULL, '" sql += str(self.dbname) sql += "', '" sql += str(self.encoder) sql += "', 0, '" #goodEncoder sql += str(self.lattices) sql += "', 0, '" #goodLattices sql += str(self.complex) sql += "', 0, '" #goodComplex sql += str(self.verges_total) sql += "', 0, " #goodVerges sql += str(self.verges_total) sql += ", '" sql += str(self.trains) sql += "', 0, '" #goodTrains sql += str(self.tests) sql += "', 0, '" #goodTests sql += str(self.hypotheses) sql += "', 0, '" #goodHypotheses sql += str(self.type) sql += "')" await cur.execute(sql) await conn.commit() async def create_experiment(self, **kw): pool=await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret) task1=self.create_db(pool=pool) task2=self.register_experiment(pool=pool) tasks=[asyncio.ensure_future(task1), asyncio.ensure_future(task2)] await asyncio.gather(*tasks) pool.close() await pool.wait_closed() raise web.HTTPFound(location='/vkf/' + self.dbname) class Solver(): def __init__(self, dbName, **kw): self.auxname=auxName self.auxtable=auxTable self.dbhost=dbHost self.dbname=dbName self.secret=secretKey async def load_data(self, **kw): pool=await aiomysql.create_pool(host=dbHost, user='root', password=secretKey, db=auxName) async with pool.acquire() as conn: async with conn.cursor() as cur: sql="SELECT * FROM " sql += str(auxTable) sql += " WHERE expName='" sql += str(self.dbname) sql += "'" await cur.execute(sql) row=cur.fetchone() await cur.close() pool.close() await pool.wait_closed() self.encoder=str(row.result()[2]) self.good_encoder=bool(row.result()[3]) self.lattices=str(row.result()[4]) self.good_lattices=bool(row.result()[5]) self.complex=str(row.result()[6]) self.good_complex=bool(row.result()[7]) self.verges=str(row.result()[8]) self.good_verges=bool(row.result()[9]) self.verges_total=int(row.result()[10]) self.trains=str(row.result()[11]) self.good_trains=bool(row.result()[12]) self.hypotheses=str(row.result()[15]) self.good_hypotheses=bool(row.result()[16]) self.type=str(row.result()[17]) async def create_tables(self, **kw): await createMainTables(self) pool=await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret, db=self.auxname) async with pool.acquire() as conn: async with conn.cursor() as cur: sql="UPDATE " sql += str(self.auxtable) sql += " SET encoderStatus=1 WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) await conn.commit() await cur.close() pool.close() await pool.wait_closed() raise web.HTTPFound(location='/vkf/' + self.dbname) async def compute_tables(self, **kw): await computeAuxTables(self) pool=await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret, db=self.auxname) async with pool.acquire() as conn: async with conn.cursor() as cur: sql="UPDATE " sql += str(self.auxtable) sql += " SET complexStatus=1 WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) await conn.commit() await cur.close() pool.close() await pool.wait_closed() raise web.HTTPFound(location='/vkf/' + self.dbname) async def make_induction(self, hypotheses_total, threads_total, **kw): await induction(self, hypotheses_total, threads_total) pool=await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret, db=self.auxname) async with pool.acquire() as conn: async with conn.cursor() as cur: sql="UPDATE " sql += str(self.auxtable) sql += " SET hypothesesStatus=1 WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) await conn.commit() await cur.close() pool.close() await pool.wait_closed() raise web.HTTPFound(location='/tests/' + self.dbname) class Predictor(): def __init__(self, dbName, data, **kw): self.auxname=auxName self.auxtable=auxTable self.dbhost=dbHost self.dbname=dbName self.secret=secretKey self.plus=0 self.minus=0 async def load_data(self, **kw): pool=await aiomysql.create_pool(host=dbHost, user='root', password=secretKey, db=auxName) async with pool.acquire() as conn: async with conn.cursor() as cur: sql="SELECT * FROM " sql += str(auxTable) sql += " WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) row=cur.fetchone() await cur.close() pool.close() await pool.wait_closed() self.encoder=str(row.result()[2]) self.good_encoder=bool(row.result()[3]) self.complex=str(row.result()[6]) self.good_complex=bool(row.result()[7]) self.verges=str(row.result()[8]) self.trains=str(row.result()[11]) self.tests=str(row.result()[13]) self.good_tests=bool(row.result()[14]) self.hypotheses=str(row.result()[15]) self.good_hypotheses=bool(row.result()[16]) self.type=str(row.result()[17]) async def make_prediction(self, **kw): if self.good_tests and self.good_hypotheses: await induction(self, 0, 1) await prediction(self) message_body=str(self.plus) message_body += " correct positive cases. " message_body += str(self.minus) message_body += " correct negative cases." raise web.HTTPException(body=message_body) else: raise web.HTTPFound(location='/vkf/' + self.dbname) 


Again, some helper classes are hidden:

  1. The User class corresponds to the site visitor. It allows you to register and log in as an expert.
  2. The Expert class allows you to choose one of the experiments.

The remaining classes correspond to the main procedures:

  1. The Experiment class allows you to specify the names of key and auxiliary tables and the parameters necessary for conducting VKF experiments.
  2. The Solver class is responsible for the inductive generalization in the VKF method.
  3. The Predictor class is responsible for predictions by analogy in the VKF method.

It is important to note the use of the create_pool () construct of the aiomysql package. It allows you to work with the database in multiple connections. To wait for the execution to complete, you also need the ensure_future () and gather () procedures from the asyncio module.

pool=await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret) task1=self.create_db(pool=pool) task2=self.register_experiment(pool=pool) tasks=[asyncio.ensure_future(task1), asyncio.ensure_future(task2)] await asyncio.gather(*tasks) pool.close() await pool.wait_closed() 

When reading from a table, the row=cur.fetchone () construct returns future, therefore row.result () returns a database record from which field values ​​can be extracted (for example, str (row.result () [2]) retrieves the table name with encoding the values ​​of discrete features).

pool=await aiomysql.create_pool(host=dbHost, user='root', password=secretKey, db=auxName) async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute(sql) row=cur.fetchone() await cur.close() pool.close() await pool.wait_closed() self.encoder=str(row.result()[2]) 

Key system parameters are imported from the.env file or (if there is none) from the settings.py file.

from os.path import isfile from envparse import env if isfile('.env'): env.read_envfile('.env') AUX_NAME=env.str('AUX_NAME', default='vkf') AUTH_TABLE=env.str('AUTH_TABLE', default='users') AUX_TABLE=env.str('AUX_TABLE', default='experiments') DB_HOST=env.str('DB_HOST', default='127.0.0.1') DB_HOST=env.str('DB_PORT', default=3306) DEBUG=env.bool('DEBUG', default=False) SECRET_KEY=env.str('SECRET_KEY', default='toor') SITE_HOST=env.str('HOST', default='127.0.0.1') SITE_PORT=env.int('PORT', default=8080) 

It is important to note that localhost must be specified by ip-address, otherwise aiomysql will try to connect to the database via a Unix socket, which may not work under Windows. Finally, play the last file (control.py):

import os import asyncio import vkf async def createAuxTables(db_data): if db_data.type is not "discrete": await vkf.CAttributes(db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is not "continuous": await vkf.DAttributes(db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.Lattices(db_data.lattices, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async def createMainTables(db_data): if db_data.type is "continuous": await vkf.CData(db_data.trains, db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.CData(db_data.tests, db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await vkf.FCA(db_data.lattices, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.DData(db_data.trains, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.DData(db_data.tests, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await vkf.FCA(db_data.lattices, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.FData(db_data.trains, db_data.encoder, db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.FData(db_data.tests, db_data.encoder, db_data.verges, db_data.dbname,'127.0.0.1', 'root', db_data.secret) async def computeAuxTables(db_data): if db_data.type is not "discrete": async with vkf.Join(db_data.trains, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as join: await join.compute_save(db_data.complex, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.Generator(db_data.complex, db_data.trains, db_data.verges, db_data.dbname, db_data.dbname, db_data.verges_total, 1, '127.0.0.1', 'root', db_data.secret) async def induction(db_data, hypothesesNumber, threadsNumber): if db_data.type is not "discrete": qualifier=await vkf.Qualifier(db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) beget=await vkf.Beget(db_data.complex, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is not "continuous": encoder=await vkf.Encoder(db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async with vkf.Induction() as induction: if db_data.type is "continuous": await induction.load_continuous_hypotheses(qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await induction.load_discrete_hypotheses(encoder, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await induction.load_full_hypotheses(encoder, qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if hypothesesNumber > 0: await induction.add_hypotheses(hypothesesNumber, threadsNumber) if db_data.type is "continuous": await induction.save_continuous_hypotheses(qualifier, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await induction.save_discrete_hypotheses(encoder, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await induction.save_full_hypotheses(encoder, qualifier, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async def prediction(db_data): if db_data.type is not "discrete": qualifier=await vkf.Qualifier(db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) beget=await vkf.Beget(db_data.complex, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is not "continuous": encoder=await vkf.Encoder(db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async with vkf.Induction() as induction: if db_data.type is "continuous": await induction.load_continuous_hypotheses(qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await induction.load_discrete_hypotheses(encoder, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await induction.load_full_hypotheses(encoder, qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "continuous": async with vkf.TestSample(qualifier, induction, beget, db_data.tests, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as tests: #plus=await tests.correct_positive_cases() db_data.plus=await tests.correct_positive_cases() #minus=await tests.correct_negative_cases() db_data.minus=await tests.correct_negative_cases() if db_data.type is "discrete": async with vkf.TestSample(encoder, induction, db_data.tests, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as tests: #plus=await tests.correct_positive_cases() db_data.plus=await tests.correct_positive_cases() #minus=await tests.correct_negative_cases() db_data.minus=await tests.correct_negative_cases() if db_data.type is "full": async with vkf.TestSample(encoder, qualifier, induction, beget, db_data.tests, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as tests: #plus=await tests.correct_positive_cases() db_data.plus=await tests.correct_positive_cases() #minus=await tests.correct_negative_cases() db_data.minus=await tests.correct_negative_cases() 

I saved this file completely, since the names, call order and arguments of the VKF method procedures from the vkf.cpython-36m-x86_64-linux-gnu.so library are visible here. All arguments after dbname can be omitted, since the default values ​​in the CPython library are set with standard values.

4. Comments


Anticipating the question of professional programmers about why the logic of controlling the VKF experiment is brought out (through numerous ifs) and not hidden through polymorphism into types, you should answer this way: unfortunately, dynamic typing of the Python language does not allow you to shift the decision about the type of object used to the system, that is, in any case, this sequence of nested if occurs. Therefore, the author chose to use an explicit (C-like) syntax to make the logic as transparent (and efficient) as possible.

Let me comment on the missing components:

  1. Data loading into the database for discrete signs is now carried out using the additional library vkfencoder.cpython-36m-x86_64-linux-gnu.so (students make the web interface for it, and the author calls the appropriate methods directly, as long as he works on the local host). For continuous signs, work is underway to introduce appropriate methods into vkfencoder.cpython-36m-x86_64-linux-gnu.so.
  2. Hypothesis is currently being provided by third-party MariaDB client programs (the author uses DBeaver 7.1.1 Community, but there are a large number of analogues). Students are developing a prototype system using the Django framework, where the ORM technology will allow you to create a view of hypotheses in a form convenient for experts.

5. About the author and the history of the method


The author has been engaged in data mining for more than 30 years. After graduating from the Faculty of Mechanics and Mathematics of Moscow State University Lomonosov, he was invited to a group of researchers under the leadership of Doctor of Technical Sciences, prof. VK. Finn (VINITI AN USSR). Since the beginning of the 80s of the last century, Viktor Konstantinovich has been exploring plausible reasoning and its formalization by means of multi-valued logics.

The key ideas proposed by V.K. Finn, you can consider the following:

  1. using the binary similarity operation (initially, the intersection operation in Boolean algebra);
  2. the idea of ​​discarding the generated similarity of a group of training examples if it is embedded in the description of an example of the opposite sign (counter-example);
  3. the idea of ​​predicting the investigated (target) property of new examples by taking into account the pros and cons;
  4. the idea of ​​checking the completeness of many hypotheses by finding the causes (among the generated similarities) for the presence/absence of the target property of the training examples.

It should be noted that V.K. Finn attributes some of his ideas to foreign authors. Perhaps, only the logic of argumentation with full right is considered to be invented by him independently. The idea of ​​accounting for counterexamples V.K. Finn borrowed, according to him, from K.R. Popper. And the sources of checking the completeness of inductive generalization relate to them (completely vague, in my opinion) the works of the American mathematician and logician Ch.S. Pier. He considers the generation of hypotheses about the causes using the similarity operation to be borrowed from the ideas of the British economist, philosopher and logician D.S. Mill. Therefore, he created a set of ideas he entitled "DSM-method" in honor of D.S. Mill.

Strange, but arising in the late 70s of the XX century in the writings of prof. Rudolf Wille (Germany) does not use much more useful section of the algebraic theory of lattices “Analysis of formal concepts” (AFP) in V.K. Finn Regards. In my opinion, the reason for this is an unfortunate name, which, as a person who first graduated from the Faculty of Philosophy, and then the engineering stream of the Faculty of Mechanics and Mathematics of Moscow State University, causes rejection.

As a follower of the work of his teacher, the author called his approach “VKF-method” in his honor. However, there is another decoding - the probabilistic-combinatorial formal method of machine learning based on the theory of lattices.

Now the group of V.K. Finn works in the CC them. A.A.Dorodnitsyna RAN FIT IU RAS and at the Department of Intelligent Systems of the Russian State Humanitarian University.

More information on the mathematics of the VKF solver can be found in the dissertation of the author or his video lectures at Ulyanovsk State University (for the organization of lectures and the processing of their notes, the author is grateful to A. B. Verevkin and N.G. Baranets).

The complete package of source files is stored at Bitbucket .

The source files (in C++) for the vkf library are in the process of coordinating their placement on savannah.nongnu.org. If yes, a download link will be added here.

Finally, one final note: the author began learning Python on April 6, 2020. Prior to this, the only language in which he programmed was C++. But this circumstance does not remove his accusations of possible inaccuracy of the code.

The author is sincerely grateful to Tatyana A. Volkova robofreak for support, constructive suggestions, and critical comments that allowed for substantial improve the presentation (and even significantly improve the code). However, the responsibility for the remaining errors and decisions made (even contrary to her advice) rests solely with the author.

Source