Machine Learning and Data Analysis

Vadim V. Strijov
Doctor of physico-mathematical sciences at the FRCCSC of the Russian Academy of Sciences,
professor at the Moscow Institute of Physics and Technology,
editor in chief of the Journal of Machine Learning and Data Analysis.

Web:  · strijov.com  · ORC ID  · Math-Net  · GoogleScholar
E-mail:
Phone: +7 (499) 135-4163

List of publications

Vadim Strijov
Matching entries: 0
settings...

2019

Anikeev D.A., Penkin G.O., Strijov V.V. Local approximation models for human physical activity classification // Informatics and Applications, 2019. Article Rus
Abstract: The problem of classification of time series of an accelerometer of
a mobile phone is investigated. The physical activity class corresponds
to a time series segment. Segment is associated with its feature
description. It is generated by an approximating spline. The elements
of the feature vector are the coefficients of the basic spline functions.
The computational experiment finds the optimal approximation parameters
and parameters of the classification model according to the maximum
likelihood of the logistic classification model.
BibTeX:
 
@article{AnikeyevPenkin2017Splines, 
  author = {Anikeev, D. A. and Penkin, G. O. and Strijov, V. V.},
  title = {Local approximation models for human physical activity classification},
  journal = {Informatics and Applications},
  year = {2019},
  url = {http://strijov.com/papers/AnikeyevPenkin2017Splines.pdf}
}
Bakhteev O.Y., Strijov V.V. Comprehensive analysis of gradient-based hyperparameter optimization algorithms // New Generation Computing, 2019. Article Submitted
Abstract: The paper investigates hyperparameter optimization problem. Hyperparameters are the parameters of model parameter distribution. The adequate choice of hyperparameter values prevents model overfit and allows it to obtain higher predictive performance. Neural network models with large amount of hyperparameters are analyzed. The hyperparameter optimization for models is computationally expensive. The paper proposes modifications of various gradient-based methods to simultaneously optimize many hyperparameters. The paper compares the experiment results with the random search. The main impact of the paper is hyperparameter optimization algorithms analysis for the models with high amount of parameters. To select precise and stable models the authors suggest to use two model selection criteria: crossvalidation and evidence lower bound. The algorithms are evaluated on regression and classification datasets.
BibTeX:
 
@article{Bakhteev2018, 
  author = {Bakhteev, O. Y. and Strijov, V. V.},
  title = {Comprehensive analysis of gradient-based hyperparameter optimization algorithms},
  journal = {New Generation Computing},
  year = {2019}
}
Goncharov A.V., Strijov V.V. Alignment of ordered set cartesian product // Informatics and Applications, 2019, 13(1). Article Rus
Abstract: The work is devoted to the study of metric methods for analyzing objects with complex structure. It proposes to generalize the dynamic time warping method of two time series for the case of objects defined on two or more time axes. Such objects are matrices in the discrete representation. The DTW method of time series is generalized as a method of matrices dynamic alignment. Paper proposes a distance function resistant to monotonic nonlinear deformations of the Cartesian product of two time scales. The alignment path between objects is defined. An object is called a matrix in which the rows and columns correspond to the axes of time. The properties of the proposed distance function are investigated. To illustrate the method, the problems of metric classification of objects are solved on model data and data from the MNIST dataset.
BibTeX:
 
@article{Goncharov2019mDTW, 
  author = {Goncharov, A. V. and Strijov, V. V.},
  title = {Alignment of ordered set cartesian product},
  journal = {Informatics and Applications},
  year = {2019},
  volume = {13(1)},
  url = {http://strijov.com/papers/Goncharov2019mDTW.pdf}
}
Grabovoy A.V., Bakhteev O.Y., Strijov V.V. Estimation of the relevance of the neural network parameters // Informatics and Applications, 2019. Article Rus
Abstract: This paper investigates a method for optimizing the structure of a neural network. It assumes that the number of neural network parameters can be reduced without significant loss of quality and without significant increase in the variance of the loss function. The paper proposes a method for automatic estimation of the relevance of parameters to prune a neural network. This method analyzes the covariance matrix of the posteriori distribution of the model parameters and removes the least relevant and multicorrelate parameters. It uses the Belsly method to search for multicorrelation in the neural network. The proposed method was tested on the Boston Housing data set, the Wine data set, and synthetic data.
BibTeX:
 
@article{Grabovoy2018OptimalBrainDamage, 
  author = {Grabovoy, A. V. and Bakhteev, O. Yu. and Strijov, V. V.},
  title = {Estimation of the relevance of the neural network parameters},
  journal = {Informatics and Applications},
  year = {2019},
  url = {http://strijov.com/papers/Grabovoy2018OptimalBrainDamage.pdf}
}
Kuzmin A.A., Aduenko A.A., Strijov V.V. Hierarchical thematic classification of major conference proceedings // CICLling, 2019. Article Submitted
Abstract: In this paper we develop a decision support system for the hierarchical
text classification. We

consider text collections with fixed hierarchical structure of topics
given by experts in the form of a tree.

The system sorts the topics by relevance to a given document. The
experts choose one of the most relevant

topic to finish the classification. We propose a weighted hierarchical
similarity function to calculate topic

relevance. The function calculates similarity of a document and a
tree branch. The weights in this function

determine word importance. We use the entropy of words to estimate
the weights.

The proposed hierarchical similarity function formulate a joint hierarchical
thematic classification

probability model of the document topics, parameters, and hyperparameters.
The variational bayesian

inference gives a closed form EM algorithm. The EM algorithm estimates
the parameters and calculates

the probability of a topic for a given document. Compared to hierarchical
multiclass SVM, hierarchical

PLSA with adaptive regularization, and hierarchical naive bayes, the
weighted hierarchical similarity

function has better improvement in ranking accuracy in an abstracts
collection of a major conference

EURO and a web sites collection of industrial companies.
BibTeX:
 
@article{Kuzmin2018Similarity, 
  author = {Kuzmin, A. A. and Aduenko, A. A. and Strijov, V. V.},
  title = {Hierarchical thematic classification of major conference proceedings},
  journal = {CICLling},
  year = {2019}
}
Usmanova K.R., Strijov V.V. Time series dependencies detection to construct forecasting models // Systems and Means of Informatics, 2019, 29(2). Article Rus
Abstract: The problem of forecasting requires relationship between multiple time series. Engage- ment of related time series in a forecast model boosts the forecast quality. This paper introduces the convergent cross mapping method to establish a relationship between time series. This method estimates accuracy of reconstruction of one time series using the other series. The CCM detects relationship between series not only in full trajectory spaces, but in trajectory subspaces. The computational experiment is carried out on two sets of time series: electricity consumption and air temperature, oil transportation volume and oil production volume.
BibTeX:
 
@article{Usmanova2018CCM, 
  author = {Usmanova, K. R. and Strijov, V. V.},
  title = {Time series dependencies detection to construct forecasting models},
  journal = {Systems and Means of Informatics},
  year = {2019},
  volume = {29(2)},
  url = {http://strijov.com/papers/Usmanova2018CCM.pdf}
}

2018

Aduenko A.A., Motrenko A.P., Strijov V.V. Object selection in credit scoring using covariance matrix of parameters estimations // Annals of Operations Research, 2018, 260(1-2) : 3-21. Article
Abstract: We address the problem of outlier detection for more reliable credit
scoring. Scoring models are used to estimate the probability of loan
default based on the customers application. To get an unbiased estimation
of the model parameters one must select a set of informative objects
(customers). We propose an object selection algorithm based on analysis
of the covariance matrix for the estimated parameters of the model.
To detect outliers we introduce a new quality function called specificity
measure. For common practical case of ill-conditioned covariance
matrix we suggest an empirical approximation of specificity. We illustrate
the algorithm with eight benchmark datasets from the UCI machine
learning repository and several artificial datasets. Computational
experiments show statistical significance of the classification quality
improvement for all considered datasets. The method is compared with
four other widely used methods of outlier detection: deviance, Pearson
and Bayesian residuals and gamma plots. Suggested method performs
generally better for both clustered and non-clustered outliers. The
method shows acceptable outlier discrimination for datasets that
contain up to 3040% of outliers.
BibTeX:
 
@article{Aduenko-Strijov2014ObjectSelection, 
  author = {Aduenko, A. A. and Motrenko, A. P. and Strijov, V. V.},
  title = {Object selection in credit scoring using covariance matrix of parameters estimations},
  journal = {Annals of Operations Research},
  year = {2018},
  volume = {260(1-2)},
  pages = {3-21},
  url = {http://strijov.com/papers/AduenkoObjectSelection_RV.pdf},
  doi = {10.1007/s10479-017-2417-3}
}
Aduenko A.A., Vasileisky A.S., Karelov A.I., Reyer I.A., Rudakov K.V., Strijov V.V. Detection of persistent scatterer pairs on satellite radar images with use of surface relief data // Journal of Information Technologies and Computing Systems, 2018, 68(2) : 29-43. Article Rus
Abstract: An effective control of geodynamic processes using multiple radar
satellite survey and differential interferometric processing of received
data requires the identification of terrain areas that preserve an
acceptable level of coherence on radar images over a long period.
Analysis of the phase component of the images for such areas, called
persistent scatterers, makes it possible to estimate the values of
small displacements of the observed surface with velocities less
than several centimeters per year. In this paper, two radar differential
interferometry methods based on the identification of persistent
scatterers are considered: the standard method of persistent scatterers
and the proposed modification of the method based on the use of persistent
scatterer pairs. For both methods it is suggested not to perform
a direct phase unwrapping, which is most difficult when most known
methods are used. For the method of persistent scatterer pairs it
is suggested to apply the quadratic penalty not for the phase unwrapping,
but at the final processing stage to recover the absolute values
of displacements and corrections of an a priori elevation model from
the obtained relative values. The application of the algorithms considered
is illustrated by the processing of an interferometric series of
35 radar images obtained by the COSMO-SkyMed system.
BibTeX:
 
@article{Aduedko2018PSP, 
  author = {Aduenko, A. A.. and Vasileisky, A. S. and Karelov, A. I. and Reyer, I. A. and Rudakov, K. V. and Strijov, V. V.},
  title = {Detection of persistent scatterer pairs on satellite radar images with use of surface relief data},
  journal = {Journal of Information Technologies and Computing Systems},
  year = {2018},
  volume = {68(2)},
  pages = {29-43},
  url = {http://strijov.com/papers/Aduenko2017SAR.pdf},
  doi = {10.14357/20718632180203}
}
Bakhteev O.Y., Strijov V.V. Deep learning model selection of suboptimal complexity // Automation and Remote Control, 2018, 79(8) : 14741488. Article
Abstract: We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. Suboptimal complexity is understood as an approximate estimate of the minimum description length, obtained with Bayesian inference and variational methods. We introduce probabilistic assumptions about the distribution of parameters. Based on Bayesian inference, we propose the likelihood function of the model. To obtain an estimate for the likelihood, we apply variational methods with gradient optimization algorithms. We perform a computational experiment on several samples.
BibTeX:
 
@article{Bakhteev2017Evidence, 
  author = {Bakhteev, O. Y. and Strijov, V. V.},
  title = {Deep learning model selection of suboptimal complexity},
  journal = {Automation and Remote Control},
  year = {2018},
  volume = {79(8)},
  pages = {14741488},
  url = {https://link.springer.com/content/pdf/10.1134%2FS000511791808009X.pdf},
  doi = {10.1134/S000511791808009X}
}
Goncharov A.V., Strijov V.V. Analysis of dissimilarity set between time series // Computational Mathematics and Modeling, 2018, 29(3) : 359-366. Article
Abstract: This paper investigates the metric time series classification problem.
Distance functions between time series are constructed using the
dynamic time warping method. This method aligns two time series and
builds a dissimilarity set. The vector-function of distance between
the time series is a set of statistics. It describes the distribution
of the dissimilarity set. The object feature describtion in the classification
problem is set of selected statistics values of the dissimilarity
set. It is built between the object and all the reference objects.
The additional information about the dissimilarity distribution improves
the classification quality. We propose classification method and
demonstrate its result on the classification problem of the human
physical activity time series from the mobile phone accelerometer.
BibTeX:
 
@article{Goncharov2017Analysis, 
  author = {A. V. Goncharov and V. V. Strijov},
  title = {Analysis of dissimilarity set between time series},
  journal = {Computational Mathematics and Modeling},
  year = {2018},
  volume = {29(3)},
  pages = {359-366},
  url = {http://strijov.com/papers/Goncharov2017Analysis.pdf},
  doi = {10.1007/s10598-018-9415-4}
}
Isachenko R.V., Bochkarev .., Zharikov I.N., Strijov V.V. Feature Generation for Physical Activity Classification // Artificial Intelligence and Decision Making, 2018, 3 : 20-27. Article
Abstract: The paper investigates the human physical activity classification
problem. Time series from accelerometer of a wearable device produce
a dataset. Due to high dimension of the object description and low
computational resources one has to state a feature generation problem.
The authors propose to use parameters of the local approximation
models as informative features. The experiment is conducted on two
datasets for human activity recognition using accelerometer: WISDM
and USC-HAD. It compares several superpositions of various generation
and classification models.
BibTeX:
 
@article{Isachenko2018Activity, 
  author = {Isachenko, R. V. and Bochkarev, . . and Zharikov, I. N. and Strijov, V. V.},
  title = {Feature Generation for Physical Activity Classification},
  journal = {Artificial Intelligence and Decision Making},
  year = {2018},
  volume = {3},
  pages = {20-27},
  url = {http://strijov.com/papers/Isachenko2018AccelerometerAIDM.pdf}
}
Isachenko R.V., Strijov V.V. Quadratic Programming Optimization with Feature Selection for Non-linear Models // Lobachevskii Journal of Mathematics, 2018, 39(9) : 1179-1187. Article
Abstract: To optimize the model parameters the Newton method is widely used. This method is second order optimization procedure that is unstable in real applications. In this paper we propose the procedure to make the optimization process robust. The idea is to select the set of model parameters which have to be optimized in the current step of optimization procedure. We show that in the case of nonlinear regression and logistic regression models the parameters selection could be performed by Quadratic Programming Feature Selection algorithm. It allows to find the set of independent parameters that are responsible for the residuals. We carried out the experiment to show how the proposed method works and compare it with other methods. The paper proposes the robust second-order optimization algorithm. The algorithm based on the iterative Newton method, which is unstable procedure. The authors suggest to select the set of active parameters in each optimization step. The algorithm updates only parameters from this active set. Quadratic programming feature selection is used to find the active set. It maximizes the relevance of model parameters to the residuals and minimizes the redundancy. Nonlinear regression and logistic regression models are investigated. The proposed algorithm achieves the less error with comparison to the other methods.
BibTeX:
 
@article{Isachenko2018QPFSNonlin, 
  author = {Isachenko, R. V. and Strijov, V. V.},
  title = {Quadratic Programming Optimization with Feature Selection for Non-linear Models},
  journal = {Lobachevskii Journal of Mathematics},
  year = {2018},
  volume = {39(9)},
  pages = {1179-1187},
  url = {http://strijov.com/papers/Isachenko2018QPLJM.pdf},
  doi = {10.1134/S199508021809010X}
}
Isachenko R.V., Vladimirova M.R., Strijov V.V. Dimensionality reduction for time series decoding and forecasting problems // DEStech Transactions on Computer Science and Engineering, 2018, 27349 : 286-296. Article
Abstract: The paper is devoted to the problem of decoding multiscaled time series and forecasting. The goal is to recover the dependence between input signal and target response. The proposed method allows to receive predicted values not for the next time stamp but for the whole range of values in forecast horizon. The prediction is multidimensional target vector instead of one timestamp point. We consider the linear model of partial least squares (PLS).The method finds the matrix of a joint description for the design matrix and the outcome matrix. The obtained latent space of the joint descriptions is low-dimensional. This leads to a simple, stable predictive model. We conducted computational experiments on the real data of energy consumption and electrocorticograms signals (ECoG). The experiments show significant reduction of the original spaces dimensionality and models achieve good quality of prediction.
BibTeX:
 
@article{Isachenko2018PLS, 
  author = {Isachenko, R. V. and Vladimirova, M. R. and Strijov, V. V.},
  title = {Dimensionality reduction for time series decoding and forecasting problems},
  journal = {DEStech Transactions on Computer Science and Engineering},
  year = {2018},
  volume = {27349},
  pages = {286-296},
  url = {http://strijov.com/papers/IsachenkoVladimirova2018PLS.pdf},
  doi = {10.12783/dtcse/optim2018/27940}
}
Motrenko A.P., Strijov V.V. Multi-way feature selection for ECoG-based brain-computer interface // Expert Systems with Applications, 2018, 114(30) : 402-413. Article
Abstract: The paper addresses the problem of designing Brain-Computer Interfaces.
We solve the problem of feature selection in regression models in
application to ECoG-based motion decoding. The task is to predict
hand trajectories from the voltage time series of cortical activity.
Feature description of a each point resides in spatial-temporal-frequency
domain and include the voltage time series themselves and their spectral
characteristics. Feature selection is crucial for adequate solution
of this regression problem, since electrocorticographic data is highly
dimensional and the measurements are correlated both in time and
space domains. We propose a multi-way formulation of quadratic programming
feature selection (QPFS), a recent approach to filtering-based feature
selection proposed by Katrutsa and Strijov, Comprehensive study
of feature selection methods to solve multicollinearity problem according
to evaluation criteria. QPFS incorporates both estimates of similarity
between features, and their relevance to the regression problem,
and allows an effective way to leverage them by solving a quadratic
program. Our modification allows to apply this approach to multi-way
data. We show that this modification improves prediction quality
of resultant models.
BibTeX:
 
@article{Motrenko2018ECoG, 
  author = {Motrenko, A. P. and Strijov, V. V.},
  title = {Multi-way feature selection for ECoG-based brain-computer interface},
  journal = {Expert Systems with Applications},
  year = {2018},
  volume = {114(30)},
  pages = {402-413},
  url = {http://strijov.com/papers/MotrenkoStrijov2017ECoG_HL_2.pdf},
  doi = {10.1016/j.eswa.2018.06.054}
}
Smerdov A.N., Bakhteev O.Y., Strijov V.V. Optimal recurrent neural network selection for paraphrase detection // Informatics and Applications, 2018, 12(4) : 63-69. Article Rus
Abstract: The paper investigates the problem of optimal recurrent neural network
selection. The lower bound of the model evidence is the selection
criterion. The study is concentrated on variational approach to approximate
the posterior distribution of the model parameters. The normal distribution
of parameters is approximated with various types of the covariance
matrix. To boost the model evidence, the authors propose a method
for removing parameters with the highest probability density at zero.
As an illustrative example, the problem of multi-class classification
on a sample of pairs of similar and dissimilar SemEval 2015 offers
is considered.
BibTeX:
 
@article{Smerdov2017Paraphrase, 
  author = {Smerdov, A. N. and Bakhteev, O. Y. and Strijov, V. V.},
  title = {Optimal recurrent neural network selection for paraphrase detection},
  journal = {Informatics and Applications},
  year = {2018},
  volume = {12},
  number = {4},
  pages = {63-69},
  url = {http://strijov.com/papers/SmerdovBakhteev2017Paraphrase.pdf},
  doi = {10.14357/19922264180409}
}
Usmanova K.R., Kudiyarov S.P., Martyshkin R.V., Zamkovoy A.A., Strijov V.V. Analysis of relationships between indicators in forecasting cargo transportation // Systems and Means of Informatics, 2018, 28(3) : 6-103. Article Rus
Abstract: In this paper, we analyze relationship and conformity between indicators in control system, monitoring of state and accounting of railway cargo transporta- tion. Macroeconomic time series that contain control actions, system state and target criteria are considered. We suppose that control actions, state and goal- setting are statistically related. Granger causality test is used to establishing a relationship between time series. It is assumed, that pair of time series are related if the use of the history of one of the series improves the quality of the forecast of the other. The main goal of this analysis is improving the quality of cargo transportation forecast. The computational experiment is carried out on data about cargo transportation, control actions and set target criteria.
BibTeX:
 
@article{Usmanova2018TimeSeriesCorrelation, 
  author = {K. R. Usmanova and S. P. Kudiyarov and R. V. Martyshkin and A. A. Zamkovoy and V. V. Strijov},
  title = {Analysis of relationships between indicators in forecasting cargo transportation},
  journal = {Systems and Means of Informatics},
  year = {2018},
  volume = {28},
  number = {3},
  pages = {6-103},
  url = {http://strijov.com/papers/Usmanova2018TimeSeriesCorrelation.pdf},
  doi = {10.14357/08696527180307}
}
Uvarov N.D., Malkova A.S., Kuznetsov M.P., Rudakov K.V., Strijov V.V. Selection of superposition of models for railway freight forecasting // Moscow University Computational Mathematics and Cybernetics, 2018, 42(4) : 186-193. Article
Abstract: Our aim is to construct an optimal superposition of models for the short-term railway traffic forecasting. The historical data constitutes daily railway traffic volume between pairs of stations for different cargo types. The given time series are highly volatile, noisy, and non-stationary. We propose a system that finds an optimal superposition of forecasting models with respect to historical data features. Among the candidate models the system considers: moving average model, exponential and kernel smoothing models, ARIMA model, Croston's method and LSTM neural networks.
BibTeX:
 
@article{Uvarov2018Superpositions, 
  author = {N. D. Uvarov and A. S. Malkova and M. P. Kuznetsov and K. V. Rudakov and V. V. Strijov},
  title = {Selection of superposition of models for railway freight forecasting},
  journal = {Moscow University Computational Mathematics and Cybernetics},
  year = {2018},
  volume = {42},
  number = {4},
  pages = {186-193},
  url = {http://strijov.com/papers/Uvarov2018SuperpositionForecasting_eng.pdf},
  doi = {10.3103/S027864191804009X}
}
Zamkovoy .., Kudiyarov S.P., Martyshkin R.V., Strijov V.V. Harmonization of historical data and expert models for forecasting demand fot rail transportation // Vestnik Universiteta SUM, 2018, 4 : 51-60. Article Rus
Abstract: The article attempts to solve a problem of rail freight traffic volume forecasts using retrospective data, analysis of the impact of external factors on the cargo base and the distribution of goods shipments by transport mode. In order to improve the forecast fidelity proposed a model integrating historical data of freight rail traffic volume and expert assessments of external factors affecting the work of rail transport. The article describes the structure of historical data, time series of freight traffic volumes, as well as relationship with expert models.
BibTeX:
 
@article{strijov2018RZD, 
  author = {. . Zamkovoy and S. P. Kudiyarov and R. V. Martyshkin and V. V. Strijov},
  title = {Harmonization of historical data and expert models for forecasting demand fot rail transportation},
  journal = {Vestnik Universiteta SUM},
  year = {2018},
  volume = {4},
  pages = {51-60},
  url = {https://vestnik.guu.ru/jour/article/view/996?locale=ru_RU},
  doi = {10.26425/1816-4277-2018-4-51-60}
}

2017

Anikeev D., Penkin G., Strijov V.V. Local approximation models for human physical activity classification // Informatics and Applications, 2017, 18(1) : 144-145. Article Rus
Abstract: The problem of classification of time series of an accelerometer of
a mobile phone is investigated. The physical activity class corresponds
to a time series segment. Segment is associated with its feature
description. It is generated by an approximating spline. The elements
of the feature vector are the coefficients of the basic spline functions.
The computational experiment finds the optimal approximation parameters
and parameters of the classification model according to the maximum
likelihood of the logistic classification model.
BibTeX:
 
@article{AnikeyevPenkin2017Splines, 
  author = {Anikeev, D.A. and Penkin, G.O. and Strijov, V. V.},
  title = {Local approximation models for human physical activity classification},
  journal = {Informatics and Applications},
  year = {2017},
  volume = {18(1)},
  pages = {144-145},
  url = {http://strijov.com/papers/AnikeyevPenkin2017Splines.pdf},
  doi = {https://elibrary.ru/item.asp?id=32284821}
}
Bochkarev �.�., Sofronov I.L., Strijov V.V. Generation of expertly-interpreted models for prediction of core permeability // Systems and Means of Informatics, 2017, 27(3) : 74-87. Article Rus
Abstract: This article is devoted to prediction of core permeability. Permeability is one of the main properties for estimation of filtration of gas and liquid in core. To build a permeability model, porosity, density, depth of measurement, and other core physical properties are used. An algorithm for choosing the optimal prediction model is proposed. The model of superpositions of expertly-defined functions is suggested. The proposed method is a superposition of previously obtained optimal expetly-defined functions and a two-layer neural network. The experiment on core analysis, aero- and hydrodynamics datasets was conducted. During the experiment, the optimal expertly-interpreted models for all datasets were derived. The suggested approach is compared to other methods for choosing models, such as Lasso regression, support vector regression (SVR), gradient boosting, and neural network. The error and optimal parameters estimation was conducted using cross-validation. The experiment showed that the proposed approach is competitive with other state-of-the-art methods. Moreover, the number of neurons is significantly reduced with the use of superpositions of expertly-defined functions.
BibTeX:
 
@article{Bochkarev2017PermeabilityEstimation, 
  author = {Bochkarev, �. �. and Sofronov, I. L. and Strijov, V. V.},
  title = {Generation of expertly-interpreted models for prediction of core permeability},
  journal = {Systems and Means of Informatics},
  year = {2017},
  volume = {27(3)},
  pages = {74-87},
  url = {http://strijov.com/papers/Bochkarev2017PermeabilityEstimation.pdf},
  doi = {http://www.ipiran.ru/journal_system/article/08696527170307.html}
}
Cinar Y.G., Mirisaee H., Goswami P., Gaussier E., Ait-Bachir A., Strijov V.V. Time series forecasting using RNNs: an extended attention mechanism to model periods and handle missing values // Neural Information Processing , 2017 : 533-544. Article
Abstract: In this paper, we study the use of recurrent neural networks (RNNs)
for modeling and forecasting time series. We first illustrate the
fact that standard sequence-to-sequence RNNs neither capture well
periods in time series nor handle well missing values, even though
many real life times series are periodic and contain missing values.
We then propose an extended attention mechanism that can be deployed
on top of any RNN and that is designed to capture periods and make
the RNN more robust to missing values. We show the effectiveness
of this novel model through extensive experiments with multiple univariate
and multivariate datasets.
BibTeX:
 
@article{Cinar2017TimeSeries, 
  author = {Yagmur G. Cinar and Hamid Mirisaee and Parantapa Goswami and Eric Gaussier and Ali Ait-Bachir and Vadim V. Strijov},
  title = {Time series forecasting using RNNs: an extended attention mechanism to model periods and handle missing values},
  journal = {Neural Information Processing },
  year = {2017},
  pages = {533-544},
  url = {https://arxiv.org/pdf/1703.10089.pdf},
  doi = {10.1007/978-3-319-70139-4_54}
}
Katrutsa A.M., Strijov V.V. Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria // Expert Systems with Applications, 2017, 76 : 1-11. Article
Abstract: This paper provides a new approach to feature selection based on the
concept of feature filters, so that feature selection is independent
of the prediction model. Data fitting is stated as a single-objective
optimization problem, where the objective function indicates the
error of approximating the target vector as some function of given
features. Linear dependence between features induces the multicollinearity
problem and leads to instability of the model and redundancy of the
feature set. This paper introduces a feature selection method based
on quadratic programming. This approach takes into account the mutual
dependence of the features and the target vector, and selects features
according to relevance and similarity measures defined according
to the specific problem. The main idea is to minimize mutual dependence
and maximize approximation quality by varying a binary vector that
indicates the presence of features. The selected model is less redundant
and more stable. To evaluate the quality of the proposed feature
selection method and compare it with others, we use several criteria
to measure instability and redundancy. In our experiments, we compare
the proposed approach with several other feature selection methods,
and show that the quadratic programming approach gives superior results
according to the criteria considered for the test and real data sets.
BibTeX:
 
@article{Katrutsa2016QPFeatureSelection, 
  author = {Katrutsa, A. M. and Strijov, V. V.},
  title = {Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria},
  journal = {Expert Systems with Applications},
  year = {2017},
  volume = {76},
  pages = {1-11},
  url = {http://strijov.com/papers/Katrutsa2016QPFeatureSelection.pdf},
  doi = {10.1016/j.eswa.2017.01.048}
}
Kulunchakov A.S., Strijov V.V. Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation // Expert Systems with Applications, 2017, 85 : 221-230. Article
Abstract: This paper investigates an approach to construct new ranking models for Information Retrieval. The IR ranking model depends on the document description. It includes the term frequency and document frequency. The model ranks documents upon a user request. The quality of the model is defined by the difference between the documents, which experts assess as relative to the request, and the ranked ones. To boost the model quality a modified genetic algorithm was developed. It generates models as superpositions of primitive functions and selects the best according to the quality criterion. The main impact of the research if the new technique to avoid stagnation and to control structural complexity of the consequently generated models. To solve problems of stagnation and complexity, a new criterion of model selection was introduced. It uses structural metric and penalty functions, which are defined in space of generated superpositions. To show that the newly discovered models outperform the other state-of-the-art IR scoring models the authors perform a computational experiment on TREC datasets. It shows that the resulted algorithm is significantly faster than the exhaustive one. It constructs better ranking models according to the MAP criterion. The obtained models are much simpler than the models, which were constructed with alternative approaches. The proposed technique is significant for developing the information retrieval systems based on expert assessments of the query-document relevance.
BibTeX:
 
@article{Kulunchakov2016IRfunc, 
  author = {Kulunchakov, A. S. and Strijov, V. V.},
  title = {Generation of simple structured Information Retrieval functions by genetic algorithm without stagnation},
  journal = {Expert Systems with Applications},
  year = {2017},
  volume = {85},
  pages = {221-230},
  url = {http://strijov.com/papers/Kulunchakov2014RankingBySimpleFun.pdf},
  doi = {10.1016/j.eswa.2017.05.019}
}
Molybog I.O., Motrenko A.P., Strijov V.V. Improving classification quality for intrinsic plagiarism problem // Informatics and Applications, 2017, 11(3) : 60-72. Article Rus
Abstract: The paper addresses the classification problem in multidimensional
spaces. The authors propose a supervised modification of t-distributed
Stochastic Neighbor Embedding algorithm. Additional features of the
proposed modification are that, unlike the original algorithm, it
does not require retraining if new data is added to the training
set and can be easily parallelized. The novel method was applied
to detect intrinsic plagiarism in a collection of documents. The
authors also test the performance of their algorithm using synthetic
data and show that the quality of classification is higher with thealgorithm
than without or with other algorithms for dimension reduction.
BibTeX:
 
@article{MolybogMotrenko2017DimRed, 
  author = {Molybog, I. O. and Motrenko, A. P. and Strijov, V. V.},
  title = {Improving classification quality for intrinsic plagiarism problem},
  journal = {Informatics and Applications},
  year = {2017},
  volume = {11(3)},
  pages = {60-72},
  url = {http://strijov.com/papers/MolybogMotrenko2017DimRed.pdf},
  doi = {http://www.ipiran.ru/journal/issues/article/19922264170307.html}
}
Rudakov K.V., Kuznetsov M.P., Motrenko A.P., Stenina M.M., Kashirin D.O., Strijov V.V. Optimal model selection for rail freight forecasting // Automation and Remote Control, 2017, 78(1) : 75-87. Article
Abstract: Consideration was given to selection of an optimal model of short-term forecasting of the volumes of railway transport from the historical and exogenous time series. The historical data carry information about the transportation volumes of various goods between pairs of stations. It was assumed that the result of selecting an optimal model depends on the level of aggregation in the types of goods, departure and destination points, and time. Considered were the models of vector autoregression, integrated model of the autoregressive moving average, and a nonparametric model of histogram forecasting. Criteria for comparison of the forecasts on the basis of distances between the errors of model forecasts were proposed. They are used to analyze the models with the aim of determining the admissible requests for forecast, the actual forecast depth included.
BibTeX:
 
@article{Rudakov2015RZD, 
  author = {Rudakov, K. V. and Kuznetsov, M. P. and Motrenko, A. P. and Stenina, M. M. and Kashirin, D. O. and Strijov, V. V.},
  title = {Optimal model selection for rail freight forecasting},
  journal = {Automation and Remote Control},
  year = {2017},
  volume = {78(1)},
  pages = {75-87},
  url = {http://strijov.com/papers/Rudakov2015RZD.pdf},
  doi = {10.1134/S0005117917010064}
}
Aduenko A.A. Model selection in classification problems (PhD thesis supervised by V.V. Strijov). Moscow Institute of Physics and Technology, 2017. PhdThesis Rus
Abstract: The problem of constructing multimodels in the classification problem
is investigated. The task of classification is basic in machine learning,
and the problems of multiclass classification can be effectively
reduced to solving one or more problems of two-class classification.
The tasks of the two-class classification are the problem of determining
the presence of the disease in the patient according to the set of
his analyzes, the task of analyzing the texts to get the mood of
the messages and the task of credit scoring. These tasks are relevant
in connection with the spread of remote diagnostics, automatic decision-making
systems.


Logistic regression, which is the standard in credit scoring, and
other generalized linear models do not allow to take into account
the heterogeneity in the data, in particular the dependence of the
importance of the feature on the object, and therefore are not optimal
in its presence. To take into account inhomogeneities in the data,
classifier compositions are used. The methods for constructing the
model composition allow one to take into account the inhomogeneity
in the data by constructing a multimodel containing several single
models. Models in the multi-model can be close or coincident, which
leads to uninterpretability and a decrease in the quality of the
forecast. In the work offer heuristics for thinning the ensemble
of models in the bagging. Genetic algorithms are used to select a
subset of models in keying. In the works, clustering models and choosing
a single representative for each cluster are used. The papers offer
a greedy strategy of gradually increasing the number of classifiers
in bagging. To control the number of models, use a priori sparse
distribution of the weights of the models in the mixture. The structure
of the mixture is sought by maximizing the validity. However, these
methods of thinning mixtures do not take into account the proximity
between models, and therefore the multimodel can still contain close
models. To obtain statistically distinguishable models in multimodels,
an external thinning procedure is used, based on a statistical comparison
of models by calculating distances between a posteriori parameter
distributions for different models, for example, using Bregman divergences
or f-divergences. In this paper it is shown that the existing similarity
measures distinguish the noninformative model and the coincident
informative one, and therefore do not allow to build an adequate
multimodel. To solve this problem, a similarity function is proposed
that allows solving the problem of statistical differentiation of
models. The proposed approach allows to take into account heterogeneities
in the data, to obtain an adequate multimodel containing fewer models
and having a better quality of classification.


The presence of redundant or multicorrelated features affects not
only the quality of the classification of the constructed model,
but also its stability. To solve the task of selecting characteristics
in this paper, the Bayesian approach uses the principle of maximum
validity for determining the structure of models. To solve the problem
of multicollinearity of attributes, a set of non-multicollinear features
is constructed by optimizing the quality criterion. In this paper
it is shown that the approach associated with the selection of characteristics
is not optimal. It is proved that the method of maximum validity
does not allow to take into account the dependencies between the
signs, since the estimate of the maximum validity for the covariance
matrix of characteristic weights is asymptotically degenerate. For
optimal accounting of information from multicollinear features, it
is suggested that they be combined.
BibTeX:
 
@phdthesis{Aduenko2017ModelSelection, 
  author = {Aduenko, A. A.},
  title = {Model selection in classification problems (PhD thesis supervised by V.V. Strijov)},
  school = {Moscow Institute of Physics and Technology},
  year = {2017},
  url = {http://www.frccsc.ru/sites/default/files/docs/ds/002-073-05/diss/11-aduenko/11-Aduenko_main.pdf?626}
}
Kuzmin A.A. Hierarchical classication of document collection (PhD thesis supervised by V.V. Strijov). Moscow Institute of Physics and Technology, 2017. PhdThesis Rus
Abstract: This work investigates methods of text documents categorization and
classification. These methods automatically structure documents as
hierarchical themes. Also, they optimize existing themes and reveal
thematic inconsistencies.
BibTeX:
 
@phdthesis{Kuzmin2017HierarchicalClustering, 
  author = {Kuzmin, A. A.},
  title = {Hierarchical classication of document collection (PhD thesis supervised by V.V. Strijov)},
  school = {Moscow Institute of Physics and Technology},
  year = {2017},
  url = {http://www.frccsc.ru/sites/default/files/docs/ds/002-073-05/diss/08-kuzmin/008-kuzmin_main-txt.pdf?809}
}

2016

Bakhteev O.Y., Popova M.S., Strijov V.V. Systems and means of deep learning for classification problems // Systems and Means of Informatics, 2016, 26(2) : 4-22. Article Rus
Abstract: The paper provides a guidance on deep learning net construction and
optimization using GPU. The paper proposes to use GPU-instances on
the cloud platform Amazon Web Services. The problem of time series
classification is considered. The paper proposes to use a deep learning
net, i.e. a multilevel superposition of models, belonging to the
following classes: Restricted Boltzman Machines, autoencoders and
neural nets with softmax-function in output. The proposed method
was tested on a dataset containing time segments from mobile phone
accelerometer. The analysis of relation between classification error,
dataset size and superposition parameter amount is conducted.
BibTeX:
 
@article{Bakhteev2016AWS, 
  author = {Bakhteev, O. Yu. and Popova, M. S. and Strijov, V. V.},
  title = {Systems and means of deep learning for classification problems},
  journal = {Systems and Means of Informatics},
  year = {2016},
  volume = {26(2)},
  pages = {4-22},
  url = {http://strijov.com/papers/Bakhteev2016AWS.pdf},
  doi = {10.14357/08696527160201}
}
Goncharov A.V., Strijov V.V. Metric time series classification using weighted dynamic warping relative to centroids of classes // Informatics and Applications, 2016, 10(2) : 36-47. Article Rus
Abstract: This paper discusses a problem of metric time series analysis and
classification. The proposed classification model uses the matrix
of distances between time series which is built with fixed distance
function. The dimension of this distance matrix is very high and
all related calculations are time-consuming. The problem of reducing
the computational complexity is solved by selection reference objects
and using them for describing classes. Model that uses dynamic time
warping for building reference objects or centroids is chosen as
a basic model. This paper introduces a function of weights for each
centroid that influence on calculating the distance measure. Time
series of different analytic functions and time series of human activity
from accelerometer of mobile phone are used as the objects for classification.
Properties and classification result of this model are investigated
and compared with properties of basic model.
BibTeX:
 
@article{Goncharov2015autumn, 
  author = {Goncharov, A. V. and Strijov, V. V.},
  title = {Metric time series classification using weighted dynamic warping relative to centroids of classes},
  journal = {Informatics and Applications},
  year = {2016},
  volume = {10(2)},
  pages = {36-47},
  url = {http://strijov.com/papers/Goncharov2015authumn.pdf},
  doi = {10.14357/19922264160204}
}
Isachenko R.V., Strijov V.V. Metric learning in multiclass time series classification problem // Informatics and Applications, 2016, 10(2) : 48-57. Article Rus
Abstract: This paper is devoted to the problem of multiclass time series classification.
It is proposed to align time series in relation to class centroids.
Building of centroids and alignment of time series is carried out
by the dynamic time warping algorithm. The accuracy of classification
depends significantly on the metric used to compute distances between
time series. The distance metric learning approach is used to improve
classification accuracy. Themetric learning proceduremodifies distances
between objects to make objects fromthe same cluster closer and from
the different clusters more distant. The distance between time series
is measured by the Mahalanobis metric. The distance metric learning
procedure finds the optimal transformation matrix for the Mahalanobis
metric. To calculate quality of classification, a computational experiment
on synthetic data and real data of human activity recognition was
carried out.
BibTeX:
 
@article{Isachenko2016MetricsLearning, 
  author = {Isachenko, R. V. and Strijov, V. V.},
  title = {Metric learning in multiclass time series classification problem},
  journal = {Informatics and Applications},
  year = {2016},
  volume = {10(2)},
  pages = {48-57},
  url = {http://strijov.com/papers/Isachenko2016MetricsLearning.pdf},
  doi = {10.14357/19922264160205}
}
Karasikov M.E., Strijov V.V. Feature-based time-series classification // Informatics and Applications, 2016, 10(4) : 121-131. Article Rus
Abstract: The paper if devoted to multi-class time-series classification problem.
Feature- based approach that uses meaningful and concise representations
for feature space con- struction is applied. A time-series is considered
as a sequence of segments, approximated by parametric models and
their parameters are used as time-series features. This fea- ture
construction method inherits from approximation model such unique
properties as shift invariance. We propose an approach to solve time-series
classification problem using distributions of parameters of approximation
model. The proposed approach is applied to human activity classification
problem. The computational experiments on real data demonstrate superiority
of proposed algorithm over baseline solutions.
BibTeX:
 
@article{Karasikov2016TSC, 
  author = {Karasikov, M. E. and Strijov, V. V.},
  title = {Feature-based time-series classification},
  journal = {Informatics and Applications},
  year = {2016},
  volume = {10(4)},
  pages = {121-131},
  url = {http://strijov.com/papers/Karasikov2016TSC.pdf},
  doi = {10.14357/19922264160413}
}
Kuznetsov M.P., Motrenko A.P., Kuznetsova M.V., Strijov V.V. Methods for intrinsic plagiarism detection and author diarization // Working Notes of CLEF, 2016, 1609 : 912-919. Article
Abstract: The paper investigates methods for intrinsic plagiarism detection
and author diarization. We developed a plagiarism detection method
based on constructing an author style function from features of text
sentences and detecting outliers. We adapted the method for the diarization
problem by segmenting author style statistics on text parts, which
correspond to different authors. Both methods were tested on the
PAN-2011 collection for the intrinsic plagiarism detection and implemented
for the PAN-2016 competition on author diarization.
BibTeX:
 
@article{Kuznetsov2016CLEF, 
  author = {Kuznetsov, M. P. and Motrenko, A. P. and Kuznetsova, M. V. and Strijov, V. V.},
  title = {Methods for intrinsic plagiarism detection and author diarization},
  journal = {Working Notes of CLEF},
  year = {2016},
  volume = {1609},
  pages = {912-919},
  url = {http://ceur-ws.org/Vol-1609/16090912.pdf},
  doi = {http://ceur-ws.org/Vol-1609/}
}
Kuznetsov M.P., Tokmakova A.A., Strijov V.V. Analytic and stochastic methods of structure parameter estimation // Informatica, 2016, 27(3) : 607-624. Article
Abstract: The paper presents analytic and stochastic methods of structure parameters
estimation for model selection. Structure parameters are covariance
matrices of parameters of linear and non-linear regression models.
To optimize the model parameters and the structure parameters we
maximize the model evidence including the data likelihood and the
prior parameter distribution. The analytic methods are based on the
approximated model evidence derivatives computation. The stochastic
methods are based on the model parameters sampling and data cross-validation.
The proposed methods are tested and compared on synthetic and real
data.
BibTeX:
 
@article{Kuznetsov2013Structure, 
  author = {Kuznetsov, M. P. and Tokmakova, A. A. and Strijov, V. V.},
  title = {Analytic and stochastic methods of structure parameter estimation},
  journal = {Informatica},
  year = {2016},
  volume = {27(3)},
  pages = {607-624},
  url = {http://strijov.com/papers/HyperOptimizationEng.pdf},
  doi = {http://www.mii.lt/informatica/pdf/INFO1109.pdf}
}
Kuznetsova M.V., Strijov V.V. Local forecasting of time series with invariant transformations // Information Technologies, 2016, 22(6) : 457-462. Article Rus
Abstract: The paper describes a univariate time series forecasting model. It
proposes to find segments of local history, which are similar to
the forecasted segment. A distance function is used to cluster segments.
The forecast is the average of the value of time series from this
cluster. To improve the quality of forecast the paper proposes an
invariant transformation of segments. This transformation holds the
equivalence of time series respect to clusters. The transformation
is a function, constructed by the dynamic time warping procedure.
The retrospective forecasting procedure calculates the accuracy of
the forecasting model. Accelerometer time series of a person�s motion
are used in computational experiment. It compares two constructing
forecasting models. The first one clusters segments, the second one
uses k-nearest neighbor algorithm to select similar segments.
BibTeX:
 
@article{Kuznetsova2015TimeSeries, 
  author = {Kuznetsova, M. V. and Strijov, V. V.},
  title = {Local forecasting of time series with invariant transformations},
  journal = {Information Technologies},
  year = {2016},
  volume = {22(6)},
  pages = {457-462},
  url = {http://strijov.com/papers/Kuznetsova2015TimeSeries.pdf}
}
Motrenko A.P., Rudakov K.V., Strijov V.V. Combining endogenous and exogenous variables in a special case of non-parametric time series forecasting model // Moscow University Computational Mathematics and Cybernetics, 2016, 40(2) : 71-78. Article
Abstract: We address a problem of increasing quality of forecasting time series
by taking into account the information about exogenous factors. Our
aim is to improve a special case of non-parametric forecasting algorithm,
namely the hist algorithm, derived from quantile regression.
The hist minimizes the convolution of a histogram of time series
with the loss function. To include exogenous factors into this model
we suggest to correct the histogram of endogenous time series, using
exogenous time series. We propose to adjust the histogram, using
mixtures of conditional histograms as a less sparse alternative to
multidimensional histogram and in some cases demonstrate the decrease
of loss compared to the basic forecasting algorithm. To the extent
of our knowledge, such approach to combining endogenous and exogenous
time series is original and has not been proposed yet. The suggested
method is illustrated with the data from the Russian Railways.
BibTeX:
 
@article{Motrenko2015ExogenousFactors, 
  author = {Motrenko, A. P. and Rudakov, K. V. and Strijov, V. V.},
  title = {Combining endogenous and exogenous variables in a special case of non-parametric time series forecasting model},
  journal = {Moscow University Computational Mathematics and Cybernetics},
  year = {2016},
  volume = {40(2)},
  pages = {71-78},
  url = {http://strijov.com/papers/Motrenko2015ExogenousFactors.pdf},
  doi = {10.3103/S0278641916020072}
}
Motrenko A.P., Strijov V.V. Extracting fundamental periods to segment human motion time series // Journal of Biomedical and Health Informatics, 2016, 20(6) : 1466 - 1476. Article
Abstract: The paper addresses a problem of sensor-based time series segmentation
as a part of human activity recognition problem. We assume that each
studied time series contains a fundamenta periodic which can be seen
as an ultimate entity (cycle) of motion. Due to the nature of the
data and the urge to obtain interpretable results of segmentation,
we defne the segmentation as a partition of the time series into
the periods of this fundamental periodic. To split the time series
into periods we select a pair of principal components of the Hankel
matrix. We then cut the trajectory of the selected principal components
by its symmetry axis, thus obtaining half-periods that are merged
into segments. A method of selecting a pair of components, corresponding
to the fundamental periodic is proposed.
BibTeX:
 
@article{Motrenko2015Fundamental, 
  author = {Motrenko, A. P. and Strijov, V. V.},
  title = {Extracting fundamental periods to segment human motion time series},
  journal = {Journal of Biomedical and Health Informatics},
  year = {2016},
  volume = {20(6)},
  pages = {1466 - 1476},
  url = {http://strijov.com/papers/MotrenkoStrijov2014RV2.pdf},
  doi = {10.1109/JBHI.2015.2466440}
}
Neychev R.G., Katrutsa A.M., Strijov V.V. Robust selection of multicollinear features in forecasting // Factory Laboratory, 2016, 82(3) : 68-74. Article Rus
Abstract: This paper considers a problem of constructing a stable forecasting
model using feature selection methods. It proposes a multicollinearity
detection criterion, which is necessary in the case of excessive
number of features. To investigate properties of this criterion,
a theorem is stated. It develops the Belsley method. The proposed
criterion runs an algorithm to exclude correlated features, reduce
dimensionality of the feature space and to obtain robust estimations
of the model parameters. The algorithm adds and removes features
consequently according to this criterion. The LAD-Lasso algorithm
was chosen as the basic to compare with. The computational experiment
investigates an hourly-price forecasting curve problem with the proposed
and the basic algorithms. The experiment carried out using time series
of the German electricity prices.
BibTeX:
 
@article{Neychev2015FeatureSelection, 
  author = {Neychev, R. G. and Katrutsa, A. M. and Strijov, V. V.},
  title = {Robust selection of multicollinear features in forecasting},
  journal = {Factory Laboratory},
  year = {2016},
  volume = {82(3)},
  pages = {68-74},
  url = {http://strijov.com/papers/Neychev2015FeatureSelection.pdf}
}
Zadayanchuk A.I., Popova M.C., Strijov V.V. Selection of optimal physical activity classification model using measurements of accelerometer // Information Technologies, 2016, 22(4) : 313-318. Article Rus
Abstract: This paper solves the problem of selecting optimal stable models for
classification of physical activity. We select optimal models from
the class of two-layer artificial neural networks. There are three
different ways to change structure of neurons: network pruning, network
growing, and their combination. We construct models by removing its
neurons. Neural networks with insufficient or excess number of neurons
have insufficient generalization ability and can make unstable predictions.
Proposed genetic algorithm optimizes the neural network structure.
The novelty of the work lies in the fact that the probability of
removing neurons is determined by the variance of parameters. In
the computing experiment, models are generated by optimization two
quality criteria � accuracy and stability.
BibTeX:
 
@article{Zadayanchuk2015OptimalNN4, 
  author = {Zadayanchuk, A. I. and Popova, M. C. and Strijov, V. V.},
  title = {Selection of optimal physical activity classification model using measurements of accelerometer},
  journal = {Information Technologies},
  year = {2016},
  volume = {22(4)},
  pages = {313-318},
  url = {http://strijov.com/papers/Zadayanchuk2015OptimalNN4.pdf}
}
Zhuravlev Y.I., Rudakov K.V., Korchagin A.D., Kuznetsov M.P., Motrenko A.P., Stenina M.M., Strijov V.V. Methods for hierarchical time series forecasting // Niotices of the Russian Academy of Sciences, 2016, 86. � 2. �. 138 : 138. Article Rus
Abstract: The papers investigates problems of planning in railway freight transportation
under conditions of non-stationary, non-uniform and noisy data. To
boost quality of planning it proposes to create an intelligent system,
which is based on mathematical models, historical data and expert
estimations. The paper describes a project on forecasting system
to plan the railway freight transportations following analysis of
dependence the freight transportation demand on exogenous factors.
BibTeX:
 
@article{Zhur2016TimeSeries, 
  author = {Zhuravlev, Yu. I. and Rudakov, K. V. and Korchagin, A. D. and Kuznetsov, M. P. and Motrenko, A. P. and Stenina, M. M. and Strijov, V. V.},
  title = {Methods for hierarchical time series forecasting},
  journal = {Niotices of the Russian Academy of Sciences},
  year = {2016},
  volume = {86. � 2. �. 138},
  pages = {138},
  url = {http://strijov.com/papers/Zhuravlev2015RZD.pdf},
  doi = {10.7868/S0869587316020213}
}
Goncharov A.V., Strijov V.V. Continuous time series alignment in human actions recognition // Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference proceedings // AINL FRUCT: Artificial Intelligence and Natural Language Conference, 2016 : 83-86. InProceedings
Abstract: Human physical activity monitoring with wearable devices imposes significant
restrictions on the processing power and the amount of memory available
to the algorithm. Proposed to move from discrete time series representation
to its analytical description and analyze them using mathematical
models for satisfying these constraints. The work deals with physical
activity classification. It uses metric classification algorithm,
where the object�s class determined by the distance from this object
to the nearest centroid. Paper proposed to approximate all time series
with splines and find the distance to the nearest centroid using
continuous alignment path. The calculation of distance is performed
using analytical transformations.
BibTeX:
 
@inproceedings{Gonchariv2016Fruct, 
  author = {Goncharov, A. V. and Strijov, V. V.},
  title = {Continuous time series alignment in human actions recognition},
  booktitle = {AINL FRUCT: Artificial Intelligence and Natural Language Conference},
  journal = {Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference proceedings},
  year = {2016},
  pages = {83-86},
  url = {http://strijov.com/papers/Goncharov_Fruct_2016.pdf}
}
Kuzmin A.A., Aduenko A.A., Strijov V.V. Hierarchical thematic modeling of short text collection // Intelligent Data Processing, Conference Proceedings, 2016 : 174-175. InProceedings
Abstract: The aim of this study is to construct and verify a hierarchical thematic
model of a short text collection. The present authors consider the
ways for metrics learning and features selection. Agglomerative and
divisive methods to construct a hierarchical model are compared.
A hierarchical weighted similarity function is suggested for unlabeled
data classification. Weights in this function are the importance
values of the terms from the collection dictionary. Entropy-based
approach is used to estimate these weights according to the expert
model. The proposed similarity function is represented as four-level
neural network to consider vector representation of the words given
by a trained language model. The proposed methods are used to construct
an expert system that helps experts to classify unlabeled abstracts
of the major conference EURO. The parameters of this model are estimated
using expert models of EURO conference from 2006 till 2016. The results
are compared with hierarchical multiclass SVM, probabilistic thematic
model SuhiPLSA, and hierarchical naive Bayes approach.
BibTeX:
 
@inproceedings{Kuzmin2016IDP, 
  author = {Kuzmin, A. A. and Aduenko, A. A. and Strijov, V. V.},
  title = {Hierarchical thematic modeling of short text collection},
  booktitle = {Intelligent Data Processing, Conference Proceedings},
  year = {2016},
  pages = {174-175},
  url = {http://strijov.com/papers/Kuzmin_Modeling_ShortText2016.pdf}
}
Kuzmin A.A., Aduenko A.A., Strijov V.V. Thematic Classification for EURO/IFORS Conference Using Expert Model // 28th European Conference on Operational Research, 2016. InProceedings
Abstract: Every year the program committee of a major conference constructs
its scientific program. Some participants take part in invited sessions,
but for the majority of participants the PC along with experts have
to choose sessions according to their contributed abstracts. To fit
an abstract into the current conference programme one has to construct
an expert system. It should respect previous conferences structure
and use thematic modeling techniques. The conference structure represents
a tree. It has abstracts as leaves and areas, streams, sessions as
nodes. Abstracts from the previous conferences already have their
positions in this structure. To classify a new abstract one can use
divisive hierarchical classification methods, based on SVM, NB or
kNN. However, these methods are greedy. Insufficient number of abstracts
in each lowest level cluster makes classification unstable. In addition,
expert and algorithmic classifications differs. So a group of the
most relevant clusters is preferable than the best one to meet expert
needs. We propose a relevance operator that returns all clusters
sorted by their relevance. We consider three ways of constructing
such operator using hierarchical multiclass SVM, PLSA with Adaptive
Regularization, and proposed weighted hierarchical similarity function.
We construct a model of EURO 2010 using expert models of EURO 2012
and 2013 to demonstrate performance of proposed methods.
BibTeX:
 
@inproceedings{KuzminEURO2016, 
  author = {Kuzmin, A. A. and Aduenko, A. A. and Strijov, V. V.},
  title = {Thematic Classification for EURO/IFORS Conference Using Expert Model},
  booktitle = {28th European Conference on Operational Research},
  year = {2016},
  url = {http://strijov.com/papers/KuzminEURO2016.pdf}
}
Motrenko A.P., Neychev R.G., Isachenko R.V., Popova M.S., Gromov A.N., Strijov V.V. Feature generation for multiscale time series forecasting // Intelligent Data Processing, Conference Proceedings, 2016 : 129-130. InProceedings
Abstract: The paper presents a framework for the massive multiscale time series
forecast. The focus is on the problem of forecasting behavior of
a device within the concept of Internet of things. The device is
monitored by a set of sensors, which produces large amount of multiscale
time series during its lifespan. These time series have various time
scales since distinct sensors produce observations with various frequencies
from milliseconds to weeks. The main goal is to predict the observations
of a device in a given time range. The authors propose a method of
constructing efficient feature description for the corresponding
regression problem. The method involves feature generation and dimensionality
reduction procedures. Generated features include historical information
about the target time series as well as other available time series,
local transformations, and multiscale features. Several forecasting
algorithms have been applied to the resulting regression problem
and the quality of the forecasts has been investigated for various
horizon values.
BibTeX:
 
@inproceedings{MotrenkoMiltiscale2016IDP, 
  author = {Motrenko, A. P. and Neychev, R. G. and Isachenko, R. V. and Popova, M. S. and Gromov, A. N. and Strijov, V. V.},
  title = {Feature generation for multiscale time series forecasting},
  booktitle = {Intelligent Data Processing, Conference Proceedings},
  year = {2016},
  pages = {129-130},
  url = {https://sourceforge.net/p/mvr/code/HEAD/tree/lectures/DataFest/Strijov2016FeatureGeneration.pdf?format=raw}
}
Neychev R.G., Motrenko A.P., Isachenko R.V., Inyakin A.S., Strijov V.V. Multimodel forecasting multiscale time series in Internet of things // Intelligent Data Processing, 2016 : 130-131. InProceedings
Abstract: The paper presents an approach to forecasting multiple intercorrelated
time series that can be generated by different sensors of devices
within a concept of Internet of things. In this case, generated data
are not independent and identically-distributed and there feature
space has a complex structure. The forecast construction is considered
as regression problem. To solve it, the authors propose mixture of
experts approach where several forecasting models are used. Neural
networks are chosen as the forecasting models. The optimal structure
of neural networks, their parameters, and quantity of experts are
analyzed. The proposed method has been tested within computational
experiment where it was compared to gradient boosting and decision
tree methods. The experiment was conducted on real data containing
information about electricity consumption and weather conditions
in Poland.
BibTeX:
 
@inproceedings{Neychev2016IDP, 
  author = {Neychev, R. G. and Motrenko, A. P. and Isachenko, R. V. and Inyakin, A. S. and Strijov, V. V.},
  title = {Multimodel forecasting multiscale time series in Internet of things},
  booktitle = {Intelligent Data Processing},
  year = {2016},
  pages = {130-131},
  url = {http://www.machinelearning.ru/wiki/images/9/94/NeychevIDP11.pdf}
}
Strijov V.V., Motrenko A.P. Large-scale time series forecasting // 28th European Conference on Operational Research // 28th European Conference on Operational Research, 2016. InProceedings
Abstract: The talk is devoted to investigation of behavior of a device, a member
of the internet of things. A device is monitored by a set of sensors,
which produces large amount of multiscale time series during its
lifespan. These time series have various time scales, due to measurements
could perform over each millisecond, day, week, etc. The main goal
is to forecast the next state of a device. The investigation assumes
the following conditions for a single device unit time series: there
are large set of multiscale time series; the sampling rate of a time
series is fixed; each time series has its own forecast horizon. To
make an adequate forecasting model hold the following hypothesis:
the time history is sufficient long; the time series have auto- and
cross-correlation dependencies. The model is static, so there exists
a history of optimal size. Each time series could be interpolated
by some local model, a that there exist a local approximation model,
which could be applied in the case of local data absence. The vector-autoregression
approach conducts problem statement. To find a model of optimal complexity
a consequent model generation-selection procedure was constructed.
The test-bench compares random forest, boosting and mixture of experts.
BibTeX:
 
@inproceedings{Strijov2016MultiscaleForecasting, 
  author = {Strijov, V. V. and Motrenko, A. P.},
  title = {Large-scale time series forecasting},
  booktitle = {28th European Conference on Operational Research},
  journal = {28th European Conference on Operational Research},
  year = {2016},
  url = {http://strijov.com/papers/Strijov2016MultiscaleForecasting.pdf}
}
Vladimirova M.R., Strijov V.V. Bagging of neural networks in multitask classification of biological acivity for nuclear receptors // Intelligent Data Processing, Conference Proceedings, 2016 : 18-19. InProceedings
Abstract: The paper is devoted to the multitask classification problem. The
main purpose is building an adequate model to predict whether the
object belongs to a particular class, precisely, whether the ligand
binds to a specific nuclear receptor. Nuclear receptors are a class
of proteins found within cells. These receptors work with other proteins
to regulate the expression of specific genes, thereby controlling
the development, homeostasis, and metabolism of the organism. The
regulation of gene expression generally only happens when a ligand
a molecule that affects the receptor�s behavior binds to a nuclear
receptor. Two-layer neural network is used as a classification model.
The paper considers the problems of linear and logistic regressions
with squared and cross-entropy loss functions. To analyze the classification
result, the authors propose to decompose the error into bias and
variance terms. To improve the quality of classification by reducing
the error variance, the authors suggest the composition of neural
networks  bagging. Bagging generates a set of subsamples from the
training sample using the bootstrap procedure. All subsamples have
the same size as initial sample. Classifiers are trained on each
subsample separately. Then their individual predictions are aggregated
by voting. The proposed method improves the quality of investigated
sample classification.
BibTeX:
 
@inproceedings{Vladimorove2016IDP, 
  author = {Vladimirova, M. R. and Strijov, V. V.},
  title = {Bagging of neural networks in multitask classification of biological acivity for nuclear receptors},
  booktitle = {Intelligent Data Processing, Conference Proceedings},
  year = {2016},
  pages = {18-19},
  url = {http://www.machinelearning.ru/wiki/images/5/5f/VladimirovaIOI2016_eng.pdf}
}
Kuznetsov M.P. Construction preference learning models using ordinal-scaled expert estimations (PhD thesis supervised by V.V. Strijov). Moscow Institute of Physics and Technology, 2016. PhdThesis Rus
Abstract: The thesis is devoted to preference learning models. The proposed
methods involve rank-scales expert estimations as object features.
BibTeX:
 
@phdthesis{Kuznetsov2016PhDThesis, 
  author = {Kuznetsov, M. P.},
  title = {Construction preference learning models using ordinal-scaled expert estimations (PhD thesis supervised by V.V. Strijov)},
  school = {Moscow Institute of Physics and Technology},
  year = {2016},
  url = {https://mipt.ru/upload/iblock/782/kuznetsov_dissertatsiya.pdf},
  doi = {https://mipt.ru/upload/iblock/3cb/kuznetsov_avtoreferat.pdf}
}

2015

Aduenko A.A., Rudakov K.V., Reyer I.A., Vasileysky A.S., Karelov A., Strijov V.V. Algorithm of detection and registration of persistent scatters on satellite radar images // Computer optics, 2015, 39(4) : 622-630. Article Rus
Abstract: To detect small movements of Earth surface (with a velocity less than
several centimeters per year) with use of SAR-interferometry methods
it is necessary to find a number of surface areas remaining coherent
on radar images over a long period. These areas and corresponding
image points are called persistent scatterers. Two methods of persistent
scatterers detection are consid-ered in the paper. The methods are
compared by the number of detected points and their average time
coherence. The algorithms considered are illustrated with an example
of processing of a set containing 35 radar images.
BibTeX:
 
@article{Aduenko2015SAR_ComOptics.pdf, 
  author = {Aduenko, A. A. and Rudakov, K. V. and Reyer, I. A. and Vasileysky, A. S. and Karelov, A.I. and Strijov, V. V.},
  title = {Algorithm of detection and registration of persistent scatters on satellite radar images},
  journal = {Computer optics},
  year = {2015},
  volume = {39(4)},
  pages = {622-630},
  url = {http://strijov.com/papers/Aduenko2015PSdetection.pdf},
  doi = {10.18287/0134-2452-2015-39-4-622-630}
}
Gazizullina R.K., Medvednikova M.M., Strijov V.V. Capacity of railway cargo transportation forecasting // Systems and Means of Informatics, 2015, 25(1) : 144-157. Article Rus
Abstract: The article is devoted to research of the algorithm of nonparametric
forecasting of railway cargo transportation capacity. The problem
considered is forecasting the number of wagons with various goods,
following various routes. Topology of the railway network is given
- for all possible pairs of railway lines information about all blocks
of wagons, which have moved from one line to another, including the
number of wagons in a block, type of cargo and date of a route, is
provided. The algorithm, based on convolution of empirical density
distribution of values ??of time series with loss function, is used
for prediction. Previously forecast was carried out for each railway
junction separately. Quality of the forecast is proposed to improve
due to prediction by pairs of lines instead of predicting departure
of all wagons from the given junction. The algorithm is illustrated
by daily data on transportation of 38 types of cargo collected during
year and a half.
BibTeX:
 
@article{Gazizullina2014RailwayForecasting, 
  author = {Gazizullina, R. K. and Medvednikova, M. M. and Strijov, V. V.},
  title = {Capacity of railway cargo transportation forecasting},
  journal = {Systems and Means of Informatics},
  year = {2015},
  volume = {25(1)},
  pages = {144-157},
  url = {http://strijov.com/papers/Gazizullina2014RailwayForecasting.pdf},
  doi = {10.14357/08696527150109}
}
Goncharov A.V., Popova M.S., Strijov V.V. Metric time series classification using dynamic warping relative to centroids of classes // Systems and Means of Informatics, 2015, 25(4) : 52-64. Article Rus
Abstract: This paper discusses a problem of time series classification in case
of several classes. The proposed classification model uses the matrix
of distance between time series. This distance measure is defined
by dynamic time warping method. The dimension of the distance matrix
is very high. This paper introduces centroids of each class as a
reference objects to decrease this dimension. The distance matrix
with lower dimension describes the distance between all objects and
reference objects. We use this method for human activity recognition
and investigate the quality of classification on data from the mobile
accelerometer. This metric algorithm of classification is compared
with separating classification algorithm.
BibTeX:
 
@article{Goncharov2015MetricClassification, 
  author = {Goncharov, A. V. and Popova, M. S. and Strijov, V. V.},
  title = {Metric time series classification using dynamic warping relative to centroids of classes},
  journal = {Systems and Means of Informatics},
  year = {2015},
  volume = {25(4)},
  pages = {52-64},
  url = {http://strijov.com/papers/Goncharov2015MetricClassification.pdf},
  doi = {10.14357/08696527150404}
}
Ignatov A.D., Strijov V.V. Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer // Multimedia Tools and Applications, 2015, 17.05.2015 : 1-14. Article
Abstract: The current generation of portable mobile devices incorporates various
types of sensors that open up new areas for the analysis of human
behavior. In this paper, we propose a method for human physical activity
recognition using time series, collected from a single tri-axial
accelerometer of a smartphone. Primarily, the method solves a problem
of time series segmentation, assuming that each meaningful segment
corresponds to one fundamental period of motion. To extract the fundamental
period we construct the phase trajectory matrix, applying the technique
of principal component analysis. The obtained segments refer to various
types of human physical activity. To recognize these activities we
use the k-nearest neighbor algorithm and neural network as an alternative.
We verify the accuracy of the proposed algorithms by testing them
on the WISDM dataset of labeled accelerometer time series from thirteen
users. The results show that our method achieves high precision,
ensuring nearly 96% recognition accuracy when using the bunch of
segmentation and k-nearest neighbor algorithms.
BibTeX:
 
@article{Ignatov2015HumanActivity, 
  author = {Ignatov, Andrey D. and Strijov, Vadim V.},
  title = {Human activity recognition using quasiperiodic time series collected from a single triaxial accelerometer},
  journal = {Multimedia Tools and Applications},
  year = {2015},
  volume = {17.05.2015},
  pages = {1-14},
  url = {http://strijov.com/papers/Ignatov2015HumanActivity.pdf},
  doi = {10.1007/s11042-015-2643-0}
}
Katrutsa A.M., Kuznetsov M.P., Rudakov K.V., Strijov V.V. Metric concentration search procedure using reduced matrix of pairwise distances // Intelligent Data Analysis, 2015, 19(5) : 1091-1108. Article Eng http://content.iospress.com/articles/intelligent-data-analysis/ida760
Abstract: This paper presents a new fast clustering algorithm RhoNet, based
on the metric concenration location procedure. To locate the metric
concentration, the algorithm uses a reduced matrix of pairwise ranks
distances. The key feature of the proposed algorithm is that it doesn't
need the exhaustive matrix of pairwise distances. This feature reduces
computational complexity. It is designed to solve the protein secondary
structure recognition problem. The computational experiment collects
tests and to hold performance analysis and analysis of dependency
for the algorithm quality and structure parameters. The algorithm
is compared with k-modes and tested on different metrics and data
sets.
BibTeX:
 
@article{Katrutsa2014RhoNet, 
  author = {Katrutsa, A. M. and Kuznetsov, M. P. and Rudakov, K. V. and Strijov, V. V.},
  title = {Metric concentration search procedure using reduced matrix of pairwise distances},
  journal = {Intelligent Data Analysis},
  year = {2015},
  volume = {19(5)},
  pages = {1091-1108},
  url = {http://strijov.com/papers/Katrutsa2014RhoNetClustering.pdf},
  doi = {10.3233/IDA-150760}
}
Katrutsa A.M., Strijov V.V. The multicollinearity problem for feature selection methods in regression // Informational Technologies, 2015, 1 : 8-18. Article Rus
Abstract: The paper investigates the multicollinearity problem in regression
analysis and its influence on the performance of feature selection
methods. The authors propose a procedure to test feature selection
methods. A criteria is proposed to compare the feature selection
methods, according to their performance when the multicollinearity
is present. The feature selection methods are compared according
to the other well-known evaluation measures. Methods to generate
data sets of different multicollinearity types were proposed. The
authors investigate performance of feature selection methods. The
feature selection methods were tested on the data sets of different
multicollinearity types.
BibTeX:
 
@article{Katrutsa2014TestGeneration, 
  author = {A. M. Katrutsa and V. V. Strijov},
  title = {The multicollinearity problem for feature selection methods in regression},
  journal = {Informational Technologies},
  year = {2015},
  volume = {1},
  pages = {8-18},
  url = {http://strijov.com/papers/Katrutsa2014TestGeneration.pdf}
}
Katrutsa A.M., Strijov V.V. Stresstest procedure for feature selection algorithms // Chemometrics and Intelligent Laboratory Systems, 2015, 142 : 172-183. Article
Abstract: This study investigates the multicollinearity problem and the performance
of feature selection methods in case of datasets have multicollinear
features. We propose a stresstest procedure for a set of feature
selection methods. This procedure generates test data sets with various
configurations of the target vector and features. A number of some
multicollinear features are inserted in every configuration. A feature
selection method results a set of selected features for given test
data set. To compare given feature selection methods the procedure
uses several quality measures. A criterion of the selected features
redundancy is proposed. This criterion estimates number of multicollinear
features among the selected ones. To detect multicollinearity it
uses the eigensystem of the parameter covariance matrix. In computational
experiments we consider the following illustrative methods: Lasso,
ElasticNet, LARS, Ridge and Stepwise and determine the best one,
which solve the multicollinearity problem for every considered configuration
of dataset.
BibTeX:
 
@article{Katrutsa2015Stresstest, 
  author = {Katrutsa, A. M. and Strijov, V. V.},
  title = {Stresstest procedure for feature selection algorithms},
  journal = {Chemometrics and Intelligent Laboratory Systems},
  year = {2015},
  volume = {142},
  pages = {172-183},
  url = {http://strijov.com/papers/Katrutsa2014TestGenerationEn.pdf},
  doi = {10.1016/j.chemolab.2015.01.018}
}
Kuznetsov M.P., Clausel M., Amini M.-R., Gaussier E., Strijov V.V. Supervised topic classification for modeling a hierarchical conference structure // in S. Arik et al. (Eds.): International conference on neural information processing, Part 1, LNCS, 2015, 9489 : 90�97. Article
Abstract: In this paper we investigate the problem of supervised latent modelling
for extracting topic hierarchies from data. The supervised part is
given in the form of expert information over document-topic correspondence.
To exploit the expert information we use a regularization term that
penalizes the di erence between a predicted and an expertgiven model.
We hence add the regularization term to the log-likelihood function
and use a stochastic EM based algorithm for parameter estimation.
The proposed method is used to construct a topic hierarchy over the
proceedings of the European Conference on Operational Research and
helps to automatize the abstract submission system.
BibTeX:
 
@article{TopicModelsICONIP2015, 
  author = {Kuznetsov, M. P. and Clausel, M. and Amini, M.-R. and Gaussier, E. and Strijov, V. V.},
  title = {Supervised topic classification for modeling a hierarchical conference structure},
  journal = {in S. Arik et al. (Eds.): International conference on neural information processing, Part 1, LNCS},
  year = {2015},
  volume = {9489},
  pages = {90�97},
  url = {http://strijov.com/papers/TopicModelsICONIP2015.pdf},
  doi = {10.1007/978-3-319-26532-2_11}
}
Popova M.S., Strijov V.V. Selection of optimal physical activity classification model using measurements of accelerometer // Informatics and applications, 2015, 9(1) : 76-86. Article Rus
Abstract: In this paper we solve the problem of selecting optimal stable models
for classification of physical activity. Each type of physical activity
of a particular person is described by a set of features generated
from the accelerometer time series. In conditions of feature�s multicollinearity
selection of stable models is hampered by the need to evaluate a
large number of parameters of these models. Evaluation of optimal
parameter values is also difficult due to the fact that the error
function has a large number of local minima in the parameter space.
In the paper we choose the optimal models from the class of two-layer
artificial neural networks. We solve the problem of finding the Pareto
optimal front of the set of models. The paper presents a stepwise
strategy of building optimal stable models. The strategy includes
steps of deleting and adding parameters, criteria of pruning and
growing the model and criteria of breaking the process of building.
The computational experiment compares models generated by the proposed
strategy on three quality criteria --- complexity, accuracy and stability.
BibTeX:
 
@article{Popova2014OptimalModelSelection, 
  author = {Maria S. Popova and Vadim V. Strijov},
  title = {Selection of optimal physical activity classification model using measurements of accelerometer},
  journal = {Informatics and applications},
  year = {2015},
  volume = {9(1)},
  pages = {76-86},
  url = {http://strijov.com/papers/Popova2014OptimalModelSelection.pdf},
  doi = {10.14357/19922264150107}
}
Popova M.S., Strijov V.V. Building superposition of deep learning neural networks for solving the problem of time series classication // Systems and Means of Informatics, 2015, 25(3) : 60-77. Article Rus
Abstract: This paper solves the problem of time-series classi cation using deep
learning neural networks. The paper proposes to use a multilevel
superposition of models belonging to the following classes of neural
networks: two-layer neural networks, Boltzmann machines and autoencoders.
Lower levels of superposition extract from noisy data of high dimensionality
informative features, while the upper level of the superposition
solves the problem of classi cation based on these extracted features.
The proposed model has been tested on two samples of physical activity
time series. The classi cation results obtained by proposed model
in computational experiment were compared with the results which
were obtained on the same datasets by foreign authors. The study
showed the possibility of using deep learning neural networks for
solving problems of time-series physical activity classi cation.
BibTeX:
 
@article{PopovaStrijov2015DeepLearning, 
  author = {Popova, M. S. and Strijov, V. V.},
  title = {Building superposition of deep learning neural networks for solving the problem of time series classication},
  journal = {Systems and Means of Informatics},
  year = {2015},
  volume = {25(3)},
  pages = {60-77},
  url = {http://strijov.com/papers/PopovaStrijov2015DeepLearning.pdf},
  doi = {10.14357/08696527150304}
}
Rudakov K.V., Sanduleanu L.N., Tokmakova A.A., Yamschikov I.S., Reyer I.A., Strijov V.V. Terrain objects movement detection using SAR interferometry // Computer Research and Modeling, 2015, 7(5) : 1047-1060. Article
Abstract: To determine movements of infrastructure objects on Earth surface,
SAR interferometry is used. The method is based on obtaining a series
of detailed satellite images of the same Earth surface area at different
times. Each image consists of the amplitude and phase components.
To determine terrain movements the change of the phase component
is used. A method of persistent scatterers detection and estimation
of relative shift of objects corresponding to persistent scatterers
is suggested.
BibTeX:
 
@article{Sanduleanu2016SAR, 
  author = {Rudakov, K. V. and Sanduleanu, L. N. and Tokmakova, A. A. and Yamschikov, I. S. and Reyer, I. A. and Strijov, V. V.},
  title = {Terrain objects movement detection using SAR interferometry},
  journal = {Computer Research and Modeling},
  year = {2015},
  volume = {7(5)},
  pages = {1047-1060},
  url = {http://strijov.com/papers/Rudakov_crm_2015.pdf},
  doi = {http://crm.ics.org.ru/journal/article/2370/}
}
Stenina M.M., Kuznetsov M.P., Strijov V.V. Ordinal classification using Pareto fronts // Expert Systems with Applications, 2015, 42(14) : 5947�5953. Article
Abstract: We solve an instance ranking problem using ordinal scaled expert estimations.
The experts define a preference binary relation on the set of features.
The instance ranking problem is considered as the monotone multiclass
classification problem. To solve the problem we use a set of Pareto
optimal fronts. The proposed method is illustrated with the problem
of categorization of the IUCN Red List threatened species.
BibTeX:
 
@article{Medvednikova2014POF, 
  author = {Stenina, M. M. and Kuznetsov, M. P. and Strijov, V. V.},
  title = {Ordinal classification using Pareto fronts},
  journal = {Expert Systems with Applications},
  year = {2015},
  volume = {42(14)},
  pages = {5947�5953},
  url = {http://strijov.com/papers/Medvednikova2014POF.pdf},
  doi = {10.1016/j.eswa.2015.03.021}
}
Stenina M.M., Strijov V.V. Forecasts reconciliation for hierarchical time series forecasting problem // Informatics and applications, 2015, 9(2) : 77-89. Article Rus
Abstract: The hierarchical time series forecasting problem is researched. Time
series forecasts must satisfy the physical constraints and the hierarchical
structure. In this paper a new algorithm for hierarchical time series
forecasts reconciliation is proposed. The algorithm is called GTOp
(Game-theoretically Optimal reconciliation). It guarantees that reconciled
forecasts quality is not worse than self-dependent forecasts one.
This approach is based on Nash equilibrium search for the antagonistic
game and turn forecasts reconciliation problem into the optimization
problem with equality and inequality constraints. It is proved that
the Nash equilibrium in pure strategies exists in the game if some
assumptions about the hierarchical structure, the physical constraints
and the loss function are satisfied. The algorithm performance is
demonstrated for different types of hierarchical structures of time
series.
BibTeX:
 
@article{Stenina2014Reconciliation.pdf, 
  author = {Stenina, M. M. and Strijov, V. V.},
  title = {Forecasts reconciliation for hierarchical time series forecasting problem},
  journal = {Informatics and applications},
  year = {2015},
  volume = {9(2)},
  pages = {77-89},
  url = {http://strijov.com/papers/Stenina2014Reconciliation.pdf},
  doi = {10.14357/19922264150209}
}
Strijov V.V., Weber G.W., Weber R., Sureyya O.A. Editorial of the special issue data analysis and intelligent optimization with applications // Machine Learning, 2015, 101(1-3) : 1-4. Article Eng http://link.springer.com/article/10.1007/s10994-015-5523-y
Abstract: This special issue on �Data Analysis and Intelligent Optimization
with Applications� follows a previous special issue of this journal
on the interplay of Machine Learning and Optimization, �Model Selection
and Optimization in ML� (Machine Learning 85:1-2, October 2011).
This time we shift our focus to applications of data analysis and
optimization techniques. Optimization problems underlie most machine
learning approaches. Due to emergence of new practical applications,
new problems and challenges for traditional approaches arise. Emergent
applications generate new data analysis problems, which, in turn
boost new research in optimization. The contribution of machine learning
researchers into the field of optimization is of considerable significance
and should not be overlooked. This special issue collected solutions,
adapted for real world problems, leading to massive and large-scale
data sets, online data and imbalanced data. We encouraged submission
of papers, devoted to combining machine learning and data analysis
techniques with advances in optimization to produce methods of Intelligent
Optimization, both theoretical and practical. Our goal for this special
issue was to bring together researchers working in different areas,
related to analytics and optimization.
BibTeX:
 
@article{Strijov2015Editorial, 
  author = {Strijov, V. V. and Weber, G. W. and Weber, R. and Sureyya, O. A.},
  title = {Editorial of the special issue data analysis and intelligent optimization with applications},
  journal = {Machine Learning},
  year = {2015},
  volume = {101(1-3)},
  pages = {1-4},
  url = {http://link.springer.com/content/pdf/10.1007%2Fs10994-015-5523-y.pdf},
  doi = {10.1007/s10994-015-5523-y}
}

2014

Aduenko A.A., Strijov V.V. Joint feature and object selection in multiclass classification of documents // Infocommunication Technologies, 2014, 1 : 47-54. Article Rus
Abstract: The article is dedicated to the problem of search engine results ranking.
The algorithm of multiclass classifi cation with joint selection
of features and objects is proposed. It is modifi ed for interclass
relevance comparison. Features and objects selection is performed
with stepwise regression and with genetic algorithm. Results obtained
using both algorithms are compared. Proposed multiclass classifi
cation algorithm is tested on synthetic data and on data of Yandex
search engine results.
BibTeX:
 
@article{Aduenko2013Multiclass, 
  author = {A. A. Aduenko and V. V. Strijov},
  title = {Joint feature and object selection in multiclass classification of documents},
  journal = {Infocommunication Technologies},
  year = {2014},
  volume = {1},
  pages = {47-54},
  url = {http://strijov.com/papers/Aduenko2013Multiclass.pdf}
}
Kuzmin A.A., Aduenko A.A., Strijov V.V. Thematic classification using expert model for major conference abstracts // Information Technologies, 2014, 6 : 22-26. Article Rus
Abstract: The aim of this paper is to verify a thematic structure of the conference
abstracts collection. The conference consists of main Areas; each
main Area consists of Streams; each Stream contains Sessions; Session
consists of several talks. This conference structure determines a
thematic model of the conference. Thousands of scientists submit
their abstracts and participate in the a major conference, and the
its thematic model of such conference has a multilevel structure.
The program committee constructs an expert thematic model of the
conference every year. Due to the huge number of experts in program
committee, they meet the problem of thematic integrity verification
occurs. The aim of this paper is to find inconsistences in the expert
thematic model using the a text clustering approach. We consider
an abstracts collection with an given expert model. The base assumption
is that the terms of the abstract determine the theme of this abstract
and its position location in the thematic model. We propose the a
similarity function of two abstracts and . The introduce a quality
function, which determines the quality of the thematic model. It
considering involves the intracluster and intercluster similarities.
The proposed fast non-metric clustering algorithm maximizes the this
quality function. To make the some constructed model similar with
the given expert model, the algorithm modity doesn�t change a the
constructed model if the increase of the quality function exceeds
is less than a some set fixed value of the threshold parameter value.
This threshold impacts on the number of revealed inconsistences in
the expert model. The proposed method constructs a thematic model
for the abstracts for EURO 2013.
BibTeX:
 
@article{Kuzmin2014Thematic, 
  author = {A. A. Kuzmin and A. A. Aduenko and V. V. Strijov},
  title = {Thematic classification using expert model for major conference abstracts},
  journal = {Information Technologies},
  year = {2014},
  volume = {6},
  pages = {22-26},
  url = {http://strijov.com/papers/Kuzmin2014Thematic.pdf}
}
Kuznetsov M.P., Strijov V.V. Methods of expert estimations concordance for integral quality estimation // Expert Systems with Applications, 2014, 41(4-2) : 1988-1996. Article
Abstract: The paper presents new methods of alternatives ranking using expert
estimations and measured data. The methods use expert estimations
of objects quality and criteria weights. This expert estimations
are changed during the computation. The expert estimation are supposed
to be measured in linear and ordinal scales. Each object is described
by the set of linear, ordinal or nominal criteria. The constructed
object estimations must not contradict both the measured criteria
and the expert estimations. The paper presents methods of expert
estimations concordance. The expert can correct result of this concordance.
BibTeX:
 
@article{KuznetsovStrijov2014MethodsExpert, 
  author = {M. P. Kuznetsov and V. V. Strijov},
  title = {Methods of expert estimations concordance for integral quality estimation},
  journal = {Expert Systems with Applications},
  year = {2014},
  volume = {41(4-2)},
  pages = {1988-1996},
  url = {http://strijov.com/papers/Kuznetsov-Strijov2013Concordance.pdf},
  doi = {10.1016/j.eswa.2013.08.095}
}
Motrenko A., Strijov V.V. Obtaining an aggregated forecast of railway freight transportation using Kullback�Leibler distance // Informatics and applications, 2014, 8(2) : 86-97. Article Rus
Abstract: This study addresses the problem of obtaining an aggregated forecast
of railway freight transportation. To improve the quality of aggregated
forecast, we solve a time series clusterization problem, such that
the time series in each cluster belong to the seme distribution.
Solving the clusterization problem, we need to estimate the distance
between empirical distributions of the time series. We introduce
a two-sample test based on the Kullback-Leibler distance between
histograms of the time series. We provide theoretical and experimental
research of the suggested test. Also, as a demonstration, the clusterization
of a set of railway time series based on the Kullback�Leibler distance
between time series is obtained.
BibTeX:
 
@article{Motrenko2014KullbackLeibler, 
  author = {A.P. Motrenko and V. V. Strijov},
  title = {Obtaining an aggregated forecast of railway freight transportation using Kullback�Leibler distance},
  journal = {Informatics and applications},
  year = {2014},
  volume = {8(2)},
  pages = {86-97},
  url = {http://strijov.com/papers/MotrenkoStrijov2014KL.pdf}
}
Motrenko A.P., Strijov V.V., Weber G.-W. Bayesian sample size estimation for logistic regression // Journal of Computational and Applied Mathematics, 2014, 255 : 743-752. Article
Abstract: The problem of sample size estimation is important in the medical
applications, especially in the cases of expensive measurements of
immune biomarkers. The papers describes the problem of logistic regression
analysis including model feature selection and includes the sample
size determination algorithms, namely methods of univariate statistics,
logistics regression, cross-validation and Bayesian inference. The
authors, treating the regression model parameters as the multivariate
variable, propose to estimate sample size using the distance between
parameter distribution functions on cross-validated data sets.
BibTeX:
 
@article{Motrenko2013Bayesian, 
  author = {Anastasiya P. Motrenko and Vadim V. Strijov and Gerhard-Wilhelm Weber},
  title = {Bayesian sample size estimation for logistic regression},
  journal = {Journal of Computational and Applied Mathematics},
  year = {2014},
  volume = {255},
  pages = {743-752},
  url = {http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize.pdf},
  doi = {10.1016/j.cam.2013.06.031}
}
Stenina M.M., Strijov V.V. Reconciliation of aggregated and disaggregated time series forecasts in nonparametric forecasting problems // Systems and Means of Informatics, 2014, 24(2) : 21-34. Article Rus
Abstract: In many applications there are problems of forecasting a lot of time
series with hierarchical structure. It is needed to reconcile forecasts
across the hierarchy. In this paper new algorithm of reconciliation
hierarchical time series forecasts is proposed. This algorithm is
based on solving of optimization problem with constraints. Proposed
algorithm allows to reconcile forecasts with nonplanar hierarchical
structure and take into account physical constraints of forecasted
values such as non-negativeness or maximal value. The algorithm performance
is illustrated by railroad stations occupancy data in Omsk region.
Forecasts quality is compared with forecasts quality optimal algorithm
of reconciliation. Also the algorithm performance is demonstrated
for nonplanar hierarchical structure of time series.
BibTeX:
 
@article{Stenina2014RailRoadsMatching, 
  author = {Stenina, M. M. and Strijov, V. V.},
  title = {Reconciliation of aggregated and disaggregated time series forecasts in nonparametric forecasting problems},
  journal = {Systems and Means of Informatics},
  year = {2014},
  volume = {24(2)},
  pages = {21-34},
  url = {http://strijov.com/papers/Stenina2014RailRoadsMatching.pdf},
  doi = {0.14357/08696527140202}
}
Varfolomeeva A.A., Strijov V.V. An algorithm for bibliographic records parsing using structure learning methods // Information Technologies, 2014, 7 : 11-15. Article Rus
Abstract: The paper solves the application problem of structured texts segmentation,
namely each segment of a bibliographic record must correspond to
its filed type of the BibTeX format and each record must correspond
to its bibliographic type. This problem arises due to the existence
of different standards for bibliographic records: an algorithm for
determining the types of fields of bibliographic records, which is
independent of the specific standards of their composition, should
be proposed. To solve the problem of determining the field type the
method of constructing matrix �objects� and matrices �answers� is
proposed. The authors offer an algorithm of a bibliography lists
parsing using the structure regression method, and the optimization
problem of regression model�s parameters is also solved. According
to the results of fields' segmentation bibliographic types of the
records are clustered. The quality of the constructed model is investigated
using a collection of non-parsed bibliography lists. In the paper
it is shown the proposed algorithm has good quality of segmentation
and clustering, if it has sufficient training sample.
BibTeX:
 
@article{VarfolomeevaStrijov2013FeatureSelection, 
  author = {Varfolomeeva, A. A. and Strijov, V. V.},
  title = {An algorithm for bibliographic records parsing using structure learning methods},
  journal = {Information Technologies},
  year = {2014},
  volume = {7},
  pages = {11-15},
  url = {http://strijov.com/papers/Varfolomeeva2013StrcLearning.pdf}
}
Aduenko A.A., Strijov V.V. Multimodelling and Object Selection for Banking Credit Scoring // Conference of the International Federation of Operational Research Societies, 2014 : 138. InProceedings
Abstract: To construct a bank credit scoring model one must select a set of
informative objects (client records) to get the unbiased estimation
of the model parameters. This set must have no outliers. The authors
propose an object selection algorithm for mixture of regression models.
It is based on analysis of the covariance matrix for the parameters
estimations. The computational experiment shows statistical significance
of the classification quality improvement. The algorithm is illustrated
with the cash loans and heart disease data sets.
BibTeX:
 
@inproceedings{Aduenko2014MultomodelingMulticollinear_IFORS, 
  author = {Alexander A. Aduenko and Vadim V. Strijov},
  title = {Multimodelling and Object Selection for Banking Credit Scoring},
  booktitle = {Conference of the International Federation of Operational Research Societies},
  year = {2014},
  pages = {138},
  url = {http://strijov.com/papers/Aduenko2014MultiModel_IFORS.pdf}
}
Katrutsa A.M., Strijov V.V. Multicollinearity: Performance Analysis of Feature Selection Algorithms // Conference of the International Federation of Operational Research Societies, 2014 : 138. InProceedings
Abstract: We investigate the multicollinearity problem and its influence on
the performance of feature selection methods. The paper proposes
the testing procedure for feature selection methods. We discuss the
criteria for comparing feature selection methods according to their
performance when the multicollinearity is present. Feature selection
methods are compared according to the other evaluation measures.
We propose the method of generating test data sets with different
kinds of multicollinearity. Authors conclude about the performance
of feature selection methods if the multicollinearity is present.
BibTeX:
 
@inproceedings{Katrutsa2014MultomodelingMulticollinear_IFORS, 
  author = {Alexandr M. Katrutsa and Vadim V. Strijov},
  title = {Multicollinearity: Performance Analysis of Feature Selection Algorithms},
  booktitle = {Conference of the International Federation of Operational Research Societies},
  year = {2014},
  pages = {138},
  url = {http://strijov.com/papers/Katrutsa2014MultiCollinear_IFORS.pdf}
}
Kuzmin A.A., Aduenko A.A., Strijov V.V. Thematic Classification for EURO/IFORS Conference Using Expert Model // Conference of the International Federation of Operational Research Societies, 2014 : 175. InProceedings
Abstract: The decision support system predicts the areas, streams and sessions
for the abstracts of a major conference. Abstract collections from
the previous EURO/IFORS (2010, 2012, 2013) conferences and their
expert thematic models are considered. The terminological dictionary
of the conference and the global thematic model of these collections
are constructed. A similarity function between two abstracts is proposed.
The non-metric hierarchical clustering algorithm which considers
a constructed global thematic model is used to construct the thematic
model of a new conference without an expert model.
BibTeX:
 
@inproceedings{Kuzmin2014Thematic_INFORS, 
  author = {Arsentii A. Kuzmin and Alexander A. Aduenko and Vadim V. Strijov},
  title = {Thematic Classification for EURO/IFORS Conference Using Expert Model},
  booktitle = {Conference of the International Federation of Operational Research Societies},
  year = {2014},
  pages = {175},
  url = {http://strijov.com/papers/Kuzmin2014Thematic_INFORS.pdf}
}
Kuznetsov M.P., Strijov V.V. Partial Orders Combining for the Object Ranking Problem // Conference of the International Federation of Operational Research Societies, 2014 : 157. InProceedings
Abstract: We propose a new method for the ordinal-scaled object ranking problem.
The method is based on the combining of partial orders corresponding
to the ordinal features. Every partial order is described with a
positive cone in the object space. We construct the solution of the
object ranking problem as the projection to a superposition of the
cones. To restrict model complexity and prevent overfitting we reduce
dimension of the superposition and select most informative features.
The proposed method is illustrated with the problem of the IUCN Red
List monotonic categorization.
BibTeX:
 
@inproceedings{Kuznetsov2014PartialOrders_IFORS, 
  author = {Mikhail P. Kuznetsov and Vadim V. Strijov},
  title = {Partial Orders Combining for the Object Ranking Problem},
  booktitle = {Conference of the International Federation of Operational Research Societies},
  year = {2014},
  pages = {157},
  url = {http://strijov.com/papers/Kuznetsov2014PartialOrder_IFORS.pdf}
}
Matrosov M., Strijov V.V. Short-Term Forecasting of Musical Compositions Using Chord Sequences // Conference of the International Federation of Operational Research Societies, 2014 : 229. InProceedings
Abstract: The objective is to predict a sequence of chords. It is treated as
multivariate time series of discrete values. A chord is represented
as an array of half-tone sounds within one octave. We utilize a classifier
based on probability distributions over chord sequences that are
estimated both on a big training set and some revealed part of the
forecasted melody. It shows robust forecasting on a set of 50 000
midi files. The novelty is model selection algorithm and invariant
representation of chords. The same technique can be used to predict
or synthesize various types of discrete time series.
BibTeX:
 
@inproceedings{Matrosov2014Musical_IFORS, 
  author = {Mikhail Matrosov and Vadim V. Strijov},
  title = {Short-Term Forecasting of Musical Compositions Using Chord Sequences},
  booktitle = {Conference of the International Federation of Operational Research Societies},
  year = {2014},
  pages = {229},
  url = {http://strijov.com/papers/Matrosov2014Musical_IFORS.pdf}
}
Strijov V.V., Kuznetsov M.P., Motrenko A.P. Structure learning and forecasting model generation // Conference of the International Federation of Operational Research Societies, 2014 : 101. InProceedings
Abstract: The aim of the study is to suggest a method to forecast a structure
of a regression model superposition, which approximates a data set
in terms of some quality function. The problem: algorithms of model
selection are computationally complex due to the large number of
models. The solution: we developed a model structure forecasting
algorithm based on previously selected models.
BibTeX:
 
@inproceedings{Strijov2014Structure_IFORS, 
  author = {V. V. Strijov and M. P. Kuznetsov and A. P. Motrenko},
  title = {Structure learning and forecasting model generation},
  booktitle = {Conference of the International Federation of Operational Research Societies},
  year = {2014},
  pages = {101},
  url = {http://strijov.com/papers/Strijov2014StructLearning_IFORS.pdf}
}
Sologub R.A. Algorithms of inductive model generation and transformation for non-linear regression problems (PhD thesis supervised by V.V. Strijov). Russian Academy of Sciences, Computing Center, 2014. PhdThesis Rus
Abstract: The thesis provides a solution for the problem of automatic generation
and validation the quantitative mathematical models. The considered
models are used for describing the results of measurements and experiments.
In the thesis we investigate a fundamental problem of automatic model
generation for in the data analysis field. The generated models are
used for approximation, analysis and forecasting the results of experiments.
To generate a model we consider the expert-given requirements on
the model structure. This consideration allows us to construct the
interpretable models that adequately describe the results of measurements.
To construct an adequate model we use expert-given basic functions
and a set of generation rules. The model is represented as a superposition
of the basic functions. The generation rules define the admissibility
of superpositions and exclude the generation of isomorphic models.
We propose to develop the existing methods of automatic model generation.
In particular, we propose to consider expert requirements to the
model structure and to rank the models according to the expert preferences.
The proposed methods of the isomorphic superpositions search are
based on the isomorphic subgraphs search and on the substitution
of graphs. We investigate the methods and algorithms of model generation,
their properties, complexity and stability. While solving an applied
problem of mathematical modeling, the existing knowledges and expert
information about model structure are often insufficient to construct
the efficient model. Lack of the independent variables makes the
methods of model and feature generation very perspective. The idea
of feature generation based on the generation of the new independent
variables - images of the original variables over the set of successive
mappings. This mappings are called the basic functions. Previously
the applied problems were considered in terms of the present approach.
The basic functions construction and feature generation approaches
were used for the economic and industrial problems. While solving
this problems, the researchers didn�t investigate the existence,
completeness and correctness of the proposed algorithm. In the thesis
we develop the theoretical validation of correctness and admissibility
of the superpositions generation methods and the methods convergence.
We propose methods of optimization of the model structure. The group
method of data handling, an example of the model generation method,
was considered by A.G. Ivakhnenko. In the case of linear model the
method generates new features using the multiplication operation.
Using the Kolmogorov-Gabor polynomials, the algorithm generates the
models of different complexity by the set of criteria. As a result,
the method finds the model of optimal complexity described by an
equation or a system of equations. An important stage of development
of regression models was a consideration of non-linear models. This
approach is widely described by G. Seber: he considered construction
and parameter estimation for the non-linear models. To estimate the
parameters, there was propose a Levenberg-Marquardt method. J. Koza
and N. Zelinka proposed a symbolic regression technique for inductive
model generation. The method found an optimal model from the set
of superpositions by the genetic programming. The inductive model
generation was used to solve an applied problem of the optimal antenna
form determination. V.V. Strijov developed the ideas of the inductive
model generation by using the coherent Bayesian inference for the
parameter estimation. While analysing the model structure, the most
convenient way of the superposition representation is a graph-tree.
Thereby the methods of graph transformation are applied to the superpositions.
This methods allow us to describe formally the structure optimization
procedures. We consider categorial representation of graph transformations
and conditions of the rules usability. For the trees transformation
we use the elementary patterns of graphs and construct the isomorphic
graphs of the more complex structure.
BibTeX:
 
@phdthesis{Sologub2014PhDThesis, 
  author = {Sologub, R. A.},
  title = {Algorithms of inductive model generation and transformation for non-linear regression problems (PhD thesis supervised by V.V. Strijov)},
  school = {Russian Academy of Sciences, Computing Center},
  year = {2014},
  url = {http://strijov.com/papers/Sologub2014Disser-0018d.pdf}
}
Strijov V.V. Model genetation and selection for regression and classification problems (DSc Thesis). Russian Academy of Sciences, Computing Center, 2014. PhdThesis Rus
Abstract: The thesis is devoted to the problem of model selection for regression
and classification. According to the proposed approach, the models
are selected from the inductively generated set. We propose to analyse
the distribution of model parameters to choose the model of optimal
complexity. There are two ways to construct the models, describing
an observed data: mathematical modelling and data analysis. Models
of the first type can be interpreted by the experts in the field
of study [Krasnoshchyokov: 2000]. Models of the second type perform
more efficiently, but don�t always have a clear interpretation [Bishop:
2006]. An actual problem of theoretical computer science is to combine
advantages of the two approaches to obtain efficient interpretable
models. The key issue is to construct the adequate regression and
classification models for the forecasting problems. The problem is
to find the models of optimal complexity describing the data with
given accuracy. An additional restriction is an interpretability
of the models for the expert in the field of study. The goal of research
is to propose and investigate methods of model selection from the
inductively generated set. The problem of model selection from the
countable successively generated set is novel. To formulate the problem
setting we used the broad material in the fields of model and feature
selection, that is one of the key problems in the machine learning
and data analysis area. The basic problem of study is to develop
the methods of the successive models generation and of the parameters
distribution estimation. The estimations of parameter covariance
matrices are used for simplification the model selection procedure.
The key challenge of this problem is the parameters estimation of
the big number of structurally complex regression models. Relation
between model generation and selection problems was investigated
by A.G. Ivakhnenko in the early 1980s. According to the proposed
group method of data handling [Ivakhnenko: 1981, Madala: 1994], the
model of optimal structure can be found by the successive generation
of linear models using the Kolmogorov-Gabor polynomial of the independent
variables. The criteria of optimal model structure is given by the
cross validation procedure. Unlike the GMDH, the symbolic regression
method [Koza: 2005, Zelinka: 2008] generates arbitrary non-linear
superpositions of basic functions. In the last years the problem
of model complexity analysis for symbolic regression became significant
field of study [Hazan: 2006, Vladislavleva: 2009]. Initially the
methods of inductive model generation were proposed in terms of the
group method of data handling. The structure of superposition was
defined by the external quality criteria. Afterwards this criteria
were explained in terms of data generation hypothesis and the Bayesian
inference. To solve a problem of successive model generation, there
arises a problem of estimation of the superposition elements informativity.
In terms of the Bayesian regression [Bishop: 2000], to estimate informativity
the probability density of model parameters is used. The probability
density is a parametric function; its parameters referred to as hyperparameters
[Bishop : 2006]. The hyperparameters analysis can be regarded as
one of model selection methods. For the modification of the non-linear
models superposition there was proposed an optimal brain damage method
[LeCun: 1990]. According to this method, an element of the superposition
is regarded as non-informative, if the saliency value of an error
function doesn�t exceed the given threshold. The model selection
problem is one of the key problems of the regression analysis field.
One of the present model selection methods is the minimum description
length principle. The MDL principle chooses the best compressed efficient
model [Grunwald: 2005]. The problem of models comparison is investigated
in detail by [MacKay: 1994-2003]. As an alternative to the information
criteria [Burnham: 2002, Lehman: 2005], there was proposed a coherent
Bayesian inference. The first level estimates the model parameters.
The second level makes the hyperparameters adjusting. According to
this method, the chance to select more complex model, at the comparable
values of the error function, is less. The principles of the Bayesian
approach in the linear model case were proposed by the authors [Celeux:
2006, Massart: 2008, Fleury: 2006]. At the same time, the mentioned
principles and approaches remain open the questions investigated
in the present thesis. By this reason we propose to create and develop
the theory of regression model generation and selection. The problem
is as follows. The set of models of the given class is inductively
generated by the set of parametric basic functions given by the experts.
Each model is an admissible superposition of the basic functions.
The models interpretability is guaranteed by the expert-given basic
functions, that are the basic elements of the model superposition.
Each class of models is defined by the rules of superposition generation.
The required model accuracy achieved by the consideration of the
wideness of the basic models class. The optimum criteria includes
the concepts of model complexity and accuracy, as well as the data
generation hypothesis. Along with the parameter estimations, the
proposed method estimates the model hyperparameters. Using information
about the hyperparameters, the method estimates informativity of
the superposition elements and optimizes the superposition structure.
The optimum criteria, given by the data generation hypothesis, allows
to choose the optimal models. Thus, we propose a new approach to
the formulated problem. The set of models is generated inductively
from the set of basic functions given by the experts. Each model
is considered as the admissible superposition of the basic functions.
Together with the parameters estimation we propose to estimate the
hyperparameters of the parameters distribution. Using the parameter
estimations we measure the informativity of the superposition elements
and optimize the model structure. We choose the optimal model according
to the quality criteria given by the data generation hypothesis.
�onstruction of the new methods of model selection for the classification
and regression is a major and actual problem of the recognition theory.
BibTeX:
 
@phdthesis{Strijov2014DScThesis, 
  author = {Strijov, V. V.},
  title = {Model genetation and selection for regression and classification problems (DSc Thesis)},
  school = {Russian Academy of Sciences, Computing Center},
  year = {2014},
  url = {http://strijov.com/papers/Strijov2015ModelSelectionRu.pdf}
}

2013

Aduenko A.A., Strijov V.V. Optimal text placement for titles of documents in collection // Software Engineering, 2013, 3 : 21-25. Article Rus
Abstract: Consider the method of visualization of the results of thematic clustering
of documents� collection. Pairwise-distance matrix is projected on
plain using PCA. It is required to place the titles of dociments
on plain. The loss function, which allows to reach a minimal overlap,
is suggested. For its optimisation BFGS algorithm is used. Method
suggested in the article is illustrated by visualization of conference�s
thesis.
BibTeX:
 
@article{Aduenko2013TextVisualizing, 
  author = {A. A. Aduenko and V. V. Strijov},
  title = {Optimal text placement for titles of documents in collection},
  journal = {Software Engineering},
  year = {2013},
  volume = {3},
  pages = {21-25},
  url = {http://strijov.com/papers/AduenkoStrijov2013TextVisualizing.pdf}
}
Budnikov E.A., Strijov V.V. Estimating probabilities of text strings in document collections // Information Technologies, 2013, 4 : 40-45. Article Rus
Abstract: Consider the problem of estimating the probabilities of strings in
a document. To solve the problem, the model of n-grams is used. The
n-gram classes is proposed to solve the estimation problem the large
number of model parameters. Three discount models: Good-Turing, Katz
and absolute discounting are used to solve the problem of zero probability
of strings. The proposed model is illustrated by computational experiments
on real data.
BibTeX:
 
@article{BudnikovStrijov2013Estimation, 
  author = {E. A. Budnikov and V. V. Strijov},
  title = {Estimating probabilities of text strings in document collections},
  journal = {Information Technologies},
  year = {2013},
  volume = {4},
  pages = {40-45},
  url = {http://strijov.com/papers/BudnikovStrijov2013Estimation.pdf}
}
Ivanova A.V., Aduenko A.A., Strijov V.V. Algorithm of construction logical rules for text segmentation // Software Engineering, 2013, 6 : 41-48. Article Rus
Abstract: Consider the method of recovery of BibTeX-structure bibliographic
records from their text representation. Structure is recovered using
logical rules defined on an expert-given set of regular expressions.
Algorithm based on stub covers is proposed for constructing the logic
rules. The algorithm is illustrated with the problem of searching
the structure in bibliographic records, represented by text strings.
BibTeX:
 
@article{IvanovaAduenkoStrijov2013TextMarkUp, 
  author = {A. V. Ivanova and A. A. Aduenko and V. V. Strijov},
  title = {Algorithm of construction logical rules for text segmentation},
  journal = {Software Engineering},
  year = {2013},
  volume = {6},
  pages = {41-48},
  url = {http://strijov.com/papers/Ivanova2012LogicStructureCor.pdf}
}
Kuzmin A.A., Strijov V.V. Validation of thematic models for document collections // Software Engineering, 2013, 4 : 16-20. Article Rus
Abstract: Consider a collection of documents with expert thematic model. To
verify the adequacy of the expert model build an algorithmic model
by hierarchical clustering text collections. The agglomerative and
divisive clustering methods are investigated. The algorithmic model
error in comparison to the expert model is estimated. The differences
between expert model and algorithmic model are visualized.
BibTeX:
 
@article{Kuzmin2013ThematicClustering, 
  author = {A. A. Kuzmin and V. V. Strijov},
  title = {Validation of thematic models for document collections},
  journal = {Software Engineering},
  year = {2013},
  volume = {4},
  pages = {16-20},
  url = {http://strijov.com/papers/Kuzmin2013ThematicClustering.pdf}
}
Medvednikova M.M., Strijov V.V. Construction of rank-scaled quality integral indicator for scientific publications in using co-clustering // Notices of Tula State University, 2013, 1 : 154-165. Article Rus
Abstract: The method of the scientific publications quality measurement is proposed.
It connects the quality of researcher�s publication and the quality
of a journal in which the researcher publishes his article. The joined
integral indicator is computed for the list of previous years publications
using the collaborative filtering algorithm. A proximity function
of authors and journals� integral indicators is proposed as the quality
functional. The involvement of the researchers� and publishers� integration
into the international science is estimated.
BibTeX:
 
@article{Medvednikova2013CoIndicator, 
  author = {M. M. Medvednikova and V. V. Strijov},
  title = {Construction of rank-scaled quality integral indicator for scientific publications in using co-clustering},
  journal = {Notices of Tula State University},
  year = {2013},
  volume = {1},
  pages = {154-165},
  url = {http://strijov.com/papers/Medvednikova2012CoIndicator.pdf}
}
Rudoy G.I., Strijov V.V. Algorithms for inductive generation of superpositions for approximation of experimental data // Informatics and applications, 2013, 7(1) : 17-26. Article Rus
Abstract: The paper presents an algorithm which inductively generates admissible
non-linear models. An algorithm to generate all admissible superpositions
of given complexity in finite number of iterations is proposed. The
proof of its correctness is stated. The proposed approach is illustrated
by a computational experiment on synthetic data.
BibTeX:
 
@article{Rudoy2013Generation, 
  author = {Rudoy, Georgiy I. and Strijov, Vadim V.},
  title = {Algorithms for inductive generation of superpositions for approximation of experimental data},
  journal = {Informatics and applications},
  year = {2013},
  volume = {7(1)},
  pages = {17-26},
  url = {http://strijov.com/papers/Rudoy2012Generation_Preprint.pdf}
}
Strijov V.V. Error function in regression analysis // Factory Laboratory, 2013, 79(5) : 65-73. Article Rus
Abstract: � ������ ������� ������ ���������� ������� ������ ��� ���������� �����
�������������� �������. ��������������� ��������� �������� �������������
��������� ����������, �������� ��� ������� ������ �������� ������������
������. ��� ����������� ������������� �������� ������� ������ ������
���� ��� ��������� ������������� � �������������� ����� ����� ����������
��������� ����������. ����� ��������� ������� ������� ������, ������������
� ���������� ������� �������������� ���������.
BibTeX:
 
@article{Strijov2013ErrorFunction, 
  author = {Strijov, V. V.},
  title = {Error function in regression analysis},
  journal = {Factory Laboratory},
  year = {2013},
  volume = {79(5)},
  pages = {65-73},
  url = {http://strijov.com/papers/Strijov2012ErrorFn.pdf}
}
Strijov V.V., Krymova E.A., Weber G.W. Evidence optimization for consequently generated models // Mathematical and Computer Modelling, 2013, 57(1-2) : 50-56. Article Rus
Abstract: To construct an adequate regression model one has to fulfill the set
of measured features with their generated derivatives. Often the
number of these features exceeds the number of the samples in the
data set. After a feature generation process the problem of feature
selection from a set of highly correlated features arises. The proposed
algorithm uses an evidence maximization procedure to select a model
as a subset of generated features. During the selection process it
rejects multicollinear features. A problem of European option volatility
modeling illustrates the algorithm. Its performance is compared with
the performances of similar well-known algorithms.
BibTeX:
 
@article{Strijov11Evidence, 
  author = {Strijov, V. V. and Krymova, E. A. and Weber, G. W.},
  title = {Evidence optimization for consequently generated models},
  journal = {Mathematical and Computer Modelling},
  year = {2013},
  volume = {57(1-2)},
  pages = {50-56},
  url = {http://www.sciencedirect.com/science/article/pii/S0895717711001075},
  doi = {10.1016/j.mcm.2011.02.017}
}
Tsyganova S.V., Strijov V.V. The construction of hierarchical thematic models for document collection // Applied Informatics, 2013, 1 : 109-115. Article
Abstract: This work is devoted to detection themes of document collection and
to their hierarchical structure. The main task is to construct hierarchical
thematic model for documents' collection. To solve this task it's
suggested to use probabilistic topic models. The main attention is
paid to hierarchical thematic models and, particulary, to discuss
the properties of PLSA and LDA algorythms. The peculiarity of construction
of hierarchical model is the crossing from the conception of "bag
of words" to conception of "bag of themes". The work is illustrate
on theses of EURO-2012 conference and on synthetic data.
BibTeX:
 
@article{TsyganovaStrijov2013Hierarchical, 
  author = {Tsyganova, S. V. and Strijov, V. V.},
  title = {The construction of hierarchical thematic models for document collection},
  journal = {Applied Informatics},
  year = {2013},
  volume = {1},
  pages = {109-115},
  url = {http://strijov.com/papers/Tsyganova2013TopicHierarchy.pdf}
}
Zaytsev A.A., Strijov V.V., Tokmakova A.A. Estimation regression model hyperparameters using maximum likelihood // Informational Technologies, 2013, 2 : 11-15. Article Rus
Abstract: The papers considers the regression model selection problem. The model
parameters are supposed to be a multivariate random variable with
independently distributed components. A method for hyperparameters
optimization is proposed. Direct way to obtain the hyperparameters
estimations is shown. The papers illustrated the usage of the hyperparameters
in the feature selection problem. The suggested method is compared
with the Laplace approximation method.
BibTeX:
 
@article{Zaitsev2012Estimation, 
  author = {A. A. Zaytsev and V. V. Strijov and A. A. Tokmakova},
  title = {Estimation regression model hyperparameters using maximum likelihood},
  journal = {Informational Technologies},
  year = {2013},
  volume = {2},
  pages = {11-15},
  url = {http://strijov.com/papers/ZaytsevStrijovTokmakova2012Likelihood_Preprint.pdf}
}
Aduenko A.A., Kuzmin A.A., Strijov V.V. Hierarchical thematic model visualizing algorithm // 26th European Conference on Operational Research, 2013 : 155. InProceedings
Abstract: The talk is devoted to the problem of the thematic hierarchical model
construction. One must to construct a hierarchcal model of a scientific
conference abstracts, to check the adequacy of the expert models
and to visualize hierarchical differences between the algorithmic
and expert models. An algorithms of hierarchical thematic model constructing
is developed. It uses the notion of terminology similarity to construct
the model. The obtained model is visualized as the plane graph.
BibTeX:
 
@inproceedings{KuzminStrijov2013VisualizingEURO, 
  author = {Aduenko, A. A. and Kuzmin, A. A. and Strijov, V. V.},
  title = {Hierarchical thematic model visualizing algorithm},
  booktitle = {26th European Conference on Operational Research},
  year = {2013},
  pages = {155}
}
Kuznetsov M.P., Strijov V.V. The IUCN Red List threatened speices categorization algorithm // 26th European Conference on Operational Research, 2013 : 352. InProceedings
Abstract: The main purpose of the IUCN Red List is to categorize those plants
and animals that are facing a high risk of extinction. Species are
classified by the IUCN Red List into nine groups ordered by the the
relative risk of extinction in the wild nature. Each species is described
with the rank-scaled features given by the experts. The problem is
to associate each species with one of the groups according to the
data given by the experts. We consider the rank-scaled features as
the cones in the space of objects and construct the solution as the
nearest point to the superposition of this cones.
BibTeX:
 
@inproceedings{KuznetsovStrijov2013RedListEURO, 
  author = {Kuznetsov, M. P. and Strijov, V. V.},
  title = {The IUCN Red List threatened speices categorization algorithm},
  booktitle = {26th European Conference on Operational Research},
  year = {2013},
  pages = {352}
}
Strijov V.V. Credit Scorecard Development: Model Generation and Multimodel Selection // 26th European Conference on Operational Research, 2013 : 220. InProceedings
Abstract: The talk is devoted to the automatic model generation for application
scoring. According to the bank requirements a scorecard consists
of a combination of the logistic regression models. We will discuss
the following problems: First, how many models we must generate?
Second, which model from the generated model set should be used to
compute the probability of default for a newcomer client? Third,
what features must be selected for the models? These problems must
be resolved to develop a precise, stable and simple scorecard.
BibTeX:
 
@inproceedings{Strijov2013ScorecardEURO, 
  author = {Strijov, V. V.},
  title = {Credit Scorecard Development: Model Generation and Multimodel Selection},
  booktitle = {26th European Conference on Operational Research},
  year = {2013},
  pages = {220},
  url = {http://strijov.com/papers/Strijov2013EUROscoring.pdf}
}

2012

Aduenko A.A., Kuzmin A.A., Strijov V.V. Feature selection and metrics optimisation for document collection clustering // Notices of Tula State University, 2012, 3 : 119-131. Article Rus
Abstract: This paper deals with the problem of verification of correctness of
a thematic clustering of texts with the help of metric algorithms.
The algorithm of selection the optimal distance function for texts
is proposed. Correspondence between received texts� clustering and
their expert classification is studied. The results of clusterisation
and their correspondence to expert thematic classification are illustrated
in the computing experiment on the real text collection.
BibTeX:
 
@article{AduenkoKuzminStrijov2013Selection, 
  author = {A. A. Aduenko and A. A. Kuzmin and V. V. Strijov},
  title = {Feature selection and metrics optimisation for document collection clustering},
  journal = {Notices of Tula State University},
  year = {2012},
  volume = {3},
  pages = {119-131},
  url = {http://strijov.com/papers/Kuzmin2013ThematicClustering.pdf}
}
Kuznetsov M.P., Strijov V.V., Medvednikova M.M. Multiclass classification of objects with the rank-scale description // Notices on Science and Technology of SPb. PSU, 2012, 5 : 92-95. Article Rus
Abstract: The authors propose a method of an integral indicator construction
based on the rank-scaled description matrix given by an expert. The
authors propose three-step iterative algorithm to estimate correction
parameters and features weights. The feature selection problem is
investigated. The method illustrated with the problem of classification
of the Red Book of Russian Federation rare species statuses.
BibTeX:
 
@article{Kuznetsov2012RankScales, 
  author = {Kuznetsov, M. P and Strijov, V. V. and Medvednikova, M. M.},
  title = {Multiclass classification of objects with the rank-scale description},
  journal = {Notices on Science and Technology of SPb. PSU},
  year = {2012},
  volume = {5},
  pages = {92-95},
  url = {http://strijov.com/papers/Kuznetsov2012Curvilinear.pdf}
}
Medvednikova M.M., Strijov V.V., Kuznetsov M.P. Algorithm of multiclass monotonous Pareto-classification // Notices of Tula State University, 2012, 3 : 132-141. Article Rus
Abstract: The authors propose a method to search a monotonous function, which
is defined on the cartesian product of the linearly-ordered sets.
The method is based on the procedures of monotonization of the discrete-argument
function and Pareto-optimal front slicing. The feature selection
problem investigated. The problem illustrated with the problem of
forecasting of the Red Book of Russian Federation rare-spices statuses.
BibTeX:
 
@article{Medvednikova2012RankScales, 
  author = {Medvednikova, Mariya M. and Strijov, Vadim V. and Kuznetsov, Mikhail P.},
  title = {Algorithm of multiclass monotonous Pareto-classification},
  journal = {Notices of Tula State University},
  year = {2012},
  volume = {3},
  pages = {132-141},
  url = {http://strijov.com/papers/Medvednikova2012RankScales.pdf}
}
Motrenko A.P., Strijov V.V. Multiclass logistic regression for cardio-vascular disease forecasting // Notices of Tula State University, 2012, 1 : 153-162. Article Rus
Abstract: The paper describes an algorithm to classify four groups of patients:
a cardio-vascular disease group, a cardio-risk group and two types
of healthy groups. The blood-cells protein measurements are the description
features for an investigated patient. The paper develops an algorithm
to forecast a patient�s cardio-vascular disease case as one of four
unordered classes. The problem is to estimate the regression parameters
and select the most informative features for multi-class classification.
During the forecasting all pairs of the classes are considered.
BibTeX:
 
@article{Motrenko2012CVD, 
  author = {A. P. Motrenko and V. V. Strijov},
  title = {Multiclass logistic regression for cardio-vascular disease forecasting},
  journal = {Notices of Tula State University},
  year = {2012},
  volume = {1},
  pages = {153-162},
  url = {http://strijov.com/papers/MotrenkoStrijov2012HAPrediction.pdf}
}
Sanduleanu L.N., Strijov V.V. Feature selection for autoregressive forecasting // Informational Technologies, 2012, 6 : 11-15. Article Rus
Abstract: The authors investigate the optimal model selection problem with application
to the auto-regression forecasting. To solve the problem one has
to select a maximum well-defined feature subset, subject to some
given value of the error function. To select the feature set the
modified add-del feature selection algorithm is used. This paper
suggests a method of time series forecasting model selection. The
computational experiment compares the electricity hourly prices forecasts.
BibTeX:
 
@article{Sanduleanu2012FeatureSelection_IT, 
  author = {L. N. Sanduleanu and V. V. Strijov},
  title = {Feature selection for autoregressive forecasting},
  journal = {Informational Technologies},
  year = {2012},
  volume = {6},
  pages = {11-15},
  url = {http://strijov.com/papers/SanduleanuStrijov2011FeatureSelection_Preprint.pdf}
}
Strijov V.V., Kuznetsov M.P., Rudakov K.V. Rank-scaled metric clustering of amino-acid sequences // Mathematical Biology and Bioinformatics, 2012, 7(1) : 345-359. Article Rus
Abstract: To solve the problem of the secondary protein structure recognition,
an algorithm for amino-acid subsequences clustering is developed.
To reviel clusters it uses the pairwise distances between the subsequences.
The algorithm does not require the complete pairwise matrix. This
main distinction of it implies the reduction of the computational
complexity. To run the clustering, it needs no more than the ranks
of the distances between subsequences. The algorithm is illustrated
using synthetic data along with the amino-acid sequences from the
UniProt database.
BibTeX:
 
@article{Strijov2012Clustering, 
  author = {Strijov, V. V. and Kuznetsov, M. P. and Rudakov, K. V.},
  title = {Rank-scaled metric clustering of amino-acid sequences},
  journal = {Mathematical Biology and Bioinformatics},
  year = {2012},
  volume = {7(1)},
  pages = {345-359},
  url = {http://strijov.com/papers/Strijov2012(7_345).pdf}
}
Tokmakova A.A., Strijov V.V. Estimation of linear model hyperparameters for noisy or correlated feature selection problem // Informatics and applications, 2012, 6(4) : 66-75. Article Rus
Abstract: This paper deals with the problem of feature selection in linear regression
models. To select features authors estimate the covariance matrix
of the model parameters. Dependent variable and model parameters
are assumed to be normally distributed vectors. Laplace approximation
is used for estimation of the covariance matrix: logarithm of the
error function is approximated by the normal distribution function.
The problem of noise or correlated features is also examined, since
in this case the model parameters covariance matrix becomes singular.
An algorithm for feature selection is proposed. The results of the
study for a time series are given in the computational experiment.
BibTeX:
 
@article{Tokmakova2012Hyper, 
  author = {A. A. Tokmakova and V. V. Strijov},
  title = {Estimation of linear model hyperparameters for noisy or correlated feature selection problem},
  journal = {Informatics and applications},
  year = {2012},
  volume = {6(4)},
  pages = {66-75},
  url = {http://strijov.com/papers/Tokmakova2011HyperParJournal_Preprint.pdf}
}
Kuznetsov M.P., Strijov V.V. Integral indicator construction using rank-scaled design matrix // Intellectual Information Processing. Conference proceedings, 2012 : 130-132. InProceedings Rus
Abstract: ������ ������ ���������� ������������ ����������� �������� ��������
� �������������� ���������� ������ � ���������� ������. ������ ������
������ ������� ��������� � �������� ������. ������������ ����������
������ �������� ��������, ������� �������������� � �������� ����������.
��� ������ ���������� � �������� �����. ��������������� ������ ���������
����� ������������ �����������, ��������� ������������� �� ����������
������. ��� ����� �� ������� �������� �������� ��������� ��������
������������� ����������. ������������ ��������� ������������ ���������
���������� ������ �� ��� ���������.
BibTeX:
 
@inproceedings{Kuznetsov2012IOI, 
  author = {Kuznetsov, M. P. and Strijov, V. V.},
  title = {Integral indicator construction using rank-scaled design matrix},
  booktitle = {Intellectual Information Processing. Conference proceedings},
  year = {2012},
  pages = {130--132},
  url = {http://strijov.com/papers/Kuznetsov2012IOI.pdf}
}
Motrenko A.P., Strijov V.V., Weber G.-W. Bayesian sample size estimation for logistic regression // International Conference on Applied and Computational Mathematics, 2012 : 1-5. InProceedings
Abstract: The paper is devoted to the logistic regression analysis, applied
to classification problems in biomedicine. A group of patients is
investigated as a sample set; each patient is described with a set
of features, named as biomarkers and is classified into two classes.
Since the patient measurement is expensive the problem is to reduce
number of measured features in order to increase sample size. The
responsive variable is assumed to follow a Bernoulli distribution.
Also, parameters of the regression function are evaluated. With given
set of features, the model is excessively complex. The problem is
to select a set of features of smaller size, that will classify patients
effectively. In logistic regression features are usually selected
by stepwise regression. In the computational experiment, exhaustive
search is implemented. This makes the experts sure that all possible
combinations of the features were considered. The authors use the
area under ROC curve as the optimum criterion in the feature selection
procedure.
BibTeX:
 
@inproceedings{Motrenko2012Bayesian, 
  author = {Anastasiya P. Motrenko and Vadim V. Strijov and Gerhard-Wilhelm Weber},
  title = {Bayesian sample size estimation for logistic regression},
  booktitle = {International Conference on Applied and Computational Mathematics},
  year = {2012},
  pages = {1-5},
  url = {http://strijov.com/papers/MotrenkoStrijovWeber2012SampleSize_ICACM.pdf}
}
Rudoy G.I., Strijov V.V. Simplification of superpositions of primitive functions with graph rule-rewriting // Intellectual Information Processing. Conference proceedings, 2012 : 140-143. InProceedings Rus
Abstract: The paper develops a superposition simplification algorithms for nonlinear
regression. A superposition represents an acyclic directed graph.
To simplify an graph subtree is replaces for an isomorphic one.
BibTeX:
 
@inproceedings{Rudoy2012IOI, 
  author = {Rudoy, G. I. and Strijov, V. V.},
  title = {Simplification of superpositions of primitive functions with graph rule-rewriting},
  booktitle = {Intellectual Information Processing. Conference proceedings},
  year = {2012},
  pages = {140--143},
  url = {http://strijov.com/papers/Rudoy2012IOI.pdf}
}
Strijov V.V. Sequental model selection in forecasting // 25th European Conference on Operational Research, 2012 : 176. InProceedings
Abstract: To forecast financial time series one needs a set of models of optimal
structure and complexity. The mixture model selection procedures
are based on the coherent Bayesian inference. To estimate the model
parameters and covariance matrix, Laplace approximations methods
are introduced. Using the covariance matrix one could split up the
data set to form mixture of models and select a model with minimum
description length.
BibTeX:
 
@inproceedings{Strijov2012EURO, 
  author = {Vadim V. Strijov},
  title = {Sequental model selection in forecasting},
  booktitle = {25th European Conference on Operational Research},
  year = {2012},
  pages = {176},
  url = {http://strijov.com/papers/Strijov2012EURO.pdf}
}
Tokmakova A.A., Strijov V.V. Estimation of linear model hyperparametres for noise or correlated feature selection problem // Intellectual Information Processing. Conference proceedings, 2012 : 156-159. InProceedings Rus
Abstract: This paper deals with the problem of feature selection in the linear
regression models. To select features the author estimate the covariance
matrix of the model parameters. Dependent variable and model parameters
are assumed to be normally distributed. The laplace approximation
is used for estimation the covariance matrix: the logarithm error
function is approximated by the normal distribution function. In
the case of noise and correlated features covariance matrix becomes
singular. An algorithm for feature selection is proposed.
BibTeX:
 
@inproceedings{Tokmakova2012IOI, 
  author = {Tokmakova, A. A. and Strijov, V. V.},
  title = {Estimation of linear model hyperparametres for noise or correlated feature selection problem},
  booktitle = {Intellectual Information Processing. Conference proceedings},
  year = {2012},
  pages = {156-159},
  url = {http://strijov.com/papers/Tokmakova2012IOI.pdf}
}

2011

Krymova E.A., Strijov V.V. Feature selection algorithms for linear regression models from finite and countable sets // Factory laboratory, 2011, 77(5) : 63-68. Article Rus
BibTeX:
 
@article{krymova11algorithmy_zldm, 
  author = {E. A. Krymova and V. V. Strijov},
  title = {Feature selection algorithms for linear regression models from finite and countable sets},
  journal = {Factory laboratory},
  year = {2011},
  volume = {77},
  number = {5},
  pages = {63-68},
  url = {http://zldm.ru/content/article.php?ID=1155}
}
Strijov V.V. Specification of rank-scaled expert estimation using measured data // Factory laboratory, 2011, 77(7) : 72-78. Article Rus
BibTeX:
 
@article{strijov11utochnenie_zldm, 
  author = {Strijov, V. V.},
  title = {Specification of rank-scaled expert estimation using measured data},
  journal = {Factory laboratory},
  year = {2011},
  volume = {77},
  number = {7},
  pages = {72-78},
  url = {http://zldm.ru/content/article.php?ID=1186}
}
Strijov V.V., Granic G., Juric J., Jelavic B., Maricic S.A. Integral indicator of ecological impact of the Croatian thermal power plants // Energy, 2011, 36(7) : 4144-4149. Article
Abstract: The main goal of this paper is to present the methodology of construction
of the Integral Indicator for the Croatian Thermal Power Plants and
the Combined Heat and Power Plants. The Integral Indicator is intended
to compare the Power Plants according to a certain criterion. The
criterion of the ecological impact is chosen. The following features
of the power plants are used: generated electricity and heat; consumed
coal and liquid fuel; sulphur content in fuel; emitted CO2, SO2,
NOx, and particles. The linear model is used to construct the Integral
Indicator. The model parameters are defined by the Principal Component
Analysis. The constructed Integral Indicator is compared with several
others, such as Pareto-optimal slicing indicator and Metric indicator.
The Integral Indicator keeps as much information about the waste
measures of the power plants as possible; it is simple and robust.
BibTeX:
 
@article{strijov10integral_energy, 
  author = {Vadim V. Strijov and Goran Granic and Jeljko Juric and Branka Jelavic and Sandra Antecevic Maricic},
  title = {Integral indicator of ecological impact of the Croatian thermal power plants},
  journal = {Energy},
  year = {2011},
  volume = {36},
  number = {7},
  pages = {4144-4149},
  url = {http://www.sciencedirect.com/science/article/pii/S0360544211002799},
  doi = {10.1016/j.energy.2011.04.030}
}
Strijov V.V., Krymova E.A. Model selection in linear regression analysis // Informational Technologies, 2011, 10 : 21-26. Article
Abstract: To obtain an adequate regression model one often has to enlarge the
feature set by generating of derivative features. So the regression
problem must be reformulated as the problem of the feature selection.
Hereby we assume that the number of features is almost equal of exceeds
the number of samples in the data set and present a comparative study
of classical and new feature selection algorithms. The study is illustrated
by the problem of European option volatility modelling.
BibTeX:
 
@article{krymova11vybor_it, 
  author = {V. V. Strijov and E. A. Krymova},
  title = {Model selection in linear regression analysis},
  journal = {Informational Technologies},
  year = {2011},
  volume = {10},
  pages = {21-26},
  url = {http://novtex.ru/IT/it2011/number_10_annot.html#5}
}
Kuznetsov M.P., Strijov V.V. Integral Indicators and Expert estimations of Ecological Impact // International Conference on Operations Research, 2011 : 32. InProceedings
Abstract: To compare objects or alternative decisions one must evaluate a quality
of each object. A real-valued scalar, which is corresponded to the
object, is called an integral indicator. The integral indicator of
the object is a convolution of the object features. Expert estimations
of one expert or an expert group could be indicators, too. We consider
a problem of indicator construction as following. There is a set
of objects, which should be compared according to a certain quality
criterion. A set of features describes each object. This two sets
are given together with an �object/feature� matrix of measured data.
We select the linear model of the convolution: the integral indicator
is the linear combination of features and their weights. So, to construct
the integral indicator we must find the weights of the given features.
To do that we use the expert estimates of both indicators and weights
in rank scales. To compute indicators, according to the linear model,
one can use the expert set of weights. In the general case the computed
indicators do not match the expert estimations of indicators. Our
goal is to match the estimated and the computed integral indicators
by maximizing a rank correlation between them. We consider the set
of the estimated indicators and the set of the estimated weights
as two cones in spaces of indicators and weights, respectively. Our
goal is to find the set of weights such that the distance between
this set and the cone of the expert-given weights must be minimum.
Using the found weights we compute the set of integral indicators
such that the distance between this computed set and the cone of
the expert-given integral indicators must be minimum, as well. This
methodology is used for the Clean Development Mechanism project evaluation.
The project partners have to prove that their project can yield emission
reductions in developing countries, which could not be achieved in
the project�s absence. The proposed integral indicators are intended
to evaluate the environmental impact of this projects.
BibTeX:
 
@inproceedings{Kuznetsov2011Integral, 
  author = {Michail P. Kuznetsov and Vadim V. Strijov},
  title = {Integral Indicators and Expert estimations of Ecological Impact},
  booktitle = {International Conference on Operations Research},
  year = {2011},
  pages = {32},
  url = {http://strijov.com/papers/Kuznetsov2011OR.pdf}
}
Kuznetsov M.P., Strijov V.V. Monotonic interpolation for the rank-scaled expert estimations specification // Proceedings of Mathematical Methods of Pattern Recognition. ���� �����, 2011 : 162-165. InProceedings Rus
BibTeX:
 
@inproceedings{Kuznetsov-Strijov2011Oblique_mmro, 
  author = {M. P. Kuznetsov and V. V. Strijov},
  title = {Monotonic interpolation for the rank-scaled expert estimations specification},
  booktitle = {Proceedings of Mathematical Methods of Pattern Recognition},
  publisher = {���� �����},
  year = {2011},
  pages = {162-165},
  url = {http://strijov.com/papers/Kuznetsov2011mmro15.pdf}
}
Pavlov K.V., Strijov V.V. Multilevel model selection in the bank credit scoring applications // Proceedings of Mathematical Methods of Pattern Recognition. ���� �����, 2011 : 158-161. InProceedings Rus
BibTeX:
 
@inproceedings{Pavlov2011Selection, 
  author = {Pavlov, K. V. and Strijov, V. V.},
  title = {Multilevel model selection in the bank credit scoring applications},
  booktitle = {Proceedings of Mathematical Methods of Pattern Recognition},
  publisher = {���� �����},
  year = {2011},
  pages = {158-161},
  url = {http://strijov.com/papers/Pavlov2011mmro15.pdf}
}
Strijov V.V. Multilevel model selection using parameters covariance matrix analysis // Proceedings of Mathematical Methods of Pattern Recognition. ���� �����, 2011 : 154-157. InProceedings Rus
BibTeX:
 
@inproceedings{Strijov11Multimodel_mmro, 
  author = {Strijov, V. V.},
  title = {Multilevel model selection using parameters covariance matrix analysis},
  booktitle = {Proceedings of Mathematical Methods of Pattern Recognition},
  publisher = {���� �����},
  year = {2011},
  pages = {154-157},
  url = {http://strijov.com/papers/Strijov2011mmro15.pdf}
}
Strijov V.V. Invariants and model selection in forecasting // International Conference on Operations Research, 2011 : 133. InProceedings
Abstract: Time series in the financial sector may include annual, weekly and
daily periodicals as well as non-periodical events. The energy price
and consumed volume time series; the time series of consumer sales
volume could be the examples. The generalized linear autoregressive
models are used to forecast these time series. The samples of the
main time-period of the time series correspond to the features of
the forecasting models. To boost the quality of the forecast, two
problems must be solved. First, we must select a set of features,
which forms the model of optimal quality. Second, we must split the
time series on the periodical and eventual segments and assign a
model of optimal quality of each type of segments. To solve these
problems, we estimate the distribution of the model parameters using
coherent Bayesian inference. The optimal model for a given time-segment
has the most probable value of maximum evidence, which is estimated
under conditions of the stepwise regression: the features are added
and deleted from the active feature set towards the evidence maximizing.
The splitting procedure includes analysis of the model parameters
distributions. Consider two forecasting models that are defined on
their non-intersecting consequent time-segments. These models are
different if the Kullback-Leibler distance between the distributions
of their parameters is statistically significant. In this case the
time-segment split is fixed; otherwise we consider the models equal
and join the time-segments. The proposed approach brings the most
precise time-segment splitting than the dynamic time warping procedure
and causes increase of the forecasting quality. As an illustration
we discuss the automatic detection of seasonal sales and promotions
of consumer goods.
BibTeX:
 
@inproceedings{Strijov2011Invariants_OR, 
  author = {Vadim V. Strijov},
  title = {Invariants and model selection in forecasting},
  booktitle = {International Conference on Operations Research},
  year = {2011},
  pages = {133},
  url = {http://strijov.com/papers/Strijov2011OR.pdf}
}

2010

Strijov V.V., Weber G.W. Nonlinear regression model generation using hyperparameter optimization // Computers and Mathematics with Applications, 2010, 60(4) : 981-988. Article
Abstract: An algorithm of the inductive model generation and model selection
is proposed to solve the problem of automatic construction of regression
models. A regression model is an admissible superposition of smooth
functions given by experts. Coherent Bayesian inference is used to
estimate model parameters. It introduces hyperparameters which describe
the distribution function of the model parameters. The hyperparameters
control the model generation process.
BibTeX:
 
@article{Strijov2010981, 
  author = {Strijov, V. V. and Weber, G. W.},
  title = {Nonlinear regression model generation using hyperparameter optimization},
  journal = {Computers and Mathematics with Applications},
  year = {2010},
  volume = {60},
  number = {4},
  pages = {981-988},
  note = {PCO' 2010 - Gold Coast, Australia 2-4th December 2010, 3rd Global Conference on Power Control Optimization},
  url = {http://www.sciencedirect.com/science/article/B6TYJ-4YX65PS-1/2/471789368d98fd837f293565dbfc0bbb},
  doi = {10.1016/j.camwa.2010.03.021}
}
Strijov V.V. Methods of regression model selection. Moscow, Computing Center RAS, 2010 : 60. Book Rus
Abstract: Problems of regression analysis could be posed as following. First,
a repression model and a data generation hypothesis are given. The
data generation hypothesis is the distribution function of the random
variable as well as assumptions about properties of the random variable.
This problem is the optimization problem of the model parameters.
Second, a class of the regression models (linear models, radial basic
functions, etc.) is given together with a data generation hypothesis.
This problem is the problem of model selection. Third, a class of
models and a class of data generation hypothesis are given (for example
the exponential family of distributions). To solve this problem one
must use residual analysis.
BibTeX:
 
@book{strijov2010methody_ccas, 
  author = {Vadim V. Strijov},
  title = {Methods of regression model selection},
  publisher = {Moscow, Computing Center RAS},
  year = {2010},
  pages = {60},
  url = {http://www.machinelearning.ru/wiki/images/5/52/Strijov-Krymova10Model-Selection.pdf}
}
Krymova E.A., Strijov V.V. Model selection and multicollinearity analysis // Proceedings of conference on Intelligent data processing, 2010 : 153-156. InProceedings Rus
BibTeX:
 
@inproceedings{krymova10vybor_ioi, 
  author = {Krymova, E. A. and Strijov, V. V.},
  title = {Model selection and multicollinearity analysis},
  booktitle = {Proceedings of conference on Intelligent data processing},
  year = {2010},
  pages = {153-156},
  url = {http://strijov.com/papers/Krymova2010Select_IOI.pdf}
}
Skipor K.S., Strijov V.V. Least angle logistic regression // Proceedings of conference on Intelligent data processing, 2010 : 180-183. InProceedings Rus
BibTeX:
 
@inproceedings{skipor10method_ioi, 
  author = {Skipor, K. S. and Strijov, V. V.},
  title = {Least angle logistic regression},
  booktitle = {Proceedings of conference on Intelligent data processing},
  year = {2010},
  pages = {180-183},
  url = {http://strijov.com/papers/Skipor2010-iip-8.pdf}
}
Strijov V.V. Evidence of successively generated models // International Conference on Operations Research "Mastering Complexity", 2010 : 223. InProceedings
Abstract: Let us investigate an algorithm of regression model construction.
The constructed model will be used to solve problems of the Financial
Sector: it might be a scoring model, an energy consumption forecast
model or European option volatility smile model. We suppose that
given historical data are not sufficient to discover hidden dependencies
in an investigated problem. So we propose the following approach
to the model construction. Together with historical data we use expert-given
set of primitive functions. It is recommended to collect functions,
which already widely used to model the investigated problem. Then
we assign a generating function, which will be used to generate the
set of the competitive models. We estimate evidence of the models
using coherent Bayesian inference and select a model of the best
structure. Since generating functions make a countable set of models,
we organize an iterative generation-selection procedure. Each cycle
of the procedure include the following steps. First, we modify competitive
models so that the structural distance between an original and a
derivative model will as minimal as possible. Second, we estimate
parameters and hyperparameters of the derivative model to cut-off
some model modifications at the following steps and reduce the algorithm
complexity. Third, we analyze the evidence of the derivative model
to find the probability to become it a model of the optimal structure.
Also, we analyze some restrictions applied to the model structure
and robustness of the model. As the result we obtain a model, interpretable
from the expert�s point-of view; if fits historical data well and
robust. Some additional tests are applied to verify the result model:
cross-validation and retrospective forecasting to ensure quality
of the further use.
BibTeX:
 
@inproceedings{strijov10evidence_or, 
  author = {Vadim V. Strijov},
  title = {Evidence of successively generated models},
  booktitle = {International Conference on Operations Research "Mastering Complexity"},
  year = {2010},
  pages = {223},
  url = {http://strijov.com/papers/strijov2010OR.pdf}
}
Strijov V.V. Model generation and model selection in credit scoring // 24th European Conference on Operations Research, 2010 : 220. InProceedings
Abstract: The credit scorecard is the logistic regression model; it maps the
feature space to the probability of default of a banking client.
A classical scorecard is constructed by an analyst, who manually
selects informative features and creates combinations of them. We
propose a new technique for the automatic scorecard construction.
To develop a scorecard, one must assign a set of primitive functions
and model generation rules. The result model is an admissible superposition
of the primitive functions and features. The coherent Bayesian inference
is used to select features and their superpositions.
BibTeX:
 
@inproceedings{strijov10model_euro, 
  author = {Vadim V. Strijov},
  title = {Model generation and model selection in credit scoring},
  booktitle = {24th European Conference on Operations Research},
  year = {2010},
  pages = {220},
  url = {http://strijov.com/papers/strijov10ModelGen_EURO.pdf}
}
Strijov V.V., Krymova E.A., Gerhard W.W. Evidence Optimization for Consequently Generated Models // Proceedings of the fourth global conference on power control and optimization, 2010, 1337 : 204-208. InProceedings
Abstract: We address the problem of segmenting nearly periodic time series into
period-like segments. We introduce a definition of nearly periodic
time series via triplets hbasic shape, shape transformation, time
scalingi that covers a wide range of time series. To split the time
series into periods we select a pair of principal components of the
Hankel matrix. We then cut the trajectory of the selected principal
components by its symmetry axis, thus obtaining half-periods that
are merged into segments. We describe a method of automatic selection
of periodic pairs of principal components, corresponding to the fundamental
periodicity. We demonstrate the application of the proposed method
to the problem of period extraction for accelerometric time series
of human gait. We see the automatic segmentation into periods as
a problem of major importance for human activity recognition problem,
since it allows to obtain interpretable segments: each extracted
period can be seen as an ultimate entity of gait. The method we propose
is more general compared to the application specific methods and
can be used for any nearly periodical time series. We compare its
performance to classical mathematical methods of period extraction
and find that it is not only comparable to the alternatives, but
in some cases performs better. Index Terms�sensor signal processing,
nearly periodic time series, time series segmentation, period extraction,
principal components analysis.
BibTeX:
 
@inproceedings{Strijov2011Evidence_AIP, 
  author = {Strijov, V. V. and Krymova, E. A. and Gerhard, W. W.},
  editor = {Nader Barsoum and Jeffrey Frank Webb and Pandian Vasant},
  title = {Evidence Optimization for Consequently Generated Models},
  booktitle = {Proceedings of the fourth global conference on power control and optimization},
  year = {2010},
  volume = {1337},
  pages = {204-208},
  url = {http://strijov.com/papers/strijov-weber2010PCO-3.pdf},
  doi = {10.1063/1.3592467}
}

2009

Strijov V.V., Sologub R.A. The inductive generation of the volatility smile models // Journal of Computational Technologies, 2009, 14(5) : 102-113. Article
Abstract: Volatility of the European-type options depends on their strike and
maturity. The authors suppose the volatility smile models based not
only expert knowledge, but also on data. The model generation algorithm
was proposed. It generates volatility models of the optimal structure
inductively using implied volatility data and expert considerations.
The models satisfy expert assessments. The Brent Crude Oil option
was considered as an example.
BibTeX:
 
@article{strijov09jct, 
  author = {Strijov, V. V. and Sologub, R. A.},
  title = {The inductive generation of the volatility smile models},
  journal = {Journal of Computational Technologies},
  year = {2009},
  volume = {14},
  number = {5},
  pages = {102-113},
  url = {http://strijov.com/papers/Strijov09JCT5.pdf}
}
Krymova E.A., Strijov V.V. Comparison of the heuristic algorithms for linear regression model selection // Mathematical methods for pattern recognition. Conference proceedings. MAKS Press, 2009 : 145-148. InProceedings
BibTeX:
 
@inproceedings{krymova09mmro, 
  author = {Krymova, E. A. and Strijov, V. V.},
  title = {Comparison of the heuristic algorithms for linear regression model selection},
  booktitle = {Mathematical methods for pattern recognition. Conference proceedings},
  publisher = {MAKS Press},
  year = {2009},
  pages = {145-148},
  url = {http://strijov.com/papers/strijov09MM1_MMRO-14.pdf}
}
Melnikov D.I., Strijov V.V., Anderrva E.Y., Edenharter G. Selection of support object set for robust integral indicator construction // // Mathematical methods for pattern recognition. Conference proceedings. MAKS Press, 2009 : 159-162. InProceedings
BibTeX:
 
@inproceedings{melnikov09mmro, 
  author = {Melnikov, D. I. and Strijov, V. V. and Anderrva, E. Yu. and Edenharter, G.},
  title = {Selection of support object set for robust integral indicator construction},
  booktitle = {// Mathematical methods for pattern recognition. Conference proceedings},
  publisher = {MAKS Press},
  year = {2009},
  pages = {159-162},
  url = {http://strijov.com/papers/strijov09MM2_MMRO-14.pdf}
}
Strijov A.V., Strijov V.V. Specification of the rank-scaled expert estimations // Mathematics. Computer. Education. Conference Proceedings, 2009 : 41. InProceedings
Abstract: The algorithm of the integral indicators construction is described.
It uses rank-scaled expert estimations and an object-feature data
matrix. The expert estimations are specified according to the data
and additional expert preferences. To construct integral indicators,
linear regression methods are involved. The suggested algorithm is
compared with the algorithm of linear-scaled expert estimations concordance.
BibTeX:
 
@inproceedings{strizhov09mce, 
  author = {Strijov, A. V. and Strijov, V. V.},
  title = {Specification of the rank-scaled expert estimations},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2009},
  pages = {41},
  url = {http://strijov.com/papers/strizhov09mce.pdf}
}
Strijov V.V. Model selection using inductively generated set // European Conference on Operational Research EURO-23, 2009 : 114. InProceedings
Abstract: Model selection is one of the most important subjects of Machine learning.
An algorithm of model selection depends on the class of models and
on the investigated problems. In the lecture the problems of regression
analysis will be observed. Linear as well as nonlinear regression
models will be considered. The models are supposed to be inductively
generated during the selection process. Properties of Lars, Optimal
brain surgery and Bayesian coherent inference algorithms will be
analyzed in the light of model selection.
BibTeX:
 
@inproceedings{strijov09EURO, 
  author = {Strijov, V. V.},
  title = {Model selection using inductively generated set},
  booktitle = {European Conference on Operational Research EURO-23},
  year = {2009},
  pages = {114},
  url = {http://strijov.com/papers/strijov2009EURO23.pdf}
}
Strijov V.V. Model generation and model selection // Mathematics. Computer. Education. Conference Proceedings, 2009. InProceedings
BibTeX:
 
@inproceedings{strijov09mce, 
  author = {Strijov, V. V.},
  title = {Model generation and model selection},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2009},
  url = {http://strijov.com/papers/strijov09mce.pdf}
}
Strijov V.V. The Inductive Algorithms of Model Generation // SIAM Conference on Computational Science and Engineering, 2009. InProceedings
Abstract: One of the important problems in scientific data mining is the problem
of regression modeling. To make a regression model using measured
data a researcher examines set of competitive models and chooses
a model of the best quality. Due to the nature of the experiments
non-linear models are common in biological simulations. Symbolic
regression allows dealing with large sets of non-linear models. In
the lecture inductive algorithms for model creation and selection
will be discussed.
BibTeX:
 
@inproceedings{strijov09SIAMcse09, 
  author = {Strijov, V. V.},
  title = {The Inductive Algorithms of Model Generation},
  booktitle = {SIAM Conference on Computational Science and Engineering},
  year = {2009},
  url = {http://strijov.com/papers/strijov09_SIAM_cse09.pdf}
}
Strijov V.V., Granic G.and Juric Z., Jelavic B., Maricic S. Integral Indicator of Ecological Footprint for Croatian Power Plants // HED Energy Forum �Quo Vadis Energija in Times of Climate Change�, 2009 : 46. InProceedings
Abstract: The main goal of this paper is to present the methodology of construction
of the Integral Indicator for Croatian Power Plants. The Integral
Indicator is necessary to compare Power Plants selected according
to a certain criterion. Herewith the criterion of the Ecological
Footprint was chosen. TPP and CHP Power Plants were selected. The
following features were used: generated electricity and heat; consumed
coal and liquid fuel; sulphur content in fuel; emitted CO2, SO2,
NOx and particles. To construct the Integral Indicator the linear
model were used. The model was tuned by Principal Component Analysis
algorithm. The constructed Integral Indicator was compared with several
others, such as Pareto-Optimal Slicing Indicator and Metric Indicator.
The Integral Indicator keeps as much information about features of
the Power Plants as possible; it is simple and robust.
BibTeX:
 
@inproceedings{strijov09HED, 
  author = {Strijov, V. V. and Granic, G.and Juric, Z. and Jelavic, B. and Maricic, S.A.},
  title = {Integral Indicator of Ecological Footprint for Croatian Power Plants},
  booktitle = {HED Energy Forum �Quo Vadis Energija in Times of Climate Change�},
  year = {2009},
  pages = {46},
  url = {http://strijov.com/papers/IndicatorOfEcoFootprintForCroatianPPs09HED_EIHP.pdf}
}
Strijov V.V., Krymova E.A. Algorithms of linear model generation // Mathematics. Computer. Education. Conference Proceedings, 2009. InProceedings
BibTeX:
 
@inproceedings{krymova09mce, 
  author = {Strijov, V. V. and Krymova, E. A.},
  title = {Algorithms of linear model generation},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2009},
  url = {http://strijov.com/papers/krymova09mce.pdf}
}
Strijov V.V., Sologub R.A. Generation of the implied volatility models // Mathematics. Computer. Education. Conference Proceedings, 2009. InProceedings
BibTeX:
 
@inproceedings{sologub09mce, 
  author = {Strijov, V. V. and Sologub, R. A.},
  title = {Generation of the implied volatility models},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2009},
  url = {http://strijov.com/papers/sologub09mce.pdf}
}
Strijov V.V., Sologub R.A. Algorithm of nonlinear regression model selection by analysis of hyperparameters // Mathematical methods for pattern recognition. Conference proceedings. MAKS Press, 2009 : 184-187. InProceedings
BibTeX:
 
@inproceedings{strijov09mmro, 
  author = {Strijov, V. V. and Sologub, R. A.},
  title = {Algorithm of nonlinear regression model selection by analysis of hyperparameters},
  booktitle = {Mathematical methods for pattern recognition. Conference proceedings},
  publisher = {MAKS Press},
  year = {2009},
  pages = {184-187},
  url = {http://strijov.com/papers/strijov09MM3_MMRO-14.pdf}
}

2008

Strijov V.V. The methods for the inductive generation of regression models. Moscow, Computing Center RAS, 2008. Book Rus
BibTeX:
 
@book{strijov08ln, 
  author = {Strijov, V. V.},
  title = {The methods for the inductive generation of regression models},
  publisher = {Moscow, Computing Center RAS},
  year = {2008},
  url = {http://strijov.com/papers/strijov08ln.pdf}
}
Bray D., Strijov V.V. Using immune markers for classification of the CVD patients // Intellectual Data Analysis: Abstracts of the International Scientific Conference, 2008 : 49-50. InProceedings
Abstract: The goal of the investigation is to find an algorithm that successfully
separates different groups of patients with Cardio-Vascular Disease.
The algorithm must select the most informative features: the markers,
which bring the minimal number of the misclassified patients. Four
groups of the CVD-patients are considered: A1 (surgery performed),
A3 (risk group) and B1, B2 (healthy groups). Each group contained
up to 15 patients. Each patient is described with 20 immune markers.
Since the number of the patients in the sample is relatively small,
the number of the informative markers must not exceed a few to avoid
overtraining. The algorithm must process pairs of the classes.
BibTeX:
 
@inproceedings{bray08ioi, 
  author = {Bray, D. and Strijov, V. V.},
  title = {Using immune markers for classification of the CVD patients},
  booktitle = {Intellectual Data Analysis: Abstracts of the International Scientific Conference},
  year = {2008},
  pages = {49-50},
  url = {http://strijov.com/papers/bray08ioi.pdf}
}
Gushchin A.V., Strijov V.V. An algorithm on the expert estimations objectification with measured data // Intellectual Data Analysis: the International Scientific Conference, 2008 : 78-79. InProceedings Rus
BibTeX:
 
@inproceedings{gushchin08ioi, 
  author = {Gushchin, A. V. and Strijov, V. V.},
  title = {An algorithm on the expert estimations objectification with measured data},
  booktitle = {Intellectual Data Analysis: the International Scientific Conference},
  year = {2008},
  pages = {78-79},
  url = {http://strijov.com/papers/gushchin08ioi.pdf}
}
Sologub R.A., Strijov V.V. The inductive construction of the volatility regression models // Intellectual Data Analysis: the International Scientific Conference Proceedings, 2008 : 215-216. InProceedings Rus
BibTeX:
 
@inproceedings{sologub08ioi, 
  author = {Sologub, R. A. and Strijov, V. V.},
  title = {The inductive construction of the volatility regression models},
  booktitle = {Intellectual Data Analysis: the International Scientific Conference Proceedings},
  year = {2008},
  pages = {215-216},
  url = {http://strijov.com/papers/sologub08ioi.pdf}
}
Strijov V.V. On the inductive model generation // Intellectual Data Analysis: Abstracts of the International Scientific Conference, 2008 : 220. InProceedings
Abstract: This talk is devoted to the problem of the automatic model creation
in regression analysis. The models are intended for dynamic systems
behavior analysis. The theory and the practice of the inductively-generated
models will be examined.
BibTeX:
 
@inproceedings{strijov08ioi, 
  author = {Strijov, V. V.},
  title = {On the inductive model generation},
  booktitle = {Intellectual Data Analysis: Abstracts of the International Scientific Conference},
  year = {2008},
  pages = {220},
  url = {http://strijov.com/papers/strijov08ioi.pdf}
}
Strijov V.V. Clusterization of multidimensional time-series using dynamic time warping // Mathematics. Computer. Education. Conference Proceedings, 2008 : 28. InProceedings
BibTeX:
 
@inproceedings{strijov08macoed, 
  author = {Strijov, V. V.},
  title = {Clusterization of multidimensional time-series using dynamic time warping},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2008},
  pages = {28}
}
Strijov V.V. Estimation of hyperparameters on parametric regression model generation // 9th International Conference on Pattern Recognition and Image Analysis: New Information Technologies, 2008, 2 : 178-181. InProceedings
Abstract: The problem of the non-linear regression analysis is considered. The
algorithm of the inductive model generation is described. The regression
model is a superposi- tion of given smooth functions. To estimate
the model parameters two-level Bayesian Inference technique was used.
It introduces hyperparameters, which describe the dis- tribution
function of the model parameters.
BibTeX:
 
@inproceedings{strijov08roai, 
  author = {Strijov, V. V.},
  title = {Estimation of hyperparameters on parametric regression model generation},
  booktitle = {9th International Conference on Pattern Recognition and Image Analysis: New Information Technologies},
  year = {2008},
  volume = {2},
  pages = {178-181},
  url = {http://strijov.com/papers/strijov08roai_source.pdf}
}
Strijov V.V., Sologub R.A. The inductive generation of the volatility smile models // SIAM Conference on Financial Mathematics and Engineering 2008, 2008 : 21. InProceedings
Abstract: Volatility of the European-type options depends on their strike and
maturity. The authors suppose the volatility smile models based not
only the expert knowledge, but also on the measured data. The model
generation algorithm was proposed. It generates volatility models
of the optimal structure inductively using implied volatility data
and expert considerations. The models satisfy expert assessments.
The Brent Crude Oil option was considered as an example.
BibTeX:
 
@inproceedings{sologub08finance, 
  author = {Strijov, V. V. and Sologub, R. A.},
  title = {The inductive generation of the volatility smile models},
  booktitle = {SIAM Conference on Financial Mathematics and Engineering 2008},
  year = {2008},
  pages = {21},
  url = {http://strijov.com/papers/sologub08finance_eng.pdf}
}
Vorontsov K.V., Inyakin A.S., Lisitsa A., Strijov V.V., Khachay M.Y., Chekhovich Y.V. Proof-ground for classification algorithms: the distributed computing system // Intellectual Data Analysis: the International Scientific Conference, 2008 : 54-56. InProceedings
BibTeX:
 
@inproceedings{vorontsov08polygon, 
  author = {Vorontsov, K. V. and Inyakin, A. S. and Lisitsa, A. and Strijov, V. V. and Khachay, M. Yu. and Chekhovich, Yu. V.},
  title = {Proof-ground for classification algorithms: the distributed computing system},
  booktitle = {Intellectual Data Analysis: the International Scientific Conference},
  year = {2008},
  pages = {54-56}
}
Vorontsov K.V., Inyakin A.S., Strijov V.V., Chekhovich Y.V. MachineLearning.ru: a site, devoted to problems of pattern recognition, forecasting and classification // Intellectual Data Analysis: the International Scientific Conference, 2008 : 56-58. InProceedings
BibTeX:
 
@inproceedings{vorontsov08ml, 
  author = {Vorontsov, K. V. and Inyakin, A. S. and Strijov, V. V. and Chekhovich, Yu. V.},
  title = {MachineLearning.ru: a site, devoted to problems of pattern recognition, forecasting and classification},
  booktitle = {Intellectual Data Analysis: the International Scientific Conference},
  year = {2008},
  pages = {56-58}
}

2007

Strijov V.V. The search for a parametric regression model in an inductive-generated set // Journal of Computational Technologies, 2007, 1 : 93-102. Article
Abstract: The procedure of the search for a regression model is described. The
model set is a set of superpositions of smooth functions. The model
parameters estimations are used in the search. A model of pressure
in a spray chamber of a combustion engine illustrates the approach.
In this paper one of the important parts of the proposed project
is described.
BibTeX:
 
@article{strijov07jct, 
  author = {Strijov, V. V.},
  title = {The search for a parametric regression model in an inductive-generated set},
  journal = {Journal of Computational Technologies},
  year = {2007},
  volume = {1},
  pages = {93-102},
  url = {http://strijov.com/papers/strijov06poisk_jct_en.pdf}
}
Strijov V.V., Kazakova T.V. Stable indices and the choice of a support description set // Zavodskaya Laboratoriya, 2007, 7 : 72-76. Article Rus
Abstract: This paper describes an integral indicator construction algorithm.
The integral indicator is a linear combination of object features.
The features are linear-scaled. Outliers among the objects are supposed.
The problem of the stable integral indicators construction is posed
and solved. To construct the stable integral indicator, a special-defined
subset of objects is selected. A nonsupervised algorithm is used
to make the integral indicator. The proposed algorithm used to construct
an integral indicator of the foodstuff pollution level in Russian
regions.
BibTeX:
 
@article{strijov07stable, 
  author = {Strijov, V. V. and Kazakova, T. V.},
  title = {Stable indices and the choice of a support description set},
  journal = {Zavodskaya Laboratoriya},
  year = {2007},
  volume = {7},
  pages = {72-76},
  url = {http://strijov.com/papers/stable_idx4zavlab_after_recenz.pdf}
}
Strijov V.V., Ptashko G.O. Algorithms of the optimal regression model selection. Computing Center of the Russian Academy of Sciences, 2007 : 56. Book Rus
Abstract: A model is defined by a superposition of the smooth functions. The
probability density functions of the model parameters are used. The
parameters are estimated with non-linear optimization methods. A
problem of the diesel engine pressure modelling presents an application
of the method. The parametric and non-parametric approaches to model
generation are examined. The prototype of the proposed software is
described.
BibTeX:
 
@book{strijov06occam, 
  author = {Strijov, V. V. and Ptashko, G. O.},
  title = {Algorithms of the optimal regression model selection},
  publisher = {Computing Center of the Russian Academy of Sciences},
  year = {2007},
  pages = {56},
  url = {http://strijov.com/papers/occam.pdf}
}
Ivakhnenko A.A., Kanevskiy D.Y., Rudeva A.V., Strijov V.V. How to compare marked time-series // Proc. Mathematical Methods of Pattern Recognition, 2007 : 134-137. InProceedings Rus
Abstract: The multi-model regression markup method was described. The markups
were used for classification of financial time series.
BibTeX:
 
@inproceedings{strijov07timeseries, 
  author = {Ivakhnenko, A. A. and Kanevskiy, D. Yu. and Rudeva, A. V. and Strijov, V. V.},
  title = {How to compare marked time-series},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition},
  year = {2007},
  pages = {134-137},
  url = {http://strijov.com/papers/strijov_MM_AS_4.pdf}
}
Strijov V.V., Kazakova T.V. The rank-scaled expert estimations concordance // Proc. Mathematical Methods of Pattern Recognition, 2007 : 209-211. InProceedings Rus
Abstract: Regression model with restrictions, defined by experts, were described.
The new method of multivariate regression modelling was proposed.
BibTeX:
 
@inproceedings{strijov07object, 
  author = {Strijov, V. V. and Kazakova, T. V.},
  title = {The rank-scaled expert estimations concordance},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition},
  year = {2007},
  pages = {209-211},
  url = {http://strijov.com/papers/strijov_MM_2.pdf}
}
Strijov V.V., Ptashko G.O. The invariants of time series and dynamic time warping // Proc. Mathematical Methods of Pattern Recognition, 2007 : 212-214. InProceedings Rus
Abstract: Two methods of the regression models usage were compared: the direct
regression model and the approximation of the Minimum Cost Path in
the Dynamic Time Warping.
BibTeX:
 
@inproceedings{strijov07invariants, 
  author = {Strijov, V. V. and Ptashko, G. O.},
  title = {The invariants of time series and dynamic time warping},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition},
  year = {2007},
  pages = {212-214},
  url = {http://strijov.com/papers/strijov_MM_1.pdf}
}

2006

Kazakova T.V., Strijov V.V. The robust indicators with normalising functions selection // Artificial intelligence, 2006, 1 : 160-163. Article Rus
Abstract: The problem of the stable integral indicators is considered. The objects
are linear-scaled. To construct a stable integral indicator one has
to choose a subset such that the objects in the set bring the maximal
value to the criterion of stability. A method of the feature selection
according to the regression model robustness was introduced.
BibTeX:
 
@article{strijov06AIidx, 
  author = {Kazakova, T. V. and Strijov, V. V.},
  title = {The robust indicators with normalising functions selection},
  journal = {Artificial intelligence},
  year = {2006},
  volume = {1},
  pages = {160-163},
  url = {http://strijov.com/papers/strijov06AIidx.pdf}
}
Strijov V.V. The search for regression models in an inductive-generated set // Artificial intelligence, 2006, 2 : 234-237. Article Rus
Abstract: The usage of Bayesian inference for the inductive-generated models
was described. The algorithm of the arbitrary superpositions of the
regression models was introduced. The algorithm uses hyperparameters
to estimate the importance of model elements.
BibTeX:
 
@article{strijov06AI, 
  author = {Strijov, V. V.},
  title = {The search for regression models in an inductive-generated set},
  journal = {Artificial intelligence},
  year = {2006},
  volume = {2},
  pages = {234-237},
  url = {http://strijov.com/papers/strijov06AI.pdf}
}
Strijov V.V. Specification of expert estimations using measured data // Factory Laboratory, 2006, 72(7) : 59-64. Article Rus
Abstract: To construct stable integral indicators we will use expert estimations
of object features. The indicators are linear combinations of the
features. Their values is corrected with the expert estimations.
A new method of multivariate regression is described. The model parameters
are specified by expert estimations.
BibTeX:
 
@article{strijov06utochnenie_zldm, 
  author = {Strijov, V. V.},
  title = {Specification of expert estimations using measured data},
  journal = {Factory Laboratory},
  year = {2006},
  volume = {72(7)},
  pages = {59-64},
  url = {http://strijov.com/papers/strijov06precise.pdf}
}
Strijov V.V. Vsevolod Vladimirovich Shakin // Mathematics. Computer. Education. Conference Proceedings. Regular and chaotic dynamics, 2006, 1 : 5-16. InCollection Rus
BibTeX:
 
@incollection{strijov06shakin, 
  author = {Strijov, V. V.},
  editor = {Riznichenko, G. Yu.},
  title = {Vsevolod Vladimirovich Shakin},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  publisher = {Regular and chaotic dynamics},
  year = {2006},
  volume = {1},
  pages = {5-16},
  url = {http://strijov.com/papers/VsevolodShakin06paper.pdf}
}
Kazakova T.V., Strijov V.V. The robust indicators with normalising functions selection // International Scientific Conference on Artificial Intelligence, 2006 : 199. InProceedings Rus
BibTeX:
 
@inproceedings{kazakova06ioi, 
  author = {Kazakova, T. V. and Strijov, V. V.},
  title = {The robust indicators with normalising functions selection},
  booktitle = {International Scientific Conference on Artificial Intelligence},
  year = {2006},
  pages = {199},
  url = {http://strijov.com/papers/strijov_kazakova2006ioi.pdf}
}
Strijov V.V. Indices construction using linear and ordinal expert estimations // Citizens and Governance for Sustainable Development, 2006 : 49. InProceedings
Abstract: Indices are necessary to compare objects united in a set according
to a certain criterion. For example, the objects are national protected
areas or power plants. An index is a number, which is corresponded
to an object. In this research an algorithm for construction of quality
indices using expert estimations is developed. Consider an indices
construction problem. A set of comparable objects and a set of features
are given together with an �object-feature� matrix of measured data.
Expert estimations of indices and estimations of importance features
are given. A model of indices computation is chosen. In the general
case the computed indices don�t coincide with the expert estimates
of the indices. The computed importance weights don�t coincide with
the expert estimations of importance weights, too. One has to compute
indices, which are based on measured data with the condition: the
indices must not contradict given expert estimations. There two approaches
to the problem were suggested. The first one is the unsupervised
indices construction. It finds the model parameters such that provide
the maximal value of a selfdescriptiveness criterion. The second
approach is the supervised indices construction. The model parameters
were set such that provide the minimal value of the distance between
the computed indices and their expert estimations. Now the third
approach is proposed. According to this approach the experts can
resolve the contradiction between expert estimations of indices,
importance weights and measured data. At that, there is a hyperparameter
embedded in the model. Its value corresponds to importance either
the indices or the feature weights.
BibTeX:
 
@inproceedings{strijo06sigsud, 
  author = {Strijov, V. V.},
  title = {Indices construction using linear and ordinal expert estimations},
  booktitle = {Citizens and Governance for Sustainable Development},
  year = {2006},
  pages = {49},
  url = {http://strijov.com/papers/strijo06Abstract_SIGSUD_RuEng.pdf}
}
Strijov V.V. The search for regression models in a set of smooth functions // Mathematics. Computer. Education. Conference Proceedings, 2006. InProceedings Rus
BibTeX:
 
@inproceedings{strijov06mce, 
  author = {Strijov, V. V.},
  title = {The search for regression models in a set of smooth functions},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2006},
  url = {http://strijov.com/papers/strijov06mce.pdf}
}
Strijov V.V. The search for regression models in an inductive-generated set // International Scientific Conference on Artificial Intelligence, 2006 : 198. InProceedings Rus
BibTeX:
 
@inproceedings{strijov2006ioi, 
  author = {Strijov, V. V.},
  title = {The search for regression models in an inductive-generated set},
  booktitle = {International Scientific Conference on Artificial Intelligence},
  year = {2006},
  pages = {198},
  url = {http://strijov.com/papers/strijov2006ioi.pdf}
}
Strijov V.V., Kazakova T.V. Robust indicators and selection of support objects // Multivariate statistical analysis applications in economics and quality assessment. VIII-th International Conference, 2006. InProceedings Rus
BibTeX:
 
@inproceedings{strijovkazakova06CEMI, 
  author = {Strijov, V. V. and Kazakova, T. V.},
  title = {Robust indicators and selection of support objects},
  booktitle = {Multivariate statistical analysis applications in economics and quality assessment. VIII-th International Conference},
  year = {2006},
  url = {http://strijov.com/papers/strijovkazakova06CEMI.pdf}
}

2005

Kazakova T.V., Strijov V.V. Stable integral indices // Proc. Mathematical Methods of Pattern Recognition, 2005 : 206. InProceedings Rus
BibTeX:
 
@inproceedings{kazakova05mmro, 
  author = {Kazakova, T. V. and Strijov, V. V.},
  title = {Stable integral indices},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition},
  year = {2005},
  pages = {206},
  url = {http://strijov.com/papers/kazakova05mmro.pdf}
}
Ptashko G.O., Strijov V.V. The distance function choice for the phase trajectories comparison // Proc. Mathematical Methods of Pattern Recognition, 2005 : 116-119. InProceedings Rus
Abstract: The method of the regression model comparison is examined. ��� �������
����� ����������� ����������� ��������� �������� ��������� �������
���������� ������� ������� ���������. ��������������, ��� ��������
� ���������� ��������� ����� ������� ����������. ��������� �����
������� ���������� ����� ������������, ������� �� ������������� ��������
��������, ����������� ���������. � ������� ���� ������� ���������
������� ������ ���������� ����� ������������ ��� ����������� �������������
��������� �� ����� ��������.
BibTeX:
 
@inproceedings{ptashko05mmro, 
  author = {Ptashko, G. O. and Strijov, V. V.},
  title = {The distance function choice for the phase trajectories comparison},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition},
  year = {2005},
  pages = {116-119},
  url = {http://strijov.com/papers/ptashko05mmro.pdf}
}
Ptashko G.O., Strijov V.V., Shakin V.V. Specification of ordinal expert estimations // Mathematics. Computer. Education. Conference Proceedings, 2005. InProceedings Rus
BibTeX:
 
@inproceedings{ptashko05macoed, 
  author = {Ptashko, G. O. and Strijov, V. V. and Shakin, V. V.},
  title = {Specification of ordinal expert estimations},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2005},
  url = {http://strijov.com/papers/macoed05_2.pdf}
}
Strijov V.V. How to select a nonlinear regression model of optimal complexity? // Proc. Mathematical Methods of Pattern Recognition, 2005 : 190-191. InProceedings Rus
Abstract: A model of optimal complexity was chosen from a set of several thousand
inductively-generated models. The Bayesian inference was used.
BibTeX:
 
@inproceedings{strijov05mmro, 
  author = {Strijov, V. V.},
  title = {How to select a nonlinear regression model of optimal complexity?},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition},
  year = {2005},
  pages = {190-191},
  url = {http://strijov.com/papers/strijov05mmro.pdf}
}
Strijov V.V., Shakin V.V. Selection of optimal regression model // Mathematics. Computer. Education. Conference Proceedings, 2005. InProceedings Rus
BibTeX:
 
@inproceedings{strijov05macoed, 
  author = {Strijov, V. V. and Shakin, V. V.},
  title = {Selection of optimal regression model},
  booktitle = {Mathematics. Computer. Education. Conference Proceedings},
  year = {2005},
  url = {http://strijov.com/papers/macoed05_2.pdf}
}

2003

Strijov V.V., Shakin V.V. Index construction: the expert-statistical method // Environmental research, engineering and management, 2003, 26(4) : 51-55. Article
Abstract: This paper deals with the index construction and presents a new technique
that involves expert estimations of object indices as well as feature
significance weights. An index is calculated as a linear combination
of the object�s features. Non-supervised methods of the index construction
are observed to be compared with the new method. Experts can estimate
the index and verify the results. The results are precise valid indices
and the reasoned expert estimations. This technique was used in various
economical, sociological, and ecological applications. This paper
introduces a method of multivariate regression model construction.
Here an integral indicator is a regression model with applied restrictions.
BibTeX:
 
@article{strijov03index, 
  author = {Strijov, V. V. and Shakin, V. V.},
  title = {Index construction: the expert-statistical method},
  journal = {Environmental research, engineering and management},
  year = {2003},
  volume = {26},
  number = {4},
  pages = {51-55},
  note = {ISSN 1392-1649},
  url = {http://strijov.com/papers/10-v_strijov.pdf}
}
Strijov V.V., Shakin V.V. Forecast and control with autoregressive models // Proc. Mathematical Methods of Pattern Recognition conference, 2003 : 178-181. InProceedings Rus
Abstract: An autoregressive model is represented as the model of dynamic system
behavior. One can control the system state using the inverse regression
model. The authors use time series to verify the models. ���������
����������������� ������ � ������ �� ������ ������������� ���������
�������� ������������ ������������� ������������������� �������.
����� ���� ��������� ������ �������������� �������� �������� ������������������
����������� ���������� ��������� � �������������� ������� ��������
������������� ���������. � ������ ������ ��� �������� ������������
��������-����������������� ������, ������������ ����� �������, ���
�������� ���������� ������� ������� �� ������ �� ����������, ���������
�����������, �� �, � ���������, �� �������� ����������. ����� ������
��������� ����� ����������� ����������� ����������� � ���������������
��������� ������� ���������� ��� ����������� ����������.
BibTeX:
 
@inproceedings{strijov03prognoz, 
  author = {Strijov, V. V. and Shakin, V. V.},
  title = {Forecast and control with autoregressive models},
  booktitle = {Proc. Mathematical Methods of Pattern Recognition conference},
  year = {2003},
  pages = {178-181},
  url = {http://strijov.com/papers/mmro11.pdf}
}
Strijov V.V., Shakin V.V. Index construction: the expert-statistical method // Proc. Conference on Sustainability Indicators and Intelligent Decisions, 2003 : 56-57. InProceedings
Abstract: There are lots of ways to construct indices. However, when algorithms
are chosen and some results obtained, the following question arises:
How to show adequacy of the calculated indices? To answer the question
analysts invite experts. The experts express their opinion and then
the second question arises: How to show that expert estimations are
valid?
BibTeX:
 
@inproceedings{strijov03siid, 
  author = {Strijov, V. V. and Shakin, V. V.},
  title = {Index construction: the expert-statistical method},
  booktitle = {Proc. Conference on Sustainability Indicators and Intelligent Decisions},
  year = {2003},
  pages = {56-57},
  url = {http://strijov.com/papers/siid03.pdf}
}
Aivazian S.A., Strijov V.V., Shakin V.V. On a problem of macroeconomics management. Computing Center RAS. Computing Center of the Russian Academy of Sciences, 2003. TechReport Rus
Abstract: In this paper the application of autoregressive models is considered.
The models are used to control the macroeconomic system so that the
system obtained a given state. The quality of the control was defined
as an integral indicator.
BibTeX:
 
@techreport{aivazian03macro, 
  author = {Aivazian, S. A. and Strijov, V. V. and Shakin, V. V.},
  title = {On a problem of macroeconomics management},
  publisher = {Computing Center of the Russian Academy of Sciences},
  school = {Computing Center RAS},
  year = {2003},
  url = {http://strijov.com/papers/macro1.pdf}
}