Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY

Project

General

Profile

Actions

Task #3935

closed

Task #3680: RA4a - Automatic error prediction

Task #3698: Experiment with one-class clasification for join cost enhancements

Add classifier scripts to SVN

Added by Tihelka Dan almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Start date:
31.05.2016
Due date:
05.06.2016
% Done:

0%

Estimated time:

Description

Please add OCC classifier scripts (anomaly_train.py, anomaly_eval.py, and other related stuff) to SVN, to start their modifications:
  • output data to structured format (JSON)
  • avoid using pickled data as input (it is hard to tune it with different SciKit version)
Actions #1

Updated by Tihelka Dan almost 8 years ago

  • Status changed from New to Resolved

There were several modification of the scripts carried out. Here I describe them with a few warnings:

JSON output format
All the data (except the logs) are now stored to JSON files instead of text files (in case of -r/--cv-report output) or pickled classifier pipe (in case of odet argument). There is nothing to pay special attention to, except the fact that old trained classifiers cannot be read by the new sources anymore

no pickled OCC objects
one of the most significant changes is that the (initialized) classifier, scaler and grid-search parameters are not read from a pickled object, but a python scripts from which these objects are created are passed as the 2nd and 3rd commandline argument (1st is the data provider):

./anomaly_train.py data_getter.py one_class_svm.py std_scaler.py one_class_svm.trained.json ...
./anomaly_eval.py  data_getter.py one_class_svm.trained.json ...

There are backward compatibility, though, scripts occ_pickled.py and scaler_pickled.py allow to read the old pickled data. To use them, you must call the training as:

./anomaly_train.py data_getter.py scaler_pickled.py occ_pickled.py one_class_svm.trained.json -S std-scaler.p -I one-class-svm_init.p -g one-class-svm_grid.p ...

no pickled data objects
the second significant change is that the data are also read through Python's module, instead of the pickled numpy arrays. The module is set as the 1st command line parameter to both ./anomaly_train.py and ./anomaly_eval.p scripts, and it must provide object implementing the interface defined in data_source.py module.

There is backward compatibility script data_pickled.py, which allows to read the old pickled data. To use them, you must call the training as:

./anomaly_train.py data_pickled.py ... -X0 train.p -X1 train.anomaly.p -v train.cvsplit.p
./anomaly_eval.py  data_pickled.py ... -x0 eval.p -x1 eval.anomaly.p

Note that the evaluation data use -x0/-x1 command line switches, instead of (capital) -X0/-X1 switches which were originally used by ./anomaly_eval.p. This is the important think to keep in mind, otherwise the evaluation will use wrong data (those used for training)!

Each script (either providing data, scaler or clasifier) can define its own options to configure the particular module.

Actions #2

Updated by Matoušek Jindřich almost 8 years ago

  • Parent task changed from #3811 to #3698
Actions #3

Updated by Tihelka Dan almost 8 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF