In a nut-shell
This project aims to address scalability issues of the training phase of machine learning algorithms. Currently, we have tested the method for SVM classifiers. Although the aim is quite researched upon, methods in this work are a mixture from diverse mathematical fields. That includes nearest neighbor methods, discrete optimization and numerical methods. A couple of methods are proposed. First one is called graph shedding (GS) heuristic which essentially ensures serial correctness of original training. The next one called graph clubbing (GC) heuristic introduces an approximate learning scheme that is distributed. Both methods are compared to state-of-the-art shrinking methods of LIBSVM. The foreground shows the pre-processing steps (four) common to both methods.
Resources with respect to the article (undergoing submission in the Journal of Big Data) are shared next. However, few synthetic datasets used during the study are available from us via email request.
FindingsPromising results are obtained for both methods, and network design that accompanied the distributed implementation of GC heuristic.
Sub-plot a shows scaling benefits with points of graph shedding heuristic ('GS heuristic' plot) against LIBSVM ('LIBSVM shrinking heuristic' plot). Closeness of prediction accuracy (sub-plot b) supports 'serial correctness' claim of the GS heuristic.
This figure follows semantics of the first figure. However, it is for GC heuristic. A clear scaling heirarchy of GC > GS > LIBSVM heuristics can be observed in sub-plot a. The approximate scheme also shows close accuracies for points ~>8k.
Communication-free (coarse) distribution scheme is demonstrated. Exploiting worker threads/processes yeilds additional performance benefits (sub-plot a). Coarse distribution of the training set is shown in sub-plot b.
Since message types were known for the distributed implementation of GC heuristic, we designed the network optmized for the need. One such aspect of the network was message protocols (right sub-figure). Protocol 1 which involved message marshalling was superior (left sub-figure). This optimization is analogous to dynamic programming.