You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-62Lines changed: 21 additions & 62 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,51 +61,19 @@ learningOrchestra is designed for data scientists from both engineering and acad
61
61
62
62
## Quick-start
63
63
64
-
Installation instructions:
65
-
1. learningOrchestra runs on Linux hosts. Install [Docker Engine](https://docs.docker.com/engine/install/) on all instances of your cluster. Configure your cluster in [swarm mode](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/). Install [Docker Compose](https://docs.docker.com/compose/install/) on your manager instance.
66
-
2. Clone repo on your manager instance. `https://github.com/learningOrchestra/learningOrchestra.git`
67
-
3.`cd learningOrchestra`
68
-
4. Deploy with `sudo ./run.sh`
69
-
70
64
learningOrchestra provides two options to access its features: a REST API and a Python package.
71
65
72
-
REST API: We recommand using a GUI REST API caller like [Postman](https://www.postman.com/product/api-client/) or [Insomnia](https://insomnia.rest/). Check the [list of available features](https://learningorchestra.github.io/docs/usage/#rest-api-features) for requests details.
66
+
REST API: We recommand using a GUI REST API caller like [Postman](https://www.postman.com/product/api-client/) or [Insomnia](https://insomnia.rest/).
73
67
74
68
Python package:
75
-
- Python 3 package
76
-
- Install with `pip install learning-orchestra-client`
77
-
- Start your scripts by import the package and providing the IP address of one of the instances of your cluster:
78
-
```
79
-
from learning_orchestra_client import *
80
-
cluster_ip = "xx.xx.xxx.xxx"
81
-
Context(cluster_ip)
82
-
```
83
-
- Check the [package documentation](https://github.com/learningOrchestra/pythonClient) for a list of available features.
69
+
- Check the [package documentation](https://github.com/learningOrchestra/pythonClient) for more details.
84
70
85
71
## How do I install learningOrchestra?
86
72
87
73
:bell: This documentation assumes that the users are familiar with a number of advanced computer science concepts. We have tried to link to learning resources to support beginners, as well as introduce some of the concepts in the [FAQ](#frequently-asked-questions). But if something is still not clear, don't hesitate to [ask for help](#on-using-learningOrchestra).
88
74
89
-
### Setting up your cluster
90
-
91
-
learningOrchestra operates from a [cluster](#what-is-a-cluster) of Docker [containers](#what-is-a-container).
75
+
We prrovide a documentation explaining how deploy this software, you can read more in [installation docs](https://learningorchestra.github.io/docs/installation/)
92
76
93
-
All your hosts must operate under Debian Linux OS and have [Docker Engine](https://docs.docker.com/engine/install/) installed.
94
-
95
-
Configure your cluster in [swarm mode](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/). Install [Docker Compose](https://docs.docker.com/compose/install/) on your manager instance.
96
-
97
-
You are ready to deploy! :tada:
98
-
99
-
### Deploy learningOrchestra
100
-
101
-
Clone this repository on your manager instance.
102
-
- Using HTTP protocol, `git clone https://github.com/learningOrchestra/learningOrchestra.git`
103
-
- Using SSH protocol, `git clone git@github.com:learningOrchestra/learningOrchestra.git`
104
-
- Using GitHub CLI, `gh repo clone learningOrchestra/learningOrchestra`
105
-
106
-
Move to the root of the directory, `cd learningOrchestra`.
107
-
108
-
Deploy with `sudo ./run.sh`. The deploy process should take a dozen minutes.
109
77
110
78
##### Interrupt learningOrchestra
111
79
@@ -115,20 +83,27 @@ Run `docker stack rm microservice`.
115
83
116
84
learningOrchestra is organised into interoperable [microservices](#what-are-microservices). They offer access to third-party libraries, frameworks and software to **gather data**, **clean data**, **train machine learning models**, **tune machine learning models**, **evaluate machine learning models** and **visualize data and results**.
117
85
118
-
The current version of learningOrchestra offers 7 features:
119
-
- The **Dataset download datasets from an URL**. It holds and manage the downloaded data.
120
-
- The **Data type is a transform feature** dedicated to changing the type of data fields.
121
-
- The **Projection is a transform feature** dedicated to make projections from datasets.
122
-
- The **Histogram, t-SNE and PCA APIs are exploration features**. They transform the map the data into new representation spaces so it can be visualized. They can be used on the raw data as well as on the intermediate and final results of the analysis pipeline.
123
-
- The **Builder is the high couple feature**. It includes some preprocessing features and machine learning features to train models, evaluate models and predict information using trained models.
86
+
The current version of learningOrchestra offers 11 services:
87
+
-**Dataset** - Responsible to obtain a dataset. External datasets are stored on MongoDB or on volumes using an Uniform Resource Locator (URL). There is also an alternative to load TensorFlow existing datasets.
88
+
-**Model** - Responsible to load supervised or unsupervised models from existing repositories. It is useful to be used to configure a TensorFlow or Scikit-learn object with
89
+
a tuned and pre-trained neural network using Google or Facebook best practicesand large instances, for example. On the other hand, it is also useful to load acustomized/optimized neural network developed from scratch by a team of data scientists.
90
+
-**Transform** - Responsible for a catalog of tasks, including embedding, normalization, text enrichment, bucketization, data projection and so forth. Learning Orchestra has its own implementations for some services and implement other transform services from TensorFlow and Scikit-learn.
91
+
-**Explore** - The data scientist must see the pipes steps results of an analytical pipeline, so Learning Orchestra supports data exploration using the catalog of explore capabilities of TensorFlow and Scikit-learn tools, including histogram, clustering, t-SNE,PCA and others. All outputs of this step are plottable.
92
+
-**Tune** - Performs the search for an optimal set of parameters for a given model. It can be made through strategies like grid-search, random search, or Bayesian optimization
93
+
-**Train** - Probably it is the most computational expensive service of an ML pipeline, because the models will be trained for best learn the subjacents patterns on data. Adiversity of algorithms can be executed, like Support Vector Machine (SVM), Random Forest, Bayesian inference, K-Nearest Neighbors (KNN), Deep Neural Networks(DNN), and many others.
94
+
-**Evaluate** - After training a model, it is necessary to evaluate it’s power to generalize tonew unseen data. For that, the model needs to perform inferences or classification on a test dataset to obtain metrics that more accurately describe the capabilities of themodel. Some common metrics are precision, recall, f1-score, accuracy, mean squarederror (MSE), and cross-entropy. This service is useful to describe the generalization power and to detect the need for model calibrations
95
+
-**Predict** - The model can run indefinitely. Sometimes feedbacks are mandatory toreinforce the train step, so the Evaluate services are called multiple times. This is the main reason for a production pipe and, consequently, a service of such a type
96
+
-**Builder** - Responsible to execute Spark-ML or TensorFlow entire pipelines in Python, offering an alternative way to use the Learning Orchestra system just as a deployment alternative and not an environment for building ML workflows composed of pipelines.
97
+
-**Observe** - Represents a catalog of collections of Learning Orchestra and a publish/subscribe mechanism. Applications can subscribe to these collections to receive notifications via observers.
98
+
-**Function** - Responsible to wrap a Python function, representing a wildcard for the data scientist when there is no Learning Orchestra support for a specific ML service. It is different from Builder service, since it does not run the entire pipeline. Instead, it runs just a Python function of Scikit-learn or TensorFlow models on a cluster container. It is part of future plans the support of functions written in R language.
124
99
125
100
The REST API can be called on from any computer, including one that is not part of the cluster learningOrchestra is deployed on. learningOrchestra provides two options to access its features: a REST API and a Python package.
126
101
127
102
### Using the REST API
128
103
129
104
We recommand using a **GUI REST API** caller like [Postman](https://www.postman.com/product/api-client/) or [Insomnia](https://insomnia.rest/). Of course, regular `curl` commands from the terminal remain a possibility.
130
105
131
-
The details for each feature are available in the [documentation](https://learningorchestra.github.io/docs/usage/#rest-api).
106
+
The details for REST API are available in the [open api documentation](https://app.swaggerhub.com/apis-docs/learningOrchestra/learningOrchestra/v1.0).
132
107
133
108
### Using the Python package
134
109
@@ -145,33 +120,17 @@ Check the [package documentation](https://github.com/learningOrchestra/pythonCli
145
120
146
121
### Check cluster status
147
122
148
-
To check the deployed microservices and machines of your cluster, run `CLUSTER_IP:8000` where *CLUSTER_IP* is replaced by the external IP of a machine in your cluster.
123
+
To check the deployed microservices and machines of your cluster, run `CLUSTER_IP:9000` where *CLUSTER_IP* is replaced by the external IP of a machine in your cluster.
149
124
150
125
The same can be done to check Spark cluster state with `CLUSTER_IP:8080`.
151
126
152
127
## About learningOrchestra
153
128
154
129
### Research background
155
130
156
-
The [first monograph](https://drive.google.com/file/d/1ZDrTR58pBuobpgwB_AOOFTlfmZEY6uQS/view) (under construction)
157
-
158
-
### Future steps
159
-
160
-
* Increase the catalog of analysis microservice options (clustering,
161
-
sampling, hierarchy and so forth) and the number of existing ML players (Tensorflow,
162
-
WEKA and others).
163
-
* Implement the Load Model step.
164
-
* Decouple the training and the validation steps to enable different pipe
165
-
compositions. Include deep learning option in these steps.
166
-
* Implement the tuning step.
167
-
* Implement the production step for learning alternatives with feedbacks.
168
-
* Conclude the Observer step using Kafka solution.
169
-
* Refactor the REST API to insert or remove tags for a better semantic
170
-
representation.
171
-
* Write other ML pipelines and workflows using larger datasets and from
172
-
different knowledge domains.
173
-
* Build a new set of experiments.
174
-
* Write a final version of the monograph.
131
+
The [first monograph](https://drive.google.com/file/d/1ZDrTR58pBuobpgwB_AOOFTlfmZEY6uQS/view)
132
+
133
+
The [second monograph](https://www.overleaf.com/read/spqznyvtyjsy)(under construction)
0 commit comments