Skip to content

Commit 2ea32c4

Browse files
Update README.md
1 parent a4a3be6 commit 2ea32c4

1 file changed

Lines changed: 21 additions & 62 deletions

File tree

README.md

Lines changed: 21 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -61,51 +61,19 @@ learningOrchestra is designed for data scientists from both engineering and acad
6161

6262
## Quick-start
6363

64-
Installation instructions:
65-
1. learningOrchestra runs on Linux hosts. Install [Docker Engine](https://docs.docker.com/engine/install/) on all instances of your cluster. Configure your cluster in [swarm mode](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/). Install [Docker Compose](https://docs.docker.com/compose/install/) on your manager instance.
66-
2. Clone repo on your manager instance. `https://github.com/learningOrchestra/learningOrchestra.git`
67-
3. `cd learningOrchestra`
68-
4. Deploy with `sudo ./run.sh`
69-
7064
learningOrchestra provides two options to access its features: a REST API and a Python package.
7165

72-
REST API: We recommand using a GUI REST API caller like [Postman](https://www.postman.com/product/api-client/) or [Insomnia](https://insomnia.rest/). Check the [list of available features](https://learningorchestra.github.io/docs/usage/#rest-api-features) for requests details.
66+
REST API: We recommand using a GUI REST API caller like [Postman](https://www.postman.com/product/api-client/) or [Insomnia](https://insomnia.rest/).
7367

7468
Python package:
75-
- Python 3 package
76-
- Install with `pip install learning-orchestra-client`
77-
- Start your scripts by import the package and providing the IP address of one of the instances of your cluster:
78-
```
79-
from learning_orchestra_client import *
80-
cluster_ip = "xx.xx.xxx.xxx"
81-
Context(cluster_ip)
82-
```
83-
- Check the [package documentation](https://github.com/learningOrchestra/pythonClient) for a list of available features.
69+
- Check the [package documentation](https://github.com/learningOrchestra/pythonClient) for more details.
8470

8571
## How do I install learningOrchestra?
8672

8773
:bell: This documentation assumes that the users are familiar with a number of advanced computer science concepts. We have tried to link to learning resources to support beginners, as well as introduce some of the concepts in the [FAQ](#frequently-asked-questions). But if something is still not clear, don't hesitate to [ask for help](#on-using-learningOrchestra).
8874

89-
### Setting up your cluster
90-
91-
learningOrchestra operates from a [cluster](#what-is-a-cluster) of Docker [containers](#what-is-a-container).
75+
We prrovide a documentation explaining how deploy this software, you can read more in [installation docs](https://learningorchestra.github.io/docs/installation/)
9276

93-
All your hosts must operate under Debian Linux OS and have [Docker Engine](https://docs.docker.com/engine/install/) installed.
94-
95-
Configure your cluster in [swarm mode](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/). Install [Docker Compose](https://docs.docker.com/compose/install/) on your manager instance.
96-
97-
You are ready to deploy! :tada:
98-
99-
### Deploy learningOrchestra
100-
101-
Clone this repository on your manager instance.
102-
- Using HTTP protocol, `git clone https://github.com/learningOrchestra/learningOrchestra.git`
103-
- Using SSH protocol, `git clone git@github.com:learningOrchestra/learningOrchestra.git`
104-
- Using GitHub CLI, `gh repo clone learningOrchestra/learningOrchestra`
105-
106-
Move to the root of the directory, `cd learningOrchestra`.
107-
108-
Deploy with `sudo ./run.sh`. The deploy process should take a dozen minutes.
10977

11078
##### Interrupt learningOrchestra
11179

@@ -115,20 +83,27 @@ Run `docker stack rm microservice`.
11583

11684
learningOrchestra is organised into interoperable [microservices](#what-are-microservices). They offer access to third-party libraries, frameworks and software to **gather data**, **clean data**, **train machine learning models**, **tune machine learning models**, **evaluate machine learning models** and **visualize data and results**.
11785

118-
The current version of learningOrchestra offers 7 features:
119-
- The **Dataset download datasets from an URL**. It holds and manage the downloaded data.
120-
- The **Data type is a transform feature** dedicated to changing the type of data fields.
121-
- The **Projection is a transform feature** dedicated to make projections from datasets.
122-
- The **Histogram, t-SNE and PCA APIs are exploration features**. They transform the map the data into new representation spaces so it can be visualized. They can be used on the raw data as well as on the intermediate and final results of the analysis pipeline.
123-
- The **Builder is the high couple feature**. It includes some preprocessing features and machine learning features to train models, evaluate models and predict information using trained models.
86+
The current version of learningOrchestra offers 11 services:
87+
- **Dataset** - Responsible to obtain a dataset. External datasets are stored on MongoDB or on volumes using an Uniform Resource Locator (URL). There is also an alternative to load TensorFlow existing datasets.
88+
- **Model** - Responsible to load supervised or unsupervised models from existing repositories. It is useful to be used to configure a TensorFlow or Scikit-learn object with
89+
a tuned and pre-trained neural network using Google or Facebook best practicesand large instances, for example. On the other hand, it is also useful to load acustomized/optimized neural network developed from scratch by a team of data scientists.
90+
- **Transform** - Responsible for a catalog of tasks, including embedding, normalization, text enrichment, bucketization, data projection and so forth. Learning Orchestra has its own implementations for some services and implement other transform services from TensorFlow and Scikit-learn.
91+
- **Explore** - The data scientist must see the pipes steps results of an analytical pipeline, so Learning Orchestra supports data exploration using the catalog of explore capabilities of TensorFlow and Scikit-learn tools, including histogram, clustering, t-SNE,PCA and others. All outputs of this step are plottable.
92+
- **Tune** - Performs the search for an optimal set of parameters for a given model. It can be made through strategies like grid-search, random search, or Bayesian optimization
93+
- **Train** - Probably it is the most computational expensive service of an ML pipeline, because the models will be trained for best learn the subjacents patterns on data. Adiversity of algorithms can be executed, like Support Vector Machine (SVM), Random Forest, Bayesian inference, K-Nearest Neighbors (KNN), Deep Neural Networks(DNN), and many others.
94+
- **Evaluate** - After training a model, it is necessary to evaluate it’s power to generalize tonew unseen data. For that, the model needs to perform inferences or classification on a test dataset to obtain metrics that more accurately describe the capabilities of themodel. Some common metrics are precision, recall, f1-score, accuracy, mean squarederror (MSE), and cross-entropy. This service is useful to describe the generalization power and to detect the need for model calibrations
95+
- **Predict** - The model can run indefinitely. Sometimes feedbacks are mandatory toreinforce the train step, so the Evaluate services are called multiple times. This is the main reason for a production pipe and, consequently, a service of such a type
96+
- **Builder** - Responsible to execute Spark-ML or TensorFlow entire pipelines in Python, offering an alternative way to use the Learning Orchestra system just as a deployment alternative and not an environment for building ML workflows composed of pipelines.
97+
- **Observe** - Represents a catalog of collections of Learning Orchestra and a publish/subscribe mechanism. Applications can subscribe to these collections to receive notifications via observers.
98+
- **Function** - Responsible to wrap a Python function, representing a wildcard for the data scientist when there is no Learning Orchestra support for a specific ML service. It is different from Builder service, since it does not run the entire pipeline. Instead, it runs just a Python function of Scikit-learn or TensorFlow models on a cluster container. It is part of future plans the support of functions written in R language.
12499

125100
The REST API can be called on from any computer, including one that is not part of the cluster learningOrchestra is deployed on. learningOrchestra provides two options to access its features: a REST API and a Python package.
126101

127102
### Using the REST API
128103

129104
We recommand using a **GUI REST API** caller like [Postman](https://www.postman.com/product/api-client/) or [Insomnia](https://insomnia.rest/). Of course, regular `curl` commands from the terminal remain a possibility.
130105

131-
The details for each feature are available in the [documentation](https://learningorchestra.github.io/docs/usage/#rest-api).
106+
The details for REST API are available in the [open api documentation](https://app.swaggerhub.com/apis-docs/learningOrchestra/learningOrchestra/v1.0).
132107

133108
### Using the Python package
134109

@@ -145,33 +120,17 @@ Check the [package documentation](https://github.com/learningOrchestra/pythonCli
145120

146121
### Check cluster status
147122

148-
To check the deployed microservices and machines of your cluster, run `CLUSTER_IP:8000` where *CLUSTER_IP* is replaced by the external IP of a machine in your cluster.
123+
To check the deployed microservices and machines of your cluster, run `CLUSTER_IP:9000` where *CLUSTER_IP* is replaced by the external IP of a machine in your cluster.
149124

150125
The same can be done to check Spark cluster state with `CLUSTER_IP:8080`.
151126

152127
## About learningOrchestra
153128

154129
### Research background
155130

156-
The [first monograph](https://drive.google.com/file/d/1ZDrTR58pBuobpgwB_AOOFTlfmZEY6uQS/view) (under construction)
157-
158-
### Future steps
159-
160-
* Increase the catalog of analysis microservice options (clustering,
161-
sampling, hierarchy and so forth) and the number of existing ML players (Tensorflow,
162-
WEKA and others).
163-
* Implement the Load Model step.
164-
* Decouple the training and the validation steps to enable different pipe
165-
compositions. Include deep learning option in these steps.
166-
* Implement the tuning step.
167-
* Implement the production step for learning alternatives with feedbacks.
168-
* Conclude the Observer step using Kafka solution.
169-
* Refactor the REST API to insert or remove tags for a better semantic
170-
representation.
171-
* Write other ML pipelines and workflows using larger datasets and from
172-
different knowledge domains.
173-
* Build a new set of experiments.
174-
* Write a final version of the monograph.
131+
The [first monograph](https://drive.google.com/file/d/1ZDrTR58pBuobpgwB_AOOFTlfmZEY6uQS/view)
132+
133+
The [second monograph](https://www.overleaf.com/read/spqznyvtyjsy)(under construction)
175134

176135
### Contributors :sparkles:
177136

0 commit comments

Comments
 (0)