|
1 | | -# M.A.D.H.A.V.A |
| 1 | +# M.A.D.H.A.V.A. |
2 | 2 |
|
3 | | -**Multi-domain Analytical Data Harvesting & Automated Verification Assistant** |
| 3 | +<div align="center"> |
| 4 | + <img src="client/src/logo.png" alt="M.A.D.H.A.V.A. Logo" width="300" /> |
| 5 | + <h1>Multi-domain Analytical Data Harvesting & Automated Verification Assistant</h1> |
| 6 | +</div> |
4 | 7 |
|
5 | | - |
| 8 | +## Overview |
6 | 9 |
|
7 | | -M.A.D.H.A.V.A is a powerful real-time RAG (Retrieval Augmented Generation) system that processes streaming data across multiple domains to provide context-enhanced responses. It combines the strengths of vector databases, large language models, and domain-specific APIs to deliver accurate and up-to-date information. |
| 10 | +M.A.D.H.A.V.A. is an advanced AI-powered assistant that provides intelligent analysis and insights across multiple domains: |
8 | 11 |
|
9 | | -## Features |
10 | | - |
11 | | -- **Real-time Data Processing**: Continuously ingest and process streaming data |
12 | | -- **Multi-domain Support**: Specialized processing for finance, legal, healthcare, code, and more |
13 | | -- **Vector Search with Pinecone**: High-performance similarity search using Pinecone vector database |
14 | | -- **Event-driven Architecture**: Process events asynchronously for improved performance |
15 | | -- **Domain-specific APIs**: Integrate with specialized APIs for enhanced responses |
16 | | -- **LangChain Integration**: Leverage LangChain components for advanced RAG pipelines |
17 | | -- **Modern UI**: Clean, responsive interface for easy interaction |
| 12 | +- 💰 **Finance**: Market analysis and investment insights |
| 13 | +- 🏥 **Healthcare**: Medical research and clinical analysis |
| 14 | +- ⚖️ **Legal**: Case analysis and compliance |
| 15 | +- 💻 **Code Assistant**: AI debugging and code review |
| 16 | +- 📰 **News**: Real-time analysis and trend detection |
| 17 | +- 🛍️ **E-commerce**: Market trends and consumer behavior |
18 | 18 |
|
19 | | -## Architecture |
20 | | - |
21 | | -M.A.D.H.A.V.A follows a modular architecture with the following components: |
| 19 | +## Features |
22 | 20 |
|
23 | | -1. **Vector Store**: Uses Pinecone for efficient vector similarity search |
24 | | -2. **Embedding Models**: Sentence transformers for high-quality document embeddings |
25 | | -3. **LLM Integration**: Connects to various LLMs through a unified interface |
26 | | -4. **Domain Processors**: Specialized processors for different knowledge domains |
27 | | -5. **Event Queue**: Asynchronous event processing for real-time updates |
28 | | -6. **API Handlers**: Integration with external domain-specific APIs |
29 | | -7. **Web Interface**: Modern, responsive UI for user interaction |
| 21 | +- **Domain-Specific Processing**: Tailored analysis for each domain |
| 22 | +- **RAG Implementation**: Advanced retrieval-augmented generation |
| 23 | +- **Real-time Insights**: Instant analysis and recommendations |
| 24 | +- **Interactive Interface**: User-friendly query system |
| 25 | +- **Scalable Architecture**: Built for performance and reliability |
30 | 26 |
|
31 | 27 | ## Getting Started |
32 | 28 |
|
33 | | -### Prerequisites |
34 | | - |
35 | | -- Python 3.9+ |
36 | | -- Node.js 14+ (for frontend) |
37 | | -- Docker (optional, for containerized deployment) |
38 | | -- Pinecone API key (for vector database) |
39 | | - |
40 | | -### Installation |
41 | | - |
42 | | -1. Clone the repository: |
43 | | - ```bash |
44 | | - git clone https://github.com/yourusername/madhava.git |
45 | | - cd madhava |
46 | | - ``` |
47 | | - |
48 | | -2. Set up the backend: |
49 | | - ```bash |
50 | | - cd backend |
51 | | - python -m venv venv |
52 | | - source venv/bin/activate # On Windows: venv\Scripts\activate |
53 | | - pip install -r requirements.txt |
54 | | - ``` |
55 | | - |
56 | | -3. Configure environment variables: |
57 | | - ```bash |
58 | | - cp .env.example .env |
59 | | - # Edit .env with your API keys and configuration |
60 | | - ``` |
61 | | - |
62 | | -4. Run the backend server: |
63 | | - ```bash |
64 | | - python server.py |
65 | | - ``` |
66 | | - |
67 | | -5. Access the application: |
68 | | - Open your browser and navigate to `http://localhost:8000` |
69 | | - |
70 | | -### Docker Deployment |
71 | | - |
72 | | -1. Build the Docker image: |
73 | | - ```bash |
74 | | - docker build -t madhava . |
75 | | - ``` |
76 | | - |
77 | | -2. Run the container: |
78 | | - ```bash |
79 | | - docker run -p 8000:8000 -e PINECONE_API_KEY=your_api_key madhava |
80 | | - ``` |
81 | | - |
82 | | -## Pinecone Integration |
83 | | - |
84 | | -M.A.D.H.A.V.A uses Pinecone as its vector database for efficient similarity search. To set up Pinecone: |
85 | | - |
86 | | -1. Create a Pinecone account at [pinecone.io](https://www.pinecone.io/) |
87 | | -2. Create a new index with dimension 384 (for the default embedding model) |
88 | | -3. Add your Pinecone API key to the `.env` file: |
89 | | - ``` |
90 | | - PINECONE_API_KEY=your_pinecone_api_key |
91 | | - PINECONE_ENVIRONMENT=your_pinecone_environment |
92 | | - PINECONE_INDEX_NAME=madhava-index |
93 | | - ``` |
94 | | - |
95 | | -## Domain-specific Processing |
96 | | - |
97 | | -M.A.D.H.A.V.A supports multiple domains with specialized processing: |
98 | | - |
99 | | -- **Finance**: Integration with financial data APIs and market analysis |
100 | | -- **Legal**: Legal document analysis and regulatory information |
101 | | -- **Healthcare**: Medical information and healthcare data processing |
102 | | -- **Code**: Programming assistance and code analysis |
103 | | -- **Education**: Educational content and learning resources |
104 | | -- **Travel**: Travel planning and recommendations |
105 | | -- **Real Estate**: Property analysis and market insights |
106 | | - |
107 | | -To add a new domain, extend the `DomainProcessor` class with a new domain-specific method. |
108 | | - |
109 | | -## API Reference |
110 | | - |
111 | | -### Authentication |
112 | | - |
113 | | -- `POST /login`: Authenticate with email and Pinecone API key |
114 | | -- `GET /logout`: Log out the current user |
115 | | -- `POST /token`: OAuth2 compatible token endpoint |
116 | | - |
117 | | -### Query Endpoints |
118 | | - |
119 | | -- `POST /api/query`: Process a query with domain-specific context |
120 | | -- `GET /api/user/{user_id}/history`: Get query history for a user |
121 | | -- `GET /alerts`: Get system alerts |
122 | | - |
123 | | -### Vector Store Operations |
| 29 | +1. Clone the repository |
| 30 | +```bash |
| 31 | +git clone https://github.com/yourusername/M.A.D.H.A.V.A..git |
| 32 | +cd M.A.D.H.A.V.A. |
| 33 | +``` |
124 | 34 |
|
125 | | -- `POST /api/documents`: Add documents to the vector store |
126 | | -- `GET /api/documents`: List documents in the vector store |
127 | | -- `DELETE /api/documents/{doc_id}`: Delete a document from the vector store |
128 | | -- `GET /api/stats`: Get vector store statistics |
| 35 | +2. Install dependencies |
| 36 | +```bash |
| 37 | +# Backend |
| 38 | +python -m venv venv |
| 39 | +source venv/bin/activate |
| 40 | +pip install -r requirements.txt |
129 | 41 |
|
130 | | -## Development |
| 42 | +# Frontend |
| 43 | +cd client |
| 44 | +npm install |
| 45 | +``` |
131 | 46 |
|
132 | | -### Project Structure |
| 47 | +3. Start the application |
| 48 | +```bash |
| 49 | +# Backend |
| 50 | +python main.py |
133 | 51 |
|
134 | | -``` |
135 | | -madhava/ |
136 | | -├── backend/ |
137 | | -│ ├── static/ # Static assets |
138 | | -│ ├── templates/ # HTML templates |
139 | | -│ ├── routes/ # API route handlers |
140 | | -│ ├── models/ # Data models |
141 | | -│ ├── pinecone_store.py # Pinecone integration |
142 | | -│ ├── realtime_rag.py # RAG implementation |
143 | | -│ ├── domain_processors.py # Domain-specific processing |
144 | | -│ ├── api_handlers.py # External API integration |
145 | | -│ ├── server.py # Main server file |
146 | | -│ └── requirements.txt # Python dependencies |
147 | | -├── docs/ # Documentation |
148 | | -├── tests/ # Test suite |
149 | | -└── README.md # This file |
| 52 | +# Frontend |
| 53 | +cd client |
| 54 | +npm start |
150 | 55 | ``` |
151 | 56 |
|
152 | | -### Adding a New Domain |
153 | | - |
154 | | -1. Add a new domain processor method in `domain_processors.py`: |
155 | | - ```python |
156 | | - async def process_new_domain(self, query: str) -> Dict[str, Any]: |
157 | | - # Domain-specific processing |
158 | | - domain_data = await self.api_handler.query_domain_api(query) |
159 | | - |
160 | | - # Use RAG to enhance the response |
161 | | - rag_result = await self.rag_orchestrator.process_query(query, domain="new_domain") |
162 | | - |
163 | | - # Combine results |
164 | | - result = { |
165 | | - "answer": rag_result["answer"], |
166 | | - "context": rag_result["context"], |
167 | | - "sources": rag_result["sources"] + ["Domain API"], |
168 | | - "metrics": rag_result["metrics"] |
169 | | - } |
170 | | - |
171 | | - return result |
172 | | - ``` |
173 | | - |
174 | | -2. Add a formatting method: |
175 | | - ```python |
176 | | - def _format_new_domain_response(self, data: Dict[str, Any]) -> str: |
177 | | - return f"New Domain Analysis: {data.get('summary', 'No data available')}" |
178 | | - ``` |
179 | | - |
180 | | -3. Update the domain list in the query schema. |
| 57 | +## Architecture |
181 | 58 |
|
182 | | -## Contributing |
| 59 | +The application follows a modern microservices architecture: |
| 60 | +- Frontend: React.js with modern UI/UX |
| 61 | +- Backend: FastAPI with Python |
| 62 | +- Database: MongoDB, Redis, Vector Store |
| 63 | +- AI: Gemini API integration |
183 | 64 |
|
184 | | -Contributions are welcome! Please feel free to submit a Pull Request. |
| 65 | +## Contributing |
185 | 66 |
|
186 | | -1. Fork the repository |
187 | | -2. Create your feature branch (`git checkout -b feature/amazing-feature`) |
188 | | -3. Commit your changes (`git commit -m 'Add some amazing feature'`) |
189 | | -4. Push to the branch (`git push origin feature/amazing-feature`) |
190 | | -5. Open a Pull Request |
| 67 | +We welcome contributions! Please read our contributing guidelines before submitting pull requests. |
191 | 68 |
|
192 | 69 | ## License |
193 | 70 |
|
194 | 71 | This project is licensed under the MIT License - see the LICENSE file for details. |
195 | | - |
196 | | -## Acknowledgments |
197 | | - |
198 | | -- [Pathway](https://github.com/pathwaycom/pathway) for real-time data processing |
199 | | -- [LangChain](https://github.com/langchain-ai/langchain) for LLM application framework |
200 | | -- [Pinecone](https://www.pinecone.io/) for vector database |
201 | | -- [FastAPI](https://fastapi.tiangolo.com/) for the web framework |
202 | | -- [Sentence Transformers](https://www.sbert.net/) for embeddings |
0 commit comments