Skip to content

Commit 7b2107c

Browse files
adding dataplex obejct creation and change in readme
1 parent df3a34e commit 7b2107c

2 files changed

Lines changed: 201 additions & 10 deletions

File tree

managed-connectivity/community-contributed-connectors/aws-glue-connector/README.md

Lines changed: 87 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@ Once the metadata file has been generated, you can import it into Dataplex using
9292
"https://dataplex.googleapis.com/v1/projects/{project-id}/locations/{location}/metadataJobs?metadataJobId={job-id}"
9393
```
9494

95+
## Setup Resources
96+
9597
### Required Catalog Objects
9698

9799
Note before importing metadata, the Entry Group and all Entry Types and Aspect Types found in the metadata import file must exist in the target project and location. This connector requires the following Entry Group, Entry Types and Aspect Types:
@@ -104,6 +106,91 @@ Note before importing metadata, the Entry Group and all Entry Types and Aspect T
104106

105107
See [manage entries and create custom sources](https://cloud.google.com/dataplex/docs/ingest-custom-sources) for instructions on creating Entry Groups, Entry Types, and Aspect Types.
106108

109+
### Automated Setup
110+
To run this connector, you must first create the required Dataplex resources. Run the provided script to create all resources automatically:
111+
112+
```bash
113+
# Set your project and location
114+
export PROJECT_ID=your-project-id
115+
export LOCATION=us-central1
116+
export ENTRY_GROUP_ID=aws-glue-entries
117+
118+
# Run the setup script
119+
chmod +x scripts/setup_dataplex_resources.sh
120+
./scripts/setup_dataplex_resources.sh
121+
```
122+
123+
### Manual Setup & Schema Definitions
124+
125+
If you prefer to create them manually, ensure you define the following:
126+
127+
#### Entry Types
128+
* `aws-glue-database`
129+
* `aws-glue-table`
130+
* `aws-glue-view`
131+
132+
#### Aspect Types
133+
134+
**1. `aws-lineage-aspect`**
135+
Used to store lineage relationships.
136+
137+
* **JSON Schema**:
138+
```json
139+
{
140+
"type": "record",
141+
"recordFields": [
142+
{
143+
"name": "links",
144+
"type": "array",
145+
"index": 1,
146+
"arrayItems": {
147+
"type": "record",
148+
"recordFields": [
149+
{
150+
"name": "source",
151+
"type": "record",
152+
"index": 1,
153+
"recordFields": [
154+
{ "name": "fully_qualified_name", "type": "string", "index": 1 }
155+
]
156+
},
157+
{
158+
"name": "target",
159+
"type": "record",
160+
"index": 2,
161+
"recordFields": [
162+
{ "name": "fully_qualified_name", "type": "string", "index": 1 }
163+
]
164+
}
165+
]
166+
}
167+
}
168+
]
169+
}
170+
```
171+
172+
**2. Marker Aspects**
173+
* `aws-glue-database`
174+
* `aws-glue-table`
175+
* `aws-glue-view`
176+
177+
These aspects are used primarily for tagging. You can use a minimal schema:
178+
```json
179+
{
180+
"type": "record",
181+
"recordFields": [
182+
{
183+
"name": "description",
184+
"type": "string",
185+
"index": 1,
186+
"constraints": { "required": false }
187+
}
188+
]
189+
}
190+
```
191+
192+
See [manage entries and create custom sources](https://cloud.google.com/dataplex/docs/ingest-custom-sources) for more details.
193+
107194
## Metadata Extracted
108195

109196
The connector maps AWS Glue objects to Dataplex entries as follows:
@@ -123,17 +210,7 @@ The connector also parses AWS Glue Job scripts (Python/Scala) to extract lineage
123210

124211
***
125212

126-
## Resources Required
127-
128-
To run this connector and import metadata, you need the following resources:
129213

130-
1. **GCP Project**: To host the execution and Dataplex Metastore.
131-
2. **Secret Manager Secret**: To store AWS Credentials securely.
132-
3. **GCS Bucket**: To store the intermediate JSONL output file.
133-
4. **Dataplex Entry Group**: The destination for the imported metadata.
134-
5. **Dataplex Aspect Types & Entry Types**: (Optional) Custom types if you want rich UI rendering, though standard types are used for schema.
135-
136-
***
137214

138215
## AWS Credentials
139216

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# Configuration
5+
# Replace these values with your actual project and location
6+
PROJECT_ID="${PROJECT_ID:-YOUR_PROJECT_ID}"
7+
LOCATION="${LOCATION:-us-central1}"
8+
ENTRY_GROUP_ID="${ENTRY_GROUP_ID:-aws-glue-entries}"
9+
10+
echo "Using Project: $PROJECT_ID"
11+
echo "Using Location: $LOCATION"
12+
echo "Target Entry Group: $ENTRY_GROUP_ID"
13+
14+
# 1. Create Entry Group
15+
echo "----------------------------------------------------------------"
16+
echo "Creating Entry Group: $ENTRY_GROUP_ID..."
17+
gcloud dataplex entry-groups create "$ENTRY_GROUP_ID" \
18+
--project="$PROJECT_ID" \
19+
--location="$LOCATION" \
20+
--description="Entry group for AWS Glue metadata" || echo "Entry Group might already exist."
21+
22+
# 2. Create Entry Types
23+
ENTRY_TYPES=("aws-glue-database" "aws-glue-table" "aws-glue-view")
24+
25+
for TYPE in "${ENTRY_TYPES[@]}"; do
26+
echo "----------------------------------------------------------------"
27+
echo "Creating Entry Type: $TYPE..."
28+
gcloud dataplex entry-types create "$TYPE" \
29+
--project="$PROJECT_ID" \
30+
--location="$LOCATION" \
31+
--description="Entry type for $TYPE" || echo "Entry Type $TYPE might already exist."
32+
done
33+
34+
# 3. Create Aspect Types
35+
echo "----------------------------------------------------------------"
36+
echo "Creating Aspect Types..."
37+
38+
# 3a. Marker Aspect Types (Database, Table, View)
39+
# We define a minimal schema for these markers since they are primarily used for identification.
40+
# Marker schema definition moved into loop below
41+
42+
MARKER_ASPECTS=("aws-glue-database" "aws-glue-table" "aws-glue-view")
43+
44+
for ASPECT in "${MARKER_ASPECTS[@]}"; do
45+
echo "Creating Aspect Type: $ASPECT..."
46+
cat > "${ASPECT}.yaml" <<EOF
47+
name: "$ASPECT"
48+
type: record
49+
recordFields:
50+
- name: description
51+
type: string
52+
index: 1
53+
annotations:
54+
description: "Optional description for this marker."
55+
constraints:
56+
required: false
57+
EOF
58+
gcloud dataplex aspect-types create "$ASPECT" \
59+
--project="$PROJECT_ID" \
60+
--location="$LOCATION" \
61+
--metadata-template-file-name="${ASPECT}.yaml" || echo "Aspect Type $ASPECT might already exist."
62+
rm "${ASPECT}.yaml"
63+
done
64+
65+
# 3b. Lineage Aspect Type
66+
# Defines the schema for lineage links (source -> target)
67+
cat > lineage_aspect.yaml <<EOF
68+
name: "aws-lineage-aspect"
69+
type: record
70+
recordFields:
71+
- name: links
72+
type: array
73+
index: 1
74+
annotations:
75+
description: "List of lineage links."
76+
arrayItems:
77+
name: "link"
78+
type: record
79+
recordFields:
80+
- name: source
81+
type: record
82+
index: 1
83+
annotations:
84+
description: "Source entity in the lineage relationship."
85+
recordFields:
86+
- name: fully_qualified_name
87+
type: string
88+
index: 1
89+
annotations:
90+
description: "FQN of the source entity."
91+
- name: target
92+
type: record
93+
index: 2
94+
annotations:
95+
description: "Target entity in the lineage relationship."
96+
recordFields:
97+
- name: fully_qualified_name
98+
type: string
99+
index: 1
100+
annotations:
101+
description: "FQN of the target entity."
102+
EOF
103+
104+
echo "Creating Aspect Type: aws-lineage-aspect..."
105+
gcloud dataplex aspect-types create "aws-lineage-aspect" \
106+
--project="$PROJECT_ID" \
107+
--location="$LOCATION" \
108+
--metadata-template-file-name=lineage_aspect.yaml || echo "Aspect Type aws-lineage-aspect might already exist."
109+
110+
# Clean up temporary files
111+
rm lineage_aspect.yaml
112+
113+
echo "----------------------------------------------------------------"
114+
echo "Setup complete. Please verify resources in the Google Cloud Console."

0 commit comments

Comments
 (0)