[FLINK-39422] [REST][docs] Fix OpenAPI spec for SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers#27903
Conversation
a89e2bd to
6b7ff1e
Compare
SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers
SlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializersSlotSharingGroupId, MetricCollectionResponseBody, and SerializedThrowable custom serializers
|
@RocMarshal do you know who can review this PR? |
RocMarshal
left a comment
There was a problem hiding this comment.
Hi, @nikhilsu Thanks a lot for the contribution.
Left a few of comments.
Pls take a look~
| * serializer that writes the metrics collection as a raw JSON array, not as an object with a | ||
| * "metrics" field. | ||
| */ | ||
| private static void overrideMetricCollectionSchema(final OpenAPI openApi) { |
There was a problem hiding this comment.
Hey @RocMarshal thanks for the review!
The reason you need this is because MetricCollectionResponseBody uses a custom Serializer which omits the metric key and serializes a response object as a JSON array.
Here's how a sample response looks like if you call the GET /v1/jobs/${jobid}/metrics endpoint. Notice, that there is not metric key here, MetricCollectionResponseBody is serialized to a JSON array:
$ curl 'http://localhost:8081/v1/jobs/bc6df938e80006c05639a139e6cf716f/metrics?get=numRecordsInPerSecond,numRecordsOutPerSecond,numRestarts' \
-H 'Accept: application/json, text/plain, */*' \
-H 'Accept-Language: en-US,en;q=0.9' \
--insecure | jq .
[
{
"id": "numRestarts",
"value": "0"
}
]However, the problem is that the OpenAPI Spec Generator uses the class schema (through reflections) and generates API specs where the response includes the metric key: https://github.com/apache/flink/blob/release-1.20/docs/static/generated/rest_v1_dispatcher.yml#L2721-L2727
This causes a mismatch in OpenAPI spec generated Flink clients. The client expects a metric key in the response but it actually does not exist.
There was a problem hiding this comment.
Sorry for ignoring the details and Thank you @nikhilsu for your clarification.
Got it~
| * serializer that writes the metrics collection as a raw JSON array, not as an object with a | ||
| * "metrics" field. | ||
| */ | ||
| private static void overrideMetricCollectionSchema(final OpenAPI openApi) { |
There was a problem hiding this comment.
Hi, @nikhilsu
In my limited reading,Perhaps implementing the MetricCollectionResponseBody class in the style of
org.apache.flink.runtime.rest.messages.ConfigurationInfo would be a good choice.
Main considerations are as follows:
- We wouldn’t need to modify the
OpenApiSpecGeneratorclass when introducing similar new classes in the future - There is already a precedent with the implementation of
ConfigurationInfo, and we wouldn’t need to explicitly introduce serialization and deserialization processors - In the REST API’s HTML and YML documentation, the definition of the return type would be relatively more detailed. For example, an
objecttype would be refined to something likeArray<Metric> - this should not break the existing JSON field structure.
There was a problem hiding this comment.
Hey @RocMarshal, this is a good suggestion. It would be much cleaner to extend MetricCollectionResponseBody from an ArrayList<Metric>.
However, after this change, endpoints no longer reference MetricCollectionResponseBody - they inline the array. Generated clients would need regeneration. The wire format is the same (JSON array), but the type in generated code changes from MetricCollectionResponseBody to List[Metric].
Here is the diff of the spec.
There was a problem hiding this comment.
This would break downstream client code. But it might be ok as this endpoint would have never worked in the first place.
There was a problem hiding this comment.
I've pushed the change. Let me know what you think.
There was a problem hiding this comment.
Hi, @nikhilsu In my limited reading,Perhaps implementing the
MetricCollectionResponseBodyclass in the style oforg.apache.flink.runtime.rest.messages.ConfigurationInfowould be a good choice. Main considerations are as follows:
- We wouldn’t need to modify the
OpenApiSpecGeneratorclass when introducing similar new classes in the future- There is already a precedent with the implementation of
ConfigurationInfo, and we wouldn’t need to explicitly introduce serialization and deserialization processors- In the REST API’s HTML and YML documentation, the definition of the return type would be relatively more detailed. For example, an
objecttype would be refined to something likeArray<Metric>
Hi, @1996fanrui Could you help take a look for this strategy ?
I guess the generation of the REST API documentation might be related to the release quality of 2.3, although the impact is probably relatively small.
Thank you very much.
There was a problem hiding this comment.
Btw, @RocMarshal now that we've updated MetricCollectionResponseBody to mirror the structure of ConfigurationInfo, for the sake of consistency we should also do the same in case of:
AggregatedMetricsResponseBody.java
There was a problem hiding this comment.
Hi, @nikhilsu
If the current implementation in this AggregatedMetricsResponseBody.java file does not cause the previous issues when generating the corresponding documentation, then I would prefer to keep it as is.
WDYTA?
There was a problem hiding this comment.
I'm pretty sure it does as it uses a custom serializer and deserializer.
There was a problem hiding this comment.
Yup, you can see it's being generated as an object when response will be an array:
https://github.com/apache/flink/blob/release-2.2/docs/static/generated/rest_v1_dispatcher.yml#L1620-L1621
e684820 to
5beedd0
Compare
There was a problem hiding this comment.
FYI: changes to *.html and *.yml were made as part of running ./mvnw package -Dgenerate-rest-docs -pl flink-docs -am -nsu -DskipTests 2>&1
RocMarshal
left a comment
There was a problem hiding this comment.
Thanks @nikhilsu .
LGTM on the whole, just left a few of comments.
BTW, IIUC,
RuntimeRestAPIStabilityTest will be failure, maybe you could have a try on running it.
then do some adjustments based on the hint output of the unit-test result.
like running RuntimeRestAPIStabilityTest with -Dgenerate-rest-snapshot
|
|
||
| jsonGenerator.writeObject(metricCollectionResponseBody.getMetrics()); | ||
| } | ||
| return this; |
There was a problem hiding this comment.
| return this; | |
| return List.copyOf(this); |
There was a problem hiding this comment.
Actually let me make it simpler and remove this method altogether. It's mostly used only in tests.
There was a problem hiding this comment.
@nikhilsu One of the reasons for keeping this method is to maintain compatibility with previous behavior as much as possible.
There was a problem hiding this comment.
Ok sounds good.
@RocMarshal IMHO we should return this. Returning a copy would be a behavior change - the old implementation returned this.metrics which was the same collection reference passed to the constructor, not a defensive copy. Callers that mutated the result would have mutated the internal state before too.
More importantly, the class itself is an ArrayList now. Returning a copy from getMetrics() while the object itself is mutable via add()/remove() would be inconsistent - you'd have two ways to access the same data with different mutability guarantees.
e8fc938 to
27a3ea4
Compare
Ran The test pass because the REST snapshots records |


What is the purpose of the change
This PR fixes three Flink request/response DTOs that use custom Jackson serializers, while their OpenAPI specs are generated from the DTO schema. This leads to a mismatch between the actual wire format and what the generated spec describes, breaking downstream clients:
SlotSharingGroupId: Serialized as a hex string bySlotSharingGroupIDSerializer(added in FLINK-20090), but the spec incorrectly defines it as an object withbytes,lowerPart,upperPartfields becauseoverrideIdSchemas()does not includeSlotSharingGroupId.MetricCollectionResponseBody: Serialized as a raw JSON array[{"id": "metricName", "value": "1"}]by a customSerializer(annotated with@JsonSerialize), but the spec incorrectly defines it as an object with ametricsproperty.SerializedThrowable: Serialized with three fields (class,stack-trace,serialized-throwable) bySerializedThrowableSerializer, but the spec override only includedserialized-throwable, omitting theclassandstack-tracestring fields.None of these changes break backward compatibility - the previous schemas never matched the actual wire format, so any client generated from the spec would have already failed to decode these fields correctly.
Affected endpoints:
GET /jobs/:jobid(SlotSharingGroupIdinJobDetailsInfo.JobVertexDetailsInfo)GET /jobmanager/metricsGET /jobs/:jobid/metricsGET /jobs/:jobid/vertices/:vertexid/metricsGET /jobs/:jobid/vertices/:vertexid/subtasks/metricsGET /jobs/:jobid/vertices/:vertexid/watermarksGET /taskmanagers/:taskmanagerid/metricsGET /taskmanagers/metricsSerializedThrowable(savepoint/checkpoint failure responses)Brief change log
SlotSharingGroupIdtoOpenApiSpecGenerator.overrideIdSchemas()astype: string, pattern: [0-9a-f]{32}MetricCollectionResponseBodyto extendArrayList<Metric>, removing the custom serializer/deserializer. The wire format is identical (JSON array), but the class structure now matches, so the spec generator produces the correct schema without a manual override.classandstack-tracefields to theSerializedThrowableschema override inOpenApiSpecGenerator, using public constants fromSerializedThrowableSerializer.Verifying this change
This change is already covered by existing tests, such as
OpenApiSpecGeneratorTestandMetricCollectionResponseBodyTest.Does this pull request potentially affect one of the following parts:
@Public(Evolving): noMetricCollectionResponseBodycustom serializer/deserializer removed, replaced by extendingArrayList<Metric>which Jackson serializes identically as a JSON array)Documentation