What else is in view in the event? The way the recognized object tagging works on the server end, the AI looks at the entire view for objects that it recognizes and if any are seen, it tags the event as such. The detected motion isn’t necessarily what the AI recognized and tagged. Recognize objects can be either moving or stationary, somewhere in frame.
For this use case, I’d recommend multiple fixed cameras with overlapping coverage coving the area of importance. I have several pan cams and eventually only ever used them as glorified fixed cameras as I disabled the motion tracking and pan scanning. I left them in use because if I wanted to manually scan around I could, but added another camera to get 100% coverage of an area.
Also, you should check out the Wishlist as it looks like your request is covered by a couple existing topics: