The AI is not tagging AI “motion”. Motion is recorded by the cam when detected (where a user-defined threshold of pixels changing during a particular time in a particular area is considered “motion”), then that recording is sent to Wyze servers where each video frame is scanned for particular objects of interest, which, if found, will be tagged on the video data.
Nothing is tracking contiguous motion (yes, contiguous with a ‘G’) at any point, not even the initial event that triggers the cam to begin recording, and especially not any AI objects. These things essentially have no concept of object permanence, or even a rough model of such a thing. Each frame with an object in it is, by current tech, almost fully independent of every other frame. Rare is the consumer grade software that understands or cares that Bob is still Bob the entire time he’s walking from the left side of your yard to the right side. Rather, there are 200 frames of people detected, and you just happen to understand that all 200 of them are just Bob recorded at 20fps.