It would be great if you could detect clapping.

1 clap: action
2 claps: action


And be able to set them separately on different cameras.

Like a high tech clapper.

This is a fun idea but I believe that without a visual confirmation I think it’d be too easily triggered by just a log falling or door slamming or branch breaking or other natural sounds.

Combine it with a AI identification for visually identifying a clap and the sound then you have something that generates fewer false positives and more useful. Then you could release it on April Fool’s day but pranks on you it’s a real thing.

[I do see how adding the visual component could be annoying for people who don’t get the camera has to see and hear but pick your poison… I think disambiguating sounds is hard]

The one concern I have about a visual identification is that it becomes more limiting.

For example, we are told that for the AI to recognize a Face, it needs to cover at least 300 pixels to maintain a 90% confidence interval. This means your face has to be roughly within 6 feet of the camera depending on the size of the face and other factors. It can detect faces a little farther away (8 feet or so), but it quickly becomes less accurate, and farther than 8 feet away it can’t detect smaller faces.

And a face is pretty big. So for the AI to accurately pick up on a visual cue it would require the visual cue be larger than a human head, or be closer than 6 feet to the camera. Those limits make a visual cue less useful and too constraining.

I think the sound alone should be okay. You wouldn’t want to allow a single clap to do anything because there would be too many false detections, but I actually have a real “Clapper” device where plug one works on 2 claps and plug two works with three claps. You have to clap at the right speed, and you can change the sensitivity settings for it, but it works pretty accurately overall. The Wyze cams could be trained to do something quite similar with similar accuracy to the old clapper devices that required 2 or 3 claps. That would make for an awesome AI trigger to start a routine through the cameras.

I agree with your point that 2+ claps should generally be sufficient but it’s still challenging: Is it someone clapping or is it a neighbor hammering? A nail gun? Heels on a tile floor? A car hitting a dip? A large metal trash can being emptied by an mechanical trash truck? (they routinely do a double-tap on the trash container) Or any number of sounds.

If you don’t disambiguate between those sounds and actual claps would users recognize that limitation? That the environment may create many sounds similar to two claps? So how do you disambiguate?

Well, for machine learning, that sounds like (pun intended) a lot of inputs for percussive sounds that aren’t claps (to eliminate false positives) and a lot of training on what a double-clap or triple-clap actually sound like from various distances in highly variable sound environments (an echoey driveway, a cityscape or a business?).

I’m not sure facial recognition is the best comparison since here we’d be talking about time/velocity of pixel change vs exact pattern match but ultimately I agree with you. It’d probably be hard to implement a visual-AI piece consistently enough to be considered a real feature instead of a gimmick.

And ultimately that might be my concern with implementing something like this. Superficially, it sounds really easy, but when tied into the entire Wyze ecosystem you have to do a lot of edge-case detection and consider whether people really should be able to unlock their door with the ‘clapper trigger.’

So ultimately dev-cost feels unjustified to do it really well and not just as a gimmick. But I don’t work for Wyze so who cares what I think! I do like considering cool technical problems, though.

An Aside
There is a deeper problem begging to be solved here. Is there a way to signal a Wyze camera to execute a (preferably non-security implicating) trigger? Right? I’m outside without my phone, working in the yard and want the backyard lights turned on because it’s getting dark.

What ‘signal’ would work? Is the easiest answer just to allow voice commands through Wyze camera microphones? “Wyze turn on outdoor lights” – I don’t think the cameras could do Alexa/Google Voice/Siri level voice detection because that seems really hardware-intensive but could it do more basic ‘voice transcription over the internet’ to allow for really simple standard commands?

Ooooo, that would be kind of cool. Though, same worry about security implications and development costs.

Fun-time-speculation-over :wink:

I proposed a similar voice transcription suggestion (and trigger options) to the Wyze AI team when they solicited suggestions for the AI here:

So yes, I have similar thoughts as you as how that could be cool.

As for visual cues, Wyze had some interesting announcements here:

But the information in that video has basically been the extent of what they’ve said to the public so far.

I don’t know enough about what it would take to distinguish between claps, but I can say that the clapper I have has never activated from anything on accident that wasn’t a deliberate clap, so I would assume an intelligent AI should also be able to be programmed to tell the difference between any of those things with sufficient samples of correct and incorrect detections with sensitivity settins, at least to be as reliable as clappers are, and mine hasn’t falsely triggered in memory. They could launch a pilot or beta test for it and just have users submit events with the various inputs and the AI would do all the work just like it does for meowing and barking and crying detection, etc.

It would be cool to have some more AI triggers though. Currently we basically triggers for DETECTS:

  • Sound
  • Motion
  • Smoke Alarm
  • CO Alarm
  • Pet
  • Vehicle
  • Package
  • Person

I’d love to see more options we can set as triggers.


