VED-X: A Versatile Event Detection Framework

VED-X is a novel Few Shot Learning framework capable of learning specific visual events (as opposed to objects only) using only a few samples. To train a model that can recognize an event, the user needs to provide only a few positive samples (snapshots of when the event is happening) and a few negative samples (snapshots of when the event is not happening.) The trained model can then recognize the event with a good accuracy and if the model makes a mistake (false positive/false negative), the user can flag the mistake. The user's feedback is then incorporated into the model and the model keeps getting better over time.

First, watch some of our pre-recorded demos to see how it works. Then run a live experiment using a webcam(on desktop) below.

Entrance Door Open

Hot Tub Left Uncovered (double constraint)

Person Wearing A Mask (demoing user feedback)

Faucet Left Running (double constraint)

Dog On The Couch


Package Being Picked Up

Grill Left Uncovered

Gas Stove Left On

Recognizing A Specific Person

Car Door Open

Person Taking Pills

Lights On

Real-Time Prediction
Flagged prediction will show up here.
Event Name
Positive Samples
Negative Samples
  • The model learns your event of interest based on your given samples only. At a high level, it tries to guess your event of interest by finding the most distinct aspects of the positive samples that are not present in the negative samples. So the main distinction between your positive and negative samples must be your event of interest.
  • The trained model is capable of generalizing to a good degree. In other words, it is robust to some variations it has not seen in the training samples. However, depending on the event, some variations might reduce the accuracy. Adding more diversity to your samples increases the model's robustness to new variations.
  • Make sure the main difference between your positive and negative samples is your event of interest, otherwise the model might focus on a different aspect. For example, if your event is "person riding a bike" and in most of your negative samples, there is no person or bike, the model may think "person appearing" or "bike appearing" is your event of interest.
  • Your event of interest must be visible in all your positive samples. Currently, we are streaming at a low resolution (400px by 400px), so your event must be clearly visible at that resolution.
  • We recommend using Chrome or Firefox to run a live demo. Some functionalities may not work as expected on some of the other web browsers.
  • Does the model incorporate temporal behavior?
    Currently, the model is limited to static events, i.e. events that can be characterized by only one snapshot. We have found this covers most visual events of interest.
  • Can I train a model to recognize a specific person?
    The model is not capable of learning the distinct facial features very accurately but it can distinguish between people with different appearances.
  • Can the models run at the edge?
    It would be possible to deploy the models at the edge depending on the hardware specifications and the desired performance.
  • How can I access the API?
    Please, email us at for access to the API.