More
    HomeNewsBudget-Friendly AI Tool Extracts Data from Screencasts

    Budget-Friendly AI Tool Extracts Data from Screencasts

    Cheap AI Solution Offers Accurate Data Extraction from Screen Recordings. Read for Details!
    Posted in

    AI researcher Simon Willison recently tackled a common problem using “video scraping.”

    Instead of manually inputting scattered payment values from multiple emails, he recorded a short video of his emails and fed it into Google’s AI Studio.

    Using the Gemini 1.5 Pro and Flash models, the AI successfully extracted the data into JSON format, which Willison then converted to a CSV table for spreadsheet use. He was surprised by the low cost and accuracy of the process, noting it cost less than one-tenth of a cent.

    Also Read: Google Gemini Introduces Double-Check Feature to Verify AI-Generated Content

    Video scraping, or feeding screen recordings into AI models for data extraction, highlights the potential of large language models like Google’s Gemini and GPT-4o, which can process audio, video, image, and text inputs.

    These models convert multimedia inputs into tokens for sequence prediction, making the term “token prediction model” (TPM) more fitting.

    In a previous experiment, Willison used a seven-second video of his bookshelves, which Gemini analyzed to list all visible book titles.

    As a data journalist, Willison values converting unstructured data into structured formats, and video scraping can bypass traditional barriers like website authentication and anti-scraping technology.

    This technique marks a shift in how users might interact with AI models. Rather than typing data or detailing scenarios in text, AI applications can increasingly work with visual data captured on the screen.

    Major AI labs are already exploring similar “video understanding” or “vision” techniques.

    OpenAI has demonstrated a prototype ChatGPT Mac App with screen interaction capabilities, and Microsoft has shown a “Copilot Vision” concept.

    However, public video input features are not yet widely available due to the computational costs involved.

    While Google currently subsidizes user AI costs with its resources, the decreasing cost of AI computing will eventually make these capabilities accessible to a broader audience.

    However, the potential for privacy issues arises when AI models can see user screens. While useful for positive applications, this capability could also lead to privacy invasions or autonomous spying.

    Apps like Rewind AI on Mac and Microsoft’s Recall for Windows 11 demonstrate video scraping’s privacy challenges. These apps store screen-recorded data for later recall, which poses security risks if hacked. Willison, however, is cautious and only uploads specific video data to AI models when necessary.

    Future AI models may run locally without cloud connections, enhancing privacy control. Willison expects to use video scraping more frequently, predicting its broader application as AI technology advances.

    What’s your Reaction?
    +1
    0
    +1
    0
    +1
    0
    +1
    0
    Sanchita Das
    Sanchita Dashttps://bytespired.com/
    Sanchita is with growing experience in troubleshooting and tech-related issues. Pursues interest in technology, gaming, media and storytelling and always ready to accept new challenges.

    LEAVE A REPLY

    Please enter your comment!
    Please enter your name here

    Recent stories.