example-video-search

This is an example of building a video Question-Answer system using Jina.

The index data is videos with subtitle information. After indexing, you can query with questions in natural language and retrieve the related video together with the timestamp that the corresponding answer appears.

Prerequisites

We use the YouTube video as a toy example,

pip install -r requirements.txt
bash scripts/download_data.sh

Usage

By default, we index the video file, toy-data/mnnC37ewQI8.mkv

python app.py index

Query with questions,

python app.py query

To run the video search frontend, first set it up locally. You should have Node and Yarn installed on your machine.

cd frontend
yarn

This will install the necessary dependencies.

To run the search frontend, run

yarn dev

You can see the search frontend at http://localhost:3000/.

How it works

The index flow is as below. The sentences are extracted from the subtitle file. In the other pathway, the sentences of the subtitles are encoded by the DPRTextEncoder. The meta information of the sentences together with embeddings are stored in the SimpleIndexer.

The query flow is as shown below.

The input query is a question which is encoded into embeddings by using DPRTextEncoder.
The embedding of the query question is used to retrieve the sentences from SimpleIndexer.
Rank the candidate sentences and extract the exact answers from the sentences by using DPRReaderRanker.
Get the timestamp and video uri information about the answer candidates with Text2Frame

How to index my own data?

download the subtitle files

youtube-dl --write-sub --embed-subs -o toy-data/zvXkQkqd2I8 https://www.youtube.com/watch\?v\=zvXkQkqd2I8

Replace --write-sub with --write-auto-sub when there is no subtitle file uploaded manually. This will use the subtitles generated automatically from YouTube.

run the following

python app.py index
python app.py query

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
flows		flows
frontend		frontend
scripts		scripts
tests		tests
toy-data		toy-data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
executors.py		executors.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

flows

flows

frontend

frontend

scripts

scripts

tests

tests

toy-data

toy-data

.gitignore

.gitignore

README.md

README.md

app.py

app.py

executors.py

executors.py

requirements.txt

requirements.txt

Repository files navigation

example-video-search

Prerequisites

Usage

How it works

How to index my own data?

About

Releases

Packages

Contributors 5

Languages

jina-ai/example-video-qa

Folders and files

Latest commit

History

Repository files navigation

example-video-search

Prerequisites

Usage

How it works

How to index my own data?

About

Resources

Stars

Watchers

Forks

Languages