Search for documents in a domain through Google. The objective is to extract metadata

Last update: Dec 16, 2022

Related tags

Overview

MetaFinder - Metadata search through Google

   _____               __             ___________ .__               .___                   
  /     \     ____   _/  |_  _____    \_   _____/ |__|   ____     __| _/   ____   _______  
 /  \ /  \  _/ __ \  \   __\ \__  \    |    __)   |  |  /    \   / __ |  _/ __ \  \_  __ \ 
/    Y    \ \  ___/   |  |    / __ \_  |     \    |  | |   |  \ / /_/ |  \  ___/   |  | \/ 
\____|__  /  \___  >  |__|   (____  /  \___  /    |__| |___|  / \____ |   \___  >  |__|    
        \/       \/               \/       \/               \/       \/       \/          
        
|_ Author: @JosueEncinar
|_ Description: Search for documents in a domain through Google. The objective is to extract metadata
|_ Usage: python3 metafinder.py -d domain.com -l 100 -o /tmp

Installation:

> pip3 install metafinder

Upgrades are also available using:

> pip3 install metafinder --upgrade

Usage

CLI

metafinder -d domain.com -l 20 -o folder [-t 10] [-v]

Parameters:

d: Specifies the target domain.
l: Specify the maximum number of results to be searched.
o: Specify the path to save the report.
t: Optional. Used to configure the threads (4 by default).
v: Optional. It is used to display the results on the screen as well.

In Code

import metafinder.extractor as metadata_extractor

documents_limit = 5
domain = "target_domain"
data = metadata_extractor.extract_metadata_from_google_search(domain, documents_limit)
for k,v in data.items():
    print(f"{k}:")
    print(f"|_ URL: {v['url']}")
    for metadata,value in v['metadata'].items():
        print(f"|__ {metadata}: {value}")

document_name = "test.pdf"
try:
    metadata_file = metadata_extractor.extract_metadata_from_document(document_name)
    for k,v in metadata_file.items():
        print(f"{k}: {v}")
except FileNotFoundError:
    print("File not found")

Author

This project has been developed by:

Josué Encinar García -- @JosueEncinar

Contributors

Félix Brezo Fernández -- @febrezo

Disclaimer!

This Software has been developed for teaching purposes and for use with permission of a potential target. The author is not responsible for any illegitimate use.

Search for documents in a domain through Google. The objective is to extract metadata

Related tags

Overview

MetaFinder - Metadata search through Google

Installation:

Usage

CLI

In Code

Author

Contributors

Disclaimer!

Owner

Josué Encinar

Chinese Named Entity Recognization (BiLSTM with PyTorch)

PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

Automated question generation and question answering from Turkish texts using text-to-text transformers

IEEEXtreme15.0 Questions And Answers

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXの音声合成エンジン

A BERT-based reverse-dictionary of Korean proverbs

基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Python powered crossword generator with database with 20k+ polish words

A Fast Sequence Transducer Implementation with PyTorch Bindings

a CTF web challenge about making screenshots

Rootski - Full codebase for rootski.io (without the data)

Python library for parsing resumes using natural language processing and machine learning

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Codes for processing meeting summarization datasets AMI and ICSI.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Chinese version of GPT2 training code, using BERT tokenizer.