Skip to content

Softcatala/softcatala-web-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Repository description

This repository contains Sofcatalà web site content (articles and programs descriptions).

Dataset are available in the dataset directory.

Dataset size:

  • articles.json contains 623 articles with 373233 words
  • programes.json contains 330 program descripctions with 49868 words

The license of the data is Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

How to update the dataset

How to update the dataset:

  • Export the programes and articles items from WordPress admin interface
  • Save the raw files into _/raw _directory
  • Run ./filter.sh to filter out sensitive data
  • Do pip install -r requirements.txt
  • Run python wp-to-json.py

About

Datasets with Softcatalà website content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published