wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Overview

rtd ci codecov pyversions pypi pypistatus license coc codestyle

Python based Wikidata framework for easy dataframe extraction

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information. The goal is to create an intuitive interface so that Wikidata can function as a common read-write repository for public statistics.

Contents

Installation

wikirepo can be downloaded from PyPI via pip or sourced directly from this repository:

pip install wikirepo
git clone https://github.com/andrewtavis/wikirepo.git
cd wikirepo
python setup.py install
import wikirepo

Data

wikirepo's data structure is built around Wikidata.org. Human-readable access to Wikidata statistics is achieved through converting requests into Wikidata's Quantity IDs (QIDs) and Property IDs (PIDs), with the Python package wikidata serving as a basis for data loading and indexing. See the documentation for a structured overview of the currently available properties.

Query Data

wikirepo's main access function, wikirepo.data.query, returns a pandas.DataFrame of locations and property data across time.

Each query needs the following inputs:

  • locations: the locations that data should be queried for
    • Strings are accepted for Earth, continents, and countries
    • Get all country names with wikirepo.data.incl_lctn_lbls(lctn_lvls='country')
    • The user can also pass Wikidata QIDs directly
  • depth: the geographic level of the given locations to query
    • A depth of 0 is the locations themselves
    • Greater depths correspond to lower geographic levels (states of countries, etc.)
    • A dictionary of locations is generated for lower depths (see second example below)
  • timespan: start and end datetime.date objects defining when data should come from
    • If not provided, then the most recent data will be retrieved with annotation for when it's from
  • interval: yearly, monthly, weekly, or daily as strings
  • Further arguments: the names of modules in wikirepo/data directories
    • These are passed to arguments corresponding to their directories
    • Data will be queried for these properties for the given locations, depth, timespan and interval, with results being merged as dataframe columns

Queries are also able to access information in Wikidata sub-pages for locations. For example: if inflation rate is not found on the location's main page, then wikirepo checks the location's economic topic page as inflation_rate.py is found in wikirepo/data/economic (see Germany and economy of Germany).

wikirepo further provides a unique dictionary class, EntitiesDict, that stores all loaded Wikidata entities during a query. This speeds up data retrieval, as entities are loaded once and then accessed in the EntitiesDict object for any other needed properties.

Examples of wikirepo.data.query follow:

Querying Information for Given Countries

import wikirepo
from wikirepo.data import wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
# Strings must match their Wikidata English page names
countries = ["Germany", "United States of America", "People's Republic of China"]
# countries = ["Q183", "Q30", "Q148"] # we could also pass QIDs
# data.incl_lctn_lbls(lctn_lvls='country') # or all countries`
depth = 0
timespan = (date(2009, 1, 1), date(2010, 1, 1))
interval = "yearly"

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=countries,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props=["population", "life_expectancy"],
    economic_props="median_income",
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props=None,
    institutional_props="human_dev_idx",
    political_props="executive",
    misc_props=None,
    verbose=True,
)

col_order = [
    "location",
    "qid",
    "year",
    "executive",
    "population",
    "life_exp",
    "human_dev_idx",
    "median_income",
]
df = df[col_order]

df.head(6)
location qid year executive population life_exp human_dev_idx median_income
Germany Q183 2010 Angela Merkel 8.1752e+07 79.9878 0.921 33333
Germany Q183 2009 Angela Merkel nan 79.8366 0.917 nan
United States of America Q30 2010 Barack Obama 3.08746e+08 78.5415 0.914 43585
United States of America Q30 2009 George W. Bush nan 78.3902 0.91 nan
People's Republic of China Q148 2010 Wen Jiabao 1.35976e+09 75.236 0.706 nan
People's Republic of China Q148 2009 Wen Jiabao nan 75.032 0.694 nan

Querying Information for all US Counties

# Note: >3000 regions, expect a 45 minute runtime
import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

ents_dict = wd_utils.EntitiesDict()
country = "United States of America"
# country = "Q30" # we could also pass its QID
depth = 2  # 2 for counties, 1 for states and territories
sub_lctns = True  # for all
# Only valid sub-locations given the timespan will be queried
timespan = (date(2016, 1, 1), date(2018, 1, 1))
interval = "yearly"

us_counties_dict = lctn_utils.gen_lctns_dict(
    ents_dict=ents_dict,
    locations=country,
    depth=depth,
    sub_lctns=sub_lctns,
    timespan=timespan,
    interval=interval,
    verbose=True,
)

df = wikirepo.data.query(
    ents_dict=ents_dict,
    locations=us_counties_dict,
    depth=depth,
    timespan=timespan,
    interval=interval,
    climate_props=None,
    demographic_props="population",
    economic_props=None,
    electoral_poll_props=None,
    electoral_result_props=None,
    geographic_props="area",
    institutional_props="capital",
    political_props=None,
    misc_props=None,
    verbose=True,
)

df[df["population"].notnull()].head(6)
location sub_lctn sub_sub_lctn qid year population area_km2 capital
United States of America California Alameda County Q107146 2018 1.6602e+06 2127 Oakland
United States of America California Contra Costa County Q108058 2018 1.14936e+06 2078 Martinez
United States of America California Marin County Q108117 2018 263886 2145 San Rafael
United States of America California Napa County Q108137 2018 141294 2042 Napa
United States of America California San Mateo County Q108101 2018 774155 1919 Redwood City
United States of America California Santa Clara County Q110739 2018 1.9566e+06 3377 San Jose

Upload Data (WIP)

wikirepo.data.upload will be the core of the eventual wikirepo upload feature. The goal is to record edits that a user makes to a previously queried or baseline dataframe such that these changes can then be pushed back to Wikidata. With the addition of Wikidata login credentials as a wikirepo feature (WIP), the unique information in the edited dataframe could then be uploaded to Wikidata for all to use.

The same process used to query information from Wikidata could be reversed for the upload process. Dataframe columns could be linked to their corresponding Wikidata properties, whether the time qualifiers are a point in time or spans using start time and end time could be derived through the defined variables in the module header, and other necessary qualifiers for proper data indexing could also be included. Source information could also be added in corresponding columns to the given property edits.

Pseudocode for how this process could function follows:

In the first example, changes are made to a df.copy() of a queried dataframe. pandas is then used to compare the new and original dataframes after the user has added information that they have access to.

import wikirepo
from wikirepo.data import lctn_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

ents_dict = wd_utils.EntitiesDict()
country = "Country Name"
depth = 2
sub_lctns = True
timespan = (date(2000,1,1), date(2018,1,1))
interval = 'yearly'

lctns_dict = lctn_utils.gen_lctns_dict()

df = wikirepo.data.query()
df_copy = df.copy()

# The user checks for NaNs and adds data

df_edits = pd.concat([df, df_copy]).drop_duplicates(keep=False)

wikirepo.data.upload(df_edits, credentials)

In the next example data.data_utils.gen_base_df is used to create a dataframe with dimensions that match a time series that the user has access to. The data is then added to the column that corresponds to the property to which it should be added. Source information could further be added via a structured dictionary generated for the user.

import wikirepo
from wikirepo.data import data_utils, wd_utils
from datetime import date

credentials = wd_utils.login()

locations = "Country Name"
depth = 0
# The user defines the time parameters based on their data
timespan = (date(1995,1,2), date(2010,1,2)) # (first Monday, last Sunday)
interval = 'weekly'

base_df = data_utils.gen_base_df()
base_df['data'] = data_for_matching_time_series

source_data = wd_utils.gen_source_dict('Source Information')
base_df['data_source'] = [source_data] * len(base_df)

wikirepo.data.upload(base_df, credentials)

Put simply: a full featured wikirepo.data.upload function would realize the potential of a single read-write repository for all public information.

Maps (WIP)

wikirepo/maps is a further goal of the project, as it combines wikirepo's focus on easy to access open source data and quick high level analytics.

Query Maps

As in wikirepo.data.query, passing the locations, depth, timespan and interval arguments could access GeoJSON files stored on Wikidata, thus providing mapping files in parallel to the user's data. These files could then be leveraged using existing Python plotting libraries to provide detailed presentations of geographic analysis.

Upload Maps

Similar to the potential of adding statistics through wikirepo.data.upload, GeoJSON map files could also be uploaded to Wikidata using appropriate arguments. The potential exists for a myriad of variable maps given locations, depth, timespan and interval information that would allow all wikirepo users to get the exact mapping file that they need for their given task.

Examples

wikirepo can be used as a foundation for countless projects, with its usefulness and practicality only improving as more properties are added and more data is uploaded to Wikidata.

Current usage examples include:

  • Sample notebooks for the Python package poli-sci-kit show how to use wikirepo as a basis for political election and parliamentary appointment analysis, with those notebooks being found in the examples for poli-sci-kit or on Google Colab
  • Pull requests with other examples will gladly be accepted

To-Do

Please see the contribution guidelines if you are interested in contributing to this project. Work that is in progress or could be implemented includes:

Expanding wikirepo

  • Creating an outline of the package's structure for the readme (see issue)

  • Integrating current Python tools with wikirepo structures for uploads to Wikidata

  • Adding a query of property descriptions to data.data_utils.incl_dir_idxs (see issue)

  • Adding multiprocessing support to the wikirepo.data.query process and data.lctn_utils.gen_lctns_dict

  • Potentially converting wikirepo.data.query and data.lctn_utils.gen_lctns_dict over to generated Wikidata SPARQL queries

  • Optimizing wikirepo.data.query:

    • Potentially converting EntitiesDict and LocationsDict to slotted object classes for memory savings
    • Deriving and optimizing other slow parts of the query process
  • Adding access to GeoJSON files for mapping via wikirepo.maps.query

  • Designing and adding GeoJSON files indexed by time properties to Wikidata

  • Creating, improving and sharing examples

  • Improving tests for greater code coverage

  • Improving code quality by refactoring large functions and checking conventions

Expanding Wikidata

The growth of wikirepo's database relies on that of Wikidata. Through data.wd_utils.dir_to_topic_page wikirepo can access properties on location sub-pages, thus allowing for statistics on any topic to be linked to. Beyond including entries for already existing properties (see this issue), the following are examples of property types that could be added:

  • Climate statistics could be added to data/climate

    • This would allow for easy modeling of global warming and its effects
    • Planning would be needed for whether lower intervals would be necessary, or just include daily averages
  • Those for electoral polling and results for locations

    • This would allow direct access to all needed election information in a single function call
  • A property that links political parties and their regions in data/political

    • For easy professional presentation of electoral results (ex: loading in party hex colors, abbreviations, and alignments)
  • data/demographic properties such as:

    • age, education, religious, and linguistic diversities across time
  • data/economic properties such as:

    • female workforce participation, workforce industry diversity, wealth diversity, and total working age population across time
  • Distinct properties for Freedom House and Press Freedom indexes, as well as other descriptive metrics

Similar Projects

Python

JavaScript

Java

Powered By


Wikimedia           Wikibase           Wikidata
Comments
  • Create concise requirement and env files

    Create concise requirement and env files

    This issue is for creating concise versions of requirements.txt and environment.yml for wikirepo. It would be great if these files were created by hand with specific version numbers or generated in a way so that sub-dependencies don't always need to be updated.

    As of now both files are being created with the following commands in the package's conda virtual environment:

    pip list --format=freeze > requirements.txt  
    conda env export --no-builds | grep -v "^prefix: " > environment.yml
    

    wikirepo and other obviously unneeded packages are then removed from these files before being uploaded.

    Any insights or help would be much appreciated!

    help wanted good first issue question 
    opened by andrewtavis 7
  • Remove unused packages in requirements

    Remove unused packages in requirements

    Hello, This is to follow-up issue https://github.com/andrewtavis/wikirepo/issues/17.

    Please review~

    And about setup.py, is there some purpose to use graph package, such as matplotlib and seaborn?

    opened by kination 2
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Bump lxml from 4.6.2 to 4.6.3

    Bump lxml from 4.6.2 to 4.6.3

    Bumps lxml from 4.6.2 to 4.6.3.

    Changelog

    Sourced from lxml's changelog.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • [ImgBot] Optimize images

    [ImgBot] Optimize images

    Beep boop. Your images are optimized!

    Your image file size has been reduced by 45% 🎉

    Details

    | File | Before | After | Percent reduction | |:--|:--|:--|:--| | /resources/wikirepo_logo_transparent.png | 171.28kb | 76.11kb | 55.56% | | /resources/gh_images/wikidata_logo.png | 26.59kb | 16.87kb | 36.56% | | /resources/wikirepo_logo.png | 150.90kb | 96.30kb | 36.18% | | /resources/gh_images/wikibase_logo.png | 20.41kb | 14.64kb | 28.30% | | | | | | | Total : | 369.18kb | 203.92kb | 44.76% |


    Black Lives Matter | 💰 donate | 🎓 learn | ✍🏾 sign

    📝 docs | :octocat: repo | 🙋🏾 issues | 🏅 swag | 🏪 marketplace

    opened by imgbot[bot] 1
  • Bump aiohttp from 3.7.3 to 3.7.4

    Bump aiohttp from 3.7.3 to 3.7.4

    Bumps aiohttp from 3.7.3 to 3.7.4.

    Changelog

    Sourced from aiohttp's changelog.

    3.7.4 (2021-02-25)

    Bugfixes

    • (SECURITY BUG) Started preventing open redirects in the aiohttp.web.normalize_path_middleware middleware. For more details, see https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg.

      Thanks to Beast Glatisant <https://github.com/g147>__ for finding the first instance of this issue and Jelmer Vernooij <https://jelmer.uk/>__ for reporting and tracking it down in aiohttp. [#5497](https://github.com/aio-libs/aiohttp/issues/5497) <https://github.com/aio-libs/aiohttp/issues/5497>_

    • Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a yarl.URL object for HTTP request-target.

      Before this fix, the Python parser would turn the URI's absolute-path for //some-path into / while the Cython code preserved it as //some-path. Now, both do the latter. [#5498](https://github.com/aio-libs/aiohttp/issues/5498) <https://github.com/aio-libs/aiohttp/issues/5498>_


    Commits
    • 0a26acc Bump aiohttp to v3.7.4 for a security release
    • 021c416 Merge branch 'ghsa-v6wp-4m6f-gcjg' into master
    • 4ed7c25 Bump chardet from 3.0.4 to 4.0.0 (#5333)
    • b61f0fd Fix how pure-Python HTTP parser interprets //
    • 5c1efbc Bump pre-commit from 2.9.2 to 2.9.3 (#5322)
    • 0075075 Bump pygments from 2.7.2 to 2.7.3 (#5318)
    • 5085173 Bump multidict from 5.0.2 to 5.1.0 (#5308)
    • 5d1a75e Bump pre-commit from 2.9.0 to 2.9.2 (#5290)
    • 6724d0e Bump pre-commit from 2.8.2 to 2.9.0 (#5273)
    • c688451 Removed duplicate timeout parameter in ClientSession reference docs. (#5262) ...
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Create package structure outline

    Create package structure outline

    wikirepo as a project has many modules that interconnect and are funneled to two functions - wikirepo.data.query and lctn_utils.gen_lctns_dict. It would be helpful for users and potential contributors to have a visual representation of the package that details the overarching structure and the purpose of various components. This outline could then be added to the readme in the To-Do section, potentially in a drop down.

    An initial test of this could be as simple as a directory outline that has a bit more detail about the given components - say by using *, **, †, ‡ and other symbols to indicate where a description could be found.

    A discussion of how to best present the package structure is more than welcome, and contributions would further be very appreciated!

    documentation good first issue question 
    opened by andrewtavis 0
  • Suggest properties for wikirepo

    Suggest properties for wikirepo

    Please use this issue to suggest Wikidata properties that could be added to wikirepo. With the suggestion it would be great to get the following:

    • The link to the property page on Wikidata
    • A suggestion of which category (demographic, economic, etc) the property should go into
    • [Optional] how the query script should be written (see examples/add_property to make suggestions for how the module header should be structured)

    Accepted property suggestions would then be converted to good first issues for wikirepo. Pull requests with new properties following the process of examples/add_property would also gladly be accepted! Documentation could also be done fur such issues or PRs, or could also be a separate issue.

    Thanks for your interest in supporting this project :)

    good first issue question 
    opened by andrewtavis 2
  • Add descriptions to data.data_utils.incl_dir_idxs

    Add descriptions to data.data_utils.incl_dir_idxs

    The function data.data_utils.incl_dir_idxs is how a user can find what indexes are available for a given type of data - demographic, economic, etc. It would be great if data.data_utils.incl_dir_idxs would have an option to also provide a description for the index. This could be directly queried from Wikidata.

    enhancement good first issue 
    opened by andrewtavis 0
Releases(v1.0.0)
  • v1.0.0(Dec 28, 2021)

  • v0.1.1.5(Mar 28, 2021)

    Changes include:

    • An src structure has been adopted for easier testing and to fix wheel distribution issues
    • Code quality is now checked with Codacy
    • Extensive code formatting to improve quality and style
    • Fixes to vulnerabilities through exception use
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Feb 23, 2021)

    First stable release of wikirepo

    Changes include:

    • Full documentation of the package

    • Virtual environment files

    • Bug fixes

    • Extensive testing of all modules with GH Actions and Codecov

    • Code of conduct and contribution guidelines

    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Dec 8, 2020)

    The minimum viable product of wikirepo:

    • Users are able to query data from Wikidata given locations, depth, time_lvl, and timespan arguments

    • String arguments are accepted for Earth, continents, countries and disputed territories

    • Data for greater depths can be retrieved by creating a dictionary given initial starting locations and going to greater depths using the contains administrative territorial entity property

    • Data is formatted and loaded into a pandas dataframe for further manipulation

    • All available social science properties on Wikidata have had modules created for them

    • Estimated load times and progress are given

    • The project's scope and general roadmap have been defined and detailed in the README

    Source code(tar.gz)
    Source code(zip)
Owner
Andrew Tavis McAllister
Data scientist focussing on NLP, causal inference and recommendation engines. Humboldt University of Berlin (MS); University of Oregon (BA).
Andrew Tavis McAllister
Time ranges with python

timeranges Time ranges. Read the Docs Installation pip timeranges is available on pip: pip install timeranges GitHub You can also install the latest v

Micael Jarniac 2 Sep 01, 2022
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown

915 Dec 26, 2022
PipeChain is a utility library for creating functional pipelines.

PipeChain Motivation PipeChain is a utility library for creating functional pipelines. Let's start with a motivating example. We have a list of Austra

Michael Milton 2 Aug 07, 2022
Python library for creating data pipelines with chain functional programming

PyFunctional Features PyFunctional makes creating data pipelines easy by using chained functional operators. Here are a few examples of what it can do

Pedro Rodriguez 2.1k Jan 05, 2023
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

Angel Chavez 1 Oct 31, 2021
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Jan 02, 2023
Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)

Spark-DeltaLake-Demo Reliable, Scalable Machine Learning (2022) This project was completed in an attempt to become better acquainted with the latest b

8 Mar 21, 2022
Collections of pydantic models

pydantic-collections The pydantic-collections package provides BaseCollectionModel class that allows you to manipulate collections of pydantic models

Roman Snegirev 20 Dec 26, 2022
Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

📈 Statistical Quality Control 📉 This repo contains a simple but effective tool made using python which can be used for quality control in statistica

SasiVatsal 8 Oct 18, 2022
Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

Backtesting the "Cramer Effect" & Recommendations from Cramer Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which

Gábor Vecsei 12 Aug 30, 2022
AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures.

AptaMAT Purpose AptaMat is a simple script which aims to measure differences between DNA or RNA secondary structures. The method is based on the compa

GEC UTC 3 Nov 03, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

Tobias Krabel 863 Jan 08, 2023
pyhsmm MITpyhsmm - Bayesian inference in HSMMs and HMMs. MIT

Bayesian inference in HSMMs and HMMs This is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and expli

Matthew Johnson 527 Dec 04, 2022
Repositori untuk menyimpan material Long Course STMKGxHMGI tentang Geophysical Python for Seismic Data Analysis

Long Course "Geophysical Python for Seismic Data Analysis" Instruktur: Dr.rer.nat. Wiwit Suryanto, M.Si Dipersiapkan oleh: Anang Sahroni Waktu: Sesi 1

Anang Sahroni 0 Dec 04, 2021
Python Project on Pro Data Analysis Track

Udacity-BikeShare-Project: Python Project on Pro Data Analysis Track Basic Data Exploration with pandas on Bikeshare Data Basic Udacity project using

Belal Mohammed 0 Nov 10, 2021
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022