Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

Last update: Dec 16, 2022

Overview

flashgeotext ⚡ 🌍

Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.

documentation: https://flashgeotext.iwpnd.pw/
introductory blogpost: https://iwpnd.pw/articles/2020-02/flashgeotext-library

Usage

from flashgeotext.geotext import GeoText

geotext = GeoText()

input_text = '''Shanghai. The Chinese Ministry of Finance in Shanghai said that China plans
                to cut tariffs on $75 billion worth of goods that the country
                imports from the US. Washington welcomes the decision.'''

geotext.extract(input_text=input_text)
>> {
    'cities': {
        'Shanghai': {
            'count': 2,
            'span_info': [(0, 8), (45, 53)],
            'found_as': ['Shanghai', 'Shanghai'],
            },
        'Washington, D.C.': {
            'count': 1,
            'span_info': [(175, 185)],
            'found_as': ['Washington'],
            }
        },
    'countries': {
        'China': {
            'count': 1,
            'span_info': [(64, 69)],
            'found_as': ['China'],
            },
        'United States': {
            'count': 1,
            'span_info': [(171, 173)],
            'found_as': ['US'],
            }
        }
    }

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Installing

pip:

pip install flashgeotext

conda:

conda install flashgeotext

for development:

git clone https://github.com/iwpnd/flashgeotext.git
cd flashgeotext/
poetry install

Running the tests

poetry run pytest . -v

Authors

Benjamin Ramser - Initial work - iwpnd

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Demo Data cities from http://www.geonames.org licensed under the Creative Commons Attribution 3.0 License.

Acknowledgments

Hat tip to @vi3k6i5 for his paper and implementation

Comments

Cannot install 0.3.0 or 0.3.1

Hi:

I try to install flashgeotext 0.3.1 via pip or pipenv but i get the following error:

COMAND: pipenv iinstall flashgeotext~=0.3.0

ERROR: ERROR: Could not find a version that satisfies the requirement flashgeotext~=0.3.0 ERROR: No matching distribution found for flashgeotext~=0.3.0

How can i solve it?

Thanks in advance. Best.

opened by LuciaTajuelo 6
Publishing package on conda-forge

Hi @iwpnd, this is a great package! Unfortunately, it is not available on conda-forge while flashtext can be found. I'm building a library around it and I can only fetch packages from conda-forge so I'm wondering if you might accept PR to publish on conda-forge eventually. For now the workaround is to embed all flashgeotext as a module statically but I would love to declare it in the dependencies.

opened by francbartoli 6
seem debugging will cause some fatal errors

Hello, your method helps me a lot, and I just wanna debug at some point, but much to my suprise, it will show some fatal errors once debug, how can we solve this probelm

opened by BriskyGates 5
Missing cities and countries
Hello,

I have started trying out this library, but it seems to be missing cities and countries mostly from South America. What's the best way to update the cities.json and countries.json files? Is it ok just to add the data in there manually?

Also, how can this library map Shanghai as China, where is that relation mapped? why does it not behave the same for Caracas?

>>> geotext.extract(input_text="Living in Caracas", span_info=True) {'cities': {'Caracas': {'count': 1, 'span_info': [(10, 17)]}}, 'countries': {}}

Thanks in advance!
enhancement
opened by pythobot 5
chore(deps-dev): bump awscli from 1.25.67 to 1.25.68
Bumps awscli from 1.25.67 to 1.25.68.

Changelog

Sourced from awscli's changelog.

1.25.68

api-change:identitystore: Documentation updates for the Identity Store CLI Reference.

api-change:sagemaker: This release adds HyperParameterTuningJob type in Search API.

Commits

5df5380 Merge branch 'release-1.25.68'

5a9ece8 Bumping version to 1.25.68

2d83b15 Update changelog based on model updates

aa1b1b1 Merge branch 'release-1.25.67' into develop

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 3
TypeError: slice indices must be integers or None or have an __index__ method when span_info is False
As per the doc https://flashgeotext.iwpnd.pw/geotext/, the GeoText Class python function extract has an optional argument span_info : bool - return span_info. Defaults to True. However, on passing the span_info argument as false, GeoText fails to parse the text and throws an index slice error.

Below are the steps to reproduce the error:

from flashgeotext.geotext import GeoText geotext2 = GeoText() input_text = '''Shanghai. The Chinese Ministry of Finance in Shanghai said that China plans to cut tariffs on $75 billion worth of goods that the country imports from the US. Washington welcomes the decision.''' geotext2.extract(input_text=input_text, span_info=False)

Output: > "found_as": [input_text[span_start:span_end]], TypeError: slice indices must be integers or None or have an index method
bug
opened by shreyjakhmolacactus 3
Initial impressions and questions of flashgeotext for extracting countries from affiliations
Thanks for posting at https://github.com/elyase/geotext/issues/23#issuecomment-593490351 letting me know about this package. I'm interested in it as a way to extract countries referred to in author affiliations.

For example, here is an affiliation:

'Multimodal Computing and Interaction', Saarland University & Department for Computational Biology and Applied Computing, Max Planck Institute for Informatics, Saarbrücken, 66123 Saarland, Germany, Ray and Stephanie Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, 15206 PA, USA, Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Université Pierre et Marie Curie, UMR7238, CNRS-UPMC, Paris, France and CNRS, UMR7238, Laboratory of Computational and Quantitative Biology, Paris, France.

For my project, I'd like to know what countries are mentioned (either directly or inferred from a place mention inside that country).

If I run the following (with v0.2.0):

import flashgeotext.geotext geotexter = flashgeotext.geotext.GeoText(use_demo_data=True) affil = """\ 'Multimodal Computing and Interaction', Saarland University & Department for Computational Biology and Applied Computing, Max Planck Institute for Informatics, Saarbrücken, 66123 Saarland, Germany, \ Ray and Stephanie Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, 15206 PA, USA, \ Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Université Pierre et Marie Curie, UMR7238, CNRS-UPMC, Paris, France and CNRS, UMR7238, Laboratory of Computational and Quantitative Biology, Paris, France. """ geo_text = geotexter.extract(affil, span_info=False) geo_text

I get the following output:

2020-03-02 18:50:46.475 | DEBUG | flashgeotext.lookup:add:194 - cities added to pool 2020-03-02 18:50:46.479 | DEBUG | flashgeotext.lookup:add:194 - countries added to pool 2020-03-02 18:50:46.480 | DEBUG | flashgeotext.lookup:_add_demo_data:225 - demo data loaded for: ['cities', 'countries'] {'cities': {'University': {'count': 2}, 'Saarbrücken': {'count': 1}, 'Carnegie': {'count': 1}, 'Pittsburgh': {'count': 1}, 'Berlin': {'count': 2}, 'Parys': {'count': 2}}, 'countries': {'Germany': {'count': 2}, 'United States': {'count': 1}, 'France': {'count': 2}}}

Some impressions / questions?

Are the city mentions counting towards country mentions? If yes, why does "United States" not have a count of 2 for "Pittsburgh" and "USA".

Is "Parys" for "Paris"... not sure why this conversion is made.

Counting "University" as a city will almost always be a false positive for us, although I'm guessing this is a source data issue.

Thanks for considering this feedback / helping answer any of these questions.
bug question
opened by dhimmel 3
chore(deps-dev): bump markdown-include from 0.7.0 to 0.8.0
Bumps markdown-include from 0.7.0 to 0.8.0.

Release notes

Sourced from markdown-include's releases.

v0.8.0

What's Changed

multiple templates on one line by @Umaaz in cmacmackin/markdown-include#27

✨ Add support for specifying lines and line ranges by @tiangolo in cmacmackin/markdown-include#31

New Contributors

@Umaaz made their first contribution in cmacmackin/markdown-include#27

@tiangolo made their first contribution in cmacmackin/markdown-include#31

Full Changelog: https://github.com/cmacmackin/markdown-include/compare/v0.7.2...v0.8.0

v0.7.2

Project CI fix only

Full Changelog: https://github.com/cmacmackin/markdown-include/compare/v0.7.1...v0.7.2

v0.7.1

What's Changed

Automate publishing by @ZedThree in cmacmackin/markdown-include#37

New Contributors

@ZedThree made their first contribution in cmacmackin/markdown-include#37

Full Changelog: https://github.com/cmacmackin/markdown-include/compare/v0.7.0...v0.7.1

Commits

fd3c00a Merge pull request #40 from cmacmackin/apply-black

490d3cd CI: Automate black

7bf110f Add git blame ignore file

b7b87a7 Apply black formatting

993e858 Merge pull request #39 from cmacmackin/lines-and-line-ranges

f5e754d Add tests for including lines

03c4ea0 Merge branch 'master' into lines-and-line-ranges

692e499 Merge pull request #38 from cmacmackin/unittests-ci

40109fb Delete outdated README.rst file

6ae487c CI: Run tests

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 2
chore(deps-dev): bump awscli from 1.27.13 to 1.27.17
Bumps awscli from 1.27.13 to 1.27.17.

Changelog

Sourced from awscli's changelog.

1.27.17

api-change:backup: AWS Backup introduces support for legal hold and application stack backups. AWS Backup Audit Manager introduces support for cross-Region, cross-account reports.

api-change:cloudwatch: Update cloudwatch command to latest version

api-change:drs: Non breaking changes to existing APIs, and additional APIs added to support in-AWS failing back using AWS Elastic Disaster Recovery.

api-change:ecs: This release adds support for ECS Service Connect, a new capability that simplifies writing and operating resilient distributed applications. This release updates the TaskDefinition, Cluster, Service mutation APIs with Service connect constructs and also adds a new ListServicesByNamespace API.

api-change:efs: Update efs command to latest version

api-change:iot-data: This release adds support for MQTT5 properties to AWS IoT HTTP Publish API.

api-change:iot: Job scheduling enables the scheduled rollout of a Job with start and end times and a customizable end behavior when end time is reached. This is available for continuous and snapshot jobs. Added support for MQTT5 properties to AWS IoT TopicRule Republish Action.

api-change:iotwireless: This release includes a new feature for customers to calculate the position of their devices by adding three new APIs: UpdateResourcePosition, GetResourcePosition, and GetPositionEstimate.

api-change:kendra: Amazon Kendra now supports preview of table information from HTML tables in the search results. The most relevant cells with their corresponding rows, columns are displayed as a preview in the search result. The most relevant table cell or cells are also highlighted in table preview.

api-change:logs: Updates to support CloudWatch Logs data protection and CloudWatch cross-account observability

api-change:mgn: This release adds support for Application and Wave management. We also now support custom post-launch actions.

api-change:oam: Amazon CloudWatch Observability Access Manager is a new service that allows configuration of the CloudWatch cross-account observability feature.

api-change:organizations: This release introduces delegated administrator for AWS Organizations, a new feature to help you delegate the management of your Organizations policies, enabling you to govern your AWS organization in a decentralized way. You can now allow member accounts to manage Organizations policies.

api-change:rds: This release enables new Aurora and RDS feature called Blue/Green Deployments that makes updates to databases safer, simpler and faster.

api-change:textract: This release adds support for classifying and splitting lending documents by type, and extracting information by using the Analyze Lending APIs. This release also includes support for summarized information of the processed lending document package, in addition to per document results.

api-change:transcribe: This release adds support for 'inputType' for post-call and real-time (streaming) Call Analytics within Amazon Transcribe.

1.27.16

api-change:grafana: This release includes support for configuring a Grafana workspace to connect to a datasource within a VPC as well as new APIs for configuring Grafana settings.

api-change:rbin: This release adds support for Rule Lock for Recycle Bin, which allows you to lock retention rules so that they can no longer be modified or deleted.

1.27.15

api-change:appflow: Adding support for Amazon AppFlow to transfer the data to Amazon Redshift databases through Amazon Redshift Data API service. This feature will support the Redshift destination connector on both public and private accessible Amazon Redshift Clusters and Amazon Redshift Serverless.

api-change:kinesisanalyticsv2: Support for Apache Flink 1.15 in Kinesis Data Analytics.

1.27.14

api-change:route53: Amazon Route 53 now supports the Asia Pacific (Hyderabad) Region (ap-south-2) for latency records, geoproximity records, and private DNS for Amazon VPCs in that region.

Commits

4c3c2f3 Merge branch 'release-1.27.17'

068265f Bumping version to 1.27.17

bb73db9 Update changelog based on model updates

99f1f30 Merge branch 'release-1.27.16'

11eab4e Merge branch 'release-1.27.16' into develop

e6b2e68 Bumping version to 1.27.16

0d18491 Update changelog based on model updates

1b0005a Merge branch 'release-1.27.15' into develop

140ef2e Merge branch 'release-1.27.15'

f963121 Bumping version to 1.27.15

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 2
chore(deps-dev): bump poethepoet from 0.16.4 to 0.16.5
Bumps poethepoet from 0.16.4 to 0.16.5.

Release notes

Sourced from poethepoet's releases.

v0.16.5

Fixes

Restore changes from v0.16.1 that were reverted in v0.16.2

Fix various issues on windows #106

docs: use poe --group dev instead of poe --dev by @ubmit in nat-n/poethepoet#98

Only use tomli in python<3.11 by @KotlinIsland in nat-n/poethepoet#100

Add python 3.11 to the CI and update 'dev-dependencies' to 'group.dev.dependencies' by @KotlinIsland in nat-n/poethepoet#101

Format code with isort by @KotlinIsland in nat-n/poethepoet#102

Fix typo in --help output by @howeaj in nat-n/poethepoet#105

New Contributors

@ubmit made their first contribution in nat-n/poethepoet#98

@KotlinIsland made their first contribution in nat-n/poethepoet#100

@howeaj made their first contribution in nat-n/poethepoet#105

Full Changelog: https://github.com/nat-n/poethepoet/compare/v0.16.4...v0.16.5

Commits

d266433 Bump version to 0.16.5

15edd33 Fix issues on windows and restore changes from 0.16.1 (#106)

e41072c Fix typo in --help output (#105)

1a6973a Add isort (#102)

f730b26 Add python 3.11 to the CI and update 'dev-dependencies' to 'group.dev.depende...

53083b5 Only use tomli in python<3.11 (#100)

6924b81 docs: use poetry add --group dev instead of poetry add --dev (#98)

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 2
chore(deps): bump codecov/codecov-action from 3.1.0 to 3.1.1
Bumps codecov/codecov-action from 3.1.0 to 3.1.1.

Release notes

Sourced from codecov/codecov-action's releases.

3.1.1

What's Changed

Update deprecation warning by @slifty in codecov/codecov-action#661

Create codeql-analysis.yml by @mitchell-codecov in codecov/codecov-action#593

build(deps): bump node-fetch from 3.2.3 to 3.2.4 by @dependabot in codecov/codecov-action#714

build(deps-dev): bump typescript from 4.6.3 to 4.6.4 by @dependabot in codecov/codecov-action#713

README: fix typo by @Evalir in codecov/codecov-action#712

build(deps): bump github/codeql-action from 1 to 2 by @dependabot in codecov/codecov-action#724

build(deps-dev): bump @types/jest from 27.4.1 to 27.5.0 by @dependabot in codecov/codecov-action#717

fix: Remove a blank row by @johnmanjiro13 in codecov/codecov-action#725

Update README.md with correct badge version by @gsheni in codecov/codecov-action#726

build(deps-dev): bump @types/node from 17.0.25 to 17.0.33 by @dependabot in codecov/codecov-action#729

build(deps-dev): downgrade @types/node to 16.11.35 by @dependabot in codecov/codecov-action#734

build(deps): bump actions/checkout from 2 to 3 by @dependabot in codecov/codecov-action#723

build(deps): bump @actions/github from 5.0.1 to 5.0.3 by @dependabot in codecov/codecov-action#733

build(deps): bump @actions/core from 1.6.0 to 1.8.2 by @dependabot in codecov/codecov-action#732

build(deps-dev): bump @types/node from 16.11.35 to 16.11.36 by @dependabot in codecov/codecov-action#737

Create scorecards-analysis.yml by @mitchell-codecov in codecov/codecov-action#633

build(deps): bump ossf/scorecard-action from 1.0.1 to 1.1.0 by @dependabot in codecov/codecov-action#749

fix: add more verbosity to validation by @thomasrockhu-codecov in codecov/codecov-action#747

build(deps-dev): bump typescript from 4.6.4 to 4.7.3 by @dependabot in codecov/codecov-action#755

Regenerate scorecards-analysis.yml by @mitchell-codecov in codecov/codecov-action#750

build(deps-dev): bump @types/node from 16.11.36 to 16.11.39 by @dependabot in codecov/codecov-action#759

build(deps-dev): bump @types/node from 16.11.39 to 16.11.40 by @dependabot in codecov/codecov-action#762

build(deps-dev): bump @vercel/ncc from 0.33.4 to 0.34.0 by @dependabot in codecov/codecov-action#746

build(deps): bump ossf/scorecard-action from 1.1.0 to 1.1.1 by @dependabot in codecov/codecov-action#757

build(deps): bump openpgp from 5.2.1 to 5.3.0 by @dependabot in codecov/codecov-action#760

build(deps): bump actions/upload-artifact from 2.3.1 to 3.1.0 by @dependabot in codecov/codecov-action#748

build(deps-dev): bump typescript from 4.7.3 to 4.7.4 by @dependabot in codecov/codecov-action#766

Switch to v3 by @thomasrockhu in codecov/codecov-action#774

Fix network entry in table by @kevmoo in codecov/codecov-action#783

Trim arguments after splitting them by @mitchell-codecov in codecov/codecov-action#791

build(deps): bump openpgp from 5.3.0 to 5.4.0 by @dependabot in codecov/codecov-action#799

build(deps): bump @actions/core from 1.8.2 to 1.9.1 by @dependabot in codecov/codecov-action#798

Plumb failCi into verification function. by @RobbieMcKinstry in codecov/codecov-action#769

release: update changelog and version to 3.1.1 by @thomasrockhu-codecov in codecov/codecov-action#828

New Contributors

@slifty made their first contribution in codecov/codecov-action#661

@Evalir made their first contribution in codecov/codecov-action#712

@johnmanjiro13 made their first contribution in codecov/codecov-action#725

@gsheni made their first contribution in codecov/codecov-action#726

@kevmoo made their first contribution in codecov/codecov-action#783

@RobbieMcKinstry made their first contribution in codecov/codecov-action#769

Full Changelog: https://github.com/codecov/codecov-action/compare/v3.1.0...v3.1.1

Changelog

Sourced from codecov/codecov-action's changelog.

3.1.1

Fixes

#661 Update deprecation warning

#593 Create codeql-analysis.yml

#712 README: fix typo

#725 fix: Remove a blank row

#726 Update README.md with correct badge version

#633 Create scorecards-analysis.yml

#747 fix: add more verbosity to validation

#750 Regenerate scorecards-analysis.yml

#774 Switch to v3

#783 Fix network entry in table

#791 Trim arguments after splitting them

#769 Plumb failCi into verification function.

Dependencies

#713 build(deps-dev): bump typescript from 4.6.3 to 4.6.4

#714 build(deps): bump node-fetch from 3.2.3 to 3.2.4

#724 build(deps): bump github/codeql-action from 1 to 2

#717 build(deps-dev): bump @types/jest from 27.4.1 to 27.5.0

#729 build(deps-dev): bump @types/node from 17.0.25 to 17.0.33

#734 build(deps-dev): downgrade @types/node to 16.11.35

#723 build(deps): bump actions/checkout from 2 to 3

#733 build(deps): bump @actions/github from 5.0.1 to 5.0.3

#732 build(deps): bump @actions/core from 1.6.0 to 1.8.2

#737 build(deps-dev): bump @types/node from 16.11.35 to 16.11.36

#749 build(deps): bump ossf/scorecard-action from 1.0.1 to 1.1.0

#755 build(deps-dev): bump typescript from 4.6.4 to 4.7.3

#759 build(deps-dev): bump @types/node from 16.11.36 to 16.11.39

#762 build(deps-dev): bump @types/node from 16.11.39 to 16.11.40

#746 build(deps-dev): bump @vercel/ncc from 0.33.4 to 0.34.0

#757 build(deps): bump ossf/scorecard-action from 1.1.0 to 1.1.1

#760 build(deps): bump openpgp from 5.2.1 to 5.3.0

#748 build(deps): bump actions/upload-artifact from 2.3.1 to 3.1.0

#766 build(deps-dev): bump typescript from 4.7.3 to 4.7.4

#799 build(deps): bump openpgp from 5.3.0 to 5.4.0

#798 build(deps): bump @actions/core from 1.8.2 to 1.9.1

Commits

d9f34f8 release: update changelog and version to 3.1.1 (#828)

0e9e7b4 Plumb failCi into verification function. (#769)

7f20bd4 build(deps): bump @actions/core from 1.8.2 to 1.9.1 (#798)

13bc253 build(deps): bump openpgp from 5.3.0 to 5.4.0 (#799)

5c0da1b Trim arguments after splitting them (#791)

68d5f6d Fix network entry in table (#783)

2a829b9 Switch to v3 (#774)

8e09eaf build(deps-dev): bump typescript from 4.7.3 to 4.7.4 (#766)

39e2229 build(deps): bump actions/upload-artifact from 2.3.1 to 3.1.0 (#748)

b2b7703 build(deps): bump openpgp from 5.2.1 to 5.3.0 (#760)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 2
chore(deps): bump actions/cache from 3.2.2 to 3.2.3
Bumps actions/cache from 3.2.2 to 3.2.3.

Release notes

Sourced from actions/cache's releases.

v3.2.3

What's Changed

Add Mint example by @uhooi in actions/cache#1051

Fixed broken link by @kotewar in actions/cache#1057

Add support to opt-in enable cross-os caching on windows by @Phantsure in actions/cache#1056

Release support for cross-os caching as opt-in feature by @Phantsure in actions/cache#1060

New Contributors

@uhooi made their first contribution in actions/cache#1051

Full Changelog: https://github.com/actions/cache/compare/v3...v3.2.3

Changelog

Sourced from actions/cache's changelog.

3.2.2

Reverted the changes made in 3.2.1 to use gnu tar and zstd by default on windows.

3.2.3

Support cross os caching on Windows as an opt-in feature.

Fix issue with symlink restoration on Windows for cross-os caches.

Commits

58c146c Release support for cross-os caching as opt-in feature (#1060)

6fd2d45 Add support to opt-in enable cross-os caching on windows (#1056)

1f41429 Fixed broken link (#1057)

365406c Merge pull request #1051 from uhooi/feature/add_mint_example

d621756 Update Mint example

84e5400 Merge remote-tracking branch 'origin/main' into feature/add_mint_example

See full diff in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 0

Releases(v0.4.2)

v0.4.2(Feb 1, 2022)
just fixing conda dependencies

Source code(tar.gz)
Source code(zip)
v0.4.1(Jan 31, 2022)
Fix

Parse extract when span_info is false (d497b57)

Source code(tar.gz)
Source code(zip)

v0.4.0(Sep 13, 2021)

Summary

removed the option to show the span_info and made it a default instead.
Additionally .extract now returns under what word (or synonym) the keyword was found in the input text.
As the configuration was implemented a little doofy, I updated how the configuration can be passed on init of GeoText.

Refactor

Add found synonym (fb06232)

Fix

Make configuration better (e57150f)

from flashgeotext.geotext import GeoText, GeoTextConfiguration

config = GeoTextConfiguration(**{"use_demo_data":True})
geotext = GeoText()

input_text = '''Shanghai. The Chinese Ministry of Finance in Shanghai said that China plans
                to cut tariffs on $75 billion worth of goods that the country
                imports from the US. Washington welcomes the decision.'''

geotext.extract(input_text=input_text)
>> {
    'cities': {
        'Shanghai': {
            'count': 2,
            'span_info': [(0, 8), (45, 53)],
            'found_as': ['Shanghai', 'Shanghai'],
            },
        'Washington, D.C.': {
            'count': 1,
            'span_info': [(175, 185)],
            'found_as': ['Washington'],
            }
        },
    'countries': {
        'China': {
            'count': 1,
            'span_info': [(64, 69)],
            'found_as': ['China'],
            },
        'United States': {
            'count': 1,
            'span_info': [(171, 173)],
            'found_as': ['US'],
            }
        }
    }

Source code(tar.gz)
Source code(zip)

v0.3.2(Apr 13, 2021)
Switching package management to Poetry I accidentally introduced Python 3.8 as minimum requirement. Reverting to Python 3.7 as minimum requirement.

Some housekeeping

Source code(tar.gz)
Source code(zip)
v0.3.1(Apr 7, 2021)

Loglevel is now set to WARNING by default and can be changed using environment variables.

For example:

export LOGURU_LEVEL=DEBUG to enable debug logging.

Allowed levels are ERROR, WARNING, DEBUG and INFO
Source code(tar.gz)
Source code(zip)

v0.3.0(Feb 28, 2021)

Following up on #20 You can now choose to allow to ignore the case of the lookup data.

from flashgeotext.geotext import GeoText
geotext = GeoText(config = { "use_demo_data": True, "case_sensitive": True })
text = "berlin ist ne tolle stadt"
geotext.extract(input_text=text, span_info=True)
>> { "cities": { "Berlin": [(0,6)] }

Source code(tar.gz)
Source code(zip)

v0.2.0(Mar 2, 2020)
[0.2.0] - 2020-03-02

added script argument to LookupData to specify from what script the characters in the lookup will be, see usage, this will make sure that flashgeotext works properly with different character sets (latin, cyrillic, thai etc)

demo-data will (for now) use default character set as non_word_boundary, see @issue-13

Source code(tar.gz)
Source code(zip)
v0.1.0(Feb 25, 2020)

Initial release of flashgeotext

Extract and count countries and cities from text, like GeoText on steroids using FlashText, an Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.

see docs
Source code(tar.gz)
Source code(zip)

Owner

Ben

geographer turned spatial engineer turned data-something turned software developer

GitHub Repository https://flashgeotext.iwpnd.pw

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

Transformer QG on DRCD The inputs of the model refers to we integrate C and A into a new C' in the following form. C' = [c1, c2, ..., [HL], a1, ..., a

1 Oct 22, 2021

Pretty-doc - Composable text objects with python

pretty-doc from __future__ import annotations from dataclasses import dataclass

2 Jan 17, 2022

Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

21 Dec 14, 2022

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

222 Dec 16, 2022

Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

1 Nov 08, 2021

This is a MD5 password/passphrase brute force tool

CROWES-PASS-CRACK-TOOl This is a MD5 password/passphrase brute force tool How to install: Do 'git clone https://github.com/CROW31/CROWES-PASS-CRACK-TO

9 Mar 02, 2022

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

884 Nov 11, 2022

Lyrics generation with GPT2-based Transformer

HuggingArtists - Train a model to generate lyrics Create AI-Artist in just 5 minutes! 🚀 Run the demo notebook to train 🚀 Run the GUI demo to test Di

65 Dec 19, 2022

Problem: Given a nepali news find the category of the news

Classification of category of nepali news catorgory using different algorithms Problem: Multiclass Classification Approaches: TFIDF for vectorization

2 Jan 09, 2022

Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

RoBERTaABSA This repo contains the code for NAACL 2021 paper titled Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoB

106 Nov 28, 2022

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computa

129 Jan 06, 2023

Unsupervised intent recognition

INTENT author: steeve LAQUITAINE description: deployment pattern: currently batch only Setup & run git clone https://github.com/slq0/intent.git bash

1 Apr 08, 2022

Reproduction process of BERT on SST2 dataset

BERT-SST2-Prod Reproduction process of BERT on SST2 dataset 安装说明下载代码库 git clone https://github.com/JunnYu/BERT-SST2-Prod 进入文件夹，安装requirements pip ins

1 Nov 18, 2021

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 06, 2022

It analyze the sentiment of the user, whether it is postive or negative.

Sentiment-Analyzer-Tool It analyze the sentiment of the user, whether it is postive or negative. It uses streamlit library for creating this sentiment

18 Dec 17, 2022

This project converts your human voice input to its text transcript and to an automated voice too.

Human Voice to Automated Voice & Text Introduction: In this project, whenever you'll speak, it will turn your voice into a robot voice and furthermore

3 Oct 15, 2021

Basic yet complete Machine Learning pipeline for NLP tasks

Basic yet complete Machine Learning pipeline for NLP tasks This repository accompanies the article on building basic yet complete ML pipelines for sol

20 Aug 22, 2022

Pipelines de datos, 2021.

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi. Stack princip

8 May 19, 2022

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

10 Dec 06, 2022

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".

Span-ASTE: Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction ***** New March 31th, 2022: Scikit-Style API for Easy Usage *****

111 Dec 23, 2022

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

Related tags

Overview

flashgeotext ⚡ 🌍

Usage

Getting Started

Installing

Running the tests

Authors

License

Acknowledgments

Comments

1.25.68

v0.8.0

What's Changed

New Contributors

v0.7.2

v0.7.1

What's Changed

New Contributors

1.27.17

1.27.16

1.27.15

1.27.14

v0.16.5

Fixes

New Contributors

3.1.1

What's Changed

New Contributors

3.1.1

Fixes

Dependencies

v3.2.3

What's Changed

New Contributors

3.2.2

3.2.3

Releases(v0.4.2)

v0.4.2(Feb 1, 2022)

v0.4.1(Jan 31, 2022)

Fix

v0.4.0(Sep 13, 2021)

Summary

Refactor

Fix

v0.3.2(Apr 13, 2021)

v0.3.1(Apr 7, 2021)

v0.3.0(Feb 28, 2021)

v0.2.0(Mar 2, 2020)

[0.2.0] - 2020-03-02

v0.1.0(Feb 25, 2020)

Owner

Ben

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

Pretty-doc - Composable text objects with python

Edge-Augmented Graph Transformer

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Perform sentiment analysis and keyword extraction on Craigslist listings

This is a MD5 password/passphrase brute force tool

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Lyrics generation with GPT2-based Transformer

Problem: Given a nepali news find the category of the news

Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Unsupervised intent recognition

Reproduction process of BERT on SST2 dataset

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

It analyze the sentiment of the user, whether it is postive or negative.

This project converts your human voice input to its text transcript and to an automated voice too.

Basic yet complete Machine Learning pipeline for NLP tasks

Pipelines de datos, 2021.

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

Code Implementation of "Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction".