Bidirectionally transformed strings

Overview

bistring

Build status Documentation status

The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/replace. Each bistring remembers the original string, and how its substrings map to substrings of the modified version.

For example:

>>> from bistring import bistr
>>> s = bistr('π•Ώπ–π–Š π––π–šπ–Žπ–ˆπ–, π–‡π–—π–”π–œπ–“ 🦊 π–π–šπ–’π–•π–˜ π–”π–›π–Šπ–— π–™π–π–Š π–‘π–†π–Ÿπ–ž 🐢')
>>> s = s.normalize('NFKD')     # Unicode normalization
>>> s = s.casefold()            # Case-insensitivity
>>> s = s.replace('🦊', 'fox')  # Replace emoji with text
>>> s = s.replace('🐢', 'dog')
>>> s = s.sub(r'[^\w\s]+', '')  # Strip everything but letters and spaces
>>> s = s[:19]                  # Extract a substring
>>> s.modified                  # The modified substring, after changes
'the quick brown fox'
>>> s.original                  # The original substring, before changes
'π•Ώπ–π–Š π––π–šπ–Žπ–ˆπ–, π–‡π–—π–”π–œπ–“ 🦊'

Languages

PyPI version npm version

bistring is available in multiple languages, currently Python and JavaScript/TypeScript. Ports to other languages are planned for the near future.

The code is structured similarly in each language to make it easy to share algorithms, tests, and fixes between them. The main differences come from trying to mirror the language's built-in string API. If you want to contribute a bug fix or a new feature, feel free to implement it in any one of the supported languages, and we'll try to port it to the rest of them.

Demo

Click here for a live demo of the bistring library in your browser.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Whitespace at start of string mishandled by SentenceTokenizer?

    Whitespace at start of string mishandled by SentenceTokenizer?

    I'm not sure if this is expected behaviour or a bug but the following code illustrates my uncertainty:

    pre_split = bistring.bistr(" \tFoo. \t\n \tBar. \t") \
        .sub(r"^\s+", "") \
        .sub(r"\s*\n\s*", "\n") \
        .sub(r"\s+$", "\n")
    
    post_split = bistring.bistr.join(
        [s.text for s in bistring.SentenceTokenizer("en_GB").tokenize(pre_split)]
    )
    
    # These should print True but actually print False
    print(pre_split == post_split)
    print(pre_split.original == post_split.original)
    
    # This should print True and does print True
    print(pre_split.modified == post_split.modified)
    
    # This should print False but actually prints True
    print(pre_split.original[2:] == post_split.original)
    

    In summary, I was expecting the result of re-joining the tokens produced by SentenceTokenizer to yield an identical bistr to the one that existed prior to the splitting. This appears to be true with the only (known) exception being whitespace at the start of the first sentence is being lost. Whitespace at the end of the string, and whitespace between sentences within the string, are retained as expected.

    Is this expected behaviour?

    Produces behaviour using Python 3.7, bistring 0.4.0, pyicu 2.6, and icu 68.1 (all installed via conda-forge).

    question 
    opened by qtdaniel 10
  • Composition of no-op replacements produces incorrect (or confusing?) alignment

    Composition of no-op replacements produces incorrect (or confusing?) alignment

    >>> from bistring import bistr
    >>> b = bistr("abc")
    >>> b1 = b.replace("bc", "bc")
    >>> b2 = b1.replace("ab", "ab")
    >>> b2
    bistr('abc', 'abc', Alignment([(0, 0), (1, 2), (3, 3)]))
    >>> b2[:2].original
    'a'
    

    Both individual replacements effectively don't change the contents of the original string. I think b2[:2].original "should" return 'abc' here. This would probably be achieved with a coarser composed alignment Alignment([(0, 0), (3, 3)]).

    opened by maxhgerlach 3
  • PyPI Release?

    PyPI Release?

    Thank you very much for this library! It's really useful in many text processing use cases for NLP.

    The last release 0.4 on PyPI is from September 2019 and on master there's been at least a fix for bistr.join() since then: #20 Would you consider cutting a new release?

    opened by maxhgerlach 3
  • Arithmetic progressions

    Arithmetic progressions

    This optimizes the representation of Alignments by detecting and compressing any arithmetic sub-sequences that are found. This implicitly compresses any runs with the identity mapping, as well as other potentially common ones like 2β†’1 char mappings from UTF-16 to UTF-8 or non-BMP to BMP.

    • [x] Python
    • [ ] JS
    opened by tavianator 3
  • Fix readthedocs build

    Fix readthedocs build

    • js: Update dependencies
    • js: Work around https://github.com/rollup/plugins/issues/934
    • docs: Update npm dependencies
    • docs: Work around https://github.com/readthedocs/readthedocs-docker-images/issues/107
    opened by tavianator 2
  • build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    Bumps urllib3 from 1.26.4 to 1.26.5.

    Release notes

    Sourced from urllib3's releases.

    1.26.5

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    Changelog

    Sourced from urllib3's changelog.

    1.26.5 (2021-05-26)

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.
    Commits
    • d161647 Release 1.26.5
    • 2d4a3fe Improve performance of sub-authority splitting in URL
    • 2698537 Update vendored six to 1.16.0
    • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
    • d725a9b Add Python 3.10 to GitHub Actions
    • 339ad34 Use pytest==6.2.4 on Python 3.10+
    • f271c9c Apply latest Black formatting
    • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 2
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    ⚠️ Dependabot is rebasing this PR ⚠️

    If you make any changes to it yourself then they will take precedence over the rebase.


    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    Bumps minimist from 1.2.5 to 1.2.6.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 1
  • build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    Bumps node-notifier from 8.0.0 to 8.0.1.

    Changelog

    Sourced from node-notifier's changelog.

    v8.0.1

    • fixes possible injection issue for notify-send
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    Bumps highlight.js from 10.1.2 to 10.4.1.

    Release notes

    Sourced from highlight.js's releases.

    10.4.1

    Security fixes:

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    10.4.0 - November 2020

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    Deprecations:

    ... (truncated)

    Changelog

    Sourced from highlight.js's changelog.

    Version 10.4.1 (tentative)

    Security

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    Version 10.4.0

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    ... (truncated)

    Commits
    • e96b915 bump 10.4.1
    • 065f65f chore(release) allow release script to handle production releases
    • 68509fc chore(docs) bump SECURITY mention to 9.18.5
    • aa0fb85 chore(docs) Version 9 has reached EOL.
    • fb0a626 enh(ci): Add tests for polynomial regex issues
    • fa46dd1 fix(reasonml) fix poly backtracking issue
    • d496052 fix(latex) fix poly backtracking issue
    • d9f1cdb fix(javascript/typescript) fix poly backtracking issue
    • fdec037 fix(asciidoc) fix poly backtracking issue
    • 02ca487 fix(kotlin) fix poly backtracking issue
    • Additional commits viewable in compare view
    Maintainer changes

    This version was pushed to npm by joshgoebel, a new releaser for highlight.js since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    Bumps json5 from 2.2.1 to 2.2.3.

    Release notes

    Sourced from json5's releases.

    v2.2.3

    v2.2.2

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Changelog

    Sourced from json5's changelog.

    v2.2.3 [code, diff]

    v2.2.2 [code, diff]

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Commits
    • c3a7524 2.2.3
    • 94fd06d docs: update CHANGELOG for v2.2.3
    • 3b8cebf docs(security): use GitHub security advisories
    • f0fd9e1 docs: publish a security policy
    • 6a91a05 docs(template): bug -> bug report
    • 14f8cb1 2.2.2
    • 10cc7ca docs: update CHANGELOG for v2.2.2
    • 7774c10 fix: add proto to objects and arrays
    • edde30a Readme: slight tweak to intro
    • 97286f8 Improve example in readme
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi microsoft/bistring!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository β€” take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request β€” to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches β€” this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time β€” to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 1
  • Problems with PyICU when installing bistring 0.4.0 with pip

    Problems with PyICU when installing bistring 0.4.0 with pip

    Summary

    I ran into what's apparently a known issue with installing PyICU over pip while trying to pip install bistring==0.4.0. Contrary to the error message and recommendations on that thread, installing pkg-config and libicu-dev didn't fix the issue for me. Only installing python3-icu (as recommended in the official PyICU docs) finally fixed it.

    This is obviously not an issue with bistring itself, but it makes it difficult to install bistring because the ICU dependency can't be automatically installed by pip. If there is nothing else that can be done about it, maybe a note about this could at least be added to the Readme file, so people can avoid the frustration of running into the pip error?

    More Details

    Here is the output I got from pip install bistring==0.4.0:

    Collecting bistring==0.4.0
      Downloading bistring-0.4.0-py3-none-any.whl (22 kB)
    Collecting pyicu
      Downloading PyICU-2.8.tar.gz (299 kB)
         |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 299 kB 2.1 MB/s
      Installing build dependencies ... done
      Getting requirements to build wheel ... error
      ERROR: Command errored out with exit status 1:
       command: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha
           cwd: /tmp/pip-install-x_54yndb/pyicu
      Complete output (64 lines):
      (running 'icu-config --version')
      (running 'pkg-config --modversion icu-i18n')
      Traceback (most recent call last):
        File "setup.py", line 63, in <module>
          ICU_VERSION = os.environ['ICU_VERSION']
        File "/usr/lib/python3.8/os.py", line 675, in __getitem__
          raise KeyError(key) from None
      KeyError: 'ICU_VERSION'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 66, in <module>
          ICU_VERSION = check_output(('icu-config', '--version')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'icu-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 69, in <module>
          ICU_VERSION = check_output(('pkg-config', '--modversion', 'icu-i18n')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'pkg-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "/tmp/tmprkr9c6uy", line 280, in <module>
          main()
        File "/tmp/tmprkr9c6uy", line 263, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/tmp/tmprkr9c6uy", line 114, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 143, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 71, in <module>
          raise RuntimeError('''
      RuntimeError:
      Please install pkg-config on your system or set the ICU_VERSION environment
      variable to the version of ICU you have installed.
    
      ----------------------------------------
    ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha Check the logs for full command output
    
    opened by outofthecave 4
  • Rust

    Rust

    • [x] Add a Rust project
    • [x] Implement Alignment
    • [ ] Implement BiString
    • [ ] Implement slices
    • [x] Add a README
    • [ ] Benchmarks
    • [ ] Try arithmetic progression compression
    • [ ] Unify unbounded slice behaviour with Python and JS
    • [x] CI
    opened by tavianator 0
  • Transliterate

    Transliterate

    I was hoping you might advise me on how to incorporate transliteration into a text transformation pipeline.

    Let's say I want to use a 3rd party library like from unidecode import unidecode. I could create a bistring with new_bistr = bistr(text.modified, unidecode(text.modified)) but I would loose all the previous operations.

    Is there a way to fold in a modified string that is calculated outside bistring's capabilities?

    enhancement question 
    opened by christian-storm 4
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Split large XML files into smaller ones for easy upload

Split large XML files into smaller ones for easy upload. Works for WordPress Posts Import and other XML files.

Joseph Adediji 1 Jan 30, 2022
Adventura is an open source Python Text Adventure Engine

Adventura Adventura is an open source Python Text Adventure Engine, Not yet uplo

5 Oct 02, 2022
Fuzz a language by mixing up only few words.

afasi Fuzz a language by mixing up only few words. Status Beta. Note: The default branch is default. Use Examples Version General Help Translate Help

Stefan Hagen 2 Dec 14, 2022
strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

Mihail Zaytsev 1 Oct 22, 2021
Text Summarizationcls app with python

Text Summarizationcls app This is the repo for the Text Summarization AI Project. It makes use of pre-trained Hugging Face models Packages Used The pa

Edem Gold 1 Oct 23, 2021
A minimal python script for generating multiple onetime use bip39 seed phrases

seed_signer_ontimes WARNING This project has mainly been used for local development, and creation should be ran on a air-gapped machine. A minimal pyt

CypherToad 4 Sep 12, 2022
A collection of pre-commit hooks for handling text files.

texthooks A collection of pre-commit hooks for handling text files. In particular, hooks for handling unicode characters which may be undesirable in a

Stephen Rosen 5 Oct 28, 2022
An extension to detect if the articles content match its title.

Clickbait Detector An extension to detect if the articles content match its title. This was developed in a period of 24-hours in a hackathon called 'H

Arvind Krishna 5 Jul 26, 2022
A query extract python package

A query extract python package

Fayas Noushad 4 Nov 28, 2021
pydantic-i18n is an extension to support an i18n for the pydantic error messages.

pydantic-i18n is an extension to support an i18n for the pydantic error messages

Boardpack 48 Dec 21, 2022
Repository containing the code for An-Gocair text normaliser

Scottish Gaelic Text Normaliser The following project contains the code and resources for the Scottish Gaelic text normalisation project. The repo can

3 Jun 28, 2022
Map Reduce Wordcount in Python using gRPC

This project is implemented in Python using gRPC. The input files are given in .txt format and the word count operation is performed.

Divija 4 Dec 05, 2022
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

4 Jan 22, 2022
A pipeline for making highlighted text stand-alone.

title emoji colorFrom colorTo sdk app_file pinned decontextualizer πŸ“€ green gray streamlit main.py false Decontextualizer As a second step in improvin

Paul Bricman 26 Dec 17, 2022
Returns unicode slugs

Python Slugify A Python slugify application that handles unicode. Overview Best attempt to create slugs from unicode strings while keeping it DRY. Not

Val Neekman 1.3k Jan 04, 2023
A minimal code sceleton for a textadveture parser written in python.

Textadventure sceleton written in python Use with a map file generated on https://www.trizbort.io Use the following Sockets for walking directions: n

1 Jan 06, 2022
PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

Pablo Pizarro R. 19 Dec 26, 2022
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([emai

Peter Waller 1.1k Jan 02, 2023
Search for terms(word / table / field name or any) under Snowflake schema names

snowflake-search-terms-in-ddl-views Search for terms(word / table / field name or any) under Snowflake schema names Version : 1.0v How to use ? Run th

Igal Emona 1 Dec 15, 2021
Username reconnaisance tool that checks the availability of a specified username on over 200 websites.

Username reconnaisance tool that checks the availability of a specified username on over 200 websites. Installation & Usage Clone from Github: $ git c

Richard Mwewa 20 Oct 30, 2022