Bidirectionally transformed strings

Overview

bistring

Build status Documentation status

The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/replace. Each bistring remembers the original string, and how its substrings map to substrings of the modified version.

For example:

>>> from bistring import bistr
>>> s = bistr('๐•ฟ๐–๐–Š ๐––๐–š๐–Ž๐–ˆ๐–, ๐–‡๐–—๐–”๐–œ๐–“ ๐ŸฆŠ ๐–๐–š๐–’๐–•๐–˜ ๐–”๐–›๐–Š๐–— ๐–™๐–๐–Š ๐–‘๐–†๐–Ÿ๐–ž ๐Ÿถ')
>>> s = s.normalize('NFKD')     # Unicode normalization
>>> s = s.casefold()            # Case-insensitivity
>>> s = s.replace('๐ŸฆŠ', 'fox')  # Replace emoji with text
>>> s = s.replace('๐Ÿถ', 'dog')
>>> s = s.sub(r'[^\w\s]+', '')  # Strip everything but letters and spaces
>>> s = s[:19]                  # Extract a substring
>>> s.modified                  # The modified substring, after changes
'the quick brown fox'
>>> s.original                  # The original substring, before changes
'๐•ฟ๐–๐–Š ๐––๐–š๐–Ž๐–ˆ๐–, ๐–‡๐–—๐–”๐–œ๐–“ ๐ŸฆŠ'

Languages

PyPI version npm version

bistring is available in multiple languages, currently Python and JavaScript/TypeScript. Ports to other languages are planned for the near future.

The code is structured similarly in each language to make it easy to share algorithms, tests, and fixes between them. The main differences come from trying to mirror the language's built-in string API. If you want to contribute a bug fix or a new feature, feel free to implement it in any one of the supported languages, and we'll try to port it to the rest of them.

Demo

Click here for a live demo of the bistring library in your browser.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Whitespace at start of string mishandled by SentenceTokenizer?

    Whitespace at start of string mishandled by SentenceTokenizer?

    I'm not sure if this is expected behaviour or a bug but the following code illustrates my uncertainty:

    pre_split = bistring.bistr(" \tFoo. \t\n \tBar. \t") \
        .sub(r"^\s+", "") \
        .sub(r"\s*\n\s*", "\n") \
        .sub(r"\s+$", "\n")
    
    post_split = bistring.bistr.join(
        [s.text for s in bistring.SentenceTokenizer("en_GB").tokenize(pre_split)]
    )
    
    # These should print True but actually print False
    print(pre_split == post_split)
    print(pre_split.original == post_split.original)
    
    # This should print True and does print True
    print(pre_split.modified == post_split.modified)
    
    # This should print False but actually prints True
    print(pre_split.original[2:] == post_split.original)
    

    In summary, I was expecting the result of re-joining the tokens produced by SentenceTokenizer to yield an identical bistr to the one that existed prior to the splitting. This appears to be true with the only (known) exception being whitespace at the start of the first sentence is being lost. Whitespace at the end of the string, and whitespace between sentences within the string, are retained as expected.

    Is this expected behaviour?

    Produces behaviour using Python 3.7, bistring 0.4.0, pyicu 2.6, and icu 68.1 (all installed via conda-forge).

    question 
    opened by qtdaniel 10
  • Composition of no-op replacements produces incorrect (or confusing?) alignment

    Composition of no-op replacements produces incorrect (or confusing?) alignment

    >>> from bistring import bistr
    >>> b = bistr("abc")
    >>> b1 = b.replace("bc", "bc")
    >>> b2 = b1.replace("ab", "ab")
    >>> b2
    bistr('abc', 'abc', Alignment([(0, 0), (1, 2), (3, 3)]))
    >>> b2[:2].original
    'a'
    

    Both individual replacements effectively don't change the contents of the original string. I think b2[:2].original "should" return 'abc' here. This would probably be achieved with a coarser composed alignment Alignment([(0, 0), (3, 3)]).

    opened by maxhgerlach 3
  • PyPI Release?

    PyPI Release?

    Thank you very much for this library! It's really useful in many text processing use cases for NLP.

    The last release 0.4 on PyPI is from September 2019 and on master there's been at least a fix for bistr.join() since then: #20 Would you consider cutting a new release?

    opened by maxhgerlach 3
  • Arithmetic progressions

    Arithmetic progressions

    This optimizes the representation of Alignments by detecting and compressing any arithmetic sub-sequences that are found. This implicitly compresses any runs with the identity mapping, as well as other potentially common ones like 2โ†’1 char mappings from UTF-16 to UTF-8 or non-BMP to BMP.

    • [x] Python
    • [ ] JS
    opened by tavianator 3
  • Fix readthedocs build

    Fix readthedocs build

    • js: Update dependencies
    • js: Work around https://github.com/rollup/plugins/issues/934
    • docs: Update npm dependencies
    • docs: Work around https://github.com/readthedocs/readthedocs-docker-images/issues/107
    opened by tavianator 2
  • build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    build(deps): bump urllib3 from 1.26.4 to 1.26.5 in /docs

    Bumps urllib3 from 1.26.4 to 1.26.5.

    Release notes

    Sourced from urllib3's releases.

    1.26.5

    :warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.

    If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

    Changelog

    Sourced from urllib3's changelog.

    1.26.5 (2021-05-26)

    • Fixed deprecation warnings emitted in Python 3.10.
    • Updated vendored six library to 1.16.0.
    • Improved performance of URL parser when splitting the authority component.
    Commits
    • d161647 Release 1.26.5
    • 2d4a3fe Improve performance of sub-authority splitting in URL
    • 2698537 Update vendored six to 1.16.0
    • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
    • d725a9b Add Python 3.10 to GitHub Actions
    • 339ad34 Use pytest==6.2.4 on Python 3.10+
    • f271c9c Apply latest Black formatting
    • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
    • See full diff in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 2
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /docs

    โš ๏ธ Dependabot is rebasing this PR โš ๏ธ

    If you make any changes to it yourself then they will take precedence over the rebase.


    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    build(deps): bump minimist from 1.2.5 to 1.2.6 in /js

    Bumps minimist from 1.2.5 to 1.2.6.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 1
  • build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    build(deps): bump node-notifier from 8.0.0 to 8.0.1 in /js

    Bumps node-notifier from 8.0.0 to 8.0.1.

    Changelog

    Sourced from node-notifier's changelog.

    v8.0.1

    • fixes possible injection issue for notify-send
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    build(deps): bump highlight.js from 10.1.2 to 10.4.1 in /docs

    Bumps highlight.js from 10.1.2 to 10.4.1.

    Release notes

    Sourced from highlight.js's releases.

    10.4.1

    Security fixes:

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    10.4.0 - November 2020

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    Deprecations:

    ... (truncated)

    Changelog

    Sourced from highlight.js's changelog.

    Version 10.4.1 (tentative)

    Security

    • (fix) Exponential backtracking fixes for: Josh Goebel
      • cpp
      • handlebars
      • gams
      • perl
      • jboss-cli
      • r
      • erlang-repl
      • powershell
      • routeros
    • (fix) Polynomial backtracking fixes for: Josh Goebel
      • asciidoc
      • reasonml
      • latex
      • kotlin
      • gcode
      • d
      • aspectj
      • moonscript
      • coffeescript/livescript
      • csharp
      • scilab
      • crystal
      • elixir
      • basic
      • ebnf
      • ruby
      • fortran/irpf90
      • livecodeserver
      • yaml
      • x86asm
      • dsconfig
      • markdown
      • ruleslanguage
      • xquery
      • sqf

    Very grateful to Michael Schmidt for all the help.

    Version 10.4.0

    A largish release with many improvements and fixes from quite a few different contributors. Enjoy!

    ... (truncated)

    Commits
    • e96b915 bump 10.4.1
    • 065f65f chore(release) allow release script to handle production releases
    • 68509fc chore(docs) bump SECURITY mention to 9.18.5
    • aa0fb85 chore(docs) Version 9 has reached EOL.
    • fb0a626 enh(ci): Add tests for polynomial regex issues
    • fa46dd1 fix(reasonml) fix poly backtracking issue
    • d496052 fix(latex) fix poly backtracking issue
    • d9f1cdb fix(javascript/typescript) fix poly backtracking issue
    • fdec037 fix(asciidoc) fix poly backtracking issue
    • 02ca487 fix(kotlin) fix poly backtracking issue
    • Additional commits viewable in compare view
    Maintainer changes

    This version was pushed to npm by joshgoebel, a new releaser for highlight.js since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    build(deps): bump lodash from 4.17.15 to 4.17.19 in /js

    Bumps lodash from 4.17.15 to 4.17.19.

    Release notes

    Sourced from lodash's releases.

    4.17.16

    Commits
    Maintainer changes

    This version was pushed to npm by mathias, a new releaser for lodash since your current version.


    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    build(deps): bump json5 from 2.2.1 to 2.2.3 in /js

    Bumps json5 from 2.2.1 to 2.2.3.

    Release notes

    Sourced from json5's releases.

    v2.2.3

    v2.2.2

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Changelog

    Sourced from json5's changelog.

    v2.2.3 [code, diff]

    v2.2.2 [code, diff]

    • Fix: Properties with the name __proto__ are added to objects and arrays. (#199) This also fixes a prototype pollution vulnerability reported by Jonathan Gregson! (#295).
    Commits
    • c3a7524 2.2.3
    • 94fd06d docs: update CHANGELOG for v2.2.3
    • 3b8cebf docs(security): use GitHub security advisories
    • f0fd9e1 docs: publish a security policy
    • 6a91a05 docs(template): bug -> bug report
    • 14f8cb1 2.2.2
    • 10cc7ca docs: update CHANGELOG for v2.2.2
    • 7774c10 fix: add proto to objects and arrays
    • edde30a Readme: slight tweak to intro
    • 97286f8 Improve example in readme
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies javascript 
    opened by dependabot[bot] 0
  • build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    build(deps): bump certifi from 2021.10.8 to 2022.12.7 in /docs

    Bumps certifi from 2021.10.8 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies python 
    opened by dependabot[bot] 0
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi microsoft/bistring!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that weโ€™ve integrated LGTMโ€™s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository โ€” take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, weโ€™ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request โ€” to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches โ€” this keeps the analysis results on your repositoryโ€™s Security tab up to date.
    • Once a week at a fixed time โ€” to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: weโ€™ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesnโ€™t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasnโ€™t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTMโ€™s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You donโ€™t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If youโ€™d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 1
  • Problems with PyICU when installing bistring 0.4.0 with pip

    Problems with PyICU when installing bistring 0.4.0 with pip

    Summary

    I ran into what's apparently a known issue with installing PyICU over pip while trying to pip install bistring==0.4.0. Contrary to the error message and recommendations on that thread, installing pkg-config and libicu-dev didn't fix the issue for me. Only installing python3-icu (as recommended in the official PyICU docs) finally fixed it.

    This is obviously not an issue with bistring itself, but it makes it difficult to install bistring because the ICU dependency can't be automatically installed by pip. If there is nothing else that can be done about it, maybe a note about this could at least be added to the Readme file, so people can avoid the frustration of running into the pip error?

    More Details

    Here is the output I got from pip install bistring==0.4.0:

    Collecting bistring==0.4.0
      Downloading bistring-0.4.0-py3-none-any.whl (22 kB)
    Collecting pyicu
      Downloading PyICU-2.8.tar.gz (299 kB)
         |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 299 kB 2.1 MB/s
      Installing build dependencies ... done
      Getting requirements to build wheel ... error
      ERROR: Command errored out with exit status 1:
       command: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha
           cwd: /tmp/pip-install-x_54yndb/pyicu
      Complete output (64 lines):
      (running 'icu-config --version')
      (running 'pkg-config --modversion icu-i18n')
      Traceback (most recent call last):
        File "setup.py", line 63, in <module>
          ICU_VERSION = os.environ['ICU_VERSION']
        File "/usr/lib/python3.8/os.py", line 675, in __getitem__
          raise KeyError(key) from None
      KeyError: 'ICU_VERSION'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 66, in <module>
          ICU_VERSION = check_output(('icu-config', '--version')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'icu-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "setup.py", line 69, in <module>
          ICU_VERSION = check_output(('pkg-config', '--modversion', 'icu-i18n')).strip()
        File "setup.py", line 19, in check_output
          return subprocess_check_output(popenargs)
        File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/usr/lib/python3.8/subprocess.py", line 489, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/usr/lib/python3.8/subprocess.py", line 854, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/usr/lib/python3.8/subprocess.py", line 1702, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: 'pkg-config'
    
      During handling of the above exception, another exception occurred:
    
      Traceback (most recent call last):
        File "/tmp/tmprkr9c6uy", line 280, in <module>
          main()
        File "/tmp/tmprkr9c6uy", line 263, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/tmp/tmprkr9c6uy", line 114, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 162, in get_requires_for_build_wheel
          return self._get_build_requires(
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 143, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-p3nxngyx/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 158, in run_setup
          exec(compile(code, __file__, 'exec'), locals())
        File "setup.py", line 71, in <module>
          raise RuntimeError('''
      RuntimeError:
      Please install pkg-config on your system or set the ICU_VERSION environment
      variable to the version of ICU you have installed.
    
      ----------------------------------------
    ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmprkr9c6uy get_requires_for_build_wheel /tmp/tmp5tnhtmha Check the logs for full command output
    
    opened by outofthecave 4
  • Rust

    Rust

    • [x] Add a Rust project
    • [x] Implement Alignment
    • [ ] Implement BiString
    • [ ] Implement slices
    • [x] Add a README
    • [ ] Benchmarks
    • [ ] Try arithmetic progression compression
    • [ ] Unify unbounded slice behaviour with Python and JS
    • [x] CI
    opened by tavianator 0
  • Transliterate

    Transliterate

    I was hoping you might advise me on how to incorporate transliteration into a text transformation pipeline.

    Let's say I want to use a 3rd party library like from unidecode import unidecode. I could create a bistring with new_bistr = bistr(text.modified, unidecode(text.modified)) but I would loose all the previous operations.

    Is there a way to fold in a modified string that is calculated outside bistring's capabilities?

    enhancement question 
    opened by christian-storm 4
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Translate .sbv subtitle files

deepl4subtitle Deeplใ‚’ไฝฟใฃใฆๅญ—ๅน•ใƒ•ใ‚กใ‚คใƒซ(.sbv)ใ‚’็ฟป่จณใ—ใพใ™ใ€‚ใ‚ฟใ‚คใƒ ใ‚นใ‚ฟใƒณใƒ—ใ‚‚ๅซใ‚ใฆๅ‡บๅŠ›ใ—ใพใ™ใŒใ€็ฟป่จณๆ™‚ใฏใ‚ฟใ‚คใƒ ใ‚นใ‚ฟใƒณใƒ—ใฏๆ–‡ใฎไธ€้ƒจใจใฏๅˆ‡ใ‚Š้›ขใ•ใ‚Œใ‚‹ใฎใงใ€.sbvใƒ•ใ‚กใ‚คใƒซใ‚’ใใฎใพใพ็ฟป่จณๆฉŸใซ็ชใฃ่พผใ‚€ใ‚ˆใ‚Šใ‚‚้ซ˜็ฒพๅบฆใช็ฟป่จณใŒใงใใ‚‹ใฏใšใงใ™ใ€‚ ใคใ‹ใ„ใ‹ใŸ ๅ…ฅๅŠ›ใ™ใ‚‹.sbvใƒ•ใ‚กใ‚คใƒซใฎๅ‰ๅ‡ฆ็†

Yasunori Toshimitsu 1 Oct 20, 2021
A pipeline for making highlighted text stand-alone.

title emoji colorFrom colorTo sdk app_file pinned decontextualizer ๐Ÿ“ค green gray streamlit main.py false Decontextualizer As a second step in improvin

Paul Bricman 26 Dec 17, 2022
Tools to extract questionaire of finalexam.eu and provide interactive questionaire with summary

AskMe This script is completely terminal based. No user interface is added. You can get the command line options by using the --help argument. Make su

David Loewe 1 Nov 09, 2021
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

MUSE stands for Multilingual Universal Sentence Encoder - multilingual extension (supports 16 languages) of Universal Sentence Encoder (USE).

Dani El-Ayyass 47 Sep 05, 2022
A minimal code sceleton for a textadveture parser written in python.

Textadventure sceleton written in python Use with a map file generated on https://www.trizbort.io Use the following Sockets for walking directions: n

1 Jan 06, 2022
Make writing easier!

Handwriter Make writing easier! How to Download and install a handwriting font, or create a font from your handwriting. Use a word processor like Micr

64 Dec 25, 2022
Bidirectionally transformed strings

bistring The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/repl

Microsoft 352 Dec 19, 2022
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

4 Jan 22, 2022
Getting git-style versioning working on RDFlib

Getting git-style versioning working on RDFlib

Gabe Fierro 1 Feb 01, 2022
็ŸฅไนŽ่ฏ„่ฎบๅŒบ่ฏไบ‘ๅˆ†ๆž

zhihu-comment-wordcloud ็ŸฅไนŽ่ฏ„่ฎบๅŒบ่ฏไบ‘ๅˆ†ๆž ่ตทๆบไบŽ๏ผšๅฆ‚ไฝ•็œ‹ๅพ…็ŸฅไนŽ้—ฎ้ข˜โ€œ็”ท็”Ÿ็œŸ็š„ๅพˆไธ่ƒฝๆŽฅๅ—ๅฝฉ็คผๅ—๏ผŸโ€็š„ไธ€ไธชๅ›ž็ญ”ไธ‹่ฏ„่ฎบๆ•ฐ่ถ…8ไธ‡ๆก๏ผŒๅˆ›ๅ•ไธชๅ›ž็ญ”ไธ‹่ฏ„่ฎบๆ•ฐๆ–ฐ่ฎฐๅฝ•๏ผŸ ้กน็›ฎไปฃ็ ่ฏดๆ˜Ž 2.download_comment.py ไธ‹่ฝฝๅ…จ้‡่ฏ„่ฎบ 2.word_cloud_by_dt ็”Ÿๆˆ่ฏไบ‘ 2

ๆŽๅ›ฝๅฎ 10 Sep 26, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text โœ๏ธ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

Antti Haapala 1.2k Dec 16, 2022
Etranslate is a free and unlimited python library for transiting your texts

Etranslate is a free and unlimited python library for transiting your texts

Abolfazl Khalili 16 Sep 13, 2022
split Word file by chapter

split Word file by chapter we use the mircosoft word api to code this tool api url:https://docs.microsoft.com/zh-cn/dotnet/api/ if this tool is good f

wisdom under lemon trees 5 Nov 06, 2021
The bot creates hashtags for user's texts in Russian and English.

telegram_bot_hashtags The bot creates hashtags for user's texts in Russian and English. It is a simple bot for creating hashtags. NOTE file config.py

Yana Davydovich 2 Feb 12, 2022
Maiden & Spell community player ranking based on tournament data.

MnSRank Maiden & Spell community player ranking based on tournament data. Why? 2021 just ended and this seemed like a cool idea. Elo doesn't work well

Jonathan Lee 1 Apr 20, 2022
Python Q&A for Network Engineers

Q & A I am often asked questions about how to solve this or that problem, and I decided to post these questions and solutions here, in case it is also

Natasha Samoylenko 30 Nov 15, 2022
๐Ÿšฉ A simple and clean python banner generator - Banners

๐Ÿšฉ A simple and clean python banner generator - Banners

Kumar Vicku 12 Oct 09, 2022
A program that looks through entered text and replaces certain commands with mathematical symbols

TextToSymbolConverter A program that looks through entered text and replaces certain commands with mathematical symbols Example: Syntax: Enter text in

1 Jan 02, 2022
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022