Python port of Google's libphonenumber

Overview

phonenumbers Python Library

Coverage Status

This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase, with no 2to3 conversion needed).

Original Java code is Copyright (C) 2009-2015 The Libphonenumber Authors.

Release HISTORY, derived from upstream release notes.

Installation

Install using pip with:

pip install phonenumbers

Example Usage

The main object that the library deals with is a PhoneNumber object. You can create this from a string representing a phone number using the parse function, but you also need to specify the country that the phone number is being dialled from (unless the number is in E.164 format, which is globally unique).

>>> import phonenumbers
>>> x = phonenumbers.parse("+442083661177", None)
>>> print(x)
Country Code: 44 National Number: 2083661177 Leading Zero: False
>>> type(x)
<class 'phonenumbers.phonenumber.PhoneNumber'>
>>> y = phonenumbers.parse("020 8366 1177", "GB")
>>> print(y)
Country Code: 44 National Number: 2083661177 Leading Zero: False
>>> x == y
True
>>> z = phonenumbers.parse("00 1 650 253 2222", "GB")  # as dialled from GB, not a GB number
>>> print(z)
Country Code: 1 National Number: 6502532222 Leading Zero(s): False

The PhoneNumber object that parse produces typically still needs to be validated, to check whether it's a possible number (e.g. it has the right number of digits) or a valid number (e.g. it's in an assigned exchange).

>>> z = phonenumbers.parse("+120012301", None)
>>> print(z)
Country Code: 1 National Number: 20012301 Leading Zero: False
>>> phonenumbers.is_possible_number(z)  # too few digits for USA
False
>>> phonenumbers.is_valid_number(z)
False
>>> z = phonenumbers.parse("+12001230101", None)
>>> print(z)
Country Code: 1 National Number: 2001230101 Leading Zero: False
>>> phonenumbers.is_possible_number(z)
True
>>> phonenumbers.is_valid_number(z)  # NPA 200 not used
False

The parse function will also fail completely (with a NumberParseException) on inputs that cannot be uniquely parsed, or that can't possibly be phone numbers.

>>> z = phonenumbers.parse("02081234567", None)  # no region, no + => unparseable
Traceback (most recent call last):
  File "phonenumbers/phonenumberutil.py", line 2350, in parse
    "Missing or invalid default region.")
phonenumbers.phonenumberutil.NumberParseException: (0) Missing or invalid default region.
>>> z = phonenumbers.parse("gibberish", None)
Traceback (most recent call last):
  File "phonenumbers/phonenumberutil.py", line 2344, in parse
    "The string supplied did not seem to be a phone number.")
phonenumbers.phonenumberutil.NumberParseException: (1) The string supplied did not seem to be a phone number.

Once you've got a phone number, a common task is to format it in a standardized format. There are a few formats available (under PhoneNumberFormat), and the format_number function does the formatting.

>>> phonenumbers.format_number(x, phonenumbers.PhoneNumberFormat.NATIONAL)
'020 8366 1177'
>>> phonenumbers.format_number(x, phonenumbers.PhoneNumberFormat.INTERNATIONAL)
'+44 20 8366 1177'
>>> phonenumbers.format_number(x, phonenumbers.PhoneNumberFormat.E164)
'+442083661177'

If your application has a UI that allows the user to type in a phone number, it's nice to get the formatting applied as the user types. The AsYouTypeFormatter object allows this.

>>> formatter = phonenumbers.AsYouTypeFormatter("US")
>>> formatter.input_digit("6")
'6'
>>> formatter.input_digit("5")
'65'
>>> formatter.input_digit("0")
'650'
>>> formatter.input_digit("2")
'650 2'
>>> formatter.input_digit("5")
'650 25'
>>> formatter.input_digit("3")
'650 253'
>>> formatter.input_digit("2")
'650-2532'
>>> formatter.input_digit("2")
'(650) 253-22'
>>> formatter.input_digit("2")
'(650) 253-222'
>>> formatter.input_digit("2")
'(650) 253-2222'

Sometimes, you've got a larger block of text that may or may not have some phone numbers inside it. For this, the PhoneNumberMatcher object provides the relevant functionality; you can iterate over it to retrieve a sequence of PhoneNumberMatch objects. Each of these match objects holds a PhoneNumber object together with information about where the match occurred in the original string.

>>> text = "Call me at 510-748-8230 if it's before 9:30, or on 703-4800500 after 10am."
>>> for match in phonenumbers.PhoneNumberMatcher(text, "US"):
...     print(match)
...
PhoneNumberMatch [11,23) 510-748-8230
PhoneNumberMatch [51,62) 703-4800500
>>> for match in phonenumbers.PhoneNumberMatcher(text, "US"):
...     print(phonenumbers.format_number(match.number, phonenumbers.PhoneNumberFormat.E164))
...
+15107488230
+17034800500

You might want to get some information about the location that corresponds to a phone number. The geocoder.area_description_for_number does this, when possible.

>>> from phonenumbers import geocoder
>>> ch_number = phonenumbers.parse("0431234567", "CH")
>>> geocoder.description_for_number(ch_number, "de")
'Zürich'
>>> geocoder.description_for_number(ch_number, "en")
'Zurich'
>>> geocoder.description_for_number(ch_number, "fr")
'Zurich'
>>> geocoder.description_for_number(ch_number, "it")
'Zurigo'

For mobile numbers in some countries, you can also find out information about which carrier originally owned a phone number.

>>> from phonenumbers import carrier
>>> ro_number = phonenumbers.parse("+40721234567", "RO")
>>> carrier.name_for_number(ro_number, "en")
'Vodafone'

You might also be able to retrieve a list of time zone names that the number potentially belongs to.

>>> from phonenumbers import timezone
>>> gb_number = phonenumbers.parse("+447986123456", "GB")
>>> timezone.time_zones_for_number(gb_number)
('Atlantic/Reykjavik', 'Europe/London')

For more information about the other functionality available from the library, look in the unit tests or in the original libphonenumber project.

Memory Usage

The library includes a lot of metadata, potentially giving a significant memory overhead. There are two mechanisms for dealing with this.

  • The normal metadata for the core functionality of the library is loaded on-demand, on a region-by-region basis (i.e. the metadata for a region is only loaded on the first time it is needed).
  • Metadata for extended functionality is held in separate packages, which therefore need to be explicitly loaded separately. This affects:
    • The geocoding metadata, which is held in phonenumbers.geocoder and used by the geocoding functions (geocoder.description_for_number, geocoder.description_for_valid_number or geocoder.country_name_for_number).
    • The carrier metadata, which is held in phonenumbers.carrier and used by the mapping functions (carrier.name_for_number or carrier.name_for_valid_number).
    • The timezone metadata, which is held in phonenumbers.timezone and used by the timezone functions (time_zones_for_number or time_zones_for_geographical_number).

The phonenumberslite version of the library does not include the geocoder, carrier and timezone packages, which can be useful if you have problems installing the main phonenumbers library due to space/memory limitations.

If you need to ensure that the metadata memory use is accounted for at start of day (i.e. that a subsequent on-demand load of metadata will not cause a pause or memory exhaustion):

  • Force-load the normal metadata by calling phonenumbers.PhoneMetadata.load_all().
  • Force-load the extended metadata by importing the appropriate packages (phonenumbers.geocoder, phonenumbers.carrier, phonenumbers.timezone).

The phonenumberslite version of the package does not include the geocoding, carrier and timezone metadata, which can be useful if you have problems installing the main phonenumbers package due to space/memory limitations.

Project Layout

  • The python/ directory holds the Python code.
  • The resources/ directory is a copy of the resources/ directory from libphonenumber. This is not needed to run the Python code, but is needed when upstream changes to the master metadata need to be incorporated.
  • The tools/ directory holds the tools that are used to process upstream changes to the master metadata.
Owner
David Drysdale
David Drysdale
The first online catalogue for Arabic NLP datasets.

Masader The first online catalogue for Arabic NLP datasets. This catalogue contains 200 datasets with more than 25 metadata annotations for each datas

ARBML 94 Dec 26, 2022
[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Cambridge Language Technology Lab 61 Dec 10, 2022
Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Reconstruct handwritten characters from brains using GANs Example code for the paper "Generative adversarial networks for reconstructing natural image

K. Seeliger 2 May 17, 2022
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
Protein Language Model

ProteinLM We pretrain protein language model based on Megatron-LM framework, and then evaluate the pretrained model results on TAPE (Tasks Assessing P

THUDM 77 Dec 27, 2022
LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating (Dataset) The dataset is from Amazon Review Data (2018)

Immanuvel Prathap S 1 Jan 16, 2022
Code for Text Prior Guided Scene Text Image Super-Resolution

Code for Text Prior Guided Scene Text Image Super-Resolution

82 Dec 26, 2022
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The train

Brando Koch 11 Mar 28, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022
A telegram bot to translate 100+ Languages

🔥 GOOGLE TRANSLATER 🔥 The owner would not be responsible for any kind of bans due to the bot. • ⚡ INSTALLING ⚡ • • 🔰 Deploy To Railway 🔰 • • ✅ OFF

Aɴᴋɪᴛ Kᴜᴍᴀʀ 5 Dec 20, 2021
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation This repository contains the implementation of the following paper: Live Speech

OldSix 575 Dec 31, 2022
原神抽卡记录数据集-Genshin Impact gacha data

提要 持续收集原神抽卡记录中 可以使用抽卡记录导出工具导出抽卡记录的json,将json文件发送至[email protected],我会在清除个人信息后

117 Dec 27, 2022
A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

keyword_searching Steps to use this Python scripts: (1)Paste this script into the file folder containing the PDF files you need to search from; (2)Thi

2 Nov 11, 2022
Black for Python docstrings and reStructuredText (rst).

Style-Doc Style-Doc is Black for Python docstrings and reStructuredText (rst). It can be used to format docstrings (Google docstring format) in Python

Telekom Open Source Software 13 Oct 24, 2022
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 04, 2022
TPlinker for NER 中文/英文命名实体识别

本项目是参考 TPLinker 中HandshakingTagging思想,将TPLinker由原来的关系抽取(RE)模型修改为命名实体识别(NER)模型。

GodK 113 Dec 28, 2022
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
A NLP program: tokenize method, PoS Tagging with deep learning

IRIS NLP SYSTEM A NLP program: tokenize method, PoS Tagging with deep learning Report Bug · Request Feature Table of Contents About The Project Built

Zakaria 7 Dec 13, 2022
An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

David Thorne 0 Feb 06, 2022
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022