ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

Overview

AliceMind

AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD (Machine IntelligeNce of Damo) Lab.

The family of AliceMind:

  • Language understanding model: StructBERT (ICLR 2020)
  • Generative language model: PALM (EMNLP 2020)
  • Cross-lingual language model: VECO (ACL 2021)
  • Cross-modal language model: StructVBERT (CVPR 2020 VQA Challenge Runner-up)
  • Structural language model: StructuralLM (ACL 2021)
  • Chinese language understanding model with multi-granularity inputs: LatticeBERT (NAACL 2021)
  • Pre-training table model: SDCUP (Under Review)

News

  • March, 2021: AliceMind released!
  • May, 2021: VECO and StructuralLM were accepted by ACL 2021.
  • September, 2021: The first Chinese pre-training table model SDCUP released!

Models

  • StructBERT (March 15, 2021): pre-trained models for natural language understanding (NLU). We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. "StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding" (ICLR 2020)

  • PALM (March 15, 2021): pre-trained models for natural language generation (NLG). We propose a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus, specifically designed for generating new text conditioned on context. It achieves new SOTA results in several downstream tasks. "PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation" (EMNLP 2020)

  • VECO v0 (March 15, 2021): pre-trained models for cross-lingual (x) natural language understanding (x-NLU) and generation (x-NLG). VECO (v0) achieves the new SOTA results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1~2 BLEU. “VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation" (ACL 2021)

  • StructVBERT (March 15, 2021): pre-trained models for vision-language understanding. We propose a new single-stream visual-linguistic pre-training scheme by leveraging multi-stage progressive pre-training and multi-task learning. StructVBERT obtained the 2020 VQA Challenge Runner-up award, and SOTA result on VQA 2020 public Test-standard benchmark (June 2020). "Talk Slides" (CVPR 2020 VQA Challenge Runner-up).

  • StructuralLM (March 15, 2021): pre-trained models for document-image understanding. We propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks. "StructuralLM: Structural Pre-training for Form Understanding" (ACL 2021)

  • LatticeBERT (March 15, 2021): we propose a novel pre-training paradigm for Chinese — Lattice-BERT which explicitly incorporates word representations with those of characters, thus can model a sentence in a multi-granularity manner. "Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models" (NAACL 2021)

  • SDCUP (September 6, 2021): pre-trained models for table understanding. We design a schema dependency pre-training objective to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to alleviate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. The experiment results on SQUALL and Spider demonstrate the effectiveness of our pre-training objective and curriculum in comparison to a variety of baselines. "SDCUP: Schema Dependency Enhanced Curriculum Pre-Training for Table Semantic Parsing" (Under Review)

Contact Information

AliceMind Official Website: https://nlp.aliyun.com/portal#/alice

AliceMind Open Platform: https://alicemind.aliyuncs.com

Please submit a GitHub issue if you have want help or have issues using ALICE.

For more information, you can join the AliceMind Users Group on DingTalk to contact us. The number of the DingTalk group is 35738533.

For other business communications, please contact [email protected]

License

AliceMind is released under the Apache 2.0 license.

Copyright 1999-2020 Alibaba Group Holding Ltd.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at the following link.

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Comments
  • fail to load structbert.en.large while trying to reproduce the result of GLUE

    fail to load structbert.en.large while trying to reproduce the result of GLUE

    Hi, I downloaded the structbert.en.large through the given link (https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model), but the below error occured during running.

    RuntimeError: Error(s) in loading state_dict for BertForSequenceClassificationMultiTask: Missing key(s) in state_dict: "classifier.0.weight", "classifier.0.bias". Unexpected key(s) in state_dict: "lm_bias", "linear.weight", "linear.bias", "LayerNorm.gamma", "LayerNorm.beta", "classifier.weight", "classifier.bias".

    Do you have any idea why this happen? Thank you very much.

    opened by LemX1 10
  • Will you share pre-training code of StructBERT

    Will you share pre-training code of StructBERT

    Hi, I'm trying to code StructBERT from scratch. But I couldn't find any code examples pre-training about StructBERT. In the repository I've found codes for fine-tuning based on various datasets.

    Are you planning to share pre-training model's code for StructBert such as BertForPretraining in Transformers library ?

    Thanks in advance 🙂

    opened by kaansonmezoz 5
  • What’s the CLEVER?

    What’s the CLEVER?

    I found StructBERT + CLEVER in the GLUE benchmark. Is that a technology about the pertaining or fine-tuning? Can you provide more information about CLEVER? Thanks a lot.

    image

    opened by Akeepers 2
  • Hyper parameters for ChildTuning

    Hyper parameters for ChildTuning

    Thanks a lot for all the details you provide in Appendix B for reproducibility! However, I still encounter some difficulties in reproducing the experiment. I noticed that you apply grid search. Could you please provide the specific , and learning rate for each task?

    opened by jiangllan 2
  • SOFA-models

    SOFA-models

    (https://github.com/alibaba/AliceMind/tree/main/SOFA/sofa/models)/init.py / 为什么这里只有三个,roberta没有init呢? 同样README.md里面也没有roberta的相关结果,所以roberta暂时还不能实现是么? 那么如果我需要加入其他模型(如GPT-2)需要如何做呢?

    opened by Wangxh-07 1
  • How to reproduce the result of StructBert on SST-B?

    How to reproduce the result of StructBert on SST-B?

    Hi, I can not reproduce the result reported in the paper by the code example:

    python run_classifier_multi_task.py \
      --task_name STS-B \
      --do_train \
      --do_eval \
      --do_test \
      --lr_decay_factor 1 \
      --dropout 0.1 \
      --do_lower_case \
      --detach_index -1 \
      --core_encoder bert \
      --data_dir data \
      --vocab_file config/vocab.txt \
      --bert_config_file config/large_bert_config.json \
      --init_checkpoint model/en_model \
      --max_seq_length 128 \
      --train_batch_size 32 \
      --learning_rate 2e-5 \
      --num_train_epochs 3 \
      --fast_train \
      --gradient_accumulation_steps 1 \
      --output_dir output \
      --amp_type O1
    

    Are there any hyper-params I set wrong?

    opened by sangyx 1
  • fail to download pretrained model and data

    fail to download pretrained model and data

    Hi, Author

    Thanks for your great contribution.

    But I can't download the pretrained model and data whose link starts with http://119608.oss-cn-hangzhou-zmf.aliyuncs.com

    For example, the links as follows can not be downloaded: http://119608.oss-cn-hangzhou-zmf.aliyuncs.com/structvbert/pretrained_model.tar.gz http://119608.oss-cn-hangzhou-zmf.aliyuncs.com/structvbert/data.tar.gz

    Any advice is appreciated! Thanks.

    opened by cssddnnc9527 1
  • will you consider push your work to huggingface model hub?

    will you consider push your work to huggingface model hub?

    It's a bit suffering to use your model like StructBert.

    There are some minor code modifications compared with huggingface's bert.

    So i won't say it's safe to directly use huggingface's from_pretrained api on your released model checkpoint, while it could be inconvenient to use your modeling code where the BertModel are not inherited with huggingface's PreTrainedModel.

    Any advice?

    opened by tangzhy 1
  • structVbert model

    structVbert model

    Hi, Link for downloading structvbert.en.base model does not work (http://119608.oss-cn-hangzhou-zmf.aliyuncs.com/structvbert/pretrained_model.tar.gz). Could you please fix?

    Thank you

    opened by iukorolev 1
  • Experimental configuration of Child-Tuning

    Experimental configuration of Child-Tuning

    Hi, I want to reproduce the experiment of Child-Tuning, I saw "We report the averaged results over 10 random seeds" In the paper 3.2, could you display the seed sequence? Thank you,looking forward to your reply.

    opened by chaochen99 1
  • dismatch between the given base checkpoint and description in origin paper

    dismatch between the given base checkpoint and description in origin paper

    In the origin paper, PALM-base has 6 encoder layers and 6 decoder layers. However, when I run the given code with the given base checkpoint, it prints 12 encoder layers and 12 decoder layers. Am I wrong?

    opened by 311dada 1
  • visual grounding finetune咨询

    visual grounding finetune咨询

    hello 我有在finetune 模型时,发现模型存在_IncompatibleKeys(主要是fusion_encoder的所有层), 这个是对训练否有影响。 我按照操作流程进行训练,在epoch=10自动停止,并第十轮结果如下: {'train_lr1': 1.9676295349747303e-05, 'train_lr2': 4.931851652578033e-06, 'train_loss_seq': 0.08181070101696078, 'miou': 0.6400532953981678, 'accu': 0.7921358685619346, 'epoch': 10}

    这个结果与论文的accu值存在差距,两个数值是否一致啊?

    opened by xzdong-2019 0
  • 您好!我对mPLUG中的部分代码有些不太理解,希望获得您的帮助。

    您好!我对mPLUG中的部分代码有些不太理解,希望获得您的帮助。

    image 这是图文检索里的代码,我想问一下self.distill是代表什么意思,做蒸馏吗?还有,self._momentum_update()、self._dequeue_and_enqueue(image_feat_m, text_feat_m, idx)这好像动量对比学习里的代码,我也不明白这是什么意思。如果有可能的话,请您尽可能地告诉我这段代码的含义。十分感谢!

    opened by luzhuflower 0
  • Question about the Chinese version of mPLUG

    Question about the Chinese version of mPLUG

    Hi, I notice that there is a Chinese version of mPLUG in modelscope, could you please tell me some details about this model? Such as pretrained dataset, many thanks

    opened by ZihaoZheng98 1
  • Pretrained weights for downstream tasks for mPLUG?

    Pretrained weights for downstream tasks for mPLUG?

    Currently, only the pretrained weights before fine-tuning on downstream tasks for mPLUG are released. Is it possible to release the pretrained weights for downstream tasks after fine-tuning, like visual question answering and image captioning?

    Thanks!

    opened by qiaomu-miao 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 1
Releases(v1.0)
Owner
Alibaba
Alibaba Open Source
Alibaba
An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Ultra_Fast_Lane_Detection_TensorRT An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI to accelerate. our model support for in

steven.yan 121 Dec 27, 2022
Problem: Given a nepali news find the category of the news

Classification of category of nepali news catorgory using different algorithms Problem: Multiclass Classification Approaches: TFIDF for vectorization

pudasainishushant 2 Jan 09, 2022
CoSENT、STS、SentenceBERT

CoSENT_Pytorch 比Sentence-BERT更有效的句向量方案

102 Dec 07, 2022
Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

Introduction The goal of this analysis is to find a model that fits the observed cumulative cases of COVID-19 in the US, starting in Mid-July 2021 and

Alexander Keeney 1 Jan 05, 2022
Conditional probing: measuring usable information beyond a baseline

Conditional probing: measuring usable information beyond a baseline

John Hewitt 20 Dec 15, 2022
A simple chatbot based on chatterbot that you can use for anything has basic features

Chatbotium A simple chatbot based on chatterbot that you can use for anything has basic features. I have some errors Read the paragraph below: Known b

Herman 1 Feb 16, 2022
Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

eBook Reader Dictionaries Finally, decent dictionaries based on Wiktionary for your beloved eBook reader. Dictionaries Catalan 🚧 Ελληνικά (help welco

Mickaël Schoentgen 163 Dec 31, 2022
Azure Text-to-speech service for Home Assistant

Azure Text-to-speech service for Home Assistant The Azure text-to-speech platform uses online Azure Text-to-Speech cognitive service to read a text wi

Yassine Selmi 2 Aug 06, 2022
Arabic speech recognition, classification and text-to-speech.

klaam Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows tr

ARBML 177 Dec 27, 2022
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training Code and model from our AAAI 2021 paper

Amazon Web Services - Labs 83 Jan 09, 2023
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

Chau 65 Dec 30, 2022
A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

RunMany Intro | Installation | VSCode Extension | Usage | Syntax | Settings | About A tool to run many programs written in many languages from one fil

6 May 22, 2022
中文空间语义理解评测

中文空间语义理解评测 最新消息 2021-04-10 🚩 排行榜发布: Leaderboard 2021-04-05 基线系统发布: SpaCE2021-Baseline 2021-04-05 开放数据提交: 提交结果 2021-04-01 开放报名: 我要报名 2021-04-01 数据集 pa

40 Jan 04, 2023
This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The baseline code is for EDA: Easy Data Augmentation techniques for boosting performance on text classification tasks

Akbar Karimi 81 Dec 09, 2022
EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

BioLAMA BioLAMA is biomedical factual knowledge triples for probing biomedical LMs. The triples are collected and pre-processed from three sources: CT

DMIS Laboratory - Korea University 41 Nov 18, 2022
Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Workshop: Enterprise-Scale NLP with Hugging Face & Amazon SageMaker Earlier this year we announced a strategic collaboration with Amazon to make it ea

Philipp Schmid 161 Dec 16, 2022
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

뉴스 도메인 질의응답 시스템 본 프로젝트는 뉴스기사에 대한 질의응답 서비스 를 제공하기 위해서 진행한 프로젝트입니다. 약 3개월간 ( 21. 03 ~ 21. 05 ) 진행하였으며 Transformer 아키텍쳐 기반의 Encoder를 사용하여 한국어 질의응답 데이터셋으로

TaegyeongEo 4 Jul 08, 2022
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022