Webb14 apr. 2024 · Imbalanced and noisy are two essential issues that need to be addressed in Vietnamese social media texts. Graph Convolutional Networks can address the problems of imbalanced and noisy data in... Webb20 nov. 2024 · In this work, the authors proposed an effective method to classify Vietnamese texts leveraging the TextRank algorithm and Jaccard similarity coefficient. TextRank ranks words and sentences...
Dat Quoc Nguyen - GitHub Pages
Webb5 okt. 2024 · This problem of auto-inserting accent marks fits nicely into a token classification problem (similar to, for example, ... there’s another good model pretrained on only Vietnamese text: PhoBERT. The main reason I preferred the XLM model over this was due to PhoBERT’s tokenization scheme. Webbments collected from Vietnamese social media. Secondly, a novel hate speech detection (HSD) model, which is the combination of a pre-trained PhoBERT model and a Text-CNN model, was proposed for solving tasks in Vietnamese. Thirdly, EDA techniques are applied to deal with imbalanced data to improve the performance of classifica-tion models. chip perryman
[2003.00744] PhoBERT: Pre-trained language models for …
Webbsep_token (str, optional, defaults to "") — The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering.It is also used as the last token of a sequence built with special tokens. cls_token (str, optional, defaults to "") … Webb16 nov. 2024 · PhoBert-Sentiment-Classification. Sentiment classification for Vietnamese text using PhoBert. Overview. This project shows how to finetune the recently released … Webb6 juli 2024 · Here, we employ XLM-R and PhoBERT —two recent state-of-the-art pre-trained language models that support Vietnamese—as the encoders. Table 2: Results on the test set. “Intent Acc.” and “Sent.Acc.” denote intent detection accuracy and … gra online stumble guys