added to README files, added full dataset versions to data

This commit is contained in:
2021-04-15 20:19:09 +02:00
parent cf40ad15fb
commit 1ea0677029
9 changed files with 61 additions and 543 deletions

View File

@@ -1,4 +1,4 @@
#Vocab
# Vocab
To create vocab.txt file, run **make_new_vocab.py**
# Prep dataset
@@ -8,7 +8,7 @@ To create vocab.txt file, run **make_new_vocab.py**
**language_modeling**
#Vocab Files:
# Vocab Files:
**bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file
**bert_vocab.txt**: original bert-base-german-cased vocab
**used_ingredients**: all ingredients in dataset