Files
MasterarbeitCode/train_model/README.md
2021-04-11 23:28:41 +02:00

15 lines
498 B
Markdown

#Vocab
To create vocab.txt file, run **make_new_vocab.py**
# Prep dataset
**prep_dataset_training**: Format and split dataset, so it can be used for training. Adapt which dataset version to make!
# train German FoodBERT
**language_modeling**
#Vocab Files:
**bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file
**bert_vocab.txt**: original bert-base-german-cased vocab
**used_ingredients**: all ingredients in dataset
**vocab.txt**: German FoodBERT vocabulary