Files
MasterarbeitCode/train_model

Vocab

To create vocab.txt file, run make_new_vocab.py

Prep dataset

prep_dataset_training: Format and split dataset, so it can be used for training. Adapt which dataset version to make!

train German FoodBERT

language_modeling

Vocab Files:

bert-base-german-cased_tokenizer.json: original bert-base-german-cased tokenizer file bert_vocab.txt: original bert-base-german-cased vocab used_ingredients: all ingredients in dataset vocab.txt: German FoodBERT vocabulary