# Vocab To create vocab.txt file, run **make_new_vocab.py** # Prep dataset **prep_dataset_training**: Format and split dataset, so it can be used for training. Adapt which dataset version to make! # train German FoodBERT **language_modeling** # Vocab Files: **bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file **bert_vocab.txt**: original bert-base-german-cased vocab **used_ingredients**: all ingredients in dataset **vocab.txt**: German FoodBERT vocabulary