From 98c88fdfff7fe8960a7e23f81f3ccc08ce2a539a Mon Sep 17 00:00:00 2001 From: franziska Date: Thu, 15 Apr 2021 20:31:40 +0200 Subject: [PATCH] added to READMEs --- final_Versions/README.md | 2 ++ train_model/README.md | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/final_Versions/README.md b/final_Versions/README.md index f552da8..ac8c654 100644 --- a/final_Versions/README.md +++ b/final_Versions/README.md @@ -3,6 +3,8 @@ Unzip German FoodBERT models here! They can be found under https://cloud.marquis.site/s/ZUVIIIQv6yznBj6 +For each version, the actual model is located in "output". "eval" contains the generated substitute recommendations for each version. + ## Datasets Each model has a folder "dataset" with the following files: diff --git a/train_model/README.md b/train_model/README.md index 38ab5b7..9852941 100644 --- a/train_model/README.md +++ b/train_model/README.md @@ -7,6 +7,12 @@ To create vocab.txt file, run **make_new_vocab.py** # train German FoodBERT **language_modeling** +This was executed on Google Colab with the following parameters: + +!python /content/drive/MyDrive/masterarbeit/language_modeling.py --output_dir="/content/drive/MyDrive/masterarbeit/output" --model_type=bert --model_name=bert-base-german-cased --do_train --train_data_file="/content/drive/MyDrive/masterarbeit/data/training_data.txt" --do_eval --eval_data_file="/content/drive/MyDrive/masterarbeit/data/testing_data.txt" --mlm --line_by_line --per_device_train_batch_size=8 --gradient_accumulation_steps=2 --per_device_eval_batch_size=8 --save_total_limit=5 --save_steps=10000 --logging_steps=10000 --evaluation_strategy=epoch --model_name_or_path="bert-base-german-cased" + +The exclamation mark at the beginning of the line is only needed on Google Colab and can be omitted when executing locally. Paths need to be adjusted when executing. + # Vocab Files: **bert-base-german-cased_tokenizer.json**: original bert-base-german-cased tokenizer file