gemma-7b-sql / nemo /nemo_log_globalrank-3_localrank-3.txt

Upload folder using huggingface_hub

bfc8406 verified about 2 years ago

16.9 kB

	[NeMo W 2024-03-18 05:25:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
	See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
	ret = run_job(

	[NeMo I 2024-03-18 05:25:14 train_gpt_sft:118]

	************ Experiment configuration *********
	[NeMo I 2024-03-18 05:25:14 train_gpt_sft:119]
	name: gemma-7b-sql-nemo
	trainer:
	num_nodes: 1
	devices: 8
	accelerator: gpu
	precision: bf16
	sft:
	max_epochs: 1
	max_steps: -1
	val_check_interval: 1000
	save_interval: ${.val_check_interval}
	limit_val_batches: 40
	gradient_clip_val: 1.0
	logger: false
	enable_checkpointing: false
	use_distributed_sampler: false
	max_time: null
	max_epochs: ${.sft.max_epochs}
	max_steps: ${.sft.max_steps}
	exp_manager:
	explicit_log_dir: models/gemma-7b-sql-nemo
	exp_dir: null
	name: ${name}
	create_wandb_logger: false
	wandb_logger_kwargs:
	project: null
	name: null
	resume_if_exists: true
	resume_ignore_no_checkpoint: true
	create_checkpoint_callback: true
	checkpoint_callback_params:
	monitor: validation_loss
	save_top_k: 5
	mode: min
	save_nemo_on_train_end: true
	filename: megatron_gpt_sft--{${.monitor}:.3f}-{step}-{consumed_samples}-{epoch}
	model_parallel_size: ${model.tensor_model_parallel_size}
	save_best_model: false
	model:
	seed: 1234
	tensor_model_parallel_size: 4
	pipeline_model_parallel_size: 1
	restore_from_path: /workspace/models/pytorch-7b-pt.nemo
	resume_from_checkpoint: null
	save_nemo_on_validation_end: true
	sync_batch_comm: false
	megatron_amp_O2: true
	encoder_seq_length: 4096
	sequence_parallel: false
	activations_checkpoint_granularity: null
	activations_checkpoint_method: null
	activations_checkpoint_num_layers: null
	activations_checkpoint_layers_per_pipeline: null
	answer_only_loss: true
	gradient_as_bucket_view: false
	seq_len_interpolation_factor: null
	use_flash_attention: null
	hidden_dropout: 0.0
	attention_dropout: 0.0
	ffn_dropout: 0.0
	peft:
	peft_scheme: none
	restore_from_path: null
	lora_tuning:
	target_modules:
	- attention_qkv
	adapter_dim: 32
	adapter_dropout: 0.0
	column_init_method: xavier
	row_init_method: zero
	layer_selection: null
	weight_tying: false
	position_embedding_strategy: null
	data:
	chat: false
	chat_prompt_tokens:
	system_turn_start: "\0"
	turn_start: "\x11"
	label_start: "\x12"
	end_of_turn: '

	'
	end_of_name: '

	'
	sample: false
	num_workers: 0
	dataloader_type: single
	train_ds:
	file_path: nsql.jsonl
	global_batch_size: 128
	micro_batch_size: 1
	shuffle: true
	memmap_workers: null
	max_seq_length: 8192
	min_seq_length: 1
	drop_last: true
	label_key: output
	add_eos: true
	add_sep: false
	add_bos: false
	truncation_field: input
	index_mapping_dir: null
	prompt_template: '{input} {output}'
	hf_dataset: false
	truncation_method: right
	validation_ds:
	file_path: nsql.jsonl
	global_batch_size: 128
	micro_batch_size: 1
	shuffle: false
	memmap_workers: ${model.data.train_ds.memmap_workers}
	max_seq_length: ${model.data.train_ds.max_seq_length}
	min_seq_length: 1
	drop_last: true
	label_key: ${model.data.train_ds.label_key}
	add_eos: ${model.data.train_ds.add_eos}
	add_sep: ${model.data.train_ds.add_sep}
	add_bos: ${model.data.train_ds.add_bos}
	truncation_field: ${model.data.train_ds.truncation_field}
	index_mapping_dir: null
	prompt_template: ${model.data.train_ds.prompt_template}
	hf_dataset: false
	truncation_method: right
	output_original_text: true
	optim:
	name: distributed_fused_adam
	lr: 5.0e-06
	weight_decay: 0.01
	betas:
	- 0.9
	- 0.98
	sched:
	name: CosineAnnealing
	warmup_steps: 10
	constant_steps: 1000
	min_lr: 9.0e-07
	bias_activation_fusion: true

	[NeMo W 2024-03-18 05:25:14 exp_manager:630] There were no checkpoints found in checkpoint_dir or no checkpoint folder at checkpoint_dir :models/gemma-7b-sql-nemo/checkpoints. Training from scratch.
	[NeMo I 2024-03-18 05:25:14 exp_manager:396] Experiments will be logged at models/gemma-7b-sql-nemo
	[NeMo I 2024-03-18 05:25:14 exp_manager:856] TensorboardLogger has been set up
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo I 2024-03-18 05:25:57 megatron_init:241] Rank 3 has data parallel group : [3, 7]
	[NeMo I 2024-03-18 05:25:57 megatron_init:247] Rank 3 has combined group of data parallel and context parallel : [3, 7]
	[NeMo I 2024-03-18 05:25:57 megatron_init:252] All data parallel group ranks with context parallel combined: [[0, 4], [1, 5], [2, 6], [3, 7]]
	[NeMo I 2024-03-18 05:25:57 megatron_init:255] Ranks 3 has data parallel rank: 0
	[NeMo I 2024-03-18 05:25:57 megatron_init:272] Rank 3 has context parallel group: [3]
	[NeMo I 2024-03-18 05:25:57 megatron_init:275] All context parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]]
	[NeMo I 2024-03-18 05:25:57 megatron_init:276] Ranks 3 has context parallel rank: 0
	[NeMo I 2024-03-18 05:25:57 megatron_init:287] Rank 3 has model parallel group: [0, 1, 2, 3]
	[NeMo I 2024-03-18 05:25:57 megatron_init:288] All model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]]
	[NeMo I 2024-03-18 05:25:57 megatron_init:298] Rank 3 has tensor model parallel group: [0, 1, 2, 3]
	[NeMo I 2024-03-18 05:25:57 megatron_init:302] All tensor model parallel group ranks: [[0, 1, 2, 3], [4, 5, 6, 7]]
	[NeMo I 2024-03-18 05:25:57 megatron_init:303] Rank 3 has tensor model parallel rank: 3
	[NeMo I 2024-03-18 05:25:57 megatron_init:317] Rank 3 has pipeline model parallel group: [3]
	[NeMo I 2024-03-18 05:25:57 megatron_init:329] Rank 3 has embedding group: [3]
	[NeMo I 2024-03-18 05:25:57 megatron_init:335] All pipeline model parallel group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]]
	[NeMo I 2024-03-18 05:25:57 megatron_init:336] Rank 3 has pipeline model parallel rank 0
	[NeMo I 2024-03-18 05:25:57 megatron_init:337] All embedding group ranks: [[0], [1], [2], [3], [4], [5], [6], [7]]
	[NeMo I 2024-03-18 05:25:57 megatron_init:338] Rank 3 has embedding rank: 0
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo I 2024-03-18 05:25:57 tokenizer_utils:191] Getting SentencePiece with model: /tmp/tmpe7phpf8c/c1f49ba929c24b7e95b7219ca958f881_tokenizer-final.model
	[NeMo I 2024-03-18 05:25:57 megatron_base_model:520] Padded vocab_size: 256000, original vocab_size: 256000, dummy tokens: 0.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: context_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: expert_model_parallel_size in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_overlap in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_ag in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_split_rs in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: tp_comm_bulk_dgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: finalize_model_grads_func in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:1078] The model: GPTSFTModel() does not have field.name: barrier_with_L1_time in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: num_moe_experts in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: bias_gelu_fusion in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: fp8_wgrad in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 megatron_base_model:492] The model: GPTSFTModel() does not have field.name: clone_scatter_output_in_embedding in its cfg. Add this key to cfg or config_mapping to make to make it configurable.
	[NeMo W 2024-03-18 05:25:57 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py:611: UserWarning: To guarantee overlapping TP and SP collectives with the backwardGEMMs, set environment variable CUDA_DEVICE_MAX_CONNECTIONS = 1
	warnings.warn(

	[NeMo I 2024-03-18 05:27:29 nlp_overrides:1100] Model GPTSFTModel was successfully restored from /workspace/models/pytorch-7b-pt.nemo.
	[NeMo I 2024-03-18 05:27:29 train_script_utils:169] Running full finetuning since no peft scheme is given.
	\| Name \| Type \| Params
	----------------------------------------
	0 \| model \| Float16Module \| 2.1 B
	----------------------------------------
	2.1 B Trainable params
	0 Non-trainable params
	2.1 B Total params
	8,538.206 Total estimated model params size (MB)
	[NeMo I 2024-03-18 05:27:29 text_memmap_dataset:116] Building data files
	[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:158] Loading data files
	[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:249] Loading nsql.jsonl
	[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000700
	[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:165] Computing global indices
	[NeMo I 2024-03-18 05:27:31 text_memmap_dataset:116] Building data files
	[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:158] Loading data files
	[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:249] Loading nsql.jsonl
	[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:161] Time loading 1 mem-mapped files: 0:00:00.000550
	[NeMo I 2024-03-18 05:27:34 text_memmap_dataset:165] Computing global indices
	[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0
	[NeMo W 2024-03-18 05:27:34 experimental:26] `<class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'>` is experimental and not ready for production yet. Use at your own risk.
	[NeMo I 2024-03-18 05:27:34 builders:327] Building dataloader with consumed samples: 0
	[NeMo W 2024-03-18 05:27:34 experimental:26] `<class 'nemo.collections.nlp.data.language_modeling.megatron.megatron_batch_samplers.MegatronPretrainingRandomBatchSampler'>` is experimental and not ready for production yet. Use at your own risk.
	[NeMo I 2024-03-18 05:27:40 megatron_gpt_model:1296] Pipeline model parallel rank: 0, Tensor model parallel rank: 3, Number of model parameters on device: 2.13e+09. Total number of model parameters: 8.54e+09.
	[NeMo I 2024-03-18 05:27:40 modelPT:723] Optimizer config = MegatronDistributedFusedAdam (
	Parameter Group 0
	betas: [0.9, 0.98]
	bias_correction: True
	eps: 1e-08
	lr: 5e-06
	weight_decay: 0.01

	Parameter Group 1
	betas: [0.9, 0.98]
	bias_correction: True
	eps: 1e-08
	lr: 5e-06
	weight_decay: 0.0
	)
	[NeMo I 2024-03-18 05:27:40 lr_scheduler:915] Scheduler "<nemo.core.optim.lr_scheduler.CosineAnnealing object at 0x7ec8c934d8d0>"
	will be used during training (effective maximum steps = 613) -
	Parameters :
	(warmup_steps: 10
	constant_steps: 1000
	min_lr: 9.0e-07
	max_steps: 613
	)