Camera classification Project

[211130] Pytorch: gradient vanishing problem

홍시맛쿠키 2021. 12. 1. 12:52

#문제 

기존 classifier의 output이 다 0으로 나온다. FC단에서 값이 너무 작아져 그런 것 같다.

 

#참고

1. 매번 까먹는 conda pip 경로

/home/rmc1010/anaconda3/envs/mislgan/bin/pip install ...

 

2. Visualizing gradient using tensorboard

https://deeplizard.com/learn/video/pSexXMdruFM

 

TensorBoard with PyTorch - Visualize Deep Learning Metrics

Welcome to this neural network programming series. In this episode, we will learn how to use TensorBoard to visualize metrics of our PyTorch CNN during training process.

deeplizard.com

from torch.utils.tensorboard import SummaryWriter

tb = SummaryWriter()
...
comparator = comparator()
...
tb.add_graph(comparator)

for epoch in range(opt.epoch): 
	...
    for _, label in enumerate(train_target):
    	# training code
    
    tb.add_histogram('fc1', comparator.fc1.weight, epoch)

 

Multiple GPU 사용으로 model을 Dataparallel로 묶어놨으면 layer에 바로 접근이 안된다. 

[model].module.[layer].weight <- module을 써줄것

 

...CUDA 버전이 안맞는지 화면에 안 뜬다 

그래프 방법으로 visualizing해보자..

 

3. Visualizing using plt graphs

https://discuss.pytorch.org/t/check-gradient-flow-in-network/15063/8

 

Check gradient flow in network

This is my training iteration. I need to check the gradient. I incorporated what you suggested. I am getting this error. grad_output = grad_output.data** atributeError: ‘NoneType’ object has no attribute ‘data’** The code snippet: for i,batch in en

discuss.pytorch.org

 

4. GPU 사용중에 PLT로 그래프를 그려야하는 경우 

image.detach().cpu().numpy()

- detach(): 복사

- cpu(): GPU -> CPU

- numpy(): tensor -> numpy 

위 순서대로 하지 않으면 에러난다. 

 

확인해봤더니 gradient vanishing 문제가 맞는 것 같다. 

좀더 공부해봐야 할 것 같다...