[211122] Pytorch: data sampling
#문제
Data imbalance 문제 때문에 훈련 결과가 좋지 않다.
#참고
1. Stratified sampling
How to enable the dataloader to sample from each class with equal probability
The dataloader utility in torch (courtesy of Soumith Chintala) allowed one to sample from each class with equal probability. I was wondering, if there is a straightforward approach to enable the same in pytorch dataloaders.
discuss.pytorch.org
모든 class가 batch 내에 같은 확률로 나타나도록 sampling.
sampling하려는 dataset의 label vector를 parameter로 주어야 함
2. Batches with same label
https://stackoverflow.com/questions/60725571/batches-of-points-with-the-same-label-on-pytorch
Batches of points with the same label on Pytorch
I want to train a neural network using gradient descent on batches that contain N training points each. I would like these batches to only contain points with the same label, instead of being rando...
stackoverflow.com
train과 val set의 index를 나눠놨기 때문에 전체가 아니라 train / val set에서 각 class의 index를 찾아야 함
각 class별로 dataloader 만들어서 dataloader list 만들기
3. Merging dataloaders
How to merge two torch.utils.data dataloaders with a single operation
I have two dataloaders and I would like to merge them without redefining the datasets, in my case train_dataset and val_dataset. train_loader = DataLoader(train_dataset, batch_size = 512, drop_last...
stackoverflow.com
dataloader list를 하나의 dataloader로 merge
#다음문제
Data balance를 맞춰도 결과가 좋지 않다. Visdom으로 input을 확인해봤는데 사진 자체에는 문제가 없다.
Learning rate / decay rate 을 바꿔봐도 별다른 차이가 없다.
다른 model을 찾거나 다른 방법을 찾아야할 듯 하다.