
Reputation
Badges 1
13 × Eureka!I use the implementation of yolov8:
And at the menu it says it is at iteration 2 even though the console log in the webpage says it is at epoch 8
@<1523701118159294464:profile|ExasperatedCrab78> Sure! Here is my train file:
from ultralytics import YOLO
# Load a model
model = YOLO(model="yolov8m.pt") # load a pretrained model (recommended for training)
# Train the model
model.train(
data="data.yaml",
epochs=200,
imgsz=640,
label_smoothing=0.1,
shear=0.01,
perspective=0.0001,
mosaic=0.5,
mixup=0.1,
)
and here is from the source code for yolov8
# Ultralytics YOLO :rocket:, GPL-3.0 lic...
@<1523701070390366208:profile|CostlyOstrich36> Do you know what potentially is the problem?
On my local machine it works also actually, there I am using python 3.9. The issue happens when training on the GPU Cluster for my university, there I am using python 3.8.2. I will try to create a new virtual environment with python 3.10.4 and see if it works then 🙂
Thank you for your assistance! These are the plugins installed in the environment:
absl-py==1.4.0
aiofiles==22.1.0
aiosqlite==0.18.0
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
attrs==22.2.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
bleach==6.0.0
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.1.0
clearml==1.10.3
cmake==3.26.1
comm==0.1.3
contourpy==1.0.7
cycler==0.11.0
debugpy==1.6.7
decorator==5.1.1
defusedx...
The first image shows how it should look like, however in the second image the model is actually training on the 7th epoch but the scalars are not updated, they are just stuck on iteration 0
With tensorboard I get these plots:
However, I get these warnings:
TensorFlow installation not found - running with reduced feature set.
/cluster/home/project_tdt4265/.venv/lib/python3.8/site-packages/tensorboard_data_server/bin/server: /lib64/libc.so.6: version `GLIBC_2.29' not found (required by /cluster/home/project_tdt4265/.venv/lib/python3.8/site-packages/tensorboard_data_server/bin/server)
/cluster/home/haakobh/project_tdt4265/.venv/lib/python3.8/site-packages/tensorboard_data_server/bin/server: /lib64/libc.so.6: version...
@<1523701118159294464:profile|ExasperatedCrab78> Hey, I deleted the virtual environment and created a new one with python 3.9 and the necessary dependencies and now it seems to work! 😄 Thanks for your help! Maybe there were some packages interrupting or something with the python 3.8 version