
Reputation
Badges 1
13 × Eureka!Thank you for your assistance! These are the plugins installed in the environment:
absl-py==1.4.0
aiofiles==22.1.0
aiosqlite==0.18.0
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
attrs==22.2.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
bleach==6.0.0
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.1.0
clearml==1.10.3
cmake==3.26.1
comm==0.1.3
contourpy==1.0.7
cycler==0.11.0
debugpy==1.6.7
decorator==5.1.1
defusedx...
The first image shows how it should look like, however in the second image the model is actually training on the 7th epoch but the scalars are not updated, they are just stuck on iteration 0
@<1523701070390366208:profile|CostlyOstrich36> Do you know what potentially is the problem?
@<1523701118159294464:profile|ExasperatedCrab78> Sure! Here is my train file:
from ultralytics import YOLO
# Load a model
model = YOLO(model="yolov8m.pt") # load a pretrained model (recommended for training)
# Train the model
model.train(
data="data.yaml",
epochs=200,
imgsz=640,
label_smoothing=0.1,
shear=0.01,
perspective=0.0001,
mosaic=0.5,
mixup=0.1,
)
and here is from the source code for yolov8
# Ultralytics YOLO :rocket:, GPL-3.0 lic...
With tensorboard I get these plots:
@<1523701118159294464:profile|ExasperatedCrab78> Hey, I deleted the virtual environment and created a new one with python 3.9 and the necessary dependencies and now it seems to work! 😄 Thanks for your help! Maybe there were some packages interrupting or something with the python 3.8 version
However, I get these warnings:
TensorFlow installation not found - running with reduced feature set.
/cluster/home/project_tdt4265/.venv/lib/python3.8/site-packages/tensorboard_data_server/bin/server: /lib64/libc.so.6: version `GLIBC_2.29' not found (required by /cluster/home/project_tdt4265/.venv/lib/python3.8/site-packages/tensorboard_data_server/bin/server)
/cluster/home/haakobh/project_tdt4265/.venv/lib/python3.8/site-packages/tensorboard_data_server/bin/server: /lib64/libc.so.6: version...
And at the menu it says it is at iteration 2 even though the console log in the webpage says it is at epoch 8
I use the implementation of yolov8:
On my local machine it works also actually, there I am using python 3.9. The issue happens when training on the GPU Cluster for my university, there I am using python 3.8.2. I will try to create a new virtual environment with python 3.10.4 and see if it works then 🙂