Reputation
Badges 1
25 × Eureka!If you are using the latest RC:pip install clearml==0.17.5rc5You can pass True it will use the "files_server" as configured in your clearml.conf
I used the http link as a filler to point to the files_server.
Make sense ?
Hi @<1651395720067944448:profile|GiddyHedgehong81>
However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:
Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None
clearml-data add --files my_folder_with_files
Then from code: [Non...
but then the error occurs, after the training und the validating where succesfuly completed
It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)
Seems like credentials error
Do you have everything setup correctly in your ~/clearml.conf ?
I think so (you can also comment out the Task.init() just to verify this is not a clearml issue)
Ohh StraightCoral86 did you check cleaml-task ? This is exactly what it does
(this is the CLI, from code you basically call Task.create & Task.enqueue)
Will this solve it ?
WackyRabbit7 I do 'pkill -f trains' but it's the same... If you need to debug and test run with --foreground and just hit ctrl-c to end the process (it will never switch to background...). Helps?
You need trains-server support, so if trains v0.15 is working with older backend it will revert to "training" type
LovelyHamster1 verified, this is a UI bug with old limitation enforced.
I will make sure they know about it, it should be fixed for the upcoming release 🙂
Hi LovelyHamster1
As you noted, passing overrides in Args/overrides , for example ['training.max_epochs=1000']
should work when running with the agent.
Could you verify with the latest RC, there was a fix to support the latest hydra versionpip install clearml==0.17.5rc5
I think the only way is using the API, with task.query_tasks and filter, would that have helped?
Sure 🙂
BTW: clearml-agent will mount your host .ssh into the docker to /root/.ssh by default.
So no need to do that manually
Like, if you google "dagster and clearml" or "prefect and clearml" or "airflow and clearml" -- I don't find any blogs written by people talking about how they use both of them together.
Oh yeah I see your point, I think the main reason is a lot of the dag capabilities and the orchestration is already folded into clearml's capabilities (i.e. pipelines + clearml-agent etc.)
That said I'm pretty sure I have seen just adding Task.init into each of a the framework above steps, in order to t...
(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)
@<1610083503607648256:profile|DiminutiveToad80> try to turn on:
None
enable_git_ask_pass: true
I'll try to find the link...
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
But there is no need for 2FA for cloning repo
My clearml-server server crashed for some reason
😞 No worries
Seems like something is not working with the server, i.e. it cannot connect with one of the dockers.
May I suggest to carefully go through all the steps here, make sure nothing was missed
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md
Especially number (4)
"erasing" all the packages that had been set in the base task I'm cloning from. I
Set is not add, if you are calling set_packages, you are overwriting all of them with this single call.
You can however do:
task_data = task.export_task()
requirements = task_data["script"]["requirements"]["pip"]
requirements += "new packages"
task.set_packages(requirements)
I guess we should have get_requirements ?!
Okay this is very close to what the agent is building:
Could you start a new conda env,
then install cudatoolkit=11.1
then run:
conda env update -p <conda_env_path_here> --file the_env_yaml.yml
What would be the best way to get all the models trained using a certain Task, I know we can use query_models to filter models based on Project and Task, but is it the best way?
On the Task object itself you have all the models.Task.get_task(task_id='aabb').models['output']
Glad to hear that, indeed an odd issue... is this reproducible i.e. can we get something to fix it?
Exactly! nice 🎉
Hi BoredGoat1
from this warning: " TRAINS Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring " It seems trains failed to load the nvidia .so dll that does the GPU monitoring:
This is based on pynvml, and I think it is trying to access "libnvidia-ml.so.1"
Basically saying, if you can run nvidima-smi from inside the container, it should work.