
Reputation
Badges 1
25 × Eureka!FYI: if you need to query stuff you can always look directly in the RestAPI:
https://github.com/allegroai/clearml/blob/master/clearml/backend_api/services/v2_9/projects.py
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html
clearml.conf is the file thatΒ
clearml-init
Β suppose to create, right?
Correct, specifically ~/clearml.conf
I guess that was never the intention of the function, it just returns the internal representation. Actually my question would be, how do you use it, and why? :)
Thanks GiganticTurtle0 !
I will try to reproduce with the example you provided. regardless I already took a look at the code, and I'm pretty sure I know what the issue is. We will be pushing a few fixes after the weekend, I'm hoping this one will be included as well π
Hi GrievingTurkey78
How are you getting different version than what is used in run time? it analyzes the PYTHONPATH just as python does ? How can I reproduce it? Currently you can use Task.add_requirements(package_name, package_version=None)
This will not force it though, it is a recommendation (if it fails to find the package itself) maybe we can add force ?!What do you think?
Is there still an issue? Could it be the browser cannot access the file server directly?
I think they (DevOps) said something about next week, internal roll-out is this week (I think)
LOL AlertBlackbird30 had a PR and pulled it π
Major release due next week after that we will put a a roadmap on the main GitHub page.
Anything specific you have in mind ?
Makes sense to add it to docker run by default if GPUs are mentioned in agent.
I think this is an arch thing, --privileged is not needed on ubuntu flavor, that said you can always have it if you add it here:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
clearml-agent daemon --gpus 0 --queue default --docker
But docker still sees all GPUs.
Yes --gpus should be enough, are you sure regrading the --privileged flag ?
Hmm, maybe the original Task was executed with older versions? (before the section names were introduced)
Let's try:DiscreteParameterRange('epochs', values=[30]),
Does that gives a warning ?
SarcasticSparrow10 sure see "execute_remotely" it does exactly that:
https://allegro.ai/docs/task.html#trains.task.Task.execute_remotely
It will stop the current process (after syncing everything) and launch itself remotely (i.e. enqueue itself)
When the same code is running by the "trains-agent" the execute_remotely call becomes a no-operation and is basically skipped
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) π
Hmm that is odd, can you send an email to support@clear.ml ?
at the end of the manual execution
Wait even without the pipeline decorator this function creates the warning?
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
Oh sure, use https://clear.ml/docs/latest/docs/references/sdk/dataset#get_logger they will be visible on the Dataset page on the version in question
MagnificentSeaurchin79 do you have the "." package listed under "installed packages" after you reset the Task ?
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
PompousParrot44 did you manage to get it working ?
I think the reason is that the "original" task is already the right type. I'll make sure we fix it, and always set the system tag
Hi @<1524560082761682944:profile|MammothParrot39>
By default you have the last 100 iterations there (not sure why you are only seeing the last 3), but this is configurable:
None
I can definitely feel you!
(I think the implementation is not trivial, metrics data size is collected and stored as commutative value on the account, going over per Task is actually quite taxing for the backend, maybe it should be an async request ? like get me a list of the X largest Tasks? How would the UI present it? As fyi, keeping some sort of book keeping per task is not trivial either, hence the main issue)
Hi BurlyPig26
I think you can easily change the Web port, but not the API (8008) or files (8081) port
How are you deploying it?
Can you let me know if i can override the docker image using template.yaml?
No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one
And the agent section on this machine is:api_server:Β
web_server:Β
files_server:Β
Is that correct?
but instead, they cannot be run if the files they produce, were not committed.
The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?
BTW: any specific reason for going the RestAPI way and not using the python SDK ?
Hi JitteryCoyote63
The NVIDIA_VISIBLE_DEVICES
is set automatically for the process the trains-agent spins, so from your code, it is transparent, you can only "see" GPU 0.
(Obviously not using docker you can forcefully change the OS environment in runtime, but you should avoid that ;))