Hi FunnyTurkey96
Which pip are you using, basically pip changed the dependency resolver after 20.1
Change: https://github.com/allegroai/clearml-agent/blob/aede6f4bac71c8fc56e7cf982318a48527953a3c/docs/clearml.conf#L57pip_version: "<20.2"
See if that helps
Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)
SmarmySeaurchin8 regarding the original question:task.set_project(project_id)
Task.get_projects() to get all the project names/ids
SmugOx94 Yes, we just introduced it 🙂 with 0.16.3
Discussion was here (I'll make sure to update the issue that the version is out)
https://github.com/allegroai/trains/issues/222
In your trains.conf
add the following line:sdk.development.store_code_diff_from_remote = true
It will store the diff from the remote HEAD instead of the local one.
without the ClearML Server in-between.
You mean the upload/download is slow? What is the reasoning behind removing the ClearML server ?
ClearML Agent per step
You can use the ClearML agent to build a socker per Task, so all you need is just to run the docker. will that help ?
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
Hi CurvedDolphin95
I would first check the free space on the instance (it might be that git is reporting an inaccurate error and it's free space not permission that causing it to fail the clone).
I would also check your GitHub account, notice that the now only support user/api-key (and not user/pass), which means you need to create an api-key and add it as your password in the clearml.conf.
Any chance that for some reason some of the Tasks are running from a diff user? or not using a docker ?
Hi JitteryCoyote63
Do you have a specific example in mind ?
Does a pipeline step behave differently?
Are you disabling it in the pipeline step ?
(disabling it for the pipeline Task has no effect on the pipeline steps themselves)
Quick update, I found the issue, working on a fix 🙂
JitteryCoyote63
I am setting up a new machine with two rtx 3070 GPU
Nice! you are one of the lucky few who managed to buy them 🙂
Which makes me think that the wrong torch package is installed
I think that torch 1.3.1 is does not support cuda 11 😞
I'm kind of at a point where I don't know a lot of what to even search for.
we feel you 💗 , yes there still isn't a very good source of information on where to get started...
This is because the entire field is constantly changing and evolving, and one solution will usually only apply to specific use case...
I would start with the mlops community slack channel, and youtube talks (specifically those by companies describe how they built their own internal infrastructure, i...
Yes 🙂 https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU
It seems to follow a structure specific to clearml,
Actually plotly.js 🙂
Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)
mostly out of curiosity, what is the motivation behind introducing this as an environment variable knob rather then a flag with some default in Task.init?
DepressedChimpanzee34 we will deprecate the demo server (not exactly sure when) as we have the free community one that gives better service and stores the data. It was originally set for easy on-boarding and testing, but I think that now the user experience might be better with using the community free tier.
Make sense ? btw: what ...
WackyRabbit7 basically starting v1.1 if you are running code without any configuration file, you will get an error (in contrast to previous versions where it defaulted to the demo-server)
I assume so 🙂 Datasets are kind of agnostic to the data itself, for the Dataset it's basically a file hierarchy
Hi JitteryCoyote63
Just making sure, the package itself it installed as part of the "Installed packages", and it also installs a command line utility ?
Regrading the project name:
set_project will support project_name in the next version 🙂 project_id=[p.id for p in Task.get_projects() if p.name==project_name][0]
it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent
will be able to override them :)
Yes including this. (There was a fix to an issue with trains-agent
and disabling frameworks, it is already part of 0.16.3 )
I see it's a plotly plot, even though I report a matplotlib one
ClearML tries to convert matplotlib into plotly objects so they are interactive, it it fails it falls back into a static image as in matplotlib
Legit, if you have a cached_file (i.e. exists and accessible), you can return it to the caller