Reputation
Badges 1
25 × Eureka!So this is verry odd, it looks like a pip bug:
The agent is trying to install torch==2.1.0.*
because by default it ignores the 4th+ parts (they are unstable and torch have tendency to remove them) . and for some reason pip will not match 2.1.0.*
with for example "2.1.0.dev20230306+cu118"
but based on the docs it should work:
see here: None
As a workaround you can always edit and change to the final url for example: so ...
ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?
Seems lime someone sitting in the middle and reroutes the request (maybe both https and port) ?!
Hi BroadMole98
A bit hacky but doable πtask = Task.get_task(task_id='aabbcc') task.get_logger().report_scalar(...)
yey π notice that when executed by the agent the call execute_remotely
is skipped, and so does the If statement I added (since running_locally will return False when the process is executed by the agent)
Hi SubstantialElk6
where exactly in the log do you see the credentials ?
/tmp/.clearml_agent.234234e24s.cfg
What's the exact setup ? (I mean are you using the glue? if that's the case I think the temp config file is only created inside the pod/docker so upon completion it will be deleted along side the pod.
Can you clone the git with the .ssh credentials on the host machine ?
If so, can you do the same manually inside a docker (i.e. spin a docker with mount -v /home/hostuser/.ssh:/root/.ssh) ?
Hi WackyRabbit7
I believe this is fixed in clearml-server 1.1 (this is a plotly color issue), releasing later today or tomorrow π
Okay, I was able to reproduce it (this is odd) let me check ...
ClumsyElephant70
Can you manually run the same command ?['python3.6', '-m', 'virtualenv', '/home/user/.clearml/venvs-builds/3.6']
Basically:python3.6 -m virtualenv /home/user/.clearml/venvs-builds/3.6'
and: " clearml_agent: ERROR: 'charmap' codec can't encode character '\u0303' in position 5717: character maps to <undefined>Β "
Ohh that's the issue with the LC_ALL missing in the docker itself (i.e unicode code character will break it)
Add locals into the container, in your clearml.conf
add the followingagent.extra_docker_shell_script: ["apt-get install -y locales",]
Let me know if that solves the issue (as you pointed, it has nothing to do with importing package X)
Hmm are you getting the warning on the client side , or in the clearml-server ?
I execute theΒ
clearml-session
Β withΒ
--docker
Β flag.
This is to control the docker image the agent will spin for you (think dev enviroment you want to work in, like nvidia pytorch container already having everything you need)
If the only issue is this linetask.execute_remotely(..., exit_process=True)
It has to finish the static analysis of the entire repository (which usually happens in the background but now we have to wait for it). If the repo is large this could actually take 20sec (depending on CPU/drive of the machine itself)
An example for something like spacy would be useful for the community.
That awesome, any chance you can PR something? (no need for it to be perfect, we can take it from there)
with remote machine where the code actually runs (you know this pycharm pro remote).
Are you using the pycharm plugin ? (to sync the local git changes with clearml)
https://github.com/allegroai/clearml-pycharm-plugin
So this is optuna π the idea is it will test which parameters have potential (with early stopping), then launch a subset of the selected parameters
Could you give an example of such configurations ?
(e.g. what would be diff from one to another)
Hi ClumsyElephant70
What's the clearml
you are using ?
(The first error is a by product of python process.Event created before a forkserver is created, some internal python issue. I thought it was solved, let me take a look at the code you attached)
Where did you add the Task.init call ?
This task is picked up by first agent; it runs DDP launch script for itself and then creates clones of itself with task.create_function_task() and passes its address as argument to the function
Hi UnevenHorse85
Interesting use case, just for my understanding, the idea is to use ClearML for the node allocation/scheduling and PyTorch DDP for the actual communication, is that correct ?
passes its address as argument to the function
This seems like a great solution.
the queu...
Hi LivelyLion31
Yes, the reason we designed Trains with an automagic integration is exactly that reason, so users do not need to learn another package and that with almost no effort you get most of the benefits.
Regrading the TB files, from experience most users will use the TB files short after they executed the experiment, usually for debugging and in depth capabilities (like network debugger profile etc), metric view is something that is much easier to do on a centralized server (like on...
I'll try to go with this option, I think its actually perfect for my needs
Great!