Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63
Could it be a python mismatch ? can you send the full log?
BTW: when I dopip3.8 install pytorch3d==
I get the following versions:pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)
Do people use ClearML with huggingface transformers? The code is std transformers code.
I believe they do 🙂
There is no real way to differentiate between, "storing model" using torch.save
and storing configuration ...
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
and the step is "queued" or is it "queued" in the pipeline state (i.e. the visualization did not update) ?
Would love to just cap it at a fixed amount for a month for API calls.
Try the timeout configuration, I think this shoud solve all your issues, and will be fairly easy to set for everyone
You can do:task = Task.get_task(task_id='uuid_of_experiment')
task.get_logger().report_scalar(...)
Now the only question is who will create the initial Task, so that the others can report to it. Do you have like a "master" process ?
Where again does clearml place the venv?
Usually ~/.clearml/venvs-builds/<python version>/
Multiple agents will be venvs-builds.1
and so on
Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked
/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
-...
SmallDeer34 the function Task.get_models() incorrectly returned the input model "name" instead of the object itself. I'll make sure we push a fix.
I found a different solution (hardcoding the parent tasks by hand),
I have to wonder, how does that solve the issue ?
we have a separate cache
Why? they can share
The downstream stages are rankN scripts, they are waiting for the IP address of the first stage.
Is this like a multi-node training, rather than a pipeline ?
Thanks RipeGoose2 !
clearml logging starts from n+n (thats how it seems) for non explicit
I have to say it looks like the expected behavior , I think.
Basically matching the TB, no?
Hi @<1719524641879363584:profile|ThankfulClams64>
I am using ClearML Pro and pretty regularly I will restart an experiment and nothing will get logged to ClearML.
I use ClearML with pytorch 1.7.1, pytorch-lightning 1.2.2 and Tensorboard auto
All ClearML has the latest stable updates. (clearml 1.7.4, clearml-agent 1.7.2)
Is this still happening with the latest clearml ( clearml==1.16.3rc2
) ?
What is the TB version?
I remember a fix regrading lightining support
Also just making s...
Yes I was thinking a separate branch.
The main issue with telling git to skip submodules is that it will be easily forgotten and will break stuff. BTW the git repo itself is cached so the second time there is no actual pull. Lastly it's not clear on where one could pass a git argument per task. Wdyt?
Because submodules inside a git are basically a requirement for a git repo to run. Skipping over a few or selecting manually will break the agent. That said maybe shallow clone might be easier or faster. Regardless it should be an environment passed per Task. Feel free to add a GH issue request, if this is not a unique edge case we will add it
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
I double checked the code it's always being passed 😞
Hi RipeGoose2
Can you try with the latest from git ?pip install -U git+
None
See: Add an experiment hyperparameter:
and add gpu
: True
Hi @<1726410010763726848:profile|DistinctToad76>
Why not just report scalars, the x-axis you can use as "iterations" if this is a running in real time to collect the prompts.
If this is a summary then just report a scatter plot (you can also specify the names of the axis and the series)
None
This code will give you one graph titled "loss" with two series: (1) trains (2) loss
You can always click on the name of the series and remove it for display.
Why would you need three graphs?
What do you mean by "tag" / "sub-tags"?
Can my request be made as new feature so that we can tag same type of graphs under one main tag
Sure, open a Git Issue :)
Before this line, call Task.init
It will not create another 100 tasks, they will all use the main Task. Think of it as they "inherit" it from the main process. If the main process never created a task (i.e. no call to Tasl.init) then they will create their own tasks (i.e. each one will create its own task and you will end up with 100 tasks)
And you want all of them to log into the same experiment ? or do you want an experiment per 60sec (i.e. like the scheduler)