Reputation
Badges 1
25 × Eureka!Hi GloriousPenguin2
Had to do some linux updates and redeploy clearml server, now i can access web UI & the service only if i do port-forwarding to that remote machine
So you are saying before you were able to directly browse to the server, but now you need a "jump box" ?
Okay, I was able to reproduce it (this is odd) let me check ...
This task is picked up by first agent; it runs DDP launch script for itself and then creates clones of itself with task.create_function_task() and passes its address as argument to the function
Hi UnevenHorse85
Interesting use case, just for my understanding, the idea is to use ClearML for the node allocation/scheduling and PyTorch DDP for the actual communication, is that correct ?
passes its address as argument to the function
This seems like a great solution.
the queu...
I was able to successfully enqueue the task but only entrypoint script is visible to it and nothing else.
So you passed a repository link is it did not show on the Task ?
What exactly is missing and how the Task was created ?
These are both specific cases of the glue, and yes both need to be fixed.
(1) I think is actually a feature, nonetheless we should support it.
FriendlySquid61 could you verify specifically on (2)
Hi ColossalDeer61 ,
the next trains-agent RC (solving the #196 issue) will also solve the double install issue π
I assume every fit starts reporting from step 0 , so they override one another. Could it be?
WickedGoat98 is this related to plotly opening a web page when you call show()
method ?
You can do:if not Task.running_locally() fig.show()
but I still need the laod ballancer ...
No you are good to go, as long as someone will register the pods IP automatically on a dns service (local/public) you can use the regsitered address instead of the IP itself (obviously with the port suffix)
Thanks for your support
With pleasure!
Ohh so you are saying you can store it properly, but only editing in the UI is limited ? (Maybe this is just a UI thing)
Hi TartBear70
I'm setting up reproducibility myself but when I call Task.init() the seed is changed
Correct
. Is it possible to tell clearml not to initialize any rng? It appears that task.set_random_seed() doesn't change anything.
I think this is now fixed (meaning should be part of the post weekend release)
. Is this documented?
Hmm i'm not sure (actually we should write it, maybe in Task.init docstring?)
Specifically the function that is being called is:
https://gi...
Hi GrotesqueDog77
and after some time I want to delete artifact with
You can simply upload with the same local file name and same artifact name, it will override the target storage. wdyt?
But they are all running inside the same pod, correct ?
This is odd because the screen grab point to CUDA 10.2 ...
It just seems frozen at the place where it should be spinning up the tasks within the pipeline
And is there an agent for those ? usually there is one agent for running logic tasks (like pipelines) running with --services-mode
which means multiple Tasks can be executed by the same agent. And other agents for compute Tasks that are a signle Task per agent (but you can run multiple agents on the same machine)
The function
a delete request with a
raise_on_errors=False
flag.
Are you saying we should expose raise_on_errors
it to _delete_artifacts() function itself?
If so, sure seems logic to me, any chance you want to PR it? (please just make sure the default value is still False so we keep backwards compatibility)
wdyt?
Is there a reasonΒ
clearml
Β will use the demo server when there is noΒ
~/clearml.conf
?
It's the default server for easy getting started journey, e.g. you run some sample code and it works , with zero configuration.
that said you can set an environment flag to disable the default server behavior .CLEARML_NO_DEFAULT_SERVER=1
ReassuredTiger98
wdyt?
BTW:
it will push potentially proprietary data to the public demo server.
The server if su...
BTW: if you could implement _AzureBlobServiceStorageDriver
with the new Azure package, it will be great:
Basically update this class:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L1620
I'm suggesting to make it public.
Actually I'm thinking of enabling users to register Drivers in runtime, expanding the capability to support any type of URL link, meaning you can register "azure://" with AzureDriver, and the StorageHelper will automatically use the driver you provide.
This will make sure Any part of the system will be able to transparently use any custom driver.
wdyt?
hmmm I see...
It seems to miss the fact that your process do uses the GPU.
Maybe it only happens later, that the GPU is used?
Does that make sense ?
Yes, I find myself trying to select "points" on the overview tab. And I find myself wanting to see more interesting info in the tooltip.
Yep that's a very good point.
The Overview panel would be extremely well suited for the task of selecting a number of projects for comparing them.
So what you are saying, this could be a way to multi select experiments for detailed comparison (i.e. selecting the "dots" on the overview graph), is this what you had in mind?
WittyOwl57
To get task Id's use (e.g. all the tasks of a specific project):task_ids = Task.query_tasks(project_name="examples", task_filter={'status': ["completed"])
Then per task:
` for t_id in tasks_id:
t = Task.get_task(t_id)
conf_dict = t.get_configuration_as_dict(name="filter")
task_param = t.get_parameters()
task_param['filter'] = conf_dict
# this is to enable to forcefully update parameters post execution
t.mark_started(force=True)
# update hyper-parame...
. Is it possible for two agents to be utilizing the same GPU?
It is, as long as memory wise they do not limit one another.
(If you are using k8s and clearml enterprise, then it supports GPU slicing and dynamic memory allocation)
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
Try to upload something to the file server ?
None