Reputation
Badges 1
25 × Eureka!BroadSeaturtle49 agent RC is out with a fix:pip3 install clearml-agent==1.5.0rc0Let me know if it solved the issue
Hi UnevenDolphin73
Can one compare experiments/tasks from different projects?
Yes, the easiest way is to go to the parent project ("all projects" if they have no common parent, then search for the specific Tasks (i.e. filter or using the search bar), then multi-select them.
wdyt?
FrothyShark37 any chance you can share snippet to reproduce?
Can you post here the actual line? seems like we can fix it to also support this scenario (if we could test it)
FrothyShark37 what was different in your script ?
Thanks FrothyShark37
I just verified, this would work as well, I suspect what was missing is the plt.show call, this is the actual call that triggers clearml
Hi FrothyShark37
Can you verify with the latest version?
pip install -U clearml
This one seem to work
` from clearml import Task
task = Task.init(...)
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('_mpl-gallery')
make data:
np.random.seed(10)
D = np.random.normal((3, 5, 4), (0.75, 1.00, 0.75), (200, 3))
plot:
fig, ax = plt.subplots()
vp = ax.violinplot(D, [2, 4, 6], widths=2,
showmeans=False, showmedians=False, showextrema=False)
styling:
for body in vp['bodies']:
body.set_alpha(0.9)
ax.set(xlim=(0, 8), xticks=np.arang...
Should not be complicated, it's basically here
https://github.com/allegroai/clearml/blob/1eee271f01a141e41542296ef4649eeead2e7284/clearml/task.py#L2763
wdyt?
Hi GrievingTurkey78 yes, /opt/clearml should contain everything.
That said, backup only after you spin down the DBs so they serialize everything,
but this will be invoked before fil-profiler starts generating them
I thought it will flush in the background π
You can however configure the profiler to a specific folder, then mount the folder to the host machine:
In the "base docker args" section add -v /host/folder/for/profiler:/inside/container/profile
Thanks FlutteringWorm14 , checking π
DepressedChimpanzee34 something along the lines of:from multiprocessing.pool import ThreadPool p = ThreadPool() def get_last_metric(t): return t.get_last_scalar_metrics() task_scalars_list = p.map(get_last_metric, top_tasks) p.close()We parallelized network connection as I'm assuming the delay is fetching
PompousParrot44
Check out the task.execute_remotely()
You can call it right after the task init, and it will enqueue your running Task, and leave the process (if you want).
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/task.py#L1437
Then this is by default the free space on the home folder (`~/.clearml') that is missing free space
DilapidatedDucks58 so is this more like a pipeline DAG that is built ?
I'm assuming this is more than just grouping ?
(by that I mean, accessing a Tasks artifact does necessarily point to a "connection", no? Is it a single Task everyone is accessing, or a "type" of a Task ?
Is this process fixed, i.e. for a certain project we have a flow (1) executed Task of type A, then Task of type (B) using the artifacts fro Task (A). This implies we might have multiple Tasks of types A/B but they are alw...
Is it possible to substitute these steps using containers instead.
I'm not sure I follow, could you expand ?
DefiantHippopotamus88 you are sending the curl to the wrong port , it should be 9090 (based on what remember from the unified docker compose) on your setup
Like what would be the exact query given an endpoint, for requests per sec.
You mean in Grafana ?
Please hit Ctrl-F5 refresh the entire page, see if it is till empty....
Woot woot
ChubbyLouse32 when you get it working please PR it, this is very very cool!
(I'll be happy to help π )
Task.current_task().connect(training_args, name='hugggingface args')And you should be able to change them when launching remotely π
SmallDeer34 btw: "set_parameters_as_dict" will replace all the arguments (and is one way) ...
Hurrah Hurrah
PricklyJellyfish35
Do you mean the original OmegaConf, before the overrides ? or the configuration files used to create the OmegaConf ?
GiganticTurtle0 fix was just pushed to GitHub πpip install git+
Train Data Params/a = {} Train Data Params/b = ...Then maybe we could "hack" it so that if you edit it in the UI like so:Train Data Params/a = {'new': 'value'} Train Data Params/b = ...You end up withparam = {'a': {'new': 'value'}, 'b' : ... }What do you think?
Okay. AndΒ
110
Β means 11.1 and not 11.0?Β (edited)
110 means 11.0, the odd thing is, it actually installed 11.1, and from the pytorch website this is exactly how they suggest to install with conda...
Let me know if forcing the CUDA version changes anything
Okay, what you can do is the following:
assuming you want to launch task id aabb12
The actual slurm command will be:trains-agent execute --full-monitoring --id aabb12
You can test it on your local machine as well.
Make sure the trains.conf is available in the slurm job
(use trains-agent --config-file to point to a globally shared one)
What do you think?
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?