that does happen when you create a normal local task, that's why i was confused
The parts that are not passed in both cases are the configurations from the conf file. Only the environment is passed (e.g. git python packages etc) , . For example if you have storage credentials in your conf file , they are not passed to a remote agent, instead the credentials from the remote agent are used when it runs the task.
make sense?
PricklyRaven28 basically this is the issue:
python -m fastai.launch <script>
There are multiple copies of the script running, but they are Not aware of one another.
are you getting any reporting from the diff GPUs? I'm assuming there is a hidden OS environment that signals the "master" node, so all processes can communicate with it. This is what we should automatically capture. There is a workaround the fastai.launch, that is probably similar to this one:
When you login with user/pass in the UI the same "process" happens and you get a Token to work with, this is the same as secret/key
Since in both cases you provide credentials and get back access token, it should work
(This is of course only if you are setting user/pass manually and disabling pass_hashed
as you have)
Hmm okay let me check that, I think I understand the issue
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
suppose I have an S3 bucket where my data is stored and I wish to transfer it to ClearML file server.
Then you first have to download the entire bucket locally, then register the local copy.
Basically:
StorageManager.download_folder("
", "/target/folder")
# now register the local "/target/folder" with Dataset.add_files
BoredHedgehog47 you need to make sure "<path here>/train.py" also calls Task.init (again no need to worry about calling it twice with different project/name)
The Task.init call will make sure the auto-connect works.
BTW: if you do os.fork , then there is no need for the Task.init, the main difference is that POpen starts a whole new process, and we need to make sure the newly created process is auto-connected as well (i.e. calling Task.init)
Hi @<1544853695869489152:profile|NonchalantOx99>
I would assume the clearml-server configuration / access key is misconfigured in your copy of example.env
This is assuming you can just run two copies of your code, and they will become aware of one another.
so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen
This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...
Also could you explain the difference between trigger.start() and trigger.start_remotely()
Start will start the trigger process (the one "watching the changes") locally (this makes sense for debugging etc.)
start_remotely will launch the trigger process on the "services" where it should live forever 🙂
Okay so when I add trigger_on_tags, the repetition issue is resolved.
Nice!
This problem occurs when I'm scheduling a task. Copies of the task keep being put on the queue ...
Hi VexedCat68
Check this example:
https://github.com/allegroai/clearml/blob/4f9aaa69ed2d5b8ea68ebee5508610d0b1935d5f/examples/scheduler/trigger_example.py#L44
DeterminedCrab71 that is a good point, how does plotly adjust for nans on graphs?
Or am I forced to do a get, check if the latest version is fainallyzed,
Dataset Must be finalized before using it. The only situation where it is not is because you are still in the "upload" state.
, then increment de version of that version and create my new version ?
I'm assuming there is a data processing pipeline pushing new data?! How do you know you have new data to push?
VexedCat68
But what's happening is, that I only publish a dataset once but every time it polls,
this seems wrong (i.e a bug?!), how do you setup the trigger ? is the Trigger Task constantly running or are you re-launching it?
Is it only for modified changes and not untracked files?
basically everything that "git diff" will output.
Then the agent will re-apply it on a remote machine
Create a new version of the dataset by choosing what increment in SEMVER standard I would like to add for this version number (major/minor/patch) and uploadOh this is already there
` cur_ds = Dataset.get(dataset_project="project", dataset_name="name")
if version is not given it will auto increase based on semantic versions incrementing the last number 1.2.3 -> 1.2.4
new_ds = Dataset.create(dataset_project="project", dataset_name="name", parents=[cur_ds.id]) `
Back to the feature request, if this is taken care of (both adding a missed package, and the S3 upload), do you still believe there is a room for this kind of feature?
I think RoughTiger69 was discussing this exact scenario
https://clearml.slack.com/archives/CTK20V944/p1629885416175500?thread_ts=1629881415.172600&cid=CTK20V944
wdyt?
Yes... I think that this might be a bit much automagic even for clearml 😄
Thanks NonchalantDeer14 !
BTW: how do you submit the multi GPU job? Is it multi-gpu or multi node ?
They all "inherit" the same user / environment from one another
CourageousLizard33 column order / specific selection is stored per user. If you press the share button you will have a link with all the definitions embedded on it.
Column resizing and order is in the next version release :)
Hi YummyMoth34 they will keep on trying to send reports.
I think they try for at least several hours.
I'll make sure we have conda ignore git:// packages, and pass them to the second pip stage.
Hi EnviousStarfish54
Color coding on the entire UI is stored per user (I think that on your local cookies, but I might be wrong). Anyhow any title/series combination will have the select color regardless of the project.
This way you can configure once that loss is red and accuracy is green, etc.
Hi PungentLouse55 ,
Yes we have integration with hydra on the todo list since it was first released, we actually know the guy behind Hydra, he is awesome!
What are your thoughts on integration, we would love to get feedback and pointers (Hydra itself is quite capable, and we waiting until we have multiple configuration support, and with v0.16 it was added, so now it is actually possible)
So good news (1) Dashboard is being worked on as we speak. (2) we released clearml-serving doing exactly that, the next release of clearml-serving will include integration with kfserving (under the hood) essentially managing the serving endpoints on top of the k8s cluster , wdyt?
it seems it's following the path of the script i'm using to task.create, eg:
The folder it should run it is the script path you are passing (i.e. "script=ep_fn," )
Wrong path would imply that is it not finding the correct repository, is that the case ?