The configuration tab -> configuration objects -> pipeline is empty
That's the reason it is doing nothing 😞
How come it is empty if you Cloned the local one?
LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...
By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
Sync is design to take a local folder/s and add/remove files from a dataset based on the local changes (it does that automatically based on file existence / content)
The changes (i.e. added files) are uploaded as delta changes relative to the parent version, this means we are not always uploading all files.
Add on the other hand means you...
Sure :task = Task.init(..., auto_connect_arg_parser={'arg_not_to_log': False})This will cause all argparse to automatically be logged (and later editable) with the exception of the argument arg_not_to_log
Notice that if you have --arg-something, to exclude it add to the dict arg_something': False
VictoriousPenguin97 basically spin down sereverA (this should flush all DBs) then copy /opt/clearml to the new server and spin it with docker-compose. As long as the new server is on the same address as the previous one, everything should work out of the box
Hmm, might be, check if your files server is running and configured properly
Hi UnevenBee3
the optuna study is stored on the optuna class
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/optuna/optuna.py#L186
And actually you could store and restore it
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/optuna/optuna.py#L104
I think we should improve the interface though, maybe also add get_study(), wdyt?
Assuming git repo looks something like:.git readme.txt module | +---- script.pyThe working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:args1 = value args2 = another_value
Make sense?
Hi MortifiedCrow63
Sorry getting GS credentials is taking longer than expected 🙂
Nonetheless it should not be an issue (model upload is essentially using the same StorageManager internally)
Oh :)task.get_parameters_as_dict()
additionally, I found is that clearml==1.0.5 package is able to find these partial changes, newer versions find nothing at all, maybe it's because it's always comparing against remote
Hmm it was always from remote...
it is actually doing the following:git rev-parse --abbrev-ref --symbolic-full-name @{u}Then with the branch name output,git diff --submodule=diff <add_branch_name_here>
as i also noticed that uploads are sometimes slow, and i see here max_connections=2
Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object and an additional variable on the AzureContainerConfigurations class.
Could you PR a tested draft ? we will be able to take from there
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML (remote execution) sometimes doesn't "pick-up" GPU. After I rerun the task it picks it up.
what do you mean by "does not pick up"? is it the container is up but not executed with --gpus , so no GPU access?
Hi SubstantialElk6
where exactly in the log do you see the credentials ?
/tmp/.clearml_agent.234234e24s.cfg
What's the exact setup ? (I mean are you using the glue? if that's the case I think the temp config file is only created inside the pod/docker so upon completion it will be deleted along side the pod.
Actually I saw that theÂ
RuntimeError: context has already been set
 appears when the task is initialised outsideÂ
if name == "main":
Is this when you execute the code or when the agent ?
Also what's the OS of your machine/ agent ?
Hi JitteryCoyote63
So that I could simply do
task._update_requirements(".[train]")
but when I do this, the clearml agent (latest version) does not try to grab the matching cuda version, it only takes the cpu version. Is it a known bug?
The easiest way to go about is to add:Task.add_requirements("torch", "==1.11.0") task = Task.init(...)Then it will auto detect your custom package, and will always add the torch version. The main issue with relying on the package...
WickedGoat98
The trains-agent-services docker is always CPU, the idea is put long lasting services there (like the auto cleanup or slack integration or HPO etc.)
To spin an agent with GPU on any machine (regardless of where the trains-server is) you can check the trains-agent readme.
https://github.com/allegroai/trains-agent#running-the-trains-agent
DeliciousBluewhale87 this is exactly how it works,
The glue puts a k8s job with the requested docker image (the one on the Task), the job itself (k8s job) starts the agent inside the requested docker, then the agent inside the docker will install all the required packages.
BTW, we figure out that Â
'
 is belong the echo
yep, when seeing the full command it is apparent
So if you are using the latest clearml (i.e. +1.3) reenqueuing the pipline will automatically continue it from where it stopped.
With previous versions (which is your case, I think), you clone the pipeline Task, change the parameter and enqueue it.
(The state itself of the pipeline is stored on the Task, and when you clone it, you are cloning the state as well).
Make sense ?
BurlyRaccoon64 by default if .ssh exists in the host user folder it should mount it to the container (actually mount a copy of it). do you have a log of two tasks from two diff machines, one failing one passes? because this is quite odd (assuming the setup itself is identical)
ElegantCoyote26 could be, if the Task run is under 30sec?!
GrittyKangaroo27 any chance you can open a GitHub issue so this is not forgotten ?
(btw: we I think 1.1.6 is going to be released later today, then we will have a few RC with improvements on the pipeline, I will make sure we add that as well)
Okay, so you want to take the jupyter notebook (aka colab) and have that experiment show on Trains, then use the Trains UI to launch it remotely on one of the machines running the trains-agent. Is that correct?
Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?
SmugOx94 could you please open a GitHub issue with this request, otherwise we might forget 🙂
We might also get some feedback from other users
and if you add --skip-task-init ?
I think what happens is that the clearml-Task, adds a Task.init call without the output_uri that is called before "your" Task.init, and this is what causes it to be ignored. Could that be the case?
Ohh I see now the force SSH did not replace the user in the SSH link (only if the original was http), right ?