Ohh... I would not delete them then ... 😞
Maybe kind of heuristics (files created a week ago can be deleted?!)
Hi LazyLeopard18 ,
So long story short, yes it does.
Longer version, to really accomplish full federated learning with control over data at "compute points" you need some data abstraction layer. Without data abstraction layer, federated learning is just averaging derivatives from different location, this can be easily done with any distributed learning framework, such as horovod pr pytorch distributed or TF distributed.
If what you are after is, can I launch multiple experiments with the sam...
Hi @<1556450111259676672:profile|PlainSeaurchin97>
Is there any simple way to use
argparse
to pass a clearml task name?
need to call
args = task.connect(args)
.
noooo 🙂 there is no need to do that, the arguments are automatically detected
see for yourself
args = parse_args()
task = Task.init(task_name=args.task_name)
Hi @<1729309120315527168:profile|ShallowLion60>
How did you create those credentials ?
Sorry my bad:config_obj['sdk']['stuff']['here'] = value
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
t seems there is some async behavior going on. After ending a run, this prompt just hangs for a long time:
2021-04-18 22:55:06,467 - clearml.Task - INFO - Waiting to finish uploads
And there's no sign of updates on the dashboard
Hmm that could point to an issue uploading the last images (which are larger than regular scalars) could you try flushing and waiting ?
i.e.task.flush() sleep(45)
UnevenDolphin73 I have a suspicion we have a few terms mixed:
hyperparameters :
These are essentially key/value.
when you call Task. connect (dict_with_params), clearml will flatten the dict and you end up with key/value
configuration objects :
These are actually blobs of text, the UI will show as is
When you call my_local_file=Task. connect_configuration (name, "path/to/config/file")
The entire Content of the config file is stored on the Task object itself.
Back to the use case, instead ...
using only a subset of the features
ShallowGoldfish8 if you have some parameter that controls it (i.e. select different features) then you can launch it with two sets f parameters.
Am I missing something?
for example:
` my_features_select = {"type": "set_a"}
Task.current_task().connect(my_features_select)
if my_features_select["type"] == "set_a":
do something
else
do something else `wdyt?
WackyRabbit7 I'll make sure it is fixed
ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...
Okay I think I know what's going on (there is a race that for some reason on CoLab acts differently).
As a quick hack you can do the following:Task._report_subprocess_enabled = False task = Task.init(...) task.set_initial_iteration(0)
Also I would suggest using Task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase
current task fetches the good Task
Assuming you fork the process than the gloabl instance" is passed to the subprocess. Assuming the sub-process was spawned (e.g. POpen) then an environement variable with the Task's unique ID is passed. then when you call the "Task.current_task" it "knows" the Task was already created and it will fetch the state from the clearml-server and create a new Task object for you to work with.
BTW: please use the latest RC (we fixed an issue with exactly this...
YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160
I don't know how I would be able to get the description and name?
Good point, how about doing that in code, then you have all the information and you can store it in jsons / pickle next to the data folder?
wdyt?
Hi WickedGoat98
"Failed uploading to //:8081/files_server:"
Seems like the problem. what do you have defined as files_server in the trains.conf
Try:task.update_requirements('\n'.join([".", ]))Â
ERROR: Could not install packages due to an EnvironmentError:
[Errno 28] No space left on device
BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error
Hi ResponsiveCamel97
The agent generates a new configuration file to be mounted into the docker, with all the new folders as they will be seen inside the docker itself. One of the changes is the system_site_packages as inside the docker we want the new venv to inherit everything from the docker system installed packages.
Make sense ?
I assume issue: None
Yeah this is odd I noticed as well. Let me ask the guys to take a look
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
What do you mean by simulate?
You can manually setup and run a Task if you need,
'clearml-agent execute --id task_id' add --docker for docker mode.
This will setup the env and run the task
Quick update Nexus supports direct http upload, which means that as CostlyOstrich36 mentioned, just pointing to the Nexus http upload endpoint would work:output_uri="http://<nexus>:<port>/repository/something/"
See docs:
https://support.sonatype.com/hc/en-us/articles/115006744008-How-can-I-programmatically-upload-files-into-Nexus-3-
DeliciousBluewhale87 my apologies you are correct 😞
We should probably add support for that, do you feel like adding a GitHub issue, so we do not forget?
When I'm setting up my Pipeline, I can't go "here are some brand new tasks, please run them",
I think this is the main point. Can you create those Tasks via Task.create and get what you want? If so, then sure you can do that:
` def create_step_task(a_node):
task = Task.create(...)
return task
pipe.add_step(
name="stage_process",
parents=["stage_data"],
base_task_factor=create_step_task
) `wdyt?
As for the node, this confusing bit is that this is text from the docs...
ShinyWhale52 any time 🙂
Feel free to followup with more questions