Hi GiddyPeacock64
If you already have K8s setup, and are already using ClearML.
In your kubeflow Yaml:trains-agent execute --id <task_id> --full-monitoring
This will install everything your Task needs inside the docker. Just make sure that you pass the env variable setting the ClearML , see here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L127
And voila full trace including Git and uncommitted changes, python packages, and the ability to change arguments from the UI 🙂
Hi RipeGoose2
Yes, the "services-mode" of an agent will take multiple Tasks, that said, these are "service" i.e. light CPU tasks, think pipeline controllers etc.
SteadyFox10 TRAINS_CONFIG_FILE or CLEARML_CONFIG_FILE
Hi MelancholyElk85
Can I manually deleteÂ
.zip
 files with datasets inÂ
.clearml/cache/storage_manager/datasets
 directory?
Yes, you can. I "think" the .zip is stored for easier access, but you can delete it, as long as the "extracted" folder exists, it should be fine.
so the thing with IAM roles, they are designed to allow AWS instances to get "automatic" permission (based on the IAM role). They are not actually designed to generate key/secret as I think the lifetime is be default relatively short. Since the actual request to the S3 comes from the client browser (i.e. outside of AWS cluster) the IAM role cannot apply, and you have to provide the key/secret. The easiest way is to generate S3 keys regardless of the IAM roles, to be used with the clients (sp...
Notice the configuration parameters:
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/services/monitoring/slack_alerts.py#L160
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/services/monitoring/slack_alerts.py#L162
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/services/monitoring/slack_alerts.py#L156
StorageManager
Oh it has no remove 😞StorageHelper.delete
is the only way
Ohh StraightCoral86 did you check cleaml-task
? This is exactly what it does
(this is the CLI, from code you basically call Task.create & Task.enqueue)
Will this solve it ?
ShallowCat10 try something similar to this one, due notice that it might take a while to get all the task objects, so I would start with a single one 🙂
`
from trains import Task
tasks = Task.get_tasks(project_name='my_project')
for task in tasks:
scalars = task.get_reported_scalars()
for x, y in zip(scalars['title']['original_series']['x'], scalars['title']['original_series']['y']):
task.get_logger().report_scalar(title='title', series='new_series', value=y, iteration=...
In theory, one could go over previously executed tasks, and create a copy of a specific scalar metric.
ShallowCat10 does that make sense in your scenario ?
Hi TrickyRaccoon92 , yes the examples folder is a special case, I'm not sure you can directly delete it.
Can you archive individual experiments in it ?
I use Yaml config for data and model. each of them would be a nested yaml (could be more than 2 layers), so it won't be a flexible solution and I need to manually flatten the dictionary
Yes, you are correct, the recommended option would be to store it with task.connect_configuration
it's goal is to store these types of configuration files/objects.
You can also store the yaml file itself directly just pass Path object instead of dict/string
Yes EnviousStarfish54 the comparison is line by line and compared only to the left experiment (like any multi comparison, you have to set the baseline, which is always the left column here, do notice you can reorder the columns and the comparison will be updated)
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
If we have the time maybe we could PR a fix?!
Hi @<1541954607595393024:profile|BattyCrocodile47>
This looks like a docker issue running on mac m2
None
wdyt?
I saw documentation, but I can't make the proper dict object for hyperparams
I see, this is what you are after (I think)
https://github.com/allegroai/clearml/blob/fb644fe9ec6be36b8f2f70a34256fbdc593d663a/clearml/backend_api/services/v2_20/tasks.py#L3138
Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?
WackyRabbit7 This is a json representation of the entire plot (basically how plotly sees it).
What you are after is:full_json[0]['cells']['values']
Which is a list of lists (row order) in the table
You mean to add the extra index url?
you could use :
https://github.com/allegroai/clearml-agent/blob/5f0d51d485629e9dfc2d826622524461e3fcae8a/docs/clearml.conf#L63
If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
Multi-threaded multi-processes multi-nodes 🙂
Hmm, I see the jump from 50 to 100, is that consistent with the last iteration on the aborted Task (before continuing )?
Hi SourOx12
How do you set the iteration when you continue the experiment? is it with Task.init
continue_last_task
?
Yes docker was not installed in the machine
Okay make sense, we should definitely check that you have docker before starting the daemon 😉
Ok, it would be nice to have a --user-folder-mounted that do the linking automatically
It might be misleading if you are running on k8s cluster, where one cannot just -v mount
volume...
What do you think?