Reputation
Badges 1
25 × Eureka!default is clearml data server
Yes the default is the clearml files server, what did you configure it to ? (e.g. should be something like None )
As long as the ~/.aws is configured, I "think" it should work. (I'm assuming you are referring IAM roles?)
I would also suggest using the latest aws_autoscaler (basically it adds a CLI wizard, I think the functionality is very much the same)
What is the proper way to change a clearml.conf ?
inside a container you can mount an external clearml.conf, or override everything with OS environment
https://clear.ml/docs/latest/docs/configs/env_vars#server-connection
EmbarrassedSpider34
Sync_folder and upload
Several times along the code and then
Do notice they overwrite one another...
AbruptHedgehog21 could it be the console log itself is huge ?
A few examples here:
None
Grafana model performance example:
browse to
login with: admin/admin
create a new dashboard
select Prometheus as data source
Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
Change type to heatmap, and select on the right hand-side under "Data Format" s...
I think your "files_server" is misconfigured somewhere, I cannot explain how you ended up with this broken link...
Check the clearml.conf on the machines or the env vars ?
The versions don't need to match, any combination will work.
What is the link you are seeing there?
The task pod (experiment) started reaching out to an IP associated with malicious activity. The IP was associated with 1000+ domain names. The activity was identified in AWS guard duty with a high severity level.
BoredHedgehog47 What is the pod container itself ?
EDIT:
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
https://hub.docker.com/layers/library/ubuntu/18.04/images/sha256-d5c260797a173fe5852953656a15a9e58ba14c5306c175305b3a05e0303416db?context=explore
Hi WorriedParrot51 , what do you mean by "call get_parameters_as_dict() from agent" ?
Do you mean like change the trains-agent to run the task differently?
Or inside your code while the trains agent runs it?
From the code itself (regardless off how you run it) you can always call, and get the current states parameters (i.e. from backend if running with trains-agent, or copied from the code, if running manually)task.get_parameters_as_dict()
You mean to add the extra index url?
you could use :
https://github.com/allegroai/clearml-agent/blob/5f0d51d485629e9dfc2d826622524461e3fcae8a/docs/clearml.conf#L63
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
I found something btw, let me check...
A definite maybe, they may or may not be used, but we'd like to keep that option
The precursor to the question is the idea of storing local files as "input artifacts" on the Task, which means that if the Task is cloned the links go with it. Let's assume for a second this is the case, how would you upload these artifacts in the first place?
PungentLouse55 from the screenshot I assume the experiment template you are trying to optimize is not the one from the trains/examples 🙂
In that case, and based on the screenshots, the prefix is "Args/" as this is the section name.
Regrading objective metric, again based on your screenshots:objective_metric_title="Accuracy" objective_metric_series="Validation"Make sense ?
So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?
Hmm there was a commit there that was "fixing" some stuff , apparently it also broke it
Let me see if we can push an RC (I know a few things are already waiting to be released)
Are you seeing the entire jupyter notebook in the "uncommitted changes" section
Hi ClumsyElephant70
extra_docker_shell_script: ["export SECRET=SECRET", ]
I think ${SECRET} will not get resolved you have to specifically have text value there.
That said it is a good idea to resolve it if possible, wdyt?
The wheel you download from pip, for example this one torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl
is actually both CPU and cuda 117
I always have my notebooks in git repo but suddenly it's not running them correctly.
What do you mean?
Can I switch off git diff (change detection?)
Yes, Task.init(..., auto_connect_frameworks={"detect_repository": False})
The easiest would be as an artifact (I think).
Let's assume you put it into a csv file (with pandas or mnaually)
To upload (from the pipeline Task itself):task.upload_artifacts(name='summary', artifact_object='~/my/summary.csv')Then if you want to grab it from anywhere else:task = Task.get_task(task_id='HPO controller Task id here') my_csv = Task.artifacts['summary'].get_local_copy()
If you want to store as dict it might be even easier:
` task.upload_artifacts(name='summary', artifa...
Okay fixed, you will be able to override it with output_uri=False (which is ignored on remote execution if you have a project default or Task output uri set in the UI).
Make sense ?
PungentLouse55 could you test with 0.15.2rc0 see if there is any difference ?
Hi @<1689446563463565312:profile|SmallTurkey79>
App Credentials now disappear on restart.
You mean in the web UI?
Hi SweetGiraffe8
could you try with the latest RCpip install 0.17.5rc2
VexedCat68
. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
Are you uploading the checkpoints manually with artifacts? or is it autologged & uploaded ?
Also why no reuse and overwrite older checkpoints ?
correct, you can pass it as keys on the "task_filter" argument, e.g:Task.get_tasks(..., task_filter={'status': ['failed']})
Hi @<1556812486840160256:profile|SuccessfulRaven86>
Please notice that the clearml serving is not designed for public exposure, it lacks security layer, and is designed for easy internal deployment. If you feel you need the extra security layer I sugget either add external JWT alike authentication, or talk to the clearml people, their paid tiers include enterprise grade security on top