Has anyone done this exact use case - updates to datasets triggering pipelines?
Hi TrickySheep9 seems like this is following a diff thread, am I missing something ?
You can try just pulling the "metric" section of the Task, but I cannot imaging the network bandwidth is the issue?
Could it be load on the clearml-server (i.e. it needs to handle lots of requests ?)
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
π
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar
and you have your credentials in the browser when deleting a Task ?
Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
JitteryCoyote63 hmm that is a pickle ...
let me check the code ...
yep, that's the reason it is failing, how did you train the model itself ?
I get a popup saying that the actual files werenβt deleted from S3 (so presumably only the metadata on the server gets deleted).
Hi QuaintPelican38
The browser client actual issues the delete "command", (the idea is separation of the meta-data and data, e.g. artifacts). That means you have to provide the key/secret to the UI (see profile page)
PungentLouse55 , make sure you fix the metric objective and args:
Add "General/" prefix to the list of arguments to optimize, and change the objective metric from "Accuracy" to "epoch_accuracy"
Correct (with the port mapping service in it)
@<1524922424720625664:profile|TartLeopard58> @<1545216070686609408:profile|EnthusiasticCow4>
Notice that when you are spinning multiple agents on the same GPU, the Tasks should request the "correct" fractional GPU container, i.e. if they pick a "regular" no mem limit.
So something like
CLEARML_WORKER_NAME=host-gpu0a clearml-agent daemon --gpus 0 clearml/fractional-gpu:u22-cu12.3-2gb
CLEARML_WORKER_NAME=host-gpu0b clearml-agent daemon --gpus 0 clearml/fractional-gpu:u22-cu12.3-2gb
```...
copy paste the trains.conf from any machine, it just need the definition of the trains-server address.
Specifically if you run in offline mode, there is no need for the trains.conf and you can just copy the one on GitHub
DeterminedToad86 I suspect that since it was executed on sagemaker it registered a specific package that is unique for Sagemaker (no to worry installed packages can be edited after you clone/reset the Task)
WackyRabbit7 This is a json representation of the entire plot (basically how plotly sees it).
What you are after is:full_json[0]['cells']['values']
Which is a list of lists (row order) in the table
The agent cannot use another user (it literally has no way of getting credentials). I suspect this is all a by product of the actual mount point)
CooperativeFox72 a bit of info on how it works:
In "manual" execution (i.e. without an agent)
path = task.connect_configuration(local_path, name=name
path = local_path , and the content of local_path is stored on the Task
In "remote" execution (i.e. agent)
path = task.connect_configuration(local_path, name=name
"local_path" is ignored, path is a temp file, and the content of the temp file is the content that is stored (or edited) on the Task configuration.
Make sense ?
I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...
If this is how the repo links look like, do not set anything in the clearml.conf
It "should" use the ssh for the ssh links, and http for the http links.
Any reason not to do so in the conf file ?
BroadSeaturtle49 agent RC is out with a fix:pip3 install clearml-agent==1.5.0rc0
Let me know if it solved the issue
I'm with on this one π it better to make a company wide decision on these things and not allow too much flexibility (just two options to choose from, and it should be enough, I think)
Hi CooperativeFox72
I think the upload reporting (files over 5mb) was added post 0.17 version, hence the log.
The default is upload chunk reporting is 5MB, but it is not configurable, maybe we should add it to the clearml.conf ? wdyt?
Interesting, do you think you could PR a "fixed" version ?
https://github.com/allegroai/clearml-web/blob/2b6aa6043c3f36e3349c6fe7235b77a3fddd[β¦]app/webapp-common/shared/single-graph/single-graph.component.ts
GrumpyPenguin23 could you help and point us to an overview/getting-started video?
actually no it is not, alpine is Not a good baseline, is is very very slim missing a ton of stuff.
I would use bullseye or slim (depending how many aux things you need on the container)
https://hub.docker.com//python/tags?page=1&name=bullseye
https://hub.docker.com//python/tags?page=1&name=slim-bullseye
Hi EcstaticPelican93
Sure, the model deployment itself (i.e. the serving engine) can be executed on any private network (basically like any other agent)
make sense ?
yes I'm with you, we need to fix it asap
BeefyHippopotamus73 are you saying that on a remote machine you cannot set AWS_PROFILE
? or is it the clearml.conf
is missing ? (not sure I follow how / who spins the remote machine)