Reputation
Badges 1
25 × Eureka!I prepared my own image and want use this venv
No worries, it creates a "transparent" venv, it uses everything from the docker (the penalty of create a new venv is negligible 🙂 , you end up with the exact same set of packages)
Hi VirtuousFish83
Apologies for the documentation in the docs 🙂 It sounds complicated but actually should be relatively simple. Based on what I understand, you already have the server setup and you code integrated. The question is "can you see an experiment in the UI"? If you do, then you can right click it, clone the experiment , edit parameters and send for execution (enqueue). If the experiment is not in the UI you can either (1) run the code with the Task.init call, it ill automatica...
but I belive it should have work with 0.14.1 as well
Correct
It seems to follow a structure specific to clearml,
Actually plotly.js 🙂
Can you also make sure you did not check "Disable local nachine git detection" in the clearml PyCharm plugin?
I'm assuming your are looking for the AWS autoscaler, spinning EC2 instances up/down and running daemons on them.
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler
So why is it trying to upload to "//:8081/files_server:" ?
What do you have in the trains.conf on the machine running the experiment ?
Hi GrievingTurkey78task.models['output'][-1]
should return the last stored model.
What do you have under under task.models['output'][-1].url
with ?
multipart: false
secure: false
If so, can you post here your aws.s3 section of the clearml.conf? (of course replacing the actual sensitive information with *s)
Hmmm, that actually connects with something we were thinking about: introducing sections to the hyper parameters. This way we could easily differentiate between the command line arguments and other types of parameters. DilapidatedDucks58 what do you think?
MysteriousBee56 I would do Task.create()
you can get the full Task internal representation with task.data
Then call task._edit(script={'repo': ...}) to edit/update all the Task entries.
You can check the dull details of the task object here: https://github.com/allegroai/trains/blob/master/trains/backend_api/services/v2_8/tasks.py#L954
BTW: when you have a sample script working, consider PR-ing it, I'm sure it will be useful for others 🙂 (also a great way to get us involved with debuggin...
So the issue is that you have two reference branches on the local git, one to gitlab one to gitea and it fails to understand which on is the correct remote ...
I wonder if "git ls-remote --get-url" will always work ?!
DepressedChimpanzee34 <character> will almost always be converted into \ because otherwise it will not support \t or \n etc.
What I'm looking here is some logic that will allow us not to break backwards compatibility on the one hand, but still will allow you to have something like "first\second" entry.
WDYT? any ideas? (I really want to make sure we fix it as soon as possible)
"what's the trains/trains-agent/trains-server versions ?" how can I check it?
trains/trains-agent are pip packages os,pip freeze | grep trains
trains-server you can check in the /profile page top left corner
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
Hmm we might need more detailed logs ...
When you say there is a lag, what exactly doe s that mean? if you have enough apiserver instances answering the requests, the bottleneck might be the mongo or the elastic ?
Woot woot
ChubbyLouse32 when you get it working please PR it, this is very very cool!
(I'll be happy to help 🙂 )
SoggyBeetle95 you can configure the credentials in the clearml.conf
running on the agent machines:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L320
(I'm assuming these are storage credentials)
If you need general purpose env variables, you can ad them here:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
with ["-e", "MY_VAR=MY_VALUE"]
LazyLeopard18 are you using the StorageManager to access azure:// links?
One thing though - I am running agent on behalf of a regular user.
Oh that might be credentials / docker service issue (i.e. the user might not have the ability to rn a docker with --gpus, but as you mentioned,, that seems like an arch thing 🙂 )
So if I am not using remote machine can I disable this?
yes I think you can, add to your clearml.conf
sdk.development.store_jupyter_notebook_artifact = false
BTW: why would you turn it off ?
I'm checking the preview HTML and it seems like it was not uploaded...
Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer?
Hmm in theory, but not in practice 😞
if ClearML is following OAuth 2.0, t
This is for the SSO part, not for the API, API is only using JWT for verification, the login process itself is with external SSO (OAuth 2.0). But the open-source version does not support SSO 😞
Why are you trying to add another ELB with JWT verification on it ? ...
AdventurousButterfly15
Despite having manually installed this torch version, during task execution agent still tries to install it somehow and fails:
Are you running the agent in venv mode? or docker mode?
Notice that in docker mode it inherits the python packages from the container, and adds/reinstalls missing packages. In venv mode it creates a New clean venv (there is no way to inherit a venv, venv can only inherit from system wide installed packages)
The idea is that you cannot e...
If that's the case check the free space in the monitoring of the experiment, you will find the free space in GB logged
Could not find a version that satisfies the requirement pytorch~=1.7.1
Seems like pytorch 1.7.1 has no package for python 3.7 ?
Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?
ScantChimpanzee51 what's the use case for the full path without specific artifact?
GrievingTurkey78 yes, you are correct on both.
Will the sweep functionality work?
Yes it should, that said, it will not use the trains-agent
so you are limited to the machine running the sweep.
If you want to do HPO on multi-node, checkout this example 🙂
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py