After it finishes the 1st Optimzation task, what's the next job which will be pulled ?
The one in the highest queue (if you have multiple queues)
If you use fairness it will pull in round robin from all queues, (obviously inside every queue it is based on the order of jobs).
fyi, you can reorder the jobs inside the queue from the UI ๐
DeliciousBluewhale87 wdyt?
It is currently only enabled when using ports mode, it should be enabled by default , i.e a new feature :)
Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...
. It is not possible to specify the full output destination right?
Correct ๐
it certainly does not use tensorboard python lib
Hmm, yes I assume this is why the automagic is not working ๐
Does it have a pythonic interface form the metrics ?
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
yup, it's there in draft mode so I can get the latest git commit when it's used as a base task
Yes that seems to be the problem, if it is in draft mode, you have no outputs...
Sure just setup clearml-agent
on any machine ๐
(The app.community server is the control plane)
I think this is the issue, it was search and replaced . The thing is I'm not sure the helm chart is updated to clearml. Let me check
That might be me, let me check...
MagnificentPig49 quick update, front-end guys updated me that with the next trains-server update they will have the web client code available on the repository , ETA probably mid May or so :)
This is the reason you are getting an error ๐
Basically the session asks the agent to setup a new SSH server with credentials on the remote machine, this is not an issue inside a container, as this is an isolated environment, but when running in venv mode the User running the agent is not root, hence it cannot spin/configure an SSH server.
Make sense ?
Well I guess you can say this is definitely not self explanatory line ๐
but, it is actually asking whether we should extract the code, think of it as:if extract_archive and cached_file: return cls._extract_to_cache(cached_file, name)
Hi TrickyRaccoon92
BTW: checkout the HP optimization example, it might make things even easier ๐ https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
if fails duringย
add_step
ย stage for the very first step, becauseย
task_overrides
ย contains invalid keys
I see, yes I guess it it makes sense to mark the pipeline as Failed ๐
Could you add a GitHub issue on this behavior, so we do not miss it ?
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
Hi ArrogantBlackbird16
but it returns a task handle even after the Task has been closed.
It should not ... That is a good point!
Let's fix that ๐
So assuming they are all on the same LB IP: You should do:
LB 8080 (https) -> instance 8080
LB 8008 (https) -> instance 8008
LB 8081 (https) -> instance 8081
It might also work with:
LB 443 (https) -> instance 8080
Hi FloppyDeer99
What is the meaning of no real scheduling
I think the meaning is that from the moment a k8s job is created, the k8s is in charge of actually spinning the container. Since k8s has no real priority/order the scheduling order is not guaranteed form this point.
The idea of the cleaml-k8s -glue is that the glue will launch a job on the k8s cluster only if it is sure there are enough resources to actually spin the job now (as opposed to, sometime in the future), this mea...
GiddyTurkey39
A flag would be really cool, just in case if theres any problem with the package analysis.
Trying to think if this is a system wide flag (i.e. trains.conf) or a flag in task.init.
What do you think?
How can the first process corrupt the second
I think that something went wrong and both Agents are using the same "temp" folder to setup the experiment.
why doesn't this occur if I run pipeline from command line?
The services queue is creating new dockers with everything in them so they cannot step on each others toes (so to speak)
I run all the processes as administrator. However, I've tested running the pipeline from command line in non-administrator mode, it works fine....
MagnificentSeaurchin79
Do notice that the pipeline controller assumes you have an agent running
@<1597762318140182528:profile|EnchantingPenguin77> can you provide the full log?
My question is, which version do you need docker compose?
Ohh sorry, there is no real restriction, we just wanted easy copy-paste for the installation process.
Thanks! a few thoughts below ๐
- not true โ you can specify the image you want for each stepMy apologies, looking at the release notes, it was added a while back and I have not noticed ๐
- re: role-base access control - see Outerbounds Platform that provides a layer of security and auth features required by enterprisesRole based access meaning limiting access in metaflow i.e. specific users/groups can only access specific projects etc. ...
It will store the entire content of the file, then you can edit it in the UI, and in remote it will return a new local copy of the file (based on the data in the UI) for you to read.
I mean , the python package, not the trains-server version