HurtWoodpecker30 currently in the open source only AWS is supported, I know the SaaS pro version supports it (I'm assuming enterprise as well).
You can however manually spin an instance on GCP and launch an agent on the instance (like you would on any machine)
Actually, no. This is ti spin the clearml-server on GCP, not the agent
Hi QuaintPelican38
Assuming you have open the default SSH port 10022 on the ec2 instance (and assuming the AWS premissions are set so that you can access it). You need to use the --public-ip
flag when running the clearml-session. Otherwise it "thinks" it is running on a local network and it registers itself with the local IP. With the flag on it gets the public IP of the machine, then the clearml-session running on your machine can connect to it.
Make sense ?
Try this one 🙂HyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally
with ?
multipart: false
secure: false
If so, can you post here your aws.s3 section of the clearml.conf? (of course replacing the actual sensitive information with *s)
owning the agent helps, but still it's much better if the credentials don't show up in logs,
They are not, they are always filtered out,
- how does
force_git_ssh_protocol
help please? it doesn't solve the issue of the agent simply not having accessIt automatically maps the host .ssh into the container, so that git can use SSH to clone.
What exactly is not working?
and how are you configuring it?
Hi SmarmySeaurchin8
StorageManager docs is broken in the example notebook here:
Thanks 🙂 I'll make sure we fix it
I want to display is already stored locally
Sure you can:Logger.current_logger().report_image('title','series', iteration=0, local_path='/my_file/is_here.jpg')
BTW is it cheaper than ec2 instance? Why not use the aws autoscaler ?
Hi @<1556812486840160256:profile|SuccessfulRaven86>
Please notice that the clearml serving is not designed for public exposure, it lacks security layer, and is designed for easy internal deployment. If you feel you need the extra security layer I sugget either add external JWT alike authentication, or talk to the clearml people, their paid tiers include enterprise grade security on top
Hi @<1661904968040321024:profile|SpotlessOwl43>
My problem is that when the AWS virtual machine is killed, my Pipelines and Scheduling stop working because of the killed ClearML agent,
are you using the ClearML AWS autoscaler to spin that machine ? or are you spinning it manually ?
Ohh then use the AWS autoscaler, basically it what you want, spin an EC2 and set an agent there, then if the EC2 goes down (for example if this is a spot), it will spin it up again automatically with the running Task on it.
wdyt?
When is clearml-deploy coming to the open source release?
Currently available under clearml-serving (more features are being worked on, i.e. additional stats and backends)
https://github.com/allegroai/clearml-serving
BitingKangaroo95 can you post here the entire console output of clearml-session (including full command line) ?
BitingKangaroo95 nice work 🎊
I think that what did it was:
change the sshd_config
so that it allows port forwarding
, agent forwarding
and x11 forwarding
But just in case, it might be there was a pre existing SSH identifier on your machine, and hence the error.
clear known_hosts under ~/.ssh was also something I would try 🙂
Hi @<1545216070686609408:profile|EnthusiasticCow4>
Oh dear, I think this argument is not exposed 😞
- You can open a GH
- If you want to add a PR this is very simple:None
include_archived=False,
):
if not include_archived:
system_tags = ["__$all", cls.__tag, "__$not", "archived"]
else:
system_tags = [cls.__tag]
...
system_tag...
Hi UnevenDolphin73
This differentiable storage - does it only work on file additions/removal, or also on intra-file changes?
This is on a file level, meaning you change a single byte in the file, the entire file will be packaged in the new version.
Make sense ?
Hi GrotesqueOctopus42
creates a graph of the neural network and would be nice to have it on the experiment logs aswell
I think the main issue is displaying later in the UI, thoughts?
BTW: is this useful for you outside f very local TF debugging ?
Maybe I can plot it using other lib.
I remember a while back there was integration with network visualization but it was hard to support and failed to many times...
If you have library that converts the network into html or image you can report it as debug sample?
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
On debugging, it looks like indices are corrupt.
ishhhhh, any chance you have a backup?
LazyTurkey38 I think this is caused by new versions of pip to report the wrong link:
https://github.com/bwoodsend/pip/commit/f533671b0ca9689855b7bdda67f44108387fe2a9
Hi SmilingFrog76
Great question, sadly multi-node is never simple 🙂
Let's start with the basic, let's assume one worker is available and the other is not, what would you want to happen? (p.s. I'm not aware of flexible multi-node training frameworks, i.e. a framework that can detect another node is available and connect with it mid training, that said, it might exist 🙂 )
no, at least not yet, someone definitely needs to do that though haha
Currently all the unit tests are internal (the hardest part is providing server they can run against and verify the results, hence the challange)
For example, if ClearML would offer a
TestSession
that is local and does not communicate to any backend
Offline mode? it stores everything into a folder, then zips it, you can access the target folder or the zip file and verify all the data/states
What do you mean by a custom queue ?
In the queues page you have a plus button, this will just create a new queue
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
ExcitedFish86 Oh if this is the case:
in your cleaml.conf:agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: true
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://git...
when u say useÂ
Task.current_task()
 you for logging? which i’m guessing that the fastai binding should do right?
right, this is a fancy way to say, make sure the actual sub-process is initializing ClearML so all the automagic kicks in, since this is not "forked" but a whole new process, calling Task.current_task is the equivalent of calling Task.init with the same arguments (which you can also do, I'm not sure which one is more straight forward, wdyt?)
Regarding this, does this work if the task is not running locally and is being executed by the trains agent?
This line: "if task.running_locally():" makes sure that when the code is executed by the agent it will not reset it's own requirements (the agent updates the requirements/installed_packages after it installs them from the requiremenst.txt, so that later you know exactly which packages/versions were used)
Regrading the missing packages, you might want to test with:force_analyze_entire_repo: false
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L162
Or if you have a full venv you like to store instead:
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L169
BTW:
What is the missed package?
CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?