-- I've been running my script from VSCode for the first time,
In the initial Task (the one created when running inside VSCode) do you have all the packages listed in the "Installed Packages" section ?
Could it be there is a Task.init being called Before this code snippet ?
@<1687643893996195840:profile|RoundCat60> I'm assuming we are still talking about the S3 credentials, sadly no 😞
Are you familiar with boto and IAM roles ?
I suggest a bump in the GitHub issue
Still not supported 😞
As we can’t create keys in our AWS due to infosec requirements
Hmmm
Guys, any chance you can verify the RC solves the issue?pip install clearml==1.0.2rc0
The problems comes from ClearML that thinks it starts from iteration 420, and then adds again the iteration number (421), so it starts logging from 420+421=841
JitteryCoyote63 Is this the issue ?
So I shouldn’t even need to call theÂ
task.set_initial_iteration
 function
I think just removing this call should solve it, I think that what's going on is that this is called twice (once internal once manually by your code)
Yes I was thinking a separate branch.
The main issue with telling git to skip submodules is that it will be easily forgotten and will break stuff. BTW the git repo itself is cached so the second time there is no actual pull. Lastly it's not clear on where one could pass a git argument per task. Wdyt?
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
I double checked the code it's always being passed 😞
Is it possibe to launch a task from Machine C to the queue that Machine B's agent is listening to?
Yes, that's the idea
Do I have to have anything installed (aside from theÂ
trains
 PIP package) on Machine C to do so?
Nothing, pure magic 🙂
strange ...
Check the log to see exactly where it downloaded the torch from. Just making sure it used the right repository and did not default to the pip, where it might have gotten a CPU version...
See if this helps
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
That is quite neat! You can also put a soft link from the main repo to the submodule for better visibility
Hi GleamingGrasshopper63
How well can the ML Ops component handle job queuing on a multi-GPU server
This is fully supported 🙂
You can think of queues as a way to simplify resources for users (you can do more than that,but let's start simple)
Basicalli qou can create a queue per type of GPU, for example a list of queues could be: on_prem_1gpu, on_prem_2gpus, ..., ec2_t4, ec2_v100
Then when you spin the agents, per type of machine you attach the agent to the "correct" queue.
Int...
Hi DefeatedCrab47
You should be able to change the Web server port , but API port (8008) cannot be changed. If you can login to the web app and create a project it means everything is okay. Notice that when you configure trains ( trains-init
) the port numbers are correct 🙂
PompousBeetle71 cool, next RC will have the argparse exclusion feature :)
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
how I can turn off git diff uploading?
Sure, see here
None
Please send the full log, I just tested it here, and it seems to be working
Hmm what do you have here?
os.system("cat /var/log/studio/kernel_gateway.log")