Reputation
Badges 1
25 × Eureka!do you have your Task.init
call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
I think for it to work you have to have ssh running on the host machine (the socket client itself), no?
OddAlligator72 quick question:
suggest that you implement a simple entry-point API
How would the system get the correct packages / git repo / arguments if you are only passing a single function entrypoint ?
I assume issue: None
Yeah this is odd I noticed as well. Let me ask the guys to take a look
Yes I was thinking a separate branch.
The main issue with telling git to skip submodules is that it will be easily forgotten and will break stuff. BTW the git repo itself is cached so the second time there is no actual pull. Lastly it's not clear on where one could pass a git argument per task. Wdyt?
Hi @<1614069770586427392:profile|FlutteringFrog26>
So since you have the Task id. you do:
task = Task.get_task("task id here")
Then to get the models
models = task.models["output]
the models is a list And a dict, if you want the lats one you do last_model = models[-1]
if you know the best model name you do model = models["best model"]
(notice the model name is the exact one you see in the UI. Once you have the model object you can get a copy with `model.get_lo...
Hi @<1603198134261911552:profile|ColossalReindeer77>
I would also check this one: None
check if the fileserver docker is running with docker ps
why is pushing into the services queue required ...
The services queue is usually connected with an agent running in "services mode" which means this agent is executing multiple tasks in parallel (as opposed to regular agent that only launches one Task at a time, the assumption is that "service" Tasks are usually not heavy on cpu/ram so multiple instances make sense)
Okay this is a bit hacky but will work
@PipelineDecorator.component(...)
def step(...)
import sys
import os
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)), "projects", "main" ))
from file import something
BTW: the new documentation should contain a full search over the docstring
I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
I want to be able to delete only the logs since they are taking a lot of space in my case.
I see... I do not think this is possible 😞
You can disable the auto logging though ... pass auto_connect_streams=False
to Task.init
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
try this RC let me know if it works 🙂
pip install clearml==1.13.3rc1
WickedElephant66 this seems like a general network issue, like the docker service is missing your companies firewall certificate.
Can you pull any container from docker hub ?
Thanks @<1523704157695905792:profile|VivaciousBadger56> ! great work on the docstring, I also really like the extended example. Let me make sure someone merges it
- Artifacts and models will be uploaded to the output URI, debug images are uploaded to the default file server. It can be changed via the Logger.
- Hmm is this like a configuration file?
You can do.
local_text_file = task.connect_configuration('filenotingit.txt')
Then open the 'local_text_file' it will create a local copy of the data in runtime, and the content will be stored on the Task itself. - This is how the agent installs the python packages, but if the docker already contactains th...
Hi @<1658281093108862976:profile|EncouragingPenguin15>
Should work, I'm assuming multiple nodes are running agents ? or are you saying Ray spins the jobs and clearml logs them ?
I guess last followup question, is there a way to cap costs?
Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user 🙂 )
no, i just commented it and it worked fine
Yeah, we should add a comment saying "optional" because it looks as if you need to have it there if you are using Azure.
LOL, if you can get it to run any python code, I can help with the rest. We just need to make sure we can capture the output, and then start the VScode remote debugging feature directly from the extension.
s like the
would be a really good starting place.
This is actually JS (typescript) ... not python, not sure on how to continue from there 😞
OddAlligator72 okay, that is possible, how would you specify the main python script entry point? (wouldn't that make more sense rather than a function call?)
How do you determine which packages to require now?
Analysis of the actual repository (i.e. it will actually look for imports 🙂 ) this way you get the exact versions you hve, but nit the clutter of the entire virtual environment
Yes, the agent's mode is global, i.e. all tasks are either inside docker or in venv. In theory you can have two agents on the same machine one venv one docker listening to two diff queues
So a bit of explanation on how conda is supported. First conda is not recommended, reason is, is it very easy to create a setup on conda that is un-reproducible by conda (yes, exactly that). So what trains-agent does, it tries to install all the packages it can first with conda (not one by one, because that will break conda dependencies), then the packages that it failed to install from conda, it will install using pip.
data it is going to s3 as well as ebs. Why so it should only go to s3
This sounds odd, if this is mounted then it goes to the S3 (the link will point to the files server, but it will be stored on the mounted drive i.e. S3)
wdyt?