Reputation
Badges 1
25 × Eureka!or me it sounds like the starting of the service is completed but I don't really see if the autoscaler is actually running. Also I don't see any output in the console of the autoscaler.
Do notice the autoscaler code itself needs to run somewhere, by default it will be running on your machine, or on a remote agent,
ok, but this happens in my local machine, not in the agent
resource monitoring is always running in the background, even on local machines. (of course you can turn it off)
SmarmySeaurchin8
Something like this one:vector_series = np.random.randint(10, size=10).reshape(2,5) logger.report_vector(title='vector example', series='vector series', values=vector_series, iteration=0, labels=['A','B'], xaxis='X axis label', yaxis='Y axis label')
Merged, is it working for you now?
--docker or in clearml.conf https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L153
Correct,
Notice that the glue has it's own defaults and the ability to override containers from the UI
Any recommendation or working combinations of AMI
I would take the deeplearning AMIs from Nvidia AWS , I think they work on both CPU and GPU machines.
In terms of dockers, python dockers for CPU and nvidia runtime for GPU
[https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86d[…]d2c01646d599352f6ddd9893420eb815a06c3b90619f8?context=explore](https://hub.docker.com/layers/library/python/3.11.2-bullseye/images/sha256-6128ea86db7f6b1b286d2c01646d599352f6ddd98...
MelancholyElk85 notice there is the pipeline controller queue (i.e. which agent will run the logic of the pipeline), and the default queue for the pipeline steps (i.e. the actual steps of the pipeline).
The default queue for the pipeline logic itself is services . you can change it ( pipeline.start(..., queue='another_q') )
Make sense ?
that is odd..
So if you have 3 agents, how many concurrent experiment are they running ? (actually running, not registered as running)
Hi SubstantialElk6
I can't see that is was removed, could you send the full log ?
Sure, in that case, wait until tomorrow, when the github repo is fully synced
Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? 😁
I can definitely feel you!
(I think the implementation is not trivial, metrics data size is collected and stored as commutative value on the account, going over per Task is actually quite taxing for the backend, maybe it should be an async request ? like get me a list of the X largest Tasks? How would the UI present it? As fyi, keeping some sort of book keeping per task is not trivial either, hence the main issue)
JitteryCoyote63 what's the clearml version ?
Are you always seeing the "model uploaded completed" message ?
What's the python version you are using?
MelancholyElk85 I'm assuming you have the agent setup and everything in the example code works, is that correct ?
Where is it failing on your pipelines ?
Hi RoughTiger69
A. Yes makes total sense . Basically you can use Task.export Task.import to do achieve this process (notice we assume the dataset artifacts links are available on both, usually this is the case)
B. The easiest way would be to use Process , then one subprocess is exporting from dev , where the credentials and configuration is passed with os environment. The another subprocess imports it to the prod server (again with os environment pointing to the prod server). Make sense?
Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.
I think this is the nargs="?" , is that right ?
So as you say, it seems hydra kills these
Hmm let me check in the code, maybe we can somehow hook into it
Hi @<1552101447716311040:profile|SteadySeahorse58>
ValueError: Could not find queue named "services"
Did you set an agent / auto-scaler ? where is the pipeline and its components will be running ?
GreasyPenguin14
Is it possible in ClearML to have a main task (the complete cross validation) and subtasks (one for each fold)?
You mean to see it as nested in the UI? or Auto logged by the code ?
JitteryCoyote63 hacky but sure 🙂
` from trains.config import config_obj
print(config_obj) `
Hi @<1730033904972206080:profile|FantasticSeaurchin8>
You mean in the UI , or when reporting on the SDK?
but here I can tell them: return a dictionary of what you want to save
If this is the case you have two options, either store the dict as an artifact (this makes sense if this is not standalone model you would like to later use), or store as an artifact.
Artifact example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
getting them back
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
Model example:
https:/...
Hi DilapidatedDucks58 ,
Are you running in docker or venv mode?
Do the works share a folder on the host machine?
It might be syncing issue (not directly related to the trains-agent but to the facts you have 4 processes trying to simultaneously access the same resource)
BTW: the next trains-agent RC will have a flag (default off) for torch-nightly repository support 🙂