Reputation
Badges 1
25 × Eureka!looks like a great idea, I'll make sure to pass it along and that someone reply 🙂
Very Cool!
BTW guys, are you using the task.models[]
to continue from the last checkpoint? or is it task.artifacts[]
?
Worker just installs by name from pip, and it installs not my package!
Oh dear ...
Did you configure additional pip repositories in the Agent's clearml.conf ? https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L77 It might be that (1) is not enough, as pip will first try to search the package in the pip repository, and only then in the private one. To avoid that, in your code you can point directly to an https of your package` Ta...
This is very odd, can you also put here the file names? maybe an odd character is causing it?
Can you also test it with the latest clearml version (1.8.0) ?
And you are seeing a bunch of the GS SSL errors?
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?
LOL 🙂
Make sure that when you train the model or create it manually you set the default "output_uri"
task = Task.init(..., output_uri=True)
or
task = Task.init(..., output_uri="s3://...")
So inside the pipeline logic you can do Task.current_task().id
Or inside a component Task.current_task().parent
Okay this is a bit hacky but will work
@PipelineDecorator.component(...)
def step(...)
import sys
import os
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)), "projects", "main" ))
from file import something
I think RC should be out in a day or two, meanwhile pip install git+
https://github.com/allegroai/clearml.git
Oh I see, yes the "metrics" include both scalars / plots & console outputs,
I also think they are updated only once a day (or maybe twice a day?) so even if you delete them it will take to update
(archive is not delete, you then need to go to the archived view and delete it from there)
No, if you need the cloud ready install (which you do), follow the instructions on the repo readme (not the easy single node setup in the docs, which we will be updating soon)
https://github.com/allegroai/clearml-server-helm-cloud-ready
Hi TrickySheep9
You should probably check the new https://github.com/allegroai/clearml-server-helm-cloud-ready helm chart 😉
https://github.com/allegroai/clearml-server-helm-cloud-ready
Hi @<1603198134261911552:profile|ColossalReindeer77>
I would also check this one: None
that machine will be able to pull and report multiple trials without restarting
What do you mean by "pull and report multiple trials" ? Spawn multiple processes with different parameters ?
If this is the case: the internals of the optimizer could be synced to the Task so you can access them, but this is basically the internal representation, which is optimizer dependent, which one did you have in mind?
Another option is to pull Tasks from a dedicated queue and use the LocalClearMLJob ...
Basically two options, spin the clearml-k8s-glue, as a k8s service.
This service takes clearml jobs and creates k8s job on your cluster.
The second option is to spin agents inside pods statically, then inside the pods the agent work in venv model.
I know the enterprise edition has more sophisticated k8s integration where the glue also retains the clearml scheduling capabilities.
https://github.com/allegroai/clearml-agent/#kubernetes-integration-optional
Oh I see
but now I'm confused if this is from code, why aren't you coping the Pipeline ID from the UI?
regrading the query, it should be something like
task_to_schedule = Task.get_task(project_name='MyProject/.pipelines/PipelineName', task_name='PipelineName')
then will have to rerun the pipeline code then manually get the id and update the task.
Makes total sense to me!
Failed auto-generating package requirements: _PyErr_SetObject: exception SystemExit() is not a BaseException subclass
Not sure why you are getting this one?!
ValueError: No projects found when searching for
MyProject/.pipelines/PipelineName
hmm, what are you getting with:
task = Task.get_task(pipeline_uid_here)
print(task.get_project_name())
Hmmm, are you running inside pycharm, or similar ?
Hmm yeah I think that makes sense. Can you post here the arguments?
I'm assuming you have something like '1.23a' in the arguments?
Hi @<1562973083189383168:profile|GrievingDuck15>
Thanks for noticing, yes the api is always versioned, we should make it clear in the docs. Also if you need the latest one use version 999 , it will default to the latest one it can support
Hi VexedCat68
Could it be the python version is not the same? (this is the only reason not to find a specific python package version)
Wait even without the pipeline decorator this function creates the warning?
Hi @<1691620877822595072:profile|FlutteringMouse14>
Yes, feast has been integrated by at least a couple if I remember correctly.
Basically there are two ways offline and online feature transformation. For offline your pipeline is exactly what would be recommended. The main difference is online transformation where I think feast is a great start
@<1523701868901961728:profile|ReassuredTiger98> if you use the latest RC! i sent and run with --debug
in the log you will see the full /tmp/conda_envaz1ne897.yml
content
Here it is copied from your log, do you want to see if this one works:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- de...
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
try this RC let me know if it works 🙂
pip install clearml==1.13.3rc1