you can also specify additional packages on the decorator@PipelineDecorator.component(..., packages=["tqdm>=2.1", "scikit-learn"]) def step_one(...): # code here
Hi SubstantialElk6
No need for that, you can use the helm chart (or spin them once with kubctl) then they take care of scheduling by themselves.
You can also use the k8s glue (basically spinning kubernetes pods automatically for you, based on the Tasks that you push into the ClearML queue)
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
In short, two possible deployments
Static k8s pod running the agent (then the agent runs all the experiments inside t...
Awesome! Any chance you feel like contributing it, I'm sure ppl would be thrilled ๐
Thanks@doru! BTW if you are running a code from outside the trains repo, do you still get the double package?
LOL ๐
Make sure that when you train the model or create it manually you set the default "output_uri"
task = Task.init(..., output_uri=True)
or
task = Task.init(..., output_uri="s3://...")
Hi @<1661542579272945664:profile|SaltySpider22> I'm not sure I understand the answer to my parallel quesion
Well, in that case, just change the order it should solve it (I'll make sure we have that as the default:
conda_channels: ["pytorch", "conda-forge", "defaults", ]
It should solve the issue ๐
Would be cool to let it get untracked as well, especially if we want to as an option
How would you decide what should be tracked?
Are you asking regrading the k8s integration ?
(This is not a must, you can run the clearml-agent
bare-metal on any OS)
How did you add the args? Is it argparser? If so the help is automatically picked so you can see it in yhe UI. BTW, the ability to provide a list of options is a really cool feature to have, I'll make sure to pass ot to product ๐
Hi @<1727497172041076736:profile|TightSheep99>
I think you are correct! it will use the internal individual file upload retry but does not let you control it.
Could you please open a github issue so that we do not forget to add it?
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
however for the remote file it always creates the name with the following pattern:
{filename_prefix}checkpoint{n}.pt
..
Is this the main issue?
Notice that the model name (i.e. the entry on the Task itself) is not directly connected with the stored file name on the target file server (or S3)
@<1523701601770934272:profile|GiganticMole91> really nice!
but can we scheduled new task here?
@<1523701260895653888:profile|QuaintJellyfish58> do you mean schedule a Task from the scheduled function? if yes, you can do something similar to @<1523701601770934272:profile|GiganticMole91> , you create/clone existing Task, change arguments and push it into an execution queue. wdyt?
- Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
- Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?
Hi ReassuredTiger98
To separate between minio and S3 we use:
s3://bucket/file for AWS S3 service and s3://server :port/bucket/file
for minio.
this means if your S3 links would have been s3://<minio-address>:<port>/bucket/file.bin
the UI would have popped the cred window.
Make sense ?
Woo, what a doozy.
yeah those "broken" pip versions are making our life hard ...
poetry
ย stores git related data in ... you get an internal package we have with its version, but no git reference, i.e.ย
internal_module==1.2.3
ย instead ofย
internal_module @H4dr1en
This seems like a bug with poetry (and I think I have run into this one), worth reporting it, no?
I'm assuming these are the Only packages that are imported directly (i.e. pandas requires other packages but the code imports pandas so this is what listed).
The way ClearML detect packages, it first tries to understand if this is a "standalone" scrip, if it does, than only imports in the main script are logged. Then if it "thinks" this is not a standalone script, then it will analyze the entire repository.
make sense ?
- Artifacts and models will be uploaded to the output URI, debug images are uploaded to the default file server. It can be changed via the Logger.
- Hmm is this like a configuration file?
You can do.
local_text_file = task.connect_configuration('filenotingit.txt')
Then open the 'local_text_file' it will create a local copy of the data in runtime, and the content will be stored on the Task itself. - This is how the agent installs the python packages, but if the docker already contactains th...
If you create an initial code base maybe we can merge it?
. Does
Task.connect
send each element of the dictionary as a separate api request? Has anyone else encountered this issue?
Hi SuperiorPanda77
the task.connect ends up as a single call with all the data being sent on a single request.
That said, maybe the connect dict is not the best solution for thousand key dictionary ...
Maybe artifact, or connect_configuration are better suited ?
wdyt?
"
This is Not a an S3 endpoint... what is the files server you configured for it?
But that should not mean you cannot write to them, no?!
JitteryCoyote63 you mean? (notice no brackets)task.update_requirements(".")ย
Either pass a text or a list of lines:
The safest would be '\n'.join(all_req_lines)
Seems like
Task.create
is the correct use-case then, since again this is about testing flows using e.g. pytest,
Make sense
This seems to be fine for now, ...
Sounds good! thanks UnevenDolphin73
No worries, you open the issue on pypa/pip and I will do my best to push forward ๐
We also have to be realistic I have a PR that is waiting for almost a year now (that said it is a major one and needed to wait until a few more features were merged), basically what I'm saying best case scenario is a month to get a PR merged
oh dear ๐ if that's the case I think you should open an Issue on pypa/pip , I'm not sure what we can do other than that ...
If we have the time maybe we could PR a fix?!