Reputation
Badges 1
282 × Eureka!To note, the latest codes have been pushed to the Gitlab repo.
Yes of cos, its a long one.
unfortunately, our security posture is so strict that we cannot have an agent git user that have unfettered read access to all repos.
Hi, clearml-agent==0.17.2rc3 did work. I'm on a 1.19 k8s cluster, and has this error when a task is pulled. Is the glue not compatible with 1.19?
` Pulling task 3a90802d1dfa4ec09fbccba0beffbaa8 launching on kubernetes cluster
Pushing task 3a90802d1dfa4ec09fbccba0beffbaa8 into temporary pending queue
Kubernetes scheduling task id=3a90802d1dfa4ec09fbccba0beffbaa8
kubectl output:
Flag --replicas has been deprecated, has no effect and will be removed in the future.
Flag --generator has been depre...
Ok that worked. So every time i have changes in codes, i will have to rerun the experiment on my own machine that doesn't have any GPUs?
Kinda defeat the purpose of using ClearML Agent.
Hi SuccessfulKoala55 I was refering to the Task.init() or any other SDK API that we use in our training codes.
Hi we did a check. Only 7.16.1 and 6.8.21 and above mitigates the attack. What's the current version that ClearML is using?
No, i can't see the files. But i can see if i don't use ':port' in the URL when uploading. I can't access the machine today, i'll try to check the S3 logs when i'm back.
Can i dig into the mongodb or ES to pull these data?
Hi SuccessfulKoala55 ,i managed to install clearml-agent==1.0.1rc5. However, the same issues occur.
Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.
Referencing your suggestion, we can configure output_uri on task.set_base_docker() but how should we do this for the credentials?
Thanks. That's easy to miss as its not quite apparent in the main docs. How should i pass in env variables with Task?
I'm also beginning to think this is related to https://clearml.slack.com/archives/CTK20V944/p1620664770492400 . Previously when i set force_repo_requirements_txt=true
and system_site_packages: true
, it seems to work. upgrading to v1.02 seems to change things.
My assumption is that the agent will have pulled that off the client's clearml.conf.
alright thanks. Its impt we clarify it works before we migrate the ifra.
Hi AgitatedDove14 , do you mean the configuration tab in the UI? No, i don't see it.
Hi TimelyPenguin76 ,
If you notice in the last screenshot, it state the bucket name to be http://ecs.ai . It then it tries to open http://s3.amazonaws.com/ecs.ai/clearml-models/artifact/uploading_file?X-Amz-Algorithm= ....
Hi thanks.
So i suppose ClearML make use of the information in .git folder at the root of the script folder to gather those info.
I have yet to go through thoroughly with ClearML agent. TimelyPenguin76 , so if i run a training with uncommited changes and didn't commit/push after. When i clone the task, isn't ClearML agent unable to pull that script from the git repo?
Yes for both clearml and clearml-agent
can you please verify that you have all the required packages installed locally ?
Its not installed on the image that runs the experiment. But its reflected in the requirements.txt.
what is the setting ofÂ
agent.package_manager.system_site_packages
True.
I see i understand better now. Thanks.
I've been reading the documentation for a while and I'm not getting the following very well.
Given an open source codes say, huggingface. I wanted to do some training and i wanted to track my experiments using ClearML. The obvious choice would be to use Explicit Reporting in ClearML. But the part on sending my training job. and let ClearML orchestrate is vague. Would appreciate if i can be guided to the right documentation on this.
Hi SuccessfulKoala55 , thanks. Opened issue on the CLearml-Agent GH at https://github.com/allegroai/clearml-agent/issues/67
So the context I'm asking is I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet. And for each experiment, I'll need to go into the code commit to see which id is being used. But on the other hand, I thought I've seen advertised use cases where the experiment can be directly linked to the dataset id being used. The brain's a bit rusty to recall how it was done.
Thank. Gonna try that out. But i hit another snag. Strangely, the Agent is not creating the right venv. This is what the Agent created.
` pip:
- asn1crypto==0.24.0
- attrs==20.3.0
- certifi==2020.12.5
- chardet==4.0.0
- cryptography==2.1.4
- Cython==0.29.22
- furl==2.1.0
- future==0.18.2
- humanfriendly==9.1
- idna==2.6
- importlib-metadata==3.7.0
- jsonschema==3.2.0
- keyring==10.6.0
- keyrings.alt==3.0
- orderedmultidict==1.0.1
- pathlib2==2.3.5
- psutil==5.8.0
- pycrypto==2.6.1
- pygobject...
Ok. I noted this is due to the venv_update setting. It needs to be disabled as it has a dependancy on the internet url. We can close this.
Hi erez, i think i would want to reference the code that transformed the data. Take for example, i received 10k images, i performed some transformation and save it as a next version before i split it up for my ML training. Some time later, i receive a new set of 10k images and wants to apply the same transformation and then append it to the previous 10k as another version. Clearml-data does well for the data-versioning part, but in terms of data provenance, its not clear how i can associate t...
Yeah that sounds good. But from user perspective, especially the untrained, they wouldn't know what to point to. Example, some may think it's an exe, some think it's a zip bundle, and others think it's any github repo with the word vscode.