Reputation
Badges 1
282 × Eureka!and out of curiosity, what did you think we were talking about? cos i didn't see anywhere else that might print the secrets.
Can this issue be solved with vault? It doesn't make sense to expose secrets like that.
Its 1.0.0. As printed on the top of the logs in ClearML Server UI.
Hi SuccessfulKoala55 ,i managed to install clearml-agent==1.0.1rc5. However, the same issues occur.
[root@2c7498711bef elasticsearch]# curl
`
{
"cluster_name" : "clearml",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" ...
Ok, i guess i will have to kill the whole thing and refresh it.
Transform feature engineering and data processing code into recurring data ingestion workflows. Start building data stores, develop, automate, and schedule complex data processing jobs.
Yeah that'll cover the first two points, but I don't see how it'll end up as a dataset catalogue as advertised.
I see. Is there a more elaborate codeset that describes the above interactions?
Hi SuccessfulKoala55 , can i confirm the following comments in the docker-compose.yml ?
And after that to run docker-compose commands without loss of data?
docker-compose down docker-compose up
docker-compose.yml
`
version: "3.6"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
#...
Hi SuccessfulKoala55 , thanks, tested the patch and its working as expected now.
Ok thanks, looking forward to it. Would you advise on the bug you encountered?
thanks SuccessfulKoala55 . I verified your last comment and it works.
I think the default action of clearml-agent k8s glue when running a task is to create a virtual env and installing the dependancies. So i'm just checking how to change that behaviour to look at global instead.
Hi, how may i task.init() within these sub processes without write access to the 3rd party scripts and python executables?
yeah, someone should call them out.
thanks. That seems to work. I got a question, does it save the best model or the model in the last epoch?
I see, so its a path. Another question, as far as i can tell, clearml-data will download entire datasets before starting training. This isn't very ideal when we are dealing with billions of datasets (E.g. WE might want to download a subset at a time, send to GPU for training and then use the CPU to concurrently pull another subset.). Any comments on this?
so the clearml-agent daemon needs higher privilege?
ok thanks.
yup. in this case it wasn't root. Removing that USER and -u
in pip solves the problem. However, in our production images, we are required to remove root access.
` FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y
python3-opencv ca-certificates python3-dev git wget sudo ninja-build
RUN ln -sv /usr/bin/python3 /usr/bin/python
create a non-root user
ARG USER_ID=1000
RUN useradd -m --no-log-init --system --uid ${USER_ID} a...
Got that thanks. Just to better understand. When clearml-data upload my recursive folder of image data, it convert it into a compressed form with a different folder structure than the original datasets.
When my software pull the data, i'm returned a str. How would we manipulate the data from there?
Hi, is this currently not working? http://app.community.clear.ml ? I noticed that cleaml UI will cache on the browser and if the backend is not running, its not clear to user that something is wrong (except for broken pages).
I also think it make sense that when you do certain definitive CI actions like publish, it would support some custom scripts to run.
I'm using this feature, in this case i would create 2 agents, one with cpu only queue and the other with gpu queue. And then at the code level decide with queue to send to.
Thank. Gonna try that out. But i hit another snag. Strangely, the Agent is not creating the right venv. This is what the Agent created.
` pip:
- asn1crypto==0.24.0
- attrs==20.3.0
- certifi==2020.12.5
- chardet==4.0.0
- cryptography==2.1.4
- Cython==0.29.22
- furl==2.1.0
- future==0.18.2
- humanfriendly==9.1
- idna==2.6
- importlib-metadata==3.7.0
- jsonschema==3.2.0
- keyring==10.6.0
- keyrings.alt==3.0
- orderedmultidict==1.0.1
- pathlib2==2.3.5
- psutil==5.8.0
- pycrypto==2.6.1
- pygobject...
For example, it would useful to integrate https://github.com/whylabs/whylogs#features into ClearML as part of data and model monitoring. WhyLogs would have their own static page that would preferably be displayed as a new custom tab (besides logs, scalars and plots.).
Its hard to tell, but the agent change was a significant one. Unless python versions has something to do with it.
Sorry i don't quite understand this. The task itself was submitted as I run the code on the client. I suppose the dependancies requirements would be copied over as the experiment is cloned?
Sorry take back. Just realised that this argument only worked on running the agent, but when you enqueue a task into this agent, the argument is not passed on to the container that the agent spawned.
This is the same issue for the docker image. It reverts back to nvidia/cuda:10.1-runtime-ubuntu18.04 despite me setting something else.