NaughtyFish36

6 Questions, 21 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

21 × Eureka!

Questions 6
Answers 21

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi All, Does Anyone Know How To Pass

Hi all, does anyone know how to pass iam_role or iam_name to the aws_autoscaler.py example? I pass these in when registering config from a yaml file, execute...

clearml

2 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

Hi all! Could do with some help on running registered task on a clearml-agent. My workflow so far is as follows: Execute a local training run (from within a ...

mlops

3 years ago

0 Votes

19 Answers

2K Views

0 Votes 19 Answers 2K Views

Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

Has anyone got any experience with C++ extensions in Python when using ClearML? In our setup.py we have: ext_modules=[ Extension( "file_io.extio", sources=["...

mlops

2 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi All, What Is The Appropriate Way To Mount A Volume When Running The Docker Container For A Task? I'M Executing A Task From The Experiment Manager And Adding In

Hi all, what is the appropriate way to mount a volume when running the docker container for a task? I'm executing a task from the experiment manager and addi...

clearml

2 years ago

0 Votes

2 Answers

3K Views

0 Votes 2 Answers 3K Views

Hi All! Having A Bit Of An Ssh Issue, Basically I'M Running The Clearml K8S-Glue Agent In A Pod In K8S, Which Happily Spins Up New Pods On A Managed Node Group I'Ve Set Up On Eks. However I Can'T Seem To Connect Via Ssh To My Git Repo When I Execute A Tas

Hi all! Having a bit of an ssh issue, basically I'm running the clearml k8s-glue agent in a pod in k8s, which happily spins up new pods on a managed node gro...

kubernetes mlops

2 years ago

0 Votes

4 Answers

3K Views

0 Votes 4 Answers 3K Views

Hi There, I'M Having A Slight Issue With My Kubernetes Pods Silently Failing After Downloading A Clearml Registered Dataset (Which Is Around 60Gb) As Part Of A Model Training Script. The Pods Consistently Fail After Running The

Hi there, I'm having a slight issue with my kubernetes pods silently failing after downloading a clearml registered dataset (which is around 60gb) as part of...

kubernetes mlops

2 years ago

0 Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

error: could not write config file /root/.gitconfig: Device or resource busy Using cached repository in "/root/.clearml/vcs-cache/{repo}.git.{commit}/{repo}.git"I have noticed this, is there a reason it's using a cached repo here?

2 years ago

0 Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

SuccessfulKoala55 Agent ver is 1.4.1, clearml sdk 1.7.2

2 years ago

0 Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

pushed to a branch

2 years ago

0 Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

CostlyOstrich36 I use the task.set_base_docker(docker_image="some_image") to set the docker image for the task for future experiment runs, i don't think clearml detects the image i'm running on locally when registering the task

2 years ago

0 Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

Seems like this was a hidden SSH key error that wasn't being revealed, it was using a cached repo rather than cloning the remote repo.

2 years ago

0 Hi There, I'M Having A Slight Issue With My Kubernetes Pods Silently Failing After Downloading A Clearml Registered Dataset (Which Is Around 60Gb) As Part Of A Model Training Script. The Pods Consistently Fail After Running The

update on this - seems like it's an error in our code which isn't being appropriate raised by the looks of things! I'll dig into it further but for now this can be left. thanks for replying!

2 years ago

0 Hi All, What Is The Appropriate Way To Mount A Volume When Running The Docker Container For A Task? I'M Executing A Task From The Experiment Manager And Adding In

Solved this but going to leave it up in case it's useful to anyone - just used the pod template in the values.yml for the clearml-agent helm chart to mount the hostpath as a volume mount i.e.:

    podTemplate:
    # -- volumes definition for pods spawned to consume ClearML Task (example in values.yaml comments)
    volumes:
      - name: x11-host-dir
        hostPath:
          path: /tmp/.X11-unix 
    volumeMounts:
      - name: x11-host-dir
        mountPath: '/tmp/.X11-unix'

2 years ago

0 Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

Yeah just checked this, the commit checks out on a different machine

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

We have
ext_modules=[ Extension( 'leap.learn.data_tools.file_io.extio', sources=['leap/learn/data_tools/file_io/extio.cpp'], depends=['leap/learn/data_tools/file_io/samples.h'], define_macros=[('NPY_NO_DEPRECATED_API', 'NPY_1_9_API_VERSION')], extra_compile_args=['-std=c++11'], libraries=['rt'] if platform.system() == 'Linux' else [], include_dirs=[GetNumpyIncludeDirectoryLazy()], optional=True ),in our setup.py which I belie...

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

this is the installation for a locally used package in the task fyi, so it's imported from the training script

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

build-essentials didn't work unfortunately through installing it at startup

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

I think it is to do with the build-essential issue. Let me talk you through the process:
Run a docker image locally called keras-hannd-cml (i.e. the one that is then being used by the agent as the base image later on) Run the training script to register the task, which works fine, all dependencies work i.e. the c++ packages are working correctly on that container Execute the task on an agent running in docker mode with the same image that the task was registered with i.e. keras-hannd-...

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

AgitatedDove14 fyi I do install build-essential manually in the logs I just sent you, and it still fails

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

AgitatedDove14 Unfortunately that didn't work either, I agree that should run the setup.py correctly but something still seems to be breaking, I've sent you the most recent logs

2 years ago

0 Hi All, Does Anyone Know How To Pass

Ok, thanks!

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

AgitatedDove14 The issue seems to be that the setup.py containing the Extension module we need isn't being run in the clearml virtual environment within the docker container. What is the correct process for installing local packages so they're replicated correctly when running remotely on an agent?

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

AgitatedDove14 Yeah I added it into the initial bash script to test whether that would fix the issue. The task is created using the SDK in the model training script i.e. Task.init() . I was under the impression the local package would be installed due to replication of the environment I initialised the task under, however I've tried the add_requirements("leap") function and just seem to be getting an "isadirectory" error? I also tried manually adding leap==0.4.1 in the task...

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

AgitatedDove14 . sorry what .so are you referring to here? I can't see that in the logs. The docker image installs the package via first installing requirements i.e. RUN pip install --no-cache-dir -r /tmp/requirements.txt the repo is copied locally, and then leap is installed through RUN cd /opt/keras-hannd && pip install --no-deps . .

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

and it's clearml version 1.7.2

2 years ago

0 Has Anyone Got Any Experience With C++ Extensions In Python When Using Clearml? In Our Setup.Py We Have:

AgitatedDove14 DM's you the log file for the failed task. I have tried using a task startup script to install G++, gcc etc. but it didn't seem to work, I'll try build-essentials too. I'm also interested in the way that the environments are set up in clearml, I read in the docs that the task looks for a requirements.txt file to construct the env, but does this prevent a local package being built correctly i.e. through setup.py when running a remote task?

2 years ago

Hmm yeah I have monitored some of the resource metrics and it didn't seem to be an issue. I'll attempt to install prometheus / grafana. This is a PoC however so I was hoping not to have to install too many tools.

The code running is basically this:
` if name == "main":

# initiate clear ml task
task = Task.init(
    project_name="hannd-0.1",
    task_name="train-endtoend-0.2",
    auto_connect_streams={'stdout': True, 'stderr': True, 'logging': True}
)
tas...

2 years ago