AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Hi There, I'Ve Encountered A Problematic Behavior In Python. When Defining An Argument A Default Value Of

PompousBeetle71 is this ArgParser argument or a connected dictionary ?

4 years ago

0 Hi Guys, I Managed To Set Up A Kubernetes Cluster And Install Trains Into It. While Testing My Set-Up I Run The Test_Reporting.Py Example

I have to leave i'll be back online in a couple of hours.
Meanwhile see if the ports are correct (just curl to all ports see if you get an answer) if everything is okay, try again to run the text example

3 years ago

0 Hi Guys, I Managed To Set Up A Kubernetes Cluster And Install Trains Into It. While Testing My Set-Up I Run The Test_Reporting.Py Example

Hi WickedGoat98

"Failed uploading to //:8081/files_server:"

Seems like the problem. what do you have defined as files_server in the trains.conf

3 years ago

0 Hey Guys, Anyone Knows What It Means If I Deployed A New Trains Server And When I Access It My Tab Looks Like This?

Hi ColossalAnt7
Try ctrl-F5 and refresh the page?!
It seems you are missing a few buttons 😉

4 years ago

0 Hey Guys, Anyone Knows What It Means If I Deployed A New Trains Server And When I Access It My Tab Looks Like This?

Assuming you are using docker-compose, the console output is a good start

4 years ago

0 Is There A Way How I Can Get How Many Minutes The Gpu Has Been Used In A Month? The Duration Of An Iteration Is For Every Run Different If You Vary Batch Size. Model, Or Other Stuff. I Want To Do A Crude Energy Consumption Calculation By Doing A Sum Over

You can query the system and get all the experiments based on date, then grab the machine GPU metrics.
DefeatedCrab47 check the cleanup service, it queries the system with the Apiclient.
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/examples/services/cleanup/cleanup_service.py#L72

4 years ago

Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?

4 years ago

0 Multiprocessing.Pool.Remotetraceback: """ Traceback (Most Recent Call Last): File "/Usr/Lib/Python3.6/Multiprocessing/Pool.Py", Line 119, In Worker Result = (True, Func(*Args, **Kwds)) File "/Usr/Lib/Python3.6/Multiprocessing/Pool.Py", Line 44, I

yes that makes send, I think what happened is one of the processes completed the Task (i.e. closed it) before the others did, and so they threw exception.

I switched to have all tasks in a separate process

I think that's probably the best (performance wise as well), nice!

3 years ago

0 I'M Having A Problem Reusing The Last Task Id On Jupyter Notebooks. Dispite Having Reuse_Last_Task_Id=True On Task.Init, It Always Creates A New Task Id. Anyone Ever Had This Issue?

Hi GrotesqueOctopus42

Dispite having reuse_last_task_id=True on Task.init, it always creates a new task id. Anyone ever had this issue?

So the way "reuse_last_task_id=True" works is that if there are no artifacts on the Task it will reuse it, but when running inside jupyter it always has artifacts (the notebook itself), so it starts a new Task.
You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?

one year ago

0 I'M Having A Problem Reusing The Last Task Id On Jupyter Notebooks. Dispite Having Reuse_Last_Task_Id=True On Task.Init, It Always Creates A New Task Id. Anyone Ever Had This Issue?

You can however pass a specific Task ID and it will reuse it "reuse_last_task_id=aabb11", would that help?

Hmm I'm sorry it might be "continue_last_task", can you try:
Task.init(..., continue_last_task="aabb11")

one year ago

0 Hii Everyone! I'M Having An Issue Using An Agent Without A Gpu. I'M Using It On Docker Mode (To Allow Ssh), I Changed The Default Docker Image On The Config File To Python 3.9.6 But It Seems It Is Still Trying To Use The Nvidia Image. The Error Message G

Hi GrotesqueOctopus42 ,

BTW: is it better to post the long error message on a reply to avoid polluting the channel?

Yes, that is appreciated 🙂
Basically logs in the thread of the initial message.

To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)

Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!

one year ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Hmm can you run the agent in debug mode, and check the specific console log?
'''
clearml-agent --debug daemon --foreground ...

one year ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Did you you set 'force_git_ssh_protocol: true '?
https://github.com/allegroai/clearml-agent/blob/249b51a31bee97d63f41c6d5542e657962008b68/docs/clearml.conf#L39

one year ago

0 Hii Guys, So I'Ve Got A Question About About Agents Using Ssh Connection. In The Docs (Here

Actually it is better to leave it as is, it will just automatically mount the .ssh folder into the container, i will make sure the docs point to this option first

one year ago

0 Hi, I Try To Write An Article On Medium About Clearml And Face Some A Problem With Plotly Figures. When Displaying The Figure Locally In A Browser Works Fine, But On The Cleaml Server (I Use The Free Tier Service) The Plot Is Empty And Has The Title 'Unkn

WickedGoat98 is this related to plotly opening a web page when you call show() method ?
You can do:
if not Task.running_locally() fig.show()

3 years ago

Okay, I was able to reproduce it (this is odd) let me check ...

3 years ago

WickedGoat98 what's the clearml version you are using?

3 years ago

0 Hi Everyone! Just A Simple Curiosity: Is The Clearml-Server Docker Image Built On Amd64 Only? No Arm64 Support?

Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...

one year ago

Hi WickedGoat98

I try to write an article on medium about ClearML and face some a problem with plotly figures.

This is awesome !

I ran the plotly_reporting.py example locally and the uploaded plot was ok.

So are you saying the same example code from the repository worked okay on your server but showed nothing on the hosted server ?

3 years ago

WickedGoat98 until the next RC release (should not take long) this will solve it:
df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:] df.index = df.index.astype(str) setattr(df, 'ticker', args.symbol)Basically removing the nan and converting the datetime to string representation (so plotly.js likes it)

3 years ago

Hey WickedGoat98
I found the bug, it is due to the fact the numpy (passed to plotly) contains both datetime and nan, and plotly.js does not like it. I'll make sure this is fixed, in the meantime you can just remove the first row (it contains the nan):
df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:]

3 years ago

WickedGoat98 Nice!!!
BTW: The fix should solve both (i.e. no need to manually cast), I'll make sure the fix is on GitHub so you'll be able to verify 🙂

3 years ago

WickedGoat98 Same for me, let me ask the UI guys, I think this is a UI bug.
Also maybe before you post the article we could release a fix to both, what do you think?
EDIT:
Never mind 🙂 i just saw the medium link, very cool!!!

3 years ago

WickedGoat98 this is awesome! Let me know how I could help 🙂
BTW: I checked regrading the plot comparison, this is a BE issue due to the size of the plot, I was told a fix will be deployed in a day or two.

3 years ago

WickedGoat98 give me a minute, I'm not sure it is not ClearML related

3 years ago

WickedGoat98 let me check...

3 years ago

0 Hi, I Need Your Help Setting Up An Trains Agent Running In Docker. I Have An Python Script Calling Wget As System Command Which Runs Fine On My Dev Engine. When Cloning The Experiment And Scheduling It Into The Services Queue I Get An Error That The Call

One last thing make sure you spin the pod container with privileged mode, because the trains-agent docker will spin a sibling docker for your actual experiment.

3 years ago

WickedGoat98 Basically you have two options:
Build a docker image with wget installed, then in the UI specify this image as "Base Docker Image" Configure the trains.conf file on the machine running the trains-agent, with the above script. This will cause trains-agent to install wget on any container it is running, so it is available for you to use (saving you the trouble of building your own container).With any of these two, by the time your code is executed, wget is installed an...

4 years ago

trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?

3 years ago

WickedGoat98 sorry, I missed the thread...

that the trains.conf has to be located on the node running the trains-agent.

Correct 🙂
The easiest way to check is to see if you can curl to the ip:port from the docker.
If you fail it is probably the wrong IP.
the IP you need to use is the IP of the machine running the docker-compose (not the IP of the docker inside that machine).
Make sense ?

4 years ago

Show more results